CN113688933B

CN113688933B - Classification network training method, classification method and device and electronic equipment

Info

Publication number: CN113688933B
Application number: CN202111022512.4A
Authority: CN
Inventors: 甘伟豪; 王意如
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2024-05-24
Anticipated expiration: 2039-01-18
Also published as: CN109800807A; CN109800807B; CN113688933A

Abstract

The embodiment of the application discloses a training method, a classifying method and a classifying device for a classifying network and electronic equipment, wherein the training method comprises the following steps: determining the sampling proportion of sample images of different categories obtained by the current sampling from a sample image set based on the sampling times corresponding to the current sampling in the multiple samplings; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by multiple sampling, a target classification network is obtained, and the sample image set is sampled according to the sampling proportion which dynamically changes along with the sampling times, so that the classification network obtained by training has higher classification accuracy.

Description

Classification network training method, classification method and device and electronic equipment

Technical Field

The application relates to a computer vision technology, in particular to a training method and a classifying method and device of a classifying network and electronic equipment.

Background

Classification networks play an important role in many areas, such as pedestrian detection and tracking, large-scale smart city target task search positioning, personal representation description, and the like. When the classification network processes the problems of face attribute analysis, pedestrian appearance analysis and the like, the problem of unbalanced actual training data exists, for example, whether the positive and negative sample proportion of the attribute data of the baldness is possibly as high as 1:100, how to improve the performance of the classification network obtained by training under different training data scenes is a research hotspot in the field.

Disclosure of Invention

The embodiment of the application provides training and classification technology of a classification network.

According to an aspect of an embodiment of the present application, there is provided a training method for a classification network, including:

Determining a sampling proportion of sample images of different categories obtained by the current sample from a sample image set based on a sampling number corresponding to the current sample in a plurality of times of sampling, wherein the sample image set comprises at least two image categories, and each image category comprises at least one sample image;

based on the sampling proportion, carrying out the current sampling on the sample image set to obtain a sampling sample of the current sampling;

and training a classification network based on the plurality of sampling samples obtained by the plurality of sampling to obtain a target classification network.

Optionally, in any foregoing method embodiment of the present application, the at least two image categories include a first image category and a second image category, wherein the first image category includes a greater number of sample images than the second image category.

Optionally, in any one of the above method embodiments of the present application, the sampled sample includes at least two sample images, and the at least two sample images correspond to at least one image category.

Optionally, in any of the above method embodiments of the present application, the difference between the number of sample images of different image categories corresponding to the sampling ratio decreases with increasing number of samplings.

Optionally, in any one of the above method embodiments of the present application, training the classification network based on the plurality of sampling samples obtained by the plurality of sampling to obtain the target classification network includes:

processing the sampled samples through the classification network to obtain network losses of the sampled samples;

And adjusting network parameters of the classified network based on the network loss to obtain a target classified network.

Optionally, in any one of the above method embodiments of the present application, the processing the sampled samples through the classification network to obtain network losses of the sampled samples includes:

processing the sampled samples through the classification network to obtain at least two losses of the sampled samples;

based on at least two losses of the sampled samples, a network loss of the sampled samples is obtained.

Optionally, in any one of the above method embodiments of the present application, the obtaining the network loss of the sampled samples based on at least two losses of the sampled samples includes:

And carrying out weighted summation on at least two losses of the sampling sample to obtain a network loss of the sampling sample, wherein the weight of at least one loss contained in the at least two losses depends on the current trained times corresponding to the sampling sample.

Optionally, in any foregoing method embodiment of the application, the at least one loss includes at least one of a predicted loss and an embedded loss.

Optionally, in any one of the above method embodiments of the present application, the embedded loss of the at least two losses has a lower contribution ratio to the network loss when the current trained number is a first value than when the current trained number is a second value, where the first value is greater than the second value; and/or

The predicted loss of the at least two losses contributes more to the network loss when the current number of trained times is the first value than to the network loss when the current number of trained times is the second value.

Optionally, in any of the above method embodiments of the present application, in response to the current number of trained times being less than a first preset threshold, the weight of the embedded loss in the at least one loss decreases with increasing current number of trained times; and/or

In response to the current number of trained times being greater than or equal to the first preset threshold, the weight of the embedded loss of the at least one loss is maintained at a fixed value.

Optionally, in any one of the above method embodiments of the present application, the processing the sampled samples through the classification network to obtain at least two losses of the sampled samples includes:

Processing the sampling samples through the classification network to obtain a prediction category of each sample image included in the sampling samples;

a prediction loss of the sampled sample is determined based on a prediction category of each sample image included in the sampled sample and a labeling category of each sample image.

Optionally, in any one of the above method embodiments of the present application, the determining the prediction loss of the sampling sample based on the prediction category of each sample image included in the sampling sample and the labeling category of each sample image includes:

determining a prediction error value of each sample image based on a prediction category of each sample image included in the sample and a labeling category of each sample image;

And determining a prediction error of the sampling sample based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.

Optionally, in any of the above method embodiments of the present application, the weight of the sample image depends on a first proportion of the image class to which the sample image belongs in the sampled sample.

Optionally, in any one of the above method embodiments of the present application, in response to a ratio between the first ratio and a second ratio of an image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold, the weight of the sample image is a ratio between the first ratio and the second ratio; and/or

In response to the ratio between the first ratio and the second ratio being less than the second preset threshold, the weight of the sample image is 0 or 1.

Processing the sampling sample through the classification network to obtain characteristic data of each sample image included in the sampling sample;

determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples;

And taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.

According to another aspect of the embodiment of the present application, there is provided a training method for a classification network, including:

Processing, by the classification network, a sampled sample obtained from a sample image set, obtaining at least two losses of the sampled sample, wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image, the sampled sample comprising at least two sample images;

obtaining a network loss of the sampling sample based on at least two losses of the sampling sample and weights of the at least two losses, wherein the weight of at least one loss contained by the at least two losses depends on the current trained times corresponding to the sampling sample;

Optionally, in any one of the method embodiments of the present application, the obtaining the network loss of the sampled samples based on at least two losses of the sampled samples and weights of the at least two losses includes:

and carrying out weighted summation on at least two losses of the sampling sample based on the weights of the at least two losses to obtain network losses of the sampling sample.

In response to the current number of trained times being greater than or equal to the first preset threshold, the weight of the at least one loss is maintained at a fixed value.

Optionally, in any one of the above method embodiments of the present application, the processing, by the classification network, the sampled samples obtained from the sample image set, to obtain at least two losses of the sampled samples, includes:

Optionally, in any of the above method embodiments of the present application, before processing the sampled samples obtained from the sample image set through the classification network to obtain at least two losses of the sampled samples, the method further includes:

Determining the sampling proportion of sample images of different categories obtained by the current sampling from a sample image set based on the sampling times corresponding to the current sampling in the multiple samplings;

And carrying out the current sampling on the sample image set based on the sampling proportion so as to obtain a sampling sample of the current sampling.

Optionally, in any one of the above method embodiments of the present application, the sampled sample includes at least two sample images, and the at least two sample images correspond to at least one category.

According to still another aspect of the embodiment of the present application, there is provided a classification method, including:

Acquiring an image to be processed;

Classifying the image to be processed through a target classification network to obtain an image prediction category of the image to be processed; wherein,

The target classification network is obtained by the training method according to any one of the above.

According to still another aspect of the embodiment of the present application, there is provided a training apparatus for a classification network, including:

A sample ratio determining unit, configured to determine, based on a number of sampled times corresponding to a current sample of a plurality of samples, a sample ratio of sample images of different categories obtained by the current sample from a sample image set, where the sample image set includes at least two image categories, each image category including at least one sample image;

the sample sampling unit is used for carrying out current sampling on the sample image set based on the sampling proportion so as to obtain a sampling sample of the current sampling;

and the network training unit is used for training the classification network based on the plurality of sampling samples obtained by the plurality of sampling to obtain a target classification network.

Optionally, in any embodiment of the foregoing apparatus of the present application, the at least two image categories include a first image category and a second image category, wherein a number of sample images included in the first image category is greater than a number of sample images included in the second image category.

Optionally, in any embodiment of the foregoing apparatus of the present application, the sampled sample includes at least two sample images, and the at least two sample images correspond to at least one image category.

Optionally, in any of the above embodiments of the present application, the difference between the number of sample images of different image categories corresponding to the sampling ratio decreases with increasing number of samplings.

Optionally, in any one of the above device embodiments of the present application, the network training unit includes:

The loss obtaining module is used for processing the sampling samples through the classification network to obtain network loss of the sampling samples;

and the parameter adjustment module is used for adjusting the network parameters of the classification network based on the network loss to obtain a target classification network.

Optionally, in any one of the above device embodiments of the present application, the loss obtaining module is configured to process the sampled samples through the classification network to obtain at least two losses of the sampled samples; based on at least two losses of the sampled samples, a network loss of the sampled samples is obtained.

Optionally, in any one of the above apparatus embodiments of the present application, when obtaining the network loss of the sampling sample based on at least two losses of the sampling sample, the loss obtaining module is configured to perform weighted summation on at least two losses of the sampling sample to obtain the network loss of the sampling sample, where a weight of at least one loss included in the at least two losses depends on a current trained number corresponding to the sampling sample.

Optionally, in any of the above apparatus embodiments of the application, the at least one loss includes at least one of a predicted loss and an embedded loss.

Optionally, in any one of the above apparatus embodiments of the present application, the embedded loss of the at least two losses has a lower contribution ratio to the network loss when the current trained number is a first value than when the current trained number is a second value, where the first value is greater than the second value; and/or

Optionally, in any of the above apparatus embodiments of the present application, in response to the current trained number being less than a first preset threshold, the weight of the embedded loss in the at least one loss decreases with an increase in the current trained number; and/or

Optionally, in any one of the above device embodiments of the present application, when the loss obtaining module processes the sampled samples through the classification network to obtain at least two losses of the sampled samples, the loss obtaining module is configured to process the sampled samples through the classification network to obtain a prediction class of each sample image included in the sampled samples; a prediction loss of the sampled sample is determined based on a prediction class of each sample image included in the sampled sample and a labeling class of each sample image.

Optionally, in any one of the above apparatus embodiments of the present application, the loss obtaining module is configured to determine, when determining the prediction loss of the sample based on the prediction category of each sample image included in the sample and the labeling category of each sample image, a prediction error value of each sample image based on the prediction category of each sample image included in the sample and the labeling category of each sample image; and determining a prediction error of the sampling sample based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.

Optionally, in any of the above device embodiments of the present application, the weight of the sample image depends on a first proportion of the image class to which the sample image belongs in the sampled sample.

Optionally, in any of the above device embodiments of the present application, in response to a ratio between the first ratio and a second ratio of an image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold, the weight of the sample image is a ratio between the first ratio and the second ratio; and/or

Optionally, in any one of the above device embodiments of the present application, when the loss obtaining module processes the sampled samples through the classification network to obtain at least two losses of the sampled samples, the loss obtaining module is configured to process the sampled samples through the classification network to obtain feature data of each sample image included in the sampled samples; determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples; and taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.

A sample loss obtaining unit for processing a sampled sample obtained from a sample image set through the classification network, obtaining at least two losses of the sampled sample, wherein the sample image set comprises at least two image categories, each image category comprises at least one sample image, the sampled sample comprises at least two sample images;

A network loss unit, configured to obtain a network loss of the sampled sample based on at least two losses of the sampled sample and weights of the at least two losses, where the weight of at least one loss included in the at least two losses depends on a current trained number of times corresponding to the sampled sample;

And the parameter adjustment unit is used for adjusting the network parameters of the classification network based on the network loss to obtain a target classification network.

Optionally, in any embodiment of the foregoing apparatus of the present application, the sample loss obtaining unit is configured to perform weighted summation on at least two losses of the sampled samples based on weights of the at least two losses, to obtain a network loss of the sampled samples.

Optionally, in any one of the above device embodiments of the present application, the sample loss obtaining unit is specifically configured to process, through the classification network, the sampled samples to obtain a prediction class of each sample image included in the sampled samples; a prediction loss of the sampled sample is determined based on a prediction category of each sample image included in the sampled sample and a labeling category of each sample image.

Optionally, in any one of the above apparatus embodiments of the present application, the sample loss obtaining unit is configured to determine, when determining the prediction loss of the sample based on the prediction category of each sample image included in the sample and the labeling category of each sample image, a prediction error value of each sample image based on the prediction category of each sample image included in the sample and the labeling category of each sample image; and determining a prediction error of the sampling sample based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.

Optionally, in an embodiment of any one of the foregoing apparatus of the present application, the sample loss obtaining unit is specifically configured to process, through the classification network, the sampled sample to obtain feature data of each sample image included in the sampled sample; determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples; and taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.

Optionally, in any one of the above device embodiments of the present application, the device further includes:

A sampling proportion determining unit, configured to determine, based on a number of sampled times corresponding to a current sample of a plurality of samples, a sampling proportion of sample images of different categories obtained from a sample image set by the current sample;

and the sample sampling unit is used for carrying out the current sampling on the sample image set based on the sampling proportion so as to obtain a sampling sample of the current sampling.

Optionally, in any embodiment of the foregoing apparatus of the present application, the sampled sample includes at least two sample images, and the at least two sample images correspond to at least one category.

According to still another aspect of the embodiment of the present application, there is provided a classification apparatus, including:

an image acquisition unit for acquiring an image to be processed;

The class prediction unit is used for classifying the image to be processed through a target classification network to obtain an image prediction class of the image to be processed; wherein the target classification network is obtained by the training method according to any one of the above.

According to another aspect of an embodiment of the present application, there is provided an electronic apparatus including: a memory for storing executable instructions;

And a processor in communication with the memory for executing the executable instructions to perform the operations of the training method of the classification network or the classification method as described above in any of the possible implementations.

According to another aspect of embodiments of the present application, a computer-readable storage medium is provided for storing computer-readable instructions that, when executed, perform the operations of the training method of the classification network or the classification method described above in any of the possible implementations described above.

According to another aspect of embodiments of the present application, there is provided a computer program product comprising computer readable code which, when run on a device, executes instructions for implementing the training method of the classification network or the classification method as described above in any of the possible implementations described above.

According to yet another aspect of embodiments of the present application, there is provided another computer program product for storing computer readable instructions that, when executed, cause a computer to perform the operations of the training method of the classification network or the classification method as described above in any of the possible implementations described above.

In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a software product, such as an SDK, etc.

According to the embodiment of the application, another training method and device of the classification network, electronic equipment, computer storage media and computer program products are also provided, wherein the sampling proportion of the current sample to obtain sample images of different categories from a sample image set is determined based on the sampling times corresponding to the current sample in a plurality of times of sampling; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by a plurality of sampling to obtain a target classification network.

Based on the training method, the classifying method and the classifying device for the classifying network, and the electronic equipment provided by the embodiment of the application, the sampling proportion of the sample images of different categories obtained by the current sampling from the sample image set is determined based on the sampling times corresponding to the current sampling in the multiple samplings; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by multiple sampling, a target classification network is obtained, and the sample image set is sampled according to the sampling proportion which dynamically changes along with the sampling times, so that the classification network obtained by training has higher classification accuracy.

The technical scheme of the application is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.

The application may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:

fig. 1 is a schematic flow chart of a training method of a classification network according to an embodiment of the present application.

Fig. 2 is a flowchart of another embodiment of a classification network training method according to an embodiment of the present application.

Fig. 3 is a schematic flow chart of obtaining network loss in the training method of the classification network according to the embodiment of the present application.

FIG. 4 is a schematic diagram of a prior art calculation of the triplet loss.

Fig. 5 is a schematic diagram of calculation of triplet loss in the training method of the classification network according to the embodiment of the present application.

Fig. 6 is a schematic structural diagram of a training device for classification network according to an embodiment of the present application.

Fig. 7 is a flowchart of a training method of a classification network according to another embodiment of the present application.

Fig. 8 is a flowchart of another embodiment of a training method for a classification network according to an embodiment of the present application.

Fig. 9 is another schematic structural diagram of a training device for classification network according to an embodiment of the present application.

Fig. 10 is a schematic flow chart of a classification method according to an embodiment of the present application.

Fig. 11 is a schematic structural diagram of a sorting device according to an embodiment of the present application.

Fig. 12 is a schematic structural diagram of an electronic device suitable for use in implementing a terminal device or server according to an embodiment of the present application.

Detailed Description

Various exemplary embodiments of the present application will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present application unless it is specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.

The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the application, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Fig. 1 is a schematic flow chart of a training method of a classification network according to an embodiment of the present application. The method may be performed by any electronic device, such as a terminal device, a server, a mobile device, etc.

Step 110, determining a sampling proportion of the current sample to obtain sample images of different categories from the sample image set based on the number of times the current sample corresponds to the current sample in the plurality of samples.

Wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image; for example, the sample image set includes two categories, a first category including a large amount of data and a second category including a small amount of data, and by dynamically adjusting the sampling ratio of sampling according to the sampled times in the embodiment of the present application, the sampling ratio between the first category and the second category is dynamically changed during each sampling, for example, a set function may be used to realize that the sampling ratio dynamically changes with the sampled times.

In some alternative embodiments, the at least two image categories include a first image category and a second image category, wherein the first image category includes a greater number of sample images than the second image category.

The sampling method for determining the sampling proportion of the current sampling based on the sampled times is suitable for sampling any sample image set comprising a plurality of categories, and particularly for the situation that the sample image number difference of different image categories is large, the specific gravity of the second image category with the smaller number in the sampling sample can be increased by adjusting the sampling proportion, so that the target classification network obtained through training can accurately classify the second image category, and the situation that the second image category is excessively small in number and repeated in number is large in the sampling sample due to the fact that the sample image number is smaller when the fixed sampling proportion is used for sampling is avoided, and the classification of the second image category by the trained target classification network is inaccurate. For example, in the case of learning to determine whether a pedestrian is bald, a large amount of training data is non-bald (corresponding to the first image category in the embodiment of the present application), and the bald (corresponding to the second image category in the embodiment of the present application) is rarely data, and may occupy less than 1%. If the model learned by the system is prone to a large number of categories, such as predicting all data as non-bald, the accuracy of the model can reach 99%, whereas the recall of bald is 0, according to the general class learning method. This is detrimental to the user's ability to find a particular target object, or, in other words, the model does not have the ability to determine whether the pedestrian is bald. The target classification network obtained after learning by the method provided by the embodiment of the application can improve the judgment of the bald sample, improve the recall rate of the bald and ensure the overall accuracy.

Step 120, performing current sampling on the sample image set based on the sampling proportion to obtain a sampling sample of the current sampling.

In the embodiment of the application, the sampling proportion of the current sampling is determined by the sampled times, and the sampling times are changed when each sampling is performed, so that the sampling proportion of each sampling is different, and each image category in the sample image set is ensured to play a positive role in training the classification network.

Optionally, each sample comprises at least two sample images, the at least two sample images corresponding to at least one image class.

In the embodiment of the application, in order to realize the training of the classification network based on the sampling sample, a plurality of sample images which are needed to be included in the sampling sample are used for improving the classification accuracy of the classification network obtained by the training; sampling is carried out from the sample image set based on one sampling proportion each time, and the proportion among different types of sample images in the obtained sampling samples accords with the sampling proportion; for example, a sample image set including 500 sample images of a first category and 100 sample images of a second category, sampling from the sample image set based on a sampling ratio of 3:1, 30 sample images of the first category and 10 sample images of the second category may be obtained.

Step 130, training the classification network based on the plurality of sampled samples obtained from the plurality of samplings to obtain a target classification network.

Optionally, training the classification network based on each of the plurality of sample samples in turn, the specific process may include: for each sampling sample, the classification network to be trained processes the sampling sample to obtain network loss; performing parameter adjustment on the classified network to be trained based on network loss to obtain the classified network with the parameters adjusted; judging whether the conditions for ending the training (for example, reaching the preset training times, etc.) are reached, and taking the classification network after parameter adjustment as a target classification network when the conditions for ending the training are reached; when the condition of finishing training is not reached, the classification network after the parameter adjustment is used as the classification network to be trained, the sampling sample obtained by the next sampling is processed based on the classification network to be trained to obtain the next network loss, the parameter adjustment is carried out on the classification network to be trained based on the network loss to obtain the classification network after the parameter adjustment, until the condition of finishing training is reached, and the classification network after the parameter adjustment is used as the target classification network.

Based on the training method of the classification network provided by the embodiment of the application, the sampling proportion of the sample images of different categories obtained by the current sampling from the sample image set is determined based on the sampled times corresponding to the current sampling in the plurality of samplings; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by multiple sampling, a target classification network is obtained, and the sample image set is sampled according to the sampling proportion which dynamically changes along with the sampling times, so that the classification network obtained by training has higher classification accuracy.

In one or more alternative embodiments, the difference between the number of sample images of different image categories for which the sampling scale corresponds decreases with increasing number of samplings.

In the prior art, when a sample image set is sampled, an balanced target data sampling distribution is always kept (for example, sampling is always performed by using a sampling original proportion or a set proportion or sampling is always performed by using a set proportion, and the proportion between at least two categories in the sample image set is used as the sampling original proportion), so that generalization learning of a system is not facilitated, for example, in the initial stage of system learning, excessive effective information is lost by the system due to excessive discarding of large-category sample images, and inaccurate classification of a classification network obtained through training is caused. According to the embodiment of the application, firstly, the proportion among different categories in the sample image set is taken as the sampling original proportion (for example, the sample image set comprises 500 sample images of a first category and 100 sample images of a second category, the original sampling proportion is 1/5), the data difference among the sample images of different image categories in the sample image set is reduced by dynamically adjusting the original sampling proportion based on the sampling original proportion, namely, the proportion of the sample images of different image categories in the sample image set is gradually increased along with the increase of sampling times, so that the study of unbalanced data of different categories in the sample image set is realized, the recall rate of a classification network to the sample images of the subclass is improved, the effective feature expression of all data study is realized, and the correct classification study of the sample image can be realized in the later stage.

Alternatively, the process of obtaining the sampling rate of each sample at step 110 may include: and processing the original sampling proportion based on the first dynamic change function and the sampling times corresponding to the current sampling to obtain the sampling proportion of the current sampling.

The variable in the first dynamic change function is the sampled times; alternatively, the first dynamically changing function may take the form of any function with a variable decreasing from 1 to 0, for example: convex functions, concave functions, linear functions, complex functions, etc. The first dynamically changing function reflects the state of the network learning process, the slope of the first dynamically changing function represents the rate of network learning, different classes of functions depict different learning rate styles, e.g., convex functions present learning strategies with progressively faster learning rates.

As an alternative example, the first dynamically changing function may include, but is not limited to, the following functions:

For example, convex functions: indicating that the learning rate is from slow to fast. The formula of the first dynamic change function at this time may be as shown in formula (1.1):

Where SF _cos (L) represents a first dynamically changing function in the form of a convex function, L represents the first sample, and L represents the set total number of samples.

Linear function: indicating a constant learning rate. The formula of the first dynamically changing function at this time may be as shown in formula (1.2):

where SF _linear (L) represents a first dynamically changing function in the form of a linear function, L represents the first sample, and L represents the set total number of samples.

Concave function: indicating that the learning rate is from fast to slow. The formula of the first dynamic change function at this time may be as shown in formula (1.3):

SF _exp(l)＝λ^l formula (1.3)

Where SF _exp (l) represents a first dynamically changing function in the form of a concave function, l representing the first sample.

Composite function: indicating that the learning rate is from slow to fast, and then to slow. The formula of the first dynamic change function at this time may be as shown in formula (1.4):

Where SF _composite (L) represents a first dynamically changing function in the form of a compound function, L represents the first sample, and L represents the set total number of samples.

The above 4 formulas are listed as several optional formulas of the first dynamic change function, and are not intended to limit the specific form of the first dynamic change function in the embodiment of the present application.

The sampling times are accumulatively changed, for example, when the first sampling is performed, the variable value in the first dynamic change function is 1, when the second sampling is performed, the variable value in the first dynamic change function is 2, and the like, and the corresponding sampling times are different when each sampling is performed, so that the function value of the first dynamic change function corresponding to each sampling is different, the sampling proportion of each sampling is further different, and the dynamic change sampling is realized.

Optionally, the process of processing the original proportion of samples based on the first dynamic change function and the number of times of sampling corresponding to the current sample to obtain the proportion of samples of the current sample may include:

obtaining a function value of the first dynamic change function corresponding to the current sampling;

And taking the original sampling proportion as a base, and taking the obtained function value as an index to obtain the sampling proportion of the current sampling.

Optionally, the sampling proportion obtained by the embodiment of the application realizes that the network training is studied for the real training data distribution (close to the sampling original proportion) in the initial stage and is studied for the balanced data distribution in the later stage of training. In an alternative example, the sampling ratio may be obtained based on the following equation (2):

D (l) =d _train ^g(l) formula (2)

Wherein, D (l) represents the sampling proportion corresponding to the first sampling, g (l) represents the first dynamic change function corresponding to the first sampling, and D _train represents the sampling original proportion of the sample image set; based on the sampling proportion obtained by the above formula (1), since g (l) is reduced from 1 to 0 as a first dynamic change function, for example, g (l) can be implemented by selecting the above formula (1.1), formula (1.2), formula (1.3) or formula (1.4), the obtained sampling proportion D (l) is made to approach the original sampling proportion at the time of initial sampling, and the proportion of the small number of categories is gradually increased in the subsequent sampling, so as to improve the classification performance of the classification network for the small number of categories.

Fig. 2 is a flowchart of another embodiment of a classification network training method according to an embodiment of the present application. As shown in fig. 2, the method of this embodiment includes:

step 210, determining a sampling proportion of the current sample to obtain sample images of different categories from the sample image set based on the number of times the current sample corresponds to the current sample in the plurality of samples.

Step 220, based on the sampling proportion, performing current sampling on the sample image set to obtain a sampling sample of the current sampling.

In step 230, the sampled samples are processed through the classification network to obtain network loss of the sampled samples.

Optionally, inputting the sampled sample into a classification network, classifying each sample image included in the sampled sample based on the classification network to obtain a prediction classification result, and processing the prediction classification result and a labeling classification result corresponding to each sample image to obtain a network loss of the sampled sample, wherein the network loss can be formed by at least one loss, for example, the network loss is formed by a prediction loss and an embedding loss; embodiments of the present application do not limit the specific number and types of network losses, including losses.

Step 240, adjusting network parameters of the classification network based on the network loss to obtain the target classification network.

Network training is the process of adjusting network parameters through network loss, optionally, the process of adjusting parameters includes: after obtaining a sample, inputting the sample to be trained into the classification network to obtain a network loss, adjusting network parameters in the classification network to be trained based on the network loss to obtain an adjusted classification network, at this time, judging whether the preset training times are reached (the training times in the embodiment of the application can be preset, for example, the preset training times are 10 times), when the preset training times are not reached (for example, the preset training times are 10 times, the training is 8 th times), performing the next sampling (the 9 th sampling and the training), obtaining the next sample, taking the adjusted classification network as the classification network to be trained, inputting the next sample into the classification network to be trained to obtain the next network loss, and adjusting the network parameters in the classification network to be trained based on the network loss to obtain the adjusted classification network until the preset training times are reached (for example, the preset training times are 10 times), and taking the adjusted classification network as the target classification network.

Fig. 3 is a schematic flow chart of obtaining network loss in the training method of the classification network according to the embodiment of the present application. As shown in fig. 3, step 230 in the above embodiment may include:

Step 302, processing the sampled samples through a classification network to obtain at least two losses of the sampled samples.

In an embodiment of the present application, in order to improve the training speed and the accuracy of the target classification network, at least two losses are obtained based on the sampled samples, and network losses are obtained by at least two losses, optionally, the at least two losses may include, but are not limited to: prediction loss, embedding loss, etc.

Step 304, obtaining a network loss of the sampled samples based on at least two losses of the sampled samples.

In the network training process, a loss is often not only included, in the embodiment of the present application, a batch of sampling samples is input into the classification network at a time, at least two losses are obtained, the network losses are obtained in combination with at least two losses, the losses are generally obtained based on supervision information (usually, the labeling category corresponding to the sample image) and the prediction classification result, for example, the losses are determined based on the matching condition of the prediction classification result and the labeling category.

Optionally, step 304 may include:

And carrying out weighted summation on at least two losses of the sampling samples to obtain network losses of the sampling samples.

In the network training process, the network loss is obtained by means of weighted summation aiming at a plurality of losses, each loss can make corresponding contribution in the training, and the contribution proportion of different losses to parameter adjustment in each training can be adjusted through different weights, so that the contribution proportion of more advantageous losses is improved in different training stages.

Wherein the weight of at least one of the at least two losses is dependent on the current number of trained times corresponding to the sampled sample.

Optionally, the at least one penalty for the weight value depending on the current number of trained times the sample corresponds to may include, but is not limited to: at least one of a prediction loss and an embedding loss.

In the embodiment of the application, different losses are different in importance in different stages of training, for example, some losses which are important in the early stage of training are not important in the later stage of training, and the proportion of each loss in network losses is required to be dynamically adjusted at the moment so as to solve the problems that network learning is not important and the performance of a classified network is reduced due to adding a plurality of losses in the prior art.

Optionally, the embedded loss of the at least two losses contributes less to the network loss when the current number of trained times is a first value than to the network loss when the current number of trained times is a second value; and/or

The predicted loss of the at least two losses contributes more to the network loss when the current number of trained times is a first value than to the network loss when the current number of trained times is a second value.

Wherein the first value is greater than the second value. In the embodiment of the application, since the embedding loss has advantages in the initial stage of training and has no advantages after the characteristics are basically stable in the middle and later stages of training, the contribution ratio of the embedding loss is gradually adjusted to be smaller according to the increase of training times, and in the process, the contribution ratio of the prediction loss is gradually increased along with the increase of training times.

In the method of combining classification tasks (e.g., cross entropy loss learning, etc.) and metric learning (e.g., triple loss, quadruple loss, quintuple loss learning, etc.) in the embodiment of the application, the two may be considered to have different emphasis points in the whole learning process. Optionally, the classification task pays more attention to predicting specific classification, and the metric learning aims at pulling the feature space distance of the sample, so that the embodiment of the application can learn effective feature expression first in the initial stage of training and then learn the correct classification of the sample later by adjusting the ratio of the prediction loss and the embedding loss.

In an alternative example, the proportional weights of the predicted and embedded losses throughout the learning process are controlled by dynamic adjustment. The network loss may be calculated based on the following equation (3.1), in which case the predicted loss may be a weighted cross entropy loss and the embedded loss is a triplet loss.

L _DCL＝L_DSL+f(l)*L_TEA formula (3.1)

Where L _DCL denotes the network loss, L _DSL denotes the weighted cross entropy loss, L _TEA denotes the triplet loss function value, and f (L) denotes the second dynamically changing function.

The second dynamic change function in the implementation of the present application is similar to the first dynamic change function in the above embodiment, and any function in which the variable decreases from 1 to 0 may be used, for example: convex functions, concave functions, linear functions, complex functions, etc. Alternatively, the above formula (1.1), formula (1.2), formula (1.3), or formula (1.4) may be selected for implementation.

Optionally, in response to the current number of trained times being less than a first preset threshold, the weight of the embedded loss in the at least one loss decreases as the current number of trained times increases; and/or

In response to the current number of trained times being greater than or equal to a first preset threshold, the weight of the embedded loss in the at least one loss is maintained at a fixed value.

The fixed value is used for guaranteeing that the weight value of the embedded loss is not 0, training the classification network based on at least two losses in the whole training process is achieved, loss quantity reduction caused by increase of training times is avoided, and training efficiency of the classification network is improved.

Alternatively, it may be considered in embodiments of the present application that the weight of the embedding loss may be calculated using a second dynamic change function. The second dynamically changing function may take the form of any function with a variable decreasing from 1 to 0, for example: convex functions, concave functions, linear functions, complex functions, etc.

In an alternative embodiment, the second dynamic change function f (l) may be calculated based on the following equation (3.2).

Wherein L represents the first training, L represents the set total training times, e is a constant with a smaller set value, and formula (3.2) belongs to improvement of formula (1.4), and the value of the second dynamic change function f (L) is not 0 by increasing e.

In one or more alternative embodiments, step 302 includes:

processing the sampling samples through a classification network to obtain a prediction category of each sample image included in the sampling samples;

a prediction loss of the sampled sample is determined based on the prediction category of each sample image included in the sampled sample and the annotation category of each sample image.

According to the method and the device for classifying the sample images, the prediction loss can be classified and learned based on labels, each sample image has a unique labeling category, the difference between the prediction category and the labeling category obtained through the classification network is the prediction loss of the sampling sample, namely the prediction loss expresses the accuracy of the prediction category of the classification network, and the classification network is trained through the prediction loss, so that the accuracy of judging specific categories by the classification network can be improved.

Optionally, determining the prediction loss of the sample based on the prediction category of each sample image and the labeling category of each sample image included in the sample comprises:

Determining a prediction error value for each sample image based on a prediction category of each sample image and a labeling category of each sample image included in the sampled sample;

A prediction error of the sampling sample is determined based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.

In the embodiment of the application, the effectiveness of feature expression is improved by adding the weight value when calculating the prediction error value for each sample image, and optionally, the weight of the sample image depends on the first proportion of the image class of the sample image in the sampled sample.

Alternatively, taking weighted cross entropy loss as an example of prediction loss, the prediction loss is based on general cross entropy loss, the formula for calculating cross entropy loss is improved, and the weight value is increased, so as to improve the effectiveness of feature expression, for example, the weighted cross entropy loss can be calculated based on the following formula (4.1):

Wherein L _DSL represents the weighted cross entropy loss, N represents the number of sample images included in the sample corresponding to the present training, N _j represents the number of sample images of the j-th class in a batch of sample samples, M is the number of classes included in the sample image set, Representing the actual label corresponding to the ith sample image in the jth category. w _j denotes the weight corresponding to the sample image of the j-th class. Optionally, in response to the ratio between the first proportion and the second proportion of the image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold, the weight of the sample image is the ratio between the first proportion and the second proportion; and/or

In response to the ratio between the first ratio and the second ratio being less than a second preset threshold, the weight of the sample image is 0 or 1.

In one or more alternative examples, the weights w _j for the sample image may be calculated based on the following equation (4.2):

The above formula (4.2) is expressed as When w _j is a value of/>While when/>When w _j takes on a value of 0 or 1. Where D _j (l) is the target distribution of the sample image of the j-th class in the present training (i.e., the ratio between the sample image of the j-th class and the sample images of the other classes), and B _j is the distribution of the sample image of the j-th class in the sample image set (i.e., the ratio between the sample image of the j-th class and the sample images of the other classes).

In one or more alternative embodiments, step 302 may include:

processing the sampling sample through a classification network to obtain characteristic data of each sample image included in the sampling sample;

In network training, the introduction of metric learning is beneficial to better sample feature expression, and the embodiment of the application realizes metric learning through embedding loss, wherein the embedding loss can comprise various losses based on anchor points, such as triplet loss, quadruple loss, quintuple loss and the like. Taking the triplet loss as an example, it consists of an anchor point and one each of positive and negative samples corresponding to the anchor point. FIG. 4 is a schematic diagram of a prior art calculation of the triplet loss. As shown in fig. 4, in the prior art, when the triplet loss is calculated, all sample images in the categories with a smaller number (small samples of the baldness in fig. 4) are generally taken as anchor points, that is, all anchor points in the embodiment of the present application are corresponded, but one problem exists in that all small sample categories are taken as anchor points: if a difficult small number of samples (i.e., sample images that are not aggregated in a small number of sample classes, such as the anchor point (anchor) shown in fig. 4) are used as anchor points, difficulty is brought to network learning, so that the classification boundary cannot be solved stably, such as the distance between the anchor point and the similar sample image (positive sample Hard ⁺) to be pulled in fig. 4, which results in unstable boundary between the two classes.

In response to the problem in fig. 4 described above, two concepts are presented by the embodiment of the present application: an easy sample and a difficult sample, the easy sample refers to at least two sample images in a small-amount image class (when sample images in the small-amount sample class are adopted as anchor points) which are smaller than a set value in mutual distance, that is, sample images aggregated together are called an easy sample; whereas the difficult sample is a relatively easy sample, i.e., a sample image at a distance from the easy sample greater than or equal to a set point.

In order to solve the problem existing in the prior art when the triplet loss is calculated, the embodiment of the application uses an easy sample as an anchor point to calculate the triplet loss, and can more robustly and stably determine the classification boundary, and fig. 5 is a schematic calculation diagram of the triplet loss in the training method of the classification network provided by the embodiment of the application. In an alternative example, equation (5) may be selected to calculate the triplet loss.

Where L _TEA represents the triplet loss function value, |t| represents the total number of triples, each triplet including one easy anchor point, one positive sample (sample image belonging to the same class as the easy anchor point is referred to as positive sample), and one negative sample (sample image belonging to a different class than the easy anchor point is referred to as negative sample); m _j is a set super parameter, x _easy,j represents a sample image of a j-th category in a sampling sample as an easy anchor point, x _+,j represents a positive sample of the easy anchor point x _easy,j, x _-,j represents a negative sample of the easy anchor point x _easy,j, d (x _easy,j,x_+,j) represents a distance (e.g., euclidean distance, cosine distance, etc.) between the easy anchor point x _easy,j and the positive sample x _+,j, and d (x _easy,j,x_-,j) represents a distance (e.g., euclidean distance, cosine distance, etc.) between the easy anchor point x _easy,j and the negative sample x _-,j.

The weighted cross entropy loss obtained in connection with the above embodiments and the triplet loss based on easy anchor point are used for multitasking learning. The cross entropy loss with weight is different from the learning emphasis point of the triplet loss based on easy anchor points, and has different effects in different learning stages, so that the embodiment of the application dynamically programs the emphasis points of the two in different learning stages.

In application, the training method of the classification network provided by the embodiment of the application can be applied to neural network training of an image set containing training images of different categories, in particular to unbalanced data scenes, such as pedestrian detection and tracking, large-scale smart city target task search positioning, personal portrait description and the like, and can effectively improve the performance of the neural network obtained by training.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

Fig. 6 is a schematic structural diagram of a training device for classification network according to an embodiment of the present application. The device of this embodiment can be used to implement the above-described method embodiments of the present application. As shown in fig. 6, the apparatus of this embodiment includes:

The sampling ratio determining unit 61 is configured to determine, based on the number of times of sampling corresponding to a current sample of the plurality of times of sampling, a sampling ratio at which the current sample obtains sample images of different categories from the sample image set.

Wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image.

The sample sampling unit 62 is configured to perform the current sampling on the sample image set based on the sampling proportion, so as to obtain a sampled sample of the current sampling.

The network training unit 63 is configured to train the classification network based on a plurality of sampling samples obtained by a plurality of sampling, and obtain a target classification network.

Based on the training device of the classification network provided by the embodiment of the application, the sampling proportion of the sample images of different categories obtained by the current sampling from the sample image set is determined based on the sampled times corresponding to the current sampling in the plurality of samplings; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by multiple sampling, a target classification network is obtained, and the sample image set is sampled according to the sampling proportion which dynamically changes along with the sampling times, so that the classification network obtained by training has higher classification accuracy.

The sampling method for determining the sampling proportion of the current sampling based on the sampled times is suitable for sampling any sample image set comprising a plurality of categories, and particularly for the situation that the sample image number difference of different image categories is large, the specific gravity of the second image category with the smaller number in the sample can be increased by adjusting the sampling proportion, so that the target classification network obtained through training can accurately classify the second image category, and the problem that the second image category is excessively small in number and large in repeated amount in the sample due to the fact that the sample image number is smaller when the fixed sampling proportion is used for sampling does not occur, and the training target classification network is inaccurate in classifying the second image category.

Optionally, the sampled sample comprises at least two sample images, the at least two sample images corresponding to at least one image class.

In one or more alternative embodiments, the network training unit 63 includes:

and the parameter adjustment module is used for adjusting network parameters of the classified network based on the network loss to obtain the target classified network.

Network training is the process of adjusting network parameters through network loss, optionally, the process of adjusting parameters includes: and inputting a sample to be trained after obtaining one sample per sampling, obtaining a network loss, and adjusting network parameters in the classification network to be trained based on the network loss to obtain an adjusted classification network.

Optionally, a loss obtaining module, configured to process the sampled samples through a classification network to obtain at least two losses of the sampled samples; based on at least two losses of the sampled samples, a network loss of the sampled samples is obtained.

Optionally, the loss obtaining module is configured to, when obtaining the network loss of the sampled samples based on at least two losses of the sampled samples, weight and sum the at least two losses of the sampled samples to obtain the network loss of the sampled samples.

Optionally, the at least one loss comprises at least one of a predicted loss and an embedded loss.

Optionally, the embedded loss of the at least two losses has a lower contribution to the network loss when the current number of trained times is a first value than when the current number of trained times is a second value, wherein the first value is greater than the second value; and/or

In one or more alternative embodiments, the loss obtaining module is configured to process the sampled samples through the classification network to obtain a prediction category for each sample image included in the sampled samples when the sampled samples are processed through the classification network to obtain at least two losses of the sampled samples; a prediction loss of the sampled sample is determined based on the prediction class of each sample image included in the sampled sample and the annotation class of each sample image.

Optionally, the loss obtaining module is configured to determine, when determining the prediction loss of the sample based on the prediction category of each sample image and the labeling category of each sample image included in the sample, a prediction error value of each sample image based on the prediction category of each sample image and the labeling category of each sample image included in the sample; a prediction error of the sampling sample is determined based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.

Optionally, the weight of the sample image depends on a first proportion of the sample image's belonging image class in the sample.

Optionally, in response to the ratio between the first proportion and the second proportion of the image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold, the weight of the sample image is the ratio between the first proportion and the second proportion; and/or

In one or more alternative embodiments, the loss obtaining module is configured to process the sampled samples through the classification network to obtain feature data of each sample image included in the sampled samples when the sampled samples are processed through the classification network to obtain at least two losses of the sampled samples; determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples; and taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.

In network training, the introduction of metric learning is beneficial to better sample feature expression, and the embodiment of the application realizes metric learning through embedding loss, wherein the embedding loss can comprise various losses based on anchor points, such as triplet loss, quadruple loss, quintuple loss and the like. Taking the triplet loss as an example, it consists of an anchor point and one each of positive and negative samples corresponding to the anchor point.

Fig. 7 is a flowchart of a training method of a classification network according to another embodiment of the present application. The method may be performed by any electronic device, such as a terminal device, a server, a mobile device, etc.

At step 710, processing the sampled samples obtained from the sample image set through the classification network to obtain at least two losses of the sampled samples.

Wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image, the sample comprising at least two sample images.

The sampling sample in the embodiment of the application can be obtained by sampling based on any sampling proportion, for example, the sampling can be performed based on the sampling proportion which dynamically changes, and the sampling can also be performed based on the sampling proportion which is fixedly set.

At step 720, a network penalty for the sampled samples is obtained based on the at least two penalties and the weights of the at least two penalties for the sampled samples.

The weight of at least one loss contained in at least two losses depends on the current trained times corresponding to the sampled samples, and the proportion of different losses in network losses is adjusted through the weight, so that the problem that network learning caused by adding a plurality of losses is not emphasized in the prior art, and the performance of a classified network is reduced is solved.

In the network training process, a loss is often included, in the embodiment of the present application, a batch of sampling samples are input into the classification network at a time, at least two losses are obtained, the network losses are obtained in combination with at least two losses, the training efficiency of the classification network is improved, the losses are usually obtained based on supervision information (usually the labeling category corresponding to the sample image) and the prediction classification result, for example, the losses are determined based on the coincidence condition of the prediction classification result and the labeling category.

Optionally, the at least two losses of the sampled samples are weighted summed based on the weights of the at least two losses to obtain a network loss of the sampled samples.

Step 730, adjusting network parameters of the classification network based on the network loss to obtain the target classification network.

According to the embodiment of the application, the network loss of the sampling sample is determined through at least two losses of the sampling sample, and the weight of at least one loss in the at least two losses depends on the current trained times corresponding to the sampling sample, so that the proportion of the at least two losses in the network loss is dynamically adjusted; because different losses are in different stages of training and have different importance, for example, some losses which are important in the initial stage of training are not important in the later stage of training, the classifying network training method provided by the embodiment of the application solves the problem that network learning caused by adding a plurality of losses is not important in the prior art and the performance of the classifying network is reduced by dynamically adjusting the weight value of at least one loss (for example, calculating the weight value of at least one loss by using a dynamically-changing function) so as to achieve a better network learning effect and improve the performance of the target classifying network after training.

Alternatively, the at least one loss may include, but is not limited to, at least one of a predicted loss and an embedded loss.

The embodiment of the present application is only an example of the training method applicable to the classification network provided in the embodiment of the present application, and is not limited to the specific type of at least one loss in the embodiment of the present application.

Wherein the first value is greater than the second value. In the embodiment of the application, the prediction loss has advantages in the initial stage of training (when the number of trained times is the first numerical value), but has no advantages after the characteristics are basically stable in the middle and later stages of training, so that the contribution ratio of the prediction loss is gradually adjusted to be smaller according to the increase of the training times, and in the process, the contribution ratio of the embedding loss is gradually increased along with the increase of the training times.

In an alternative example, the proportional weights of the predicted and embedded losses throughout the learning process are controlled by dynamic adjustment. The network loss may be calculated based on equation (3.1) above, in which case the predicted loss may be a weighted cross entropy loss and the embedded loss is a triplet loss.

The second dynamic change function in the implementation of the present application is similar to the first dynamic change function in the above embodiment, and any function in which the variable decreases from 1 to 0 may be used, for example: convex functions, concave functions, linear functions, complex functions, etc. Likewise, the above formula (1.1), formula (1.2), formula (1.3) or formula (1.4) may be selected.

In an alternative embodiment, the second dynamic change function f (l) may be calculated based on equation (3.2) above.

In one or more alternative embodiments, step 710 includes:

According to the method and the device for classifying the sample images, the prediction loss can be classified and learned based on labels, each sample image has a unique labeling category, the difference between the prediction category and the labeling category obtained through the classification network is the prediction loss of the sampling sample, namely the prediction loss expresses the accuracy of the prediction category of the classification network, and the classification network is trained through the prediction loss, so that the accuracy of judging specific categories by the classification network can be improved. Optionally, the process of determining the predicted loss of the sampled samples may include: determining a prediction error value for each sample image based on a prediction category of each sample image and a labeling category of each sample image included in the sampled sample; a prediction error of the sampling sample is determined based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.

Alternatively, taking weighted cross entropy loss as an example of prediction loss, the prediction loss is based on general cross entropy loss, the formula for calculating cross entropy loss is improved, and the weight value is increased, so that the effectiveness of feature expression is improved, for example, the weighted cross entropy loss can be calculated based on the formula (4.1).

In one or more alternative examples, the weights of the sample images may be calculated based on equation (4.2) above.

In one or more alternative embodiments, step 710 may include:

Fig. 8 is a flowchart of another embodiment of a training method for a classification network according to an embodiment of the present application. As shown in fig. 8, the method of this embodiment includes:

step 810, determining a sampling proportion of the current sample to obtain sample images of different categories from the sample image set based on a number of times the current sample corresponds to the current sample of the plurality of samples.

Step 820, performing current sampling on the sample image set based on the sampling proportion to obtain a sampling sample of the current sampling.

In the embodiment of the application, in order to realize the training of the classification network based on the sampling sample, a plurality of sample images which are needed to be included in the sampling sample are used for improving the classification accuracy of the classification network obtained by the training; and sampling is carried out from the sample image set based on one sampling proportion at a time, and the proportion among the sample images of different categories in the obtained sampling sample accords with the sampling proportion.

At step 830, the sampled samples obtained from the sample image set are processed through a classification network to obtain at least two losses of the sampled samples.

Step 830 in the embodiment of the present application is similar to step 710 in the above embodiment, and can be understood by referring to the above embodiment, and will not be described herein.

In step 840, a network penalty for the sampled samples is obtained based on at least two penalties and weights for the at least two penalties for the sampled samples.

Step 850, adjusting network parameters of the classification network based on the network loss to obtain the target classification network.

Step 850 in the embodiment of the present application is similar to step 730 in the above embodiment, and can be understood by referring to the above embodiment, and will not be described herein.

The training method of the classification network provided by the embodiment of the application realizes the sampling of the dynamic proportion and the dynamic adjustment of the weights of different losses, and balances the effect of each image class in the network training through the sampling of the dynamic proportion, so that the target classification network has higher recall rate for both major classes and minor classes; the weight of different losses is dynamically adjusted to enable the different losses to occupy a larger proportion in network losses when the functions are larger, and the proportion of the different losses is reduced in the network losses when the functions are smaller, so that the problems that network learning is not emphasized only due to adding a plurality of losses in the prior art, and the performance of a classified network is reduced are solved.

In the prior art, when a sample image set is sampled, an balanced target data sampling distribution is always kept (for example, sampling is always performed by using a sampling original proportion or a set proportion or sampling is always performed by using a set proportion, and the proportion between at least two categories in the sample image set is used as the sampling original proportion), so that generalization learning of a system is not facilitated, for example, in the initial stage of system learning, excessive effective information is lost by the system due to excessive discarding of large-category sample images, and inaccurate classification of a classification network obtained through training is caused. The embodiment of the application firstly obtains the proportion among different categories in a sample image set as a sampling original proportion (for example, the sample image set comprises 500 sample images of a first category and 100 sample images of a second category, the original sampling proportion is 1/5), and based on the sampling original proportion, the data difference among the sample images of different image categories in the sample is reduced by dynamically adjusting the original sampling proportion along with the increase of sampling times, namely, the proportion of a subclass sample in the sample is gradually increased, so that the recall rate of a classification network to the subclass sample is improved, the effective feature expression of all data learning is realized, and the correct classification learning of the sample image can be realized in the later stage.

Fig. 9 is another schematic structural diagram of a training device for classification network according to an embodiment of the present application. The device of this embodiment can be used to implement the above-described method embodiments of the present application. As shown in fig. 9, the apparatus of this embodiment includes:

a sample loss obtaining unit 91 for obtaining at least two losses of a sample by processing the sample obtained from the sample image set through the classification network.

A network loss unit 92 for obtaining a network loss of the sampled samples based on at least two losses of the sampled samples and weights of the at least two losses.

And a parameter adjustment unit 93 for adjusting network parameters of the classified network based on the network loss to obtain the target classified network.

According to the embodiment of the application, the network loss of the sampling sample is determined through at least two losses of the sampling sample, and the weight of at least one loss in the at least two losses depends on the current trained times corresponding to the sampling sample, so that the proportion of the at least two losses in the network loss is dynamically adjusted; because different losses are in different stages of training and have different importance, for example, some losses are important in the initial stage of training and some losses are not important in the later stage of training, the classifying network training method provided by the embodiment of the application solves the problem that network learning caused by adding a plurality of losses is not important in the prior art by dynamically adjusting the proportion of each loss in network losses, reduces the performance of the classifying network, achieves a better network learning effect by dynamically adjusting the weight value of at least one loss, and improves the performance of the target classifying network after training.

Optionally, the sample loss obtaining unit is configured to weight and sum at least two losses of the sampled samples based on weights of the at least two losses, and obtain a network loss of the sampled samples.

In response to the current number of trained times being greater than or equal to a first preset threshold, the weight of the at least one loss is maintained at a fixed value.

In one or more alternative embodiments, the sample loss obtaining unit 91 is specifically configured to process the sampled samples through the classification network to obtain a prediction class of each sample image included in the sampled samples; a prediction loss of the sampled sample is determined based on the prediction category of each sample image included in the sampled sample and the annotation category of each sample image.

Optionally, the sample loss obtaining unit 91 is configured to determine, when determining the prediction loss of the sample based on the prediction category of each sample image and the labeling category of each sample image included in the sample, a prediction error value of each sample image based on the prediction category of each sample image and the labeling category of each sample image included in the sample; a prediction error of the sampling sample is determined based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.

In one or more alternative embodiments, the sample loss obtaining unit 91 is specifically configured to process the sampled sample through the classification network to obtain feature data of each sample image included in the sampled sample; determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples; and taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.

In one or more optional embodiments, an apparatus provided by an embodiment of the present application further includes:

The sampling proportion determining unit is used for determining the sampling proportion of the sample images of different categories obtained by the current sampling from the sample image set based on the corresponding sampled times of the current sampling in the plurality of times of sampling;

and the sample sampling unit is used for carrying out current sampling on the sample image set based on the sampling proportion so as to obtain a sampling sample of the current sampling.

The training device of the classification network provided by the embodiment of the application realizes the sampling of the dynamic proportion and the dynamic adjustment of the weights of different losses, and balances the effect of each image class in the network training through the sampling of the dynamic proportion, so that the target classification network has higher recall rate for both major classes and minor classes; the weight of different losses is dynamically adjusted to enable the different losses to occupy a larger proportion in network losses when the functions are larger, and the proportion of the different losses is reduced in the network losses when the functions are smaller, so that the problems that network learning is not emphasized only due to adding a plurality of losses in the prior art, and the performance of a classified network is reduced are solved.

Optionally, the at least two image categories include a first image category and a second image category, wherein the first image category contains a greater number of sample images than the second image category.

Optionally, the sampled sample comprises at least two sample images, the at least two sample images corresponding to at least one category.

Optionally, the difference between the number of sample images of different image categories for which the sampling scale corresponds decreases with increasing number of samplings.

Fig. 10 is a schematic flow chart of a classification method according to an embodiment of the present application. The method may be performed by any electronic device, such as a terminal device, a server, a mobile device, etc.

In step 1010, an image to be processed is acquired.

The method for acquiring the image to be processed in the embodiment of the application can be various, for example, a photo shot by a camera, one or more frames of video images in video shot by a camera, any image in an album and the like.

And step 1020, classifying the image to be processed through a target classification network to obtain the image prediction type of the image to be processed.

The target classification network is obtained by the training method of the classification network provided by any one of the embodiments.

The target classification network obtained by training by the training method of the classification network provided by any one of the embodiments above realizes dynamic proportion sampling and/or dynamic adjustment of weights of different losses, and solves at least one of the problems of inaccurate subclass judgment caused by unbalanced sample image sets and reduced classification network performance due to no emphasis of network learning. The images to be processed are classified through the target classification network, the obtained classification is more accurate, the network performance is better, and the recall rate of the small categories is higher than that of the classification network trained by other methods in the prior art.

Fig. 11 is a schematic structural diagram of a sorting device according to an embodiment of the present application. The device of this embodiment can be used to implement the above-described method embodiments of the present application. As shown in fig. 11, the apparatus of this embodiment includes:

an image acquisition unit 1101 for acquiring an image to be processed.

The class prediction unit 1102 is configured to perform classification processing on an image to be processed through a target classification network, so as to obtain an image prediction class of the image to be processed; the target classification network is obtained through the training method provided by any one of the embodiments.

According to another aspect of an embodiment of the present application, there is provided an electronic device, including a processor, the processor including the training apparatus or the classifying apparatus of the classification network provided in any one of the embodiments above.

According to another aspect of an embodiment of the present application, there is provided an electronic device including: a memory for storing executable instructions;

And a processor in communication with the memory for executing the executable instructions to perform the operations of the training method or classification method of the classification network as provided in any of the embodiments above.

According to another aspect of an embodiment of the present application, there is provided a computer readable storage medium storing computer readable instructions that, when executed, perform the operations of the training method or the classification method of the classification network provided in any of the embodiments above.

According to another aspect of an embodiment of the present application, there is provided a computer program product comprising computer readable code which, when run on a device, causes a processor in the device to execute instructions for implementing the training method or classification method of the classification network as provided in any of the embodiments above.

According to yet another aspect of embodiments of the present application, there is provided another computer program product for storing computer readable instructions that, when executed, cause a computer to perform the operations of the training method or the classification method of the classification network provided by any of the embodiments described above.

The computer program product may be realized in particular by means of hardware, software or a combination thereof. In one alternative, the computer program product is embodied as a computer storage medium, and in another alternative, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

According to the embodiment of the application, training and classifying methods and devices of a classifying network, electronic equipment, a computer storage medium and a computer program product are also provided, wherein the sampling proportion of sample images of different categories obtained by the current sampling from a sample image set is determined based on the sampling times corresponding to the current sampling in a plurality of times of sampling; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by a plurality of sampling to obtain a target classification network.

In some embodiments, the network acquisition instruction or the image processing instruction may be specifically a call instruction, and the first device may instruct the second device to perform training of the classification network or image classification processing by using a call manner, and accordingly, in response to receiving the call instruction, the second device may perform steps and/or flows in any embodiment of the training method or the classification method of the classification network.

It should be understood that the terms "first," "second," and the like in the embodiments of the present application are merely for distinction and should not be construed as limiting the embodiments of the present application.

It should also be understood that in the present application, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.

It should also be appreciated that for any component, data, or structure referred to in this disclosure, one or more may be generally understood without explicit limitation or otherwise provided with a contrary in the context.

It should also be understood that the description of the embodiments of the present application emphasizes the differences between the embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.

The embodiment of the application also provides electronic equipment which can be a mobile terminal, a Personal Computer (PC), a tablet personal computer, a server and the like. Referring now to fig. 12, there is shown a schematic structural diagram of an electronic device 1200 suitable for use in implementing a terminal device or server of an embodiment of the present application: as shown in fig. 12, the electronic device 1200 includes one or more processors, such as: one or more Central Processing Units (CPUs) 1201, and/or one or more image processors (GPUs) 1213, etc., which may perform various suitable actions and processes based on executable instructions stored in Read Only Memory (ROM) 1202 or loaded from storage 1208 into Random Access Memory (RAM) 1203. The communications portion 1212 may include, but is not limited to, a network card, which may include, but is not limited to, IB (Infiniband) network cards.

The processor may communicate with the rom 1202 and/or the ram 1203 to execute executable instructions, and is connected to the communication portion 1212 through the bus 1204 and communicates with other target devices through the communication portion 1212, so as to perform operations corresponding to any one of the methods provided in the embodiments of the present application, for example, determining, based on the number of times the current sample of the plurality of samples corresponds to the current sample, a sampling proportion of the current sample to obtain different types of sample images from the sample image set; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by a plurality of sampling to obtain a target classification network. Or processing a sampled sample obtained from a sample image set through a classification network to obtain at least two losses of the sampled sample, wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image, the sampled sample comprising at least two sample images; obtaining a network loss of the sampled sample based on at least two losses of the sampled sample and weights of the at least two losses, wherein the weights of at least one loss contained by the at least two losses depend on the current trained times corresponding to the sampled sample; and adjusting network parameters of the classification network based on the network loss to obtain the target classification network.

In addition, in the RAM1203, various programs and data required for device operation can also be stored. The CPU1201, ROM1202, and RAM1203 are connected to each other through a bus 1204. In the case of RAM1203, ROM1202 is an optional module. The RAM1203 stores executable instructions that cause the central processing unit 1201 to execute operations corresponding to the communication methods described above, or write executable instructions into the ROM1202 at the time of execution. An input/output (I/O) interface 1205 is also connected to the bus 1204. The communication unit 1212 may be integrally provided or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and connected to a bus.

The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1208 including a hard disk or the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. The drive 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 1210 so that a computer program read out therefrom is installed into the storage section 1208 as needed.

It should be noted that the architecture shown in fig. 12 is only an alternative implementation, and in a specific practical process, the number and types of components in fig. 12 may be selected, deleted, added or replaced according to actual needs; in the different functional component settings, implementation manners such as a separate setting or an integrated setting may also be adopted, for example, the GPU1213 and the CPU1201 may be separately set or the GPU1213 may be integrated on the CPU1201, the communication section may be separately set, or may be integrally set on the CPU1201 or the GPU1213, and so on. Such alternative embodiments fall within the scope of the present disclosure.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method shown in the flowchart, the program code may include instructions corresponding to performing the method steps provided by embodiments of the present application, for example, determining a sampling scale for a current sample to obtain a different class of sample images from a set of sample images based on a number of samples that correspond to the current sample of the plurality of samples; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by a plurality of sampling to obtain a target classification network. Or processing a sampled sample obtained from a sample image set through a classification network to obtain at least two losses of the sampled sample, wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image, the sampled sample comprising at least two sample images; obtaining a network loss of the sampled sample based on at least two losses of the sampled sample and weights of the at least two losses, wherein the weights of at least one loss contained by the at least two losses depend on the current trained times corresponding to the sampled sample; and adjusting network parameters of the classification network based on the network loss to obtain the target classification network. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1209, and/or installed from the removable media 1211. When being executed by a Central Processing Unit (CPU) 1201, performs the operations of the above-described functions defined in the method of the present application.

The method and apparatus of the present application may be implemented in a number of ways. For example, the methods and apparatus of the present application may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present application are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present application may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present application. Thus, the present application also covers a recording medium storing a program for executing the method according to the present application.

The description of the present application has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the application in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, and to enable others of ordinary skill in the art to understand the application for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method of training a classification network, comprising:

Determining the sampling proportion of sample images of different image categories obtained by the current sampling from a sample image set based on the sampling times corresponding to the current sampling in the multiple samplings; the number of sample images corresponding to the different image categories is different;

Processing the sampled samples obtained from the sample image set through the classification network to obtain at least two losses of the sampled samples, wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image, the sampled samples comprising at least two sample images;

Obtaining a network loss of the sampled samples based on at least two losses of the sampled samples and weights of the at least two losses, wherein the weights are related to a current number of trained times, the weights of relatively more advantageous ones of the at least two losses being increased at different stages of training;

2. The method of claim 1, wherein the at least two losses include a predictive loss and an embedding loss.

3. The method of claim 2, wherein the embedding loss has a lower weight when the current trained number is a first value than when the current trained number is a second value; and/or

The weight of the prediction loss is higher when the current trained times are the first numerical value than when the current trained times are the second numerical value;

Wherein the first value is greater than the second value.

4. The method of claim 2, wherein the weight of the embedding penalty decreases as the current number of trained times increases in response to the current number of trained times being less than a first preset threshold; and/or

And in response to the current trained times being greater than or equal to the first preset threshold, the weight is maintained at a fixed value.

5. The method of claim 1, wherein, in the event that the at least two losses include predicted losses, processing the sampled samples obtained from the sample image set through the classification network to obtain predicted losses for the sampled samples comprises:

A prediction loss of the sample is determined based on a weight value of each sample image included in the sample and a prediction error value of each sample image.

6. The method of claim 5, wherein the weight of the sample image is related to a first proportion of the image class to which the sample image belongs in the sample, and the number of sample images corresponding to different image classes is different.

7. The method of claim 6, wherein the weight of the sample image is a ratio between the first scale and the second scale in response to a ratio between the first scale and a second scale of an image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold; and/or

8. The method of claim 1, wherein a difference between the number of sample images of different categories for which the sampling scale corresponds decreases as the number of samplings increases.

9. A method of classification, comprising:

Acquiring an image to be processed;

The object classification network is obtained by a training method according to any one of claims 1 to 8.

10. A training device for a classification network, comprising:

A sampling proportion determining unit, configured to determine, based on a number of sampled times corresponding to a current sample of a plurality of samples, a sampling proportion of sample images of different categories obtained from a sample image set by the current sample; the number of sample images corresponding to different image categories is different;

A sample loss obtaining unit for processing the sampled samples obtained from the sample image set through the classification network to obtain at least two losses of the sampled samples, wherein the sample image set comprises at least two image categories, each image category comprises at least one sample image, and the sampled samples comprise at least two sample images;

A network loss unit for obtaining a network loss of the sampled sample based on at least two losses of the sampled sample and weights of the at least two losses, wherein the weights are related to a current number of trained times, the weights of the relatively more advantageous of the at least two losses being increased at different stages of training;

11. An electronic device, comprising: a memory for storing executable instructions;

And a processor in communication with the memory to execute the executable instructions to perform the operations of the training method of the classification network of any one of claims 1 to 8 or the classification method of claim 9.

12. A computer readable storage medium storing computer readable instructions which, when executed, perform the operations of the training method of the classification network of any one of claims 1 to 8 or the classification method of claim 9.

13. A computer program product comprising computer readable code, characterized in that a processor in a device executes instructions for implementing the training method of the classification network of any of claims 1 to 8 or the classification method of claim 9 when said computer readable code is run on the device.