CN113688933B - Classification network training method, classification method and device and electronic equipment - Google Patents

Classification network training method, classification method and device and electronic equipment Download PDF

Info

Publication number
CN113688933B
CN113688933B CN202111022512.4A CN202111022512A CN113688933B CN 113688933 B CN113688933 B CN 113688933B CN 202111022512 A CN202111022512 A CN 202111022512A CN 113688933 B CN113688933 B CN 113688933B
Authority
CN
China
Prior art keywords
sample
sampling
loss
network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111022512.4A
Other languages
Chinese (zh)
Other versions
CN113688933A (en
Inventor
甘伟豪
王意如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN202111022512.4A priority Critical patent/CN113688933B/en
Publication of CN113688933A publication Critical patent/CN113688933A/en
Application granted granted Critical
Publication of CN113688933B publication Critical patent/CN113688933B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a training method, a classifying method and a classifying device for a classifying network and electronic equipment, wherein the training method comprises the following steps: determining the sampling proportion of sample images of different categories obtained by the current sampling from a sample image set based on the sampling times corresponding to the current sampling in the multiple samplings; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by multiple sampling, a target classification network is obtained, and the sample image set is sampled according to the sampling proportion which dynamically changes along with the sampling times, so that the classification network obtained by training has higher classification accuracy.

Description

Classification network training method, classification method and device and electronic equipment
Technical Field
The application relates to a computer vision technology, in particular to a training method and a classifying method and device of a classifying network and electronic equipment.
Background
Classification networks play an important role in many areas, such as pedestrian detection and tracking, large-scale smart city target task search positioning, personal representation description, and the like. When the classification network processes the problems of face attribute analysis, pedestrian appearance analysis and the like, the problem of unbalanced actual training data exists, for example, whether the positive and negative sample proportion of the attribute data of the baldness is possibly as high as 1:100, how to improve the performance of the classification network obtained by training under different training data scenes is a research hotspot in the field.
Disclosure of Invention
The embodiment of the application provides training and classification technology of a classification network.
According to an aspect of an embodiment of the present application, there is provided a training method for a classification network, including:
Determining a sampling proportion of sample images of different categories obtained by the current sample from a sample image set based on a sampling number corresponding to the current sample in a plurality of times of sampling, wherein the sample image set comprises at least two image categories, and each image category comprises at least one sample image;
based on the sampling proportion, carrying out the current sampling on the sample image set to obtain a sampling sample of the current sampling;
and training a classification network based on the plurality of sampling samples obtained by the plurality of sampling to obtain a target classification network.
Optionally, in any foregoing method embodiment of the present application, the at least two image categories include a first image category and a second image category, wherein the first image category includes a greater number of sample images than the second image category.
Optionally, in any one of the above method embodiments of the present application, the sampled sample includes at least two sample images, and the at least two sample images correspond to at least one image category.
Optionally, in any of the above method embodiments of the present application, the difference between the number of sample images of different image categories corresponding to the sampling ratio decreases with increasing number of samplings.
Optionally, in any one of the above method embodiments of the present application, training the classification network based on the plurality of sampling samples obtained by the plurality of sampling to obtain the target classification network includes:
processing the sampled samples through the classification network to obtain network losses of the sampled samples;
And adjusting network parameters of the classified network based on the network loss to obtain a target classified network.
Optionally, in any one of the above method embodiments of the present application, the processing the sampled samples through the classification network to obtain network losses of the sampled samples includes:
processing the sampled samples through the classification network to obtain at least two losses of the sampled samples;
based on at least two losses of the sampled samples, a network loss of the sampled samples is obtained.
Optionally, in any one of the above method embodiments of the present application, the obtaining the network loss of the sampled samples based on at least two losses of the sampled samples includes:
And carrying out weighted summation on at least two losses of the sampling sample to obtain a network loss of the sampling sample, wherein the weight of at least one loss contained in the at least two losses depends on the current trained times corresponding to the sampling sample.
Optionally, in any foregoing method embodiment of the application, the at least one loss includes at least one of a predicted loss and an embedded loss.
Optionally, in any one of the above method embodiments of the present application, the embedded loss of the at least two losses has a lower contribution ratio to the network loss when the current trained number is a first value than when the current trained number is a second value, where the first value is greater than the second value; and/or
The predicted loss of the at least two losses contributes more to the network loss when the current number of trained times is the first value than to the network loss when the current number of trained times is the second value.
Optionally, in any of the above method embodiments of the present application, in response to the current number of trained times being less than a first preset threshold, the weight of the embedded loss in the at least one loss decreases with increasing current number of trained times; and/or
In response to the current number of trained times being greater than or equal to the first preset threshold, the weight of the embedded loss of the at least one loss is maintained at a fixed value.
Optionally, in any one of the above method embodiments of the present application, the processing the sampled samples through the classification network to obtain at least two losses of the sampled samples includes:
Processing the sampling samples through the classification network to obtain a prediction category of each sample image included in the sampling samples;
a prediction loss of the sampled sample is determined based on a prediction category of each sample image included in the sampled sample and a labeling category of each sample image.
Optionally, in any one of the above method embodiments of the present application, the determining the prediction loss of the sampling sample based on the prediction category of each sample image included in the sampling sample and the labeling category of each sample image includes:
determining a prediction error value of each sample image based on a prediction category of each sample image included in the sample and a labeling category of each sample image;
And determining a prediction error of the sampling sample based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.
Optionally, in any of the above method embodiments of the present application, the weight of the sample image depends on a first proportion of the image class to which the sample image belongs in the sampled sample.
Optionally, in any one of the above method embodiments of the present application, in response to a ratio between the first ratio and a second ratio of an image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold, the weight of the sample image is a ratio between the first ratio and the second ratio; and/or
In response to the ratio between the first ratio and the second ratio being less than the second preset threshold, the weight of the sample image is 0 or 1.
Optionally, in any one of the above method embodiments of the present application, the processing the sampled samples through the classification network to obtain at least two losses of the sampled samples includes:
Processing the sampling sample through the classification network to obtain characteristic data of each sample image included in the sampling sample;
determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples;
And taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.
According to another aspect of the embodiment of the present application, there is provided a training method for a classification network, including:
Processing, by the classification network, a sampled sample obtained from a sample image set, obtaining at least two losses of the sampled sample, wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image, the sampled sample comprising at least two sample images;
obtaining a network loss of the sampling sample based on at least two losses of the sampling sample and weights of the at least two losses, wherein the weight of at least one loss contained by the at least two losses depends on the current trained times corresponding to the sampling sample;
And adjusting network parameters of the classified network based on the network loss to obtain a target classified network.
Optionally, in any one of the method embodiments of the present application, the obtaining the network loss of the sampled samples based on at least two losses of the sampled samples and weights of the at least two losses includes:
and carrying out weighted summation on at least two losses of the sampling sample based on the weights of the at least two losses to obtain network losses of the sampling sample.
Optionally, in any foregoing method embodiment of the application, the at least one loss includes at least one of a predicted loss and an embedded loss.
Optionally, in any one of the above method embodiments of the present application, the embedded loss of the at least two losses has a lower contribution ratio to the network loss when the current trained number is a first value than when the current trained number is a second value, where the first value is greater than the second value; and/or
The predicted loss of the at least two losses contributes more to the network loss when the current number of trained times is the first value than to the network loss when the current number of trained times is the second value.
Optionally, in any of the above method embodiments of the present application, in response to the current number of trained times being less than a first preset threshold, the weight of the embedded loss in the at least one loss decreases with increasing current number of trained times; and/or
In response to the current number of trained times being greater than or equal to the first preset threshold, the weight of the at least one loss is maintained at a fixed value.
Optionally, in any one of the above method embodiments of the present application, the processing, by the classification network, the sampled samples obtained from the sample image set, to obtain at least two losses of the sampled samples, includes:
Processing the sampling samples through the classification network to obtain a prediction category of each sample image included in the sampling samples;
a prediction loss of the sampled sample is determined based on a prediction category of each sample image included in the sampled sample and a labeling category of each sample image.
Optionally, in any one of the above method embodiments of the present application, the determining the prediction loss of the sampling sample based on the prediction category of each sample image included in the sampling sample and the labeling category of each sample image includes:
determining a prediction error value of each sample image based on a prediction category of each sample image included in the sample and a labeling category of each sample image;
And determining a prediction error of the sampling sample based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.
Optionally, in any of the above method embodiments of the present application, the weight of the sample image depends on a first proportion of the image class to which the sample image belongs in the sampled sample.
Optionally, in any one of the above method embodiments of the present application, in response to a ratio between the first ratio and a second ratio of an image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold, the weight of the sample image is a ratio between the first ratio and the second ratio; and/or
In response to the ratio between the first ratio and the second ratio being less than the second preset threshold, the weight of the sample image is 0 or 1.
Optionally, in any one of the above method embodiments of the present application, the processing, by the classification network, the sampled samples obtained from the sample image set, to obtain at least two losses of the sampled samples, includes:
Processing the sampling sample through the classification network to obtain characteristic data of each sample image included in the sampling sample;
determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples;
And taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.
Optionally, in any of the above method embodiments of the present application, before processing the sampled samples obtained from the sample image set through the classification network to obtain at least two losses of the sampled samples, the method further includes:
Determining the sampling proportion of sample images of different categories obtained by the current sampling from a sample image set based on the sampling times corresponding to the current sampling in the multiple samplings;
And carrying out the current sampling on the sample image set based on the sampling proportion so as to obtain a sampling sample of the current sampling.
Optionally, in any foregoing method embodiment of the present application, the at least two image categories include a first image category and a second image category, wherein the first image category includes a greater number of sample images than the second image category.
Optionally, in any one of the above method embodiments of the present application, the sampled sample includes at least two sample images, and the at least two sample images correspond to at least one category.
Optionally, in any of the above method embodiments of the present application, the difference between the number of sample images of different image categories corresponding to the sampling ratio decreases with increasing number of samplings.
According to still another aspect of the embodiment of the present application, there is provided a classification method, including:
Acquiring an image to be processed;
Classifying the image to be processed through a target classification network to obtain an image prediction category of the image to be processed; wherein,
The target classification network is obtained by the training method according to any one of the above.
According to still another aspect of the embodiment of the present application, there is provided a training apparatus for a classification network, including:
A sample ratio determining unit, configured to determine, based on a number of sampled times corresponding to a current sample of a plurality of samples, a sample ratio of sample images of different categories obtained by the current sample from a sample image set, where the sample image set includes at least two image categories, each image category including at least one sample image;
the sample sampling unit is used for carrying out current sampling on the sample image set based on the sampling proportion so as to obtain a sampling sample of the current sampling;
and the network training unit is used for training the classification network based on the plurality of sampling samples obtained by the plurality of sampling to obtain a target classification network.
Optionally, in any embodiment of the foregoing apparatus of the present application, the at least two image categories include a first image category and a second image category, wherein a number of sample images included in the first image category is greater than a number of sample images included in the second image category.
Optionally, in any embodiment of the foregoing apparatus of the present application, the sampled sample includes at least two sample images, and the at least two sample images correspond to at least one image category.
Optionally, in any of the above embodiments of the present application, the difference between the number of sample images of different image categories corresponding to the sampling ratio decreases with increasing number of samplings.
Optionally, in any one of the above device embodiments of the present application, the network training unit includes:
The loss obtaining module is used for processing the sampling samples through the classification network to obtain network loss of the sampling samples;
and the parameter adjustment module is used for adjusting the network parameters of the classification network based on the network loss to obtain a target classification network.
Optionally, in any one of the above device embodiments of the present application, the loss obtaining module is configured to process the sampled samples through the classification network to obtain at least two losses of the sampled samples; based on at least two losses of the sampled samples, a network loss of the sampled samples is obtained.
Optionally, in any one of the above apparatus embodiments of the present application, when obtaining the network loss of the sampling sample based on at least two losses of the sampling sample, the loss obtaining module is configured to perform weighted summation on at least two losses of the sampling sample to obtain the network loss of the sampling sample, where a weight of at least one loss included in the at least two losses depends on a current trained number corresponding to the sampling sample.
Optionally, in any of the above apparatus embodiments of the application, the at least one loss includes at least one of a predicted loss and an embedded loss.
Optionally, in any one of the above apparatus embodiments of the present application, the embedded loss of the at least two losses has a lower contribution ratio to the network loss when the current trained number is a first value than when the current trained number is a second value, where the first value is greater than the second value; and/or
The predicted loss of the at least two losses contributes more to the network loss when the current number of trained times is the first value than to the network loss when the current number of trained times is the second value.
Optionally, in any of the above apparatus embodiments of the present application, in response to the current trained number being less than a first preset threshold, the weight of the embedded loss in the at least one loss decreases with an increase in the current trained number; and/or
In response to the current number of trained times being greater than or equal to the first preset threshold, the weight of the embedded loss of the at least one loss is maintained at a fixed value.
Optionally, in any one of the above device embodiments of the present application, when the loss obtaining module processes the sampled samples through the classification network to obtain at least two losses of the sampled samples, the loss obtaining module is configured to process the sampled samples through the classification network to obtain a prediction class of each sample image included in the sampled samples; a prediction loss of the sampled sample is determined based on a prediction class of each sample image included in the sampled sample and a labeling class of each sample image.
Optionally, in any one of the above apparatus embodiments of the present application, the loss obtaining module is configured to determine, when determining the prediction loss of the sample based on the prediction category of each sample image included in the sample and the labeling category of each sample image, a prediction error value of each sample image based on the prediction category of each sample image included in the sample and the labeling category of each sample image; and determining a prediction error of the sampling sample based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.
Optionally, in any of the above device embodiments of the present application, the weight of the sample image depends on a first proportion of the image class to which the sample image belongs in the sampled sample.
Optionally, in any of the above device embodiments of the present application, in response to a ratio between the first ratio and a second ratio of an image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold, the weight of the sample image is a ratio between the first ratio and the second ratio; and/or
In response to the ratio between the first ratio and the second ratio being less than the second preset threshold, the weight of the sample image is 0 or 1.
Optionally, in any one of the above device embodiments of the present application, when the loss obtaining module processes the sampled samples through the classification network to obtain at least two losses of the sampled samples, the loss obtaining module is configured to process the sampled samples through the classification network to obtain feature data of each sample image included in the sampled samples; determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples; and taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.
According to still another aspect of the embodiment of the present application, there is provided a training apparatus for a classification network, including:
A sample loss obtaining unit for processing a sampled sample obtained from a sample image set through the classification network, obtaining at least two losses of the sampled sample, wherein the sample image set comprises at least two image categories, each image category comprises at least one sample image, the sampled sample comprises at least two sample images;
A network loss unit, configured to obtain a network loss of the sampled sample based on at least two losses of the sampled sample and weights of the at least two losses, where the weight of at least one loss included in the at least two losses depends on a current trained number of times corresponding to the sampled sample;
And the parameter adjustment unit is used for adjusting the network parameters of the classification network based on the network loss to obtain a target classification network.
Optionally, in any embodiment of the foregoing apparatus of the present application, the sample loss obtaining unit is configured to perform weighted summation on at least two losses of the sampled samples based on weights of the at least two losses, to obtain a network loss of the sampled samples.
Optionally, in any of the above apparatus embodiments of the application, the at least one loss includes at least one of a predicted loss and an embedded loss.
Optionally, in any one of the above apparatus embodiments of the present application, the embedded loss of the at least two losses has a lower contribution ratio to the network loss when the current trained number is a first value than when the current trained number is a second value, where the first value is greater than the second value; and/or
The predicted loss of the at least two losses contributes more to the network loss when the current number of trained times is the first value than to the network loss when the current number of trained times is the second value.
Optionally, in any of the above apparatus embodiments of the present application, in response to the current trained number being less than a first preset threshold, the weight of the embedded loss in the at least one loss decreases with an increase in the current trained number; and/or
In response to the current number of trained times being greater than or equal to the first preset threshold, the weight of the at least one loss is maintained at a fixed value.
Optionally, in any one of the above device embodiments of the present application, the sample loss obtaining unit is specifically configured to process, through the classification network, the sampled samples to obtain a prediction class of each sample image included in the sampled samples; a prediction loss of the sampled sample is determined based on a prediction category of each sample image included in the sampled sample and a labeling category of each sample image.
Optionally, in any one of the above apparatus embodiments of the present application, the sample loss obtaining unit is configured to determine, when determining the prediction loss of the sample based on the prediction category of each sample image included in the sample and the labeling category of each sample image, a prediction error value of each sample image based on the prediction category of each sample image included in the sample and the labeling category of each sample image; and determining a prediction error of the sampling sample based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.
Optionally, in any of the above device embodiments of the present application, the weight of the sample image depends on a first proportion of the image class to which the sample image belongs in the sampled sample.
Optionally, in any of the above device embodiments of the present application, in response to a ratio between the first ratio and a second ratio of an image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold, the weight of the sample image is a ratio between the first ratio and the second ratio; and/or
In response to the ratio between the first ratio and the second ratio being less than the second preset threshold, the weight of the sample image is 0 or 1.
Optionally, in an embodiment of any one of the foregoing apparatus of the present application, the sample loss obtaining unit is specifically configured to process, through the classification network, the sampled sample to obtain feature data of each sample image included in the sampled sample; determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples; and taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.
Optionally, in any one of the above device embodiments of the present application, the device further includes:
A sampling proportion determining unit, configured to determine, based on a number of sampled times corresponding to a current sample of a plurality of samples, a sampling proportion of sample images of different categories obtained from a sample image set by the current sample;
and the sample sampling unit is used for carrying out the current sampling on the sample image set based on the sampling proportion so as to obtain a sampling sample of the current sampling.
Optionally, in any embodiment of the foregoing apparatus of the present application, the at least two image categories include a first image category and a second image category, wherein a number of sample images included in the first image category is greater than a number of sample images included in the second image category.
Optionally, in any embodiment of the foregoing apparatus of the present application, the sampled sample includes at least two sample images, and the at least two sample images correspond to at least one category.
Optionally, in any of the above embodiments of the present application, the difference between the number of sample images of different image categories corresponding to the sampling ratio decreases with increasing number of samplings.
According to still another aspect of the embodiment of the present application, there is provided a classification apparatus, including:
an image acquisition unit for acquiring an image to be processed;
The class prediction unit is used for classifying the image to be processed through a target classification network to obtain an image prediction class of the image to be processed; wherein the target classification network is obtained by the training method according to any one of the above.
According to another aspect of an embodiment of the present application, there is provided an electronic apparatus including: a memory for storing executable instructions;
And a processor in communication with the memory for executing the executable instructions to perform the operations of the training method of the classification network or the classification method as described above in any of the possible implementations.
According to another aspect of embodiments of the present application, a computer-readable storage medium is provided for storing computer-readable instructions that, when executed, perform the operations of the training method of the classification network or the classification method described above in any of the possible implementations described above.
According to another aspect of embodiments of the present application, there is provided a computer program product comprising computer readable code which, when run on a device, executes instructions for implementing the training method of the classification network or the classification method as described above in any of the possible implementations described above.
According to yet another aspect of embodiments of the present application, there is provided another computer program product for storing computer readable instructions that, when executed, cause a computer to perform the operations of the training method of the classification network or the classification method as described above in any of the possible implementations described above.
In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a software product, such as an SDK, etc.
According to the embodiment of the application, another training method and device of the classification network, electronic equipment, computer storage media and computer program products are also provided, wherein the sampling proportion of the current sample to obtain sample images of different categories from a sample image set is determined based on the sampling times corresponding to the current sample in a plurality of times of sampling; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by a plurality of sampling to obtain a target classification network.
Based on the training method, the classifying method and the classifying device for the classifying network, and the electronic equipment provided by the embodiment of the application, the sampling proportion of the sample images of different categories obtained by the current sampling from the sample image set is determined based on the sampling times corresponding to the current sampling in the multiple samplings; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by multiple sampling, a target classification network is obtained, and the sample image set is sampled according to the sampling proportion which dynamically changes along with the sampling times, so that the classification network obtained by training has higher classification accuracy.
The technical scheme of the application is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.
The application may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
fig. 1 is a schematic flow chart of a training method of a classification network according to an embodiment of the present application.
Fig. 2 is a flowchart of another embodiment of a classification network training method according to an embodiment of the present application.
Fig. 3 is a schematic flow chart of obtaining network loss in the training method of the classification network according to the embodiment of the present application.
FIG. 4 is a schematic diagram of a prior art calculation of the triplet loss.
Fig. 5 is a schematic diagram of calculation of triplet loss in the training method of the classification network according to the embodiment of the present application.
Fig. 6 is a schematic structural diagram of a training device for classification network according to an embodiment of the present application.
Fig. 7 is a flowchart of a training method of a classification network according to another embodiment of the present application.
Fig. 8 is a flowchart of another embodiment of a training method for a classification network according to an embodiment of the present application.
Fig. 9 is another schematic structural diagram of a training device for classification network according to an embodiment of the present application.
Fig. 10 is a schematic flow chart of a classification method according to an embodiment of the present application.
Fig. 11 is a schematic structural diagram of a sorting device according to an embodiment of the present application.
Fig. 12 is a schematic structural diagram of an electronic device suitable for use in implementing a terminal device or server according to an embodiment of the present application.
Detailed Description
Various exemplary embodiments of the present application will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present application unless it is specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the application, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Fig. 1 is a schematic flow chart of a training method of a classification network according to an embodiment of the present application. The method may be performed by any electronic device, such as a terminal device, a server, a mobile device, etc.
Step 110, determining a sampling proportion of the current sample to obtain sample images of different categories from the sample image set based on the number of times the current sample corresponds to the current sample in the plurality of samples.
Wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image; for example, the sample image set includes two categories, a first category including a large amount of data and a second category including a small amount of data, and by dynamically adjusting the sampling ratio of sampling according to the sampled times in the embodiment of the present application, the sampling ratio between the first category and the second category is dynamically changed during each sampling, for example, a set function may be used to realize that the sampling ratio dynamically changes with the sampled times.
In some alternative embodiments, the at least two image categories include a first image category and a second image category, wherein the first image category includes a greater number of sample images than the second image category.
The sampling method for determining the sampling proportion of the current sampling based on the sampled times is suitable for sampling any sample image set comprising a plurality of categories, and particularly for the situation that the sample image number difference of different image categories is large, the specific gravity of the second image category with the smaller number in the sampling sample can be increased by adjusting the sampling proportion, so that the target classification network obtained through training can accurately classify the second image category, and the situation that the second image category is excessively small in number and repeated in number is large in the sampling sample due to the fact that the sample image number is smaller when the fixed sampling proportion is used for sampling is avoided, and the classification of the second image category by the trained target classification network is inaccurate. For example, in the case of learning to determine whether a pedestrian is bald, a large amount of training data is non-bald (corresponding to the first image category in the embodiment of the present application), and the bald (corresponding to the second image category in the embodiment of the present application) is rarely data, and may occupy less than 1%. If the model learned by the system is prone to a large number of categories, such as predicting all data as non-bald, the accuracy of the model can reach 99%, whereas the recall of bald is 0, according to the general class learning method. This is detrimental to the user's ability to find a particular target object, or, in other words, the model does not have the ability to determine whether the pedestrian is bald. The target classification network obtained after learning by the method provided by the embodiment of the application can improve the judgment of the bald sample, improve the recall rate of the bald and ensure the overall accuracy.
Step 120, performing current sampling on the sample image set based on the sampling proportion to obtain a sampling sample of the current sampling.
In the embodiment of the application, the sampling proportion of the current sampling is determined by the sampled times, and the sampling times are changed when each sampling is performed, so that the sampling proportion of each sampling is different, and each image category in the sample image set is ensured to play a positive role in training the classification network.
Optionally, each sample comprises at least two sample images, the at least two sample images corresponding to at least one image class.
In the embodiment of the application, in order to realize the training of the classification network based on the sampling sample, a plurality of sample images which are needed to be included in the sampling sample are used for improving the classification accuracy of the classification network obtained by the training; sampling is carried out from the sample image set based on one sampling proportion each time, and the proportion among different types of sample images in the obtained sampling samples accords with the sampling proportion; for example, a sample image set including 500 sample images of a first category and 100 sample images of a second category, sampling from the sample image set based on a sampling ratio of 3:1, 30 sample images of the first category and 10 sample images of the second category may be obtained.
Step 130, training the classification network based on the plurality of sampled samples obtained from the plurality of samplings to obtain a target classification network.
Optionally, training the classification network based on each of the plurality of sample samples in turn, the specific process may include: for each sampling sample, the classification network to be trained processes the sampling sample to obtain network loss; performing parameter adjustment on the classified network to be trained based on network loss to obtain the classified network with the parameters adjusted; judging whether the conditions for ending the training (for example, reaching the preset training times, etc.) are reached, and taking the classification network after parameter adjustment as a target classification network when the conditions for ending the training are reached; when the condition of finishing training is not reached, the classification network after the parameter adjustment is used as the classification network to be trained, the sampling sample obtained by the next sampling is processed based on the classification network to be trained to obtain the next network loss, the parameter adjustment is carried out on the classification network to be trained based on the network loss to obtain the classification network after the parameter adjustment, until the condition of finishing training is reached, and the classification network after the parameter adjustment is used as the target classification network.
Based on the training method of the classification network provided by the embodiment of the application, the sampling proportion of the sample images of different categories obtained by the current sampling from the sample image set is determined based on the sampled times corresponding to the current sampling in the plurality of samplings; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by multiple sampling, a target classification network is obtained, and the sample image set is sampled according to the sampling proportion which dynamically changes along with the sampling times, so that the classification network obtained by training has higher classification accuracy.
In one or more alternative embodiments, the difference between the number of sample images of different image categories for which the sampling scale corresponds decreases with increasing number of samplings.
In the prior art, when a sample image set is sampled, an balanced target data sampling distribution is always kept (for example, sampling is always performed by using a sampling original proportion or a set proportion or sampling is always performed by using a set proportion, and the proportion between at least two categories in the sample image set is used as the sampling original proportion), so that generalization learning of a system is not facilitated, for example, in the initial stage of system learning, excessive effective information is lost by the system due to excessive discarding of large-category sample images, and inaccurate classification of a classification network obtained through training is caused. According to the embodiment of the application, firstly, the proportion among different categories in the sample image set is taken as the sampling original proportion (for example, the sample image set comprises 500 sample images of a first category and 100 sample images of a second category, the original sampling proportion is 1/5), the data difference among the sample images of different image categories in the sample image set is reduced by dynamically adjusting the original sampling proportion based on the sampling original proportion, namely, the proportion of the sample images of different image categories in the sample image set is gradually increased along with the increase of sampling times, so that the study of unbalanced data of different categories in the sample image set is realized, the recall rate of a classification network to the sample images of the subclass is improved, the effective feature expression of all data study is realized, and the correct classification study of the sample image can be realized in the later stage.
Alternatively, the process of obtaining the sampling rate of each sample at step 110 may include: and processing the original sampling proportion based on the first dynamic change function and the sampling times corresponding to the current sampling to obtain the sampling proportion of the current sampling.
The variable in the first dynamic change function is the sampled times; alternatively, the first dynamically changing function may take the form of any function with a variable decreasing from 1 to 0, for example: convex functions, concave functions, linear functions, complex functions, etc. The first dynamically changing function reflects the state of the network learning process, the slope of the first dynamically changing function represents the rate of network learning, different classes of functions depict different learning rate styles, e.g., convex functions present learning strategies with progressively faster learning rates.
As an alternative example, the first dynamically changing function may include, but is not limited to, the following functions:
For example, convex functions: indicating that the learning rate is from slow to fast. The formula of the first dynamic change function at this time may be as shown in formula (1.1):
Where SF cos (L) represents a first dynamically changing function in the form of a convex function, L represents the first sample, and L represents the set total number of samples.
Linear function: indicating a constant learning rate. The formula of the first dynamically changing function at this time may be as shown in formula (1.2):
where SF linear (L) represents a first dynamically changing function in the form of a linear function, L represents the first sample, and L represents the set total number of samples.
Concave function: indicating that the learning rate is from fast to slow. The formula of the first dynamic change function at this time may be as shown in formula (1.3):
SF exp(l)=λl formula (1.3)
Where SF exp (l) represents a first dynamically changing function in the form of a concave function, l representing the first sample.
Composite function: indicating that the learning rate is from slow to fast, and then to slow. The formula of the first dynamic change function at this time may be as shown in formula (1.4):
Where SF composite (L) represents a first dynamically changing function in the form of a compound function, L represents the first sample, and L represents the set total number of samples.
The above 4 formulas are listed as several optional formulas of the first dynamic change function, and are not intended to limit the specific form of the first dynamic change function in the embodiment of the present application.
The sampling times are accumulatively changed, for example, when the first sampling is performed, the variable value in the first dynamic change function is 1, when the second sampling is performed, the variable value in the first dynamic change function is 2, and the like, and the corresponding sampling times are different when each sampling is performed, so that the function value of the first dynamic change function corresponding to each sampling is different, the sampling proportion of each sampling is further different, and the dynamic change sampling is realized.
Optionally, the process of processing the original proportion of samples based on the first dynamic change function and the number of times of sampling corresponding to the current sample to obtain the proportion of samples of the current sample may include:
obtaining a function value of the first dynamic change function corresponding to the current sampling;
And taking the original sampling proportion as a base, and taking the obtained function value as an index to obtain the sampling proportion of the current sampling.
Optionally, the sampling proportion obtained by the embodiment of the application realizes that the network training is studied for the real training data distribution (close to the sampling original proportion) in the initial stage and is studied for the balanced data distribution in the later stage of training. In an alternative example, the sampling ratio may be obtained based on the following equation (2):
D (l) =d train g(l) formula (2)
Wherein, D (l) represents the sampling proportion corresponding to the first sampling, g (l) represents the first dynamic change function corresponding to the first sampling, and D train represents the sampling original proportion of the sample image set; based on the sampling proportion obtained by the above formula (1), since g (l) is reduced from 1 to 0 as a first dynamic change function, for example, g (l) can be implemented by selecting the above formula (1.1), formula (1.2), formula (1.3) or formula (1.4), the obtained sampling proportion D (l) is made to approach the original sampling proportion at the time of initial sampling, and the proportion of the small number of categories is gradually increased in the subsequent sampling, so as to improve the classification performance of the classification network for the small number of categories.
Fig. 2 is a flowchart of another embodiment of a classification network training method according to an embodiment of the present application. As shown in fig. 2, the method of this embodiment includes:
step 210, determining a sampling proportion of the current sample to obtain sample images of different categories from the sample image set based on the number of times the current sample corresponds to the current sample in the plurality of samples.
Wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image; for example, the sample image set includes two categories, a first category including a large amount of data and a second category including a small amount of data, and by dynamically adjusting the sampling ratio of sampling according to the sampled times in the embodiment of the present application, the sampling ratio between the first category and the second category is dynamically changed during each sampling, for example, a set function may be used to realize that the sampling ratio dynamically changes with the sampled times.
Step 220, based on the sampling proportion, performing current sampling on the sample image set to obtain a sampling sample of the current sampling.
In step 230, the sampled samples are processed through the classification network to obtain network loss of the sampled samples.
Optionally, inputting the sampled sample into a classification network, classifying each sample image included in the sampled sample based on the classification network to obtain a prediction classification result, and processing the prediction classification result and a labeling classification result corresponding to each sample image to obtain a network loss of the sampled sample, wherein the network loss can be formed by at least one loss, for example, the network loss is formed by a prediction loss and an embedding loss; embodiments of the present application do not limit the specific number and types of network losses, including losses.
Step 240, adjusting network parameters of the classification network based on the network loss to obtain the target classification network.
Network training is the process of adjusting network parameters through network loss, optionally, the process of adjusting parameters includes: after obtaining a sample, inputting the sample to be trained into the classification network to obtain a network loss, adjusting network parameters in the classification network to be trained based on the network loss to obtain an adjusted classification network, at this time, judging whether the preset training times are reached (the training times in the embodiment of the application can be preset, for example, the preset training times are 10 times), when the preset training times are not reached (for example, the preset training times are 10 times, the training is 8 th times), performing the next sampling (the 9 th sampling and the training), obtaining the next sample, taking the adjusted classification network as the classification network to be trained, inputting the next sample into the classification network to be trained to obtain the next network loss, and adjusting the network parameters in the classification network to be trained based on the network loss to obtain the adjusted classification network until the preset training times are reached (for example, the preset training times are 10 times), and taking the adjusted classification network as the target classification network.
Fig. 3 is a schematic flow chart of obtaining network loss in the training method of the classification network according to the embodiment of the present application. As shown in fig. 3, step 230 in the above embodiment may include:
Step 302, processing the sampled samples through a classification network to obtain at least two losses of the sampled samples.
In an embodiment of the present application, in order to improve the training speed and the accuracy of the target classification network, at least two losses are obtained based on the sampled samples, and network losses are obtained by at least two losses, optionally, the at least two losses may include, but are not limited to: prediction loss, embedding loss, etc.
Step 304, obtaining a network loss of the sampled samples based on at least two losses of the sampled samples.
In the network training process, a loss is often not only included, in the embodiment of the present application, a batch of sampling samples is input into the classification network at a time, at least two losses are obtained, the network losses are obtained in combination with at least two losses, the losses are generally obtained based on supervision information (usually, the labeling category corresponding to the sample image) and the prediction classification result, for example, the losses are determined based on the matching condition of the prediction classification result and the labeling category.
Optionally, step 304 may include:
And carrying out weighted summation on at least two losses of the sampling samples to obtain network losses of the sampling samples.
In the network training process, the network loss is obtained by means of weighted summation aiming at a plurality of losses, each loss can make corresponding contribution in the training, and the contribution proportion of different losses to parameter adjustment in each training can be adjusted through different weights, so that the contribution proportion of more advantageous losses is improved in different training stages.
Wherein the weight of at least one of the at least two losses is dependent on the current number of trained times corresponding to the sampled sample.
Optionally, the at least one penalty for the weight value depending on the current number of trained times the sample corresponds to may include, but is not limited to: at least one of a prediction loss and an embedding loss.
In the embodiment of the application, different losses are different in importance in different stages of training, for example, some losses which are important in the early stage of training are not important in the later stage of training, and the proportion of each loss in network losses is required to be dynamically adjusted at the moment so as to solve the problems that network learning is not important and the performance of a classified network is reduced due to adding a plurality of losses in the prior art.
Optionally, the embedded loss of the at least two losses contributes less to the network loss when the current number of trained times is a first value than to the network loss when the current number of trained times is a second value; and/or
The predicted loss of the at least two losses contributes more to the network loss when the current number of trained times is a first value than to the network loss when the current number of trained times is a second value.
Wherein the first value is greater than the second value. In the embodiment of the application, since the embedding loss has advantages in the initial stage of training and has no advantages after the characteristics are basically stable in the middle and later stages of training, the contribution ratio of the embedding loss is gradually adjusted to be smaller according to the increase of training times, and in the process, the contribution ratio of the prediction loss is gradually increased along with the increase of training times.
In the method of combining classification tasks (e.g., cross entropy loss learning, etc.) and metric learning (e.g., triple loss, quadruple loss, quintuple loss learning, etc.) in the embodiment of the application, the two may be considered to have different emphasis points in the whole learning process. Optionally, the classification task pays more attention to predicting specific classification, and the metric learning aims at pulling the feature space distance of the sample, so that the embodiment of the application can learn effective feature expression first in the initial stage of training and then learn the correct classification of the sample later by adjusting the ratio of the prediction loss and the embedding loss.
In an alternative example, the proportional weights of the predicted and embedded losses throughout the learning process are controlled by dynamic adjustment. The network loss may be calculated based on the following equation (3.1), in which case the predicted loss may be a weighted cross entropy loss and the embedded loss is a triplet loss.
L DCL=LDSL+f(l)*LTEA formula (3.1)
Where L DCL denotes the network loss, L DSL denotes the weighted cross entropy loss, L TEA denotes the triplet loss function value, and f (L) denotes the second dynamically changing function.
The second dynamic change function in the implementation of the present application is similar to the first dynamic change function in the above embodiment, and any function in which the variable decreases from 1 to 0 may be used, for example: convex functions, concave functions, linear functions, complex functions, etc. Alternatively, the above formula (1.1), formula (1.2), formula (1.3), or formula (1.4) may be selected for implementation.
Optionally, in response to the current number of trained times being less than a first preset threshold, the weight of the embedded loss in the at least one loss decreases as the current number of trained times increases; and/or
In response to the current number of trained times being greater than or equal to a first preset threshold, the weight of the embedded loss in the at least one loss is maintained at a fixed value.
The fixed value is used for guaranteeing that the weight value of the embedded loss is not 0, training the classification network based on at least two losses in the whole training process is achieved, loss quantity reduction caused by increase of training times is avoided, and training efficiency of the classification network is improved.
Alternatively, it may be considered in embodiments of the present application that the weight of the embedding loss may be calculated using a second dynamic change function. The second dynamically changing function may take the form of any function with a variable decreasing from 1 to 0, for example: convex functions, concave functions, linear functions, complex functions, etc.
In an alternative embodiment, the second dynamic change function f (l) may be calculated based on the following equation (3.2).
Wherein L represents the first training, L represents the set total training times, e is a constant with a smaller set value, and formula (3.2) belongs to improvement of formula (1.4), and the value of the second dynamic change function f (L) is not 0 by increasing e.
In one or more alternative embodiments, step 302 includes:
processing the sampling samples through a classification network to obtain a prediction category of each sample image included in the sampling samples;
a prediction loss of the sampled sample is determined based on the prediction category of each sample image included in the sampled sample and the annotation category of each sample image.
According to the method and the device for classifying the sample images, the prediction loss can be classified and learned based on labels, each sample image has a unique labeling category, the difference between the prediction category and the labeling category obtained through the classification network is the prediction loss of the sampling sample, namely the prediction loss expresses the accuracy of the prediction category of the classification network, and the classification network is trained through the prediction loss, so that the accuracy of judging specific categories by the classification network can be improved.
Optionally, determining the prediction loss of the sample based on the prediction category of each sample image and the labeling category of each sample image included in the sample comprises:
Determining a prediction error value for each sample image based on a prediction category of each sample image and a labeling category of each sample image included in the sampled sample;
A prediction error of the sampling sample is determined based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.
In the embodiment of the application, the effectiveness of feature expression is improved by adding the weight value when calculating the prediction error value for each sample image, and optionally, the weight of the sample image depends on the first proportion of the image class of the sample image in the sampled sample.
Alternatively, taking weighted cross entropy loss as an example of prediction loss, the prediction loss is based on general cross entropy loss, the formula for calculating cross entropy loss is improved, and the weight value is increased, so as to improve the effectiveness of feature expression, for example, the weighted cross entropy loss can be calculated based on the following formula (4.1):
Wherein L DSL represents the weighted cross entropy loss, N represents the number of sample images included in the sample corresponding to the present training, N j represents the number of sample images of the j-th class in a batch of sample samples, M is the number of classes included in the sample image set, Representing the actual label corresponding to the ith sample image in the jth category. w j denotes the weight corresponding to the sample image of the j-th class. Optionally, in response to the ratio between the first proportion and the second proportion of the image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold, the weight of the sample image is the ratio between the first proportion and the second proportion; and/or
In response to the ratio between the first ratio and the second ratio being less than a second preset threshold, the weight of the sample image is 0 or 1.
In one or more alternative examples, the weights w j for the sample image may be calculated based on the following equation (4.2):
The above formula (4.2) is expressed as When w j is a value of/>While when/>When w j takes on a value of 0 or 1. Where D j (l) is the target distribution of the sample image of the j-th class in the present training (i.e., the ratio between the sample image of the j-th class and the sample images of the other classes), and B j is the distribution of the sample image of the j-th class in the sample image set (i.e., the ratio between the sample image of the j-th class and the sample images of the other classes).
In one or more alternative embodiments, step 302 may include:
processing the sampling sample through a classification network to obtain characteristic data of each sample image included in the sampling sample;
determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples;
and taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.
In network training, the introduction of metric learning is beneficial to better sample feature expression, and the embodiment of the application realizes metric learning through embedding loss, wherein the embedding loss can comprise various losses based on anchor points, such as triplet loss, quadruple loss, quintuple loss and the like. Taking the triplet loss as an example, it consists of an anchor point and one each of positive and negative samples corresponding to the anchor point. FIG. 4 is a schematic diagram of a prior art calculation of the triplet loss. As shown in fig. 4, in the prior art, when the triplet loss is calculated, all sample images in the categories with a smaller number (small samples of the baldness in fig. 4) are generally taken as anchor points, that is, all anchor points in the embodiment of the present application are corresponded, but one problem exists in that all small sample categories are taken as anchor points: if a difficult small number of samples (i.e., sample images that are not aggregated in a small number of sample classes, such as the anchor point (anchor) shown in fig. 4) are used as anchor points, difficulty is brought to network learning, so that the classification boundary cannot be solved stably, such as the distance between the anchor point and the similar sample image (positive sample Hard +) to be pulled in fig. 4, which results in unstable boundary between the two classes.
In response to the problem in fig. 4 described above, two concepts are presented by the embodiment of the present application: an easy sample and a difficult sample, the easy sample refers to at least two sample images in a small-amount image class (when sample images in the small-amount sample class are adopted as anchor points) which are smaller than a set value in mutual distance, that is, sample images aggregated together are called an easy sample; whereas the difficult sample is a relatively easy sample, i.e., a sample image at a distance from the easy sample greater than or equal to a set point.
In order to solve the problem existing in the prior art when the triplet loss is calculated, the embodiment of the application uses an easy sample as an anchor point to calculate the triplet loss, and can more robustly and stably determine the classification boundary, and fig. 5 is a schematic calculation diagram of the triplet loss in the training method of the classification network provided by the embodiment of the application. In an alternative example, equation (5) may be selected to calculate the triplet loss.
Where L TEA represents the triplet loss function value, |t| represents the total number of triples, each triplet including one easy anchor point, one positive sample (sample image belonging to the same class as the easy anchor point is referred to as positive sample), and one negative sample (sample image belonging to a different class than the easy anchor point is referred to as negative sample); m j is a set super parameter, x easy,j represents a sample image of a j-th category in a sampling sample as an easy anchor point, x +,j represents a positive sample of the easy anchor point x easy,j, x -,j represents a negative sample of the easy anchor point x easy,j, d (x easy,j,x+,j) represents a distance (e.g., euclidean distance, cosine distance, etc.) between the easy anchor point x easy,j and the positive sample x +,j, and d (x easy,j,x-,j) represents a distance (e.g., euclidean distance, cosine distance, etc.) between the easy anchor point x easy,j and the negative sample x -,j.
The weighted cross entropy loss obtained in connection with the above embodiments and the triplet loss based on easy anchor point are used for multitasking learning. The cross entropy loss with weight is different from the learning emphasis point of the triplet loss based on easy anchor points, and has different effects in different learning stages, so that the embodiment of the application dynamically programs the emphasis points of the two in different learning stages.
In application, the training method of the classification network provided by the embodiment of the application can be applied to neural network training of an image set containing training images of different categories, in particular to unbalanced data scenes, such as pedestrian detection and tracking, large-scale smart city target task search positioning, personal portrait description and the like, and can effectively improve the performance of the neural network obtained by training.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Fig. 6 is a schematic structural diagram of a training device for classification network according to an embodiment of the present application. The device of this embodiment can be used to implement the above-described method embodiments of the present application. As shown in fig. 6, the apparatus of this embodiment includes:
The sampling ratio determining unit 61 is configured to determine, based on the number of times of sampling corresponding to a current sample of the plurality of times of sampling, a sampling ratio at which the current sample obtains sample images of different categories from the sample image set.
Wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image.
The sample sampling unit 62 is configured to perform the current sampling on the sample image set based on the sampling proportion, so as to obtain a sampled sample of the current sampling.
In the embodiment of the application, the sampling proportion of the current sampling is determined by the sampled times, and the sampling times are changed when each sampling is performed, so that the sampling proportion of each sampling is different, and each image category in the sample image set is ensured to play a positive role in training the classification network.
The network training unit 63 is configured to train the classification network based on a plurality of sampling samples obtained by a plurality of sampling, and obtain a target classification network.
Based on the training device of the classification network provided by the embodiment of the application, the sampling proportion of the sample images of different categories obtained by the current sampling from the sample image set is determined based on the sampled times corresponding to the current sampling in the plurality of samplings; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by multiple sampling, a target classification network is obtained, and the sample image set is sampled according to the sampling proportion which dynamically changes along with the sampling times, so that the classification network obtained by training has higher classification accuracy.
In some alternative embodiments, the at least two image categories include a first image category and a second image category, wherein the first image category includes a greater number of sample images than the second image category.
The sampling method for determining the sampling proportion of the current sampling based on the sampled times is suitable for sampling any sample image set comprising a plurality of categories, and particularly for the situation that the sample image number difference of different image categories is large, the specific gravity of the second image category with the smaller number in the sample can be increased by adjusting the sampling proportion, so that the target classification network obtained through training can accurately classify the second image category, and the problem that the second image category is excessively small in number and large in repeated amount in the sample due to the fact that the sample image number is smaller when the fixed sampling proportion is used for sampling does not occur, and the training target classification network is inaccurate in classifying the second image category.
Optionally, the sampled sample comprises at least two sample images, the at least two sample images corresponding to at least one image class.
In one or more alternative embodiments, the difference between the number of sample images of different image categories for which the sampling scale corresponds decreases with increasing number of samplings.
In the prior art, when a sample image set is sampled, an balanced target data sampling distribution is always kept (for example, sampling is always performed by using a sampling original proportion or a set proportion or sampling is always performed by using a set proportion, and the proportion between at least two categories in the sample image set is used as the sampling original proportion), so that generalization learning of a system is not facilitated, for example, in the initial stage of system learning, excessive effective information is lost by the system due to excessive discarding of large-category sample images, and inaccurate classification of a classification network obtained through training is caused. According to the embodiment of the application, firstly, the proportion among different categories in the sample image set is taken as the sampling original proportion (for example, the sample image set comprises 500 sample images of a first category and 100 sample images of a second category, the original sampling proportion is 1/5), the data difference among the sample images of different image categories in the sample image set is reduced by dynamically adjusting the original sampling proportion based on the sampling original proportion, namely, the proportion of the sample images of different image categories in the sample image set is gradually increased along with the increase of sampling times, so that the study of unbalanced data of different categories in the sample image set is realized, the recall rate of a classification network to the sample images of the subclass is improved, the effective feature expression of all data study is realized, and the correct classification study of the sample image can be realized in the later stage.
In one or more alternative embodiments, the network training unit 63 includes:
The loss obtaining module is used for processing the sampling samples through the classification network to obtain network loss of the sampling samples;
and the parameter adjustment module is used for adjusting network parameters of the classified network based on the network loss to obtain the target classified network.
Network training is the process of adjusting network parameters through network loss, optionally, the process of adjusting parameters includes: and inputting a sample to be trained after obtaining one sample per sampling, obtaining a network loss, and adjusting network parameters in the classification network to be trained based on the network loss to obtain an adjusted classification network.
Optionally, a loss obtaining module, configured to process the sampled samples through a classification network to obtain at least two losses of the sampled samples; based on at least two losses of the sampled samples, a network loss of the sampled samples is obtained.
Optionally, the loss obtaining module is configured to, when obtaining the network loss of the sampled samples based on at least two losses of the sampled samples, weight and sum the at least two losses of the sampled samples to obtain the network loss of the sampled samples.
Wherein the weight of at least one of the at least two losses is dependent on the current number of trained times corresponding to the sampled sample.
Optionally, the at least one loss comprises at least one of a predicted loss and an embedded loss.
Optionally, the embedded loss of the at least two losses has a lower contribution to the network loss when the current number of trained times is a first value than when the current number of trained times is a second value, wherein the first value is greater than the second value; and/or
The predicted loss of the at least two losses contributes more to the network loss when the current number of trained times is a first value than to the network loss when the current number of trained times is a second value.
Optionally, in response to the current number of trained times being less than a first preset threshold, the weight of the embedded loss in the at least one loss decreases as the current number of trained times increases; and/or
In response to the current number of trained times being greater than or equal to a first preset threshold, the weight of the embedded loss in the at least one loss is maintained at a fixed value.
In one or more alternative embodiments, the loss obtaining module is configured to process the sampled samples through the classification network to obtain a prediction category for each sample image included in the sampled samples when the sampled samples are processed through the classification network to obtain at least two losses of the sampled samples; a prediction loss of the sampled sample is determined based on the prediction class of each sample image included in the sampled sample and the annotation class of each sample image.
According to the method and the device for classifying the sample images, the prediction loss can be classified and learned based on labels, each sample image has a unique labeling category, the difference between the prediction category and the labeling category obtained through the classification network is the prediction loss of the sampling sample, namely the prediction loss expresses the accuracy of the prediction category of the classification network, and the classification network is trained through the prediction loss, so that the accuracy of judging specific categories by the classification network can be improved.
Optionally, the loss obtaining module is configured to determine, when determining the prediction loss of the sample based on the prediction category of each sample image and the labeling category of each sample image included in the sample, a prediction error value of each sample image based on the prediction category of each sample image and the labeling category of each sample image included in the sample; a prediction error of the sampling sample is determined based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.
Optionally, the weight of the sample image depends on a first proportion of the sample image's belonging image class in the sample.
Optionally, in response to the ratio between the first proportion and the second proportion of the image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold, the weight of the sample image is the ratio between the first proportion and the second proportion; and/or
In response to the ratio between the first ratio and the second ratio being less than a second preset threshold, the weight of the sample image is 0 or 1.
In one or more alternative embodiments, the loss obtaining module is configured to process the sampled samples through the classification network to obtain feature data of each sample image included in the sampled samples when the sampled samples are processed through the classification network to obtain at least two losses of the sampled samples; determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples; and taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.
In network training, the introduction of metric learning is beneficial to better sample feature expression, and the embodiment of the application realizes metric learning through embedding loss, wherein the embedding loss can comprise various losses based on anchor points, such as triplet loss, quadruple loss, quintuple loss and the like. Taking the triplet loss as an example, it consists of an anchor point and one each of positive and negative samples corresponding to the anchor point.
Fig. 7 is a flowchart of a training method of a classification network according to another embodiment of the present application. The method may be performed by any electronic device, such as a terminal device, a server, a mobile device, etc.
At step 710, processing the sampled samples obtained from the sample image set through the classification network to obtain at least two losses of the sampled samples.
Wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image, the sample comprising at least two sample images.
The sampling sample in the embodiment of the application can be obtained by sampling based on any sampling proportion, for example, the sampling can be performed based on the sampling proportion which dynamically changes, and the sampling can also be performed based on the sampling proportion which is fixedly set.
In an embodiment of the present application, in order to improve the training speed and the accuracy of the target classification network, at least two losses are obtained based on the sampled samples, and network losses are obtained by at least two losses, optionally, the at least two losses may include, but are not limited to: prediction loss, embedding loss, etc.
At step 720, a network penalty for the sampled samples is obtained based on the at least two penalties and the weights of the at least two penalties for the sampled samples.
The weight of at least one loss contained in at least two losses depends on the current trained times corresponding to the sampled samples, and the proportion of different losses in network losses is adjusted through the weight, so that the problem that network learning caused by adding a plurality of losses is not emphasized in the prior art, and the performance of a classified network is reduced is solved.
In the network training process, a loss is often included, in the embodiment of the present application, a batch of sampling samples are input into the classification network at a time, at least two losses are obtained, the network losses are obtained in combination with at least two losses, the training efficiency of the classification network is improved, the losses are usually obtained based on supervision information (usually the labeling category corresponding to the sample image) and the prediction classification result, for example, the losses are determined based on the coincidence condition of the prediction classification result and the labeling category.
Optionally, the at least two losses of the sampled samples are weighted summed based on the weights of the at least two losses to obtain a network loss of the sampled samples.
In the network training process, the network loss is obtained by means of weighted summation aiming at a plurality of losses, each loss can make corresponding contribution in the training, and the contribution proportion of different losses to parameter adjustment in each training can be adjusted through different weights, so that the contribution proportion of more advantageous losses is improved in different training stages.
Step 730, adjusting network parameters of the classification network based on the network loss to obtain the target classification network.
Optionally, training the classification network based on each of the plurality of sample samples in turn, the specific process may include: for each sampling sample, the classification network to be trained processes the sampling sample to obtain network loss; performing parameter adjustment on the classified network to be trained based on network loss to obtain the classified network with the parameters adjusted; judging whether the conditions for ending the training (for example, reaching the preset training times, etc.) are reached, and taking the classification network after parameter adjustment as a target classification network when the conditions for ending the training are reached; when the condition of finishing training is not reached, the classification network after the parameter adjustment is used as the classification network to be trained, the sampling sample obtained by the next sampling is processed based on the classification network to be trained to obtain the next network loss, the parameter adjustment is carried out on the classification network to be trained based on the network loss to obtain the classification network after the parameter adjustment, until the condition of finishing training is reached, and the classification network after the parameter adjustment is used as the target classification network.
According to the embodiment of the application, the network loss of the sampling sample is determined through at least two losses of the sampling sample, and the weight of at least one loss in the at least two losses depends on the current trained times corresponding to the sampling sample, so that the proportion of the at least two losses in the network loss is dynamically adjusted; because different losses are in different stages of training and have different importance, for example, some losses which are important in the initial stage of training are not important in the later stage of training, the classifying network training method provided by the embodiment of the application solves the problem that network learning caused by adding a plurality of losses is not important in the prior art and the performance of the classifying network is reduced by dynamically adjusting the weight value of at least one loss (for example, calculating the weight value of at least one loss by using a dynamically-changing function) so as to achieve a better network learning effect and improve the performance of the target classifying network after training.
Alternatively, the at least one loss may include, but is not limited to, at least one of a predicted loss and an embedded loss.
The embodiment of the present application is only an example of the training method applicable to the classification network provided in the embodiment of the present application, and is not limited to the specific type of at least one loss in the embodiment of the present application.
Optionally, the embedded loss of the at least two losses contributes less to the network loss when the current number of trained times is a first value than to the network loss when the current number of trained times is a second value; and/or
The predicted loss of the at least two losses contributes more to the network loss when the current number of trained times is a first value than to the network loss when the current number of trained times is a second value.
Wherein the first value is greater than the second value. In the embodiment of the application, the prediction loss has advantages in the initial stage of training (when the number of trained times is the first numerical value), but has no advantages after the characteristics are basically stable in the middle and later stages of training, so that the contribution ratio of the prediction loss is gradually adjusted to be smaller according to the increase of the training times, and in the process, the contribution ratio of the embedding loss is gradually increased along with the increase of the training times.
In the method of combining classification tasks (e.g., cross entropy loss learning, etc.) and metric learning (e.g., triple loss, quadruple loss, quintuple loss learning, etc.) in the embodiment of the application, the two may be considered to have different emphasis points in the whole learning process. Optionally, the classification task pays more attention to predicting specific classification, and the metric learning aims at pulling the feature space distance of the sample, so that the embodiment of the application can learn effective feature expression first in the initial stage of training and then learn the correct classification of the sample later by adjusting the ratio of the prediction loss and the embedding loss.
In an alternative example, the proportional weights of the predicted and embedded losses throughout the learning process are controlled by dynamic adjustment. The network loss may be calculated based on equation (3.1) above, in which case the predicted loss may be a weighted cross entropy loss and the embedded loss is a triplet loss.
The second dynamic change function in the implementation of the present application is similar to the first dynamic change function in the above embodiment, and any function in which the variable decreases from 1 to 0 may be used, for example: convex functions, concave functions, linear functions, complex functions, etc. Likewise, the above formula (1.1), formula (1.2), formula (1.3) or formula (1.4) may be selected.
Optionally, in response to the current number of trained times being less than a first preset threshold, the weight of the embedded loss in the at least one loss decreases as the current number of trained times increases; and/or
In response to the current number of trained times being greater than or equal to a first preset threshold, the weight of the embedded loss in the at least one loss is maintained at a fixed value.
The fixed value is used for guaranteeing that the weight value of the embedded loss is not 0, training the classification network based on at least two losses in the whole training process is achieved, loss quantity reduction caused by increase of training times is avoided, and training efficiency of the classification network is improved.
Alternatively, it may be considered in embodiments of the present application that the weight of the embedding loss may be calculated using a second dynamic change function. The second dynamically changing function may take the form of any function with a variable decreasing from 1 to 0, for example: convex functions, concave functions, linear functions, complex functions, etc.
In an alternative embodiment, the second dynamic change function f (l) may be calculated based on equation (3.2) above.
In one or more alternative embodiments, step 710 includes:
processing the sampling samples through a classification network to obtain a prediction category of each sample image included in the sampling samples;
a prediction loss of the sampled sample is determined based on the prediction category of each sample image included in the sampled sample and the annotation category of each sample image.
According to the method and the device for classifying the sample images, the prediction loss can be classified and learned based on labels, each sample image has a unique labeling category, the difference between the prediction category and the labeling category obtained through the classification network is the prediction loss of the sampling sample, namely the prediction loss expresses the accuracy of the prediction category of the classification network, and the classification network is trained through the prediction loss, so that the accuracy of judging specific categories by the classification network can be improved. Optionally, the process of determining the predicted loss of the sampled samples may include: determining a prediction error value for each sample image based on a prediction category of each sample image and a labeling category of each sample image included in the sampled sample; a prediction error of the sampling sample is determined based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.
In the embodiment of the application, the effectiveness of feature expression is improved by adding the weight value when calculating the prediction error value for each sample image, and optionally, the weight of the sample image depends on the first proportion of the image class of the sample image in the sampled sample.
Alternatively, taking weighted cross entropy loss as an example of prediction loss, the prediction loss is based on general cross entropy loss, the formula for calculating cross entropy loss is improved, and the weight value is increased, so that the effectiveness of feature expression is improved, for example, the weighted cross entropy loss can be calculated based on the formula (4.1).
Optionally, in response to the ratio between the first proportion and the second proportion of the image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold, the weight of the sample image is the ratio between the first proportion and the second proportion; and/or
In response to the ratio between the first ratio and the second ratio being less than a second preset threshold, the weight of the sample image is 0 or 1.
In one or more alternative examples, the weights of the sample images may be calculated based on equation (4.2) above.
In one or more alternative embodiments, step 710 may include:
processing the sampling sample through a classification network to obtain characteristic data of each sample image included in the sampling sample;
determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples;
and taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.
In network training, the introduction of metric learning is beneficial to better sample feature expression, and the embodiment of the application realizes metric learning through embedding loss, wherein the embedding loss can comprise various losses based on anchor points, such as triplet loss, quadruple loss, quintuple loss and the like. Taking the triplet loss as an example, it consists of an anchor point and one each of positive and negative samples corresponding to the anchor point.
Fig. 8 is a flowchart of another embodiment of a training method for a classification network according to an embodiment of the present application. As shown in fig. 8, the method of this embodiment includes:
step 810, determining a sampling proportion of the current sample to obtain sample images of different categories from the sample image set based on a number of times the current sample corresponds to the current sample of the plurality of samples.
Wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image; for example, the sample image set includes two categories, a first category including a large amount of data and a second category including a small amount of data, and by dynamically adjusting the sampling ratio of sampling according to the sampled times in the embodiment of the present application, the sampling ratio between the first category and the second category is dynamically changed during each sampling, for example, a set function may be used to realize that the sampling ratio dynamically changes with the sampled times.
In some alternative embodiments, the at least two image categories include a first image category and a second image category, wherein the first image category includes a greater number of sample images than the second image category.
The sampling method for determining the sampling proportion of the current sampling based on the sampled times is suitable for sampling any sample image set comprising a plurality of categories, and particularly for the situation that the sample image number difference of different image categories is large, the specific gravity of the second image category with the smaller number in the sample can be increased by adjusting the sampling proportion, so that the target classification network obtained through training can accurately classify the second image category, and the problem that the second image category is excessively small in number and large in repeated amount in the sample due to the fact that the sample image number is smaller when the fixed sampling proportion is used for sampling does not occur, and the training target classification network is inaccurate in classifying the second image category.
Step 820, performing current sampling on the sample image set based on the sampling proportion to obtain a sampling sample of the current sampling.
In the embodiment of the application, the sampling proportion of the current sampling is determined by the sampled times, and the sampling times are changed when each sampling is performed, so that the sampling proportion of each sampling is different, and each image category in the sample image set is ensured to play a positive role in training the classification network.
Optionally, each sample comprises at least two sample images, the at least two sample images corresponding to at least one image class.
In the embodiment of the application, in order to realize the training of the classification network based on the sampling sample, a plurality of sample images which are needed to be included in the sampling sample are used for improving the classification accuracy of the classification network obtained by the training; and sampling is carried out from the sample image set based on one sampling proportion at a time, and the proportion among the sample images of different categories in the obtained sampling sample accords with the sampling proportion.
At step 830, the sampled samples obtained from the sample image set are processed through a classification network to obtain at least two losses of the sampled samples.
Wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image, the sample comprising at least two sample images.
Step 830 in the embodiment of the present application is similar to step 710 in the above embodiment, and can be understood by referring to the above embodiment, and will not be described herein.
In step 840, a network penalty for the sampled samples is obtained based on at least two penalties and weights for the at least two penalties for the sampled samples.
The weight of at least one loss contained in at least two losses depends on the current trained times corresponding to the sampled samples, and the proportion of different losses in network losses is adjusted through the weight, so that the problem that network learning caused by adding a plurality of losses is not emphasized in the prior art, and the performance of a classified network is reduced is solved.
Step 850, adjusting network parameters of the classification network based on the network loss to obtain the target classification network.
Step 850 in the embodiment of the present application is similar to step 730 in the above embodiment, and can be understood by referring to the above embodiment, and will not be described herein.
The training method of the classification network provided by the embodiment of the application realizes the sampling of the dynamic proportion and the dynamic adjustment of the weights of different losses, and balances the effect of each image class in the network training through the sampling of the dynamic proportion, so that the target classification network has higher recall rate for both major classes and minor classes; the weight of different losses is dynamically adjusted to enable the different losses to occupy a larger proportion in network losses when the functions are larger, and the proportion of the different losses is reduced in the network losses when the functions are smaller, so that the problems that network learning is not emphasized only due to adding a plurality of losses in the prior art, and the performance of a classified network is reduced are solved.
Optionally, each sample comprises at least two sample images, the at least two sample images corresponding to at least one image class.
In the embodiment of the application, in order to realize the training of the classification network based on the sampling sample, a plurality of sample images which are needed to be included in the sampling sample are used for improving the classification accuracy of the classification network obtained by the training; and sampling is carried out from the sample image set based on one sampling proportion at a time, and the proportion among the sample images of different categories in the obtained sampling sample accords with the sampling proportion.
In one or more alternative embodiments, the difference between the number of sample images of different image categories for which the sampling scale corresponds decreases with increasing number of samplings.
In the prior art, when a sample image set is sampled, an balanced target data sampling distribution is always kept (for example, sampling is always performed by using a sampling original proportion or a set proportion or sampling is always performed by using a set proportion, and the proportion between at least two categories in the sample image set is used as the sampling original proportion), so that generalization learning of a system is not facilitated, for example, in the initial stage of system learning, excessive effective information is lost by the system due to excessive discarding of large-category sample images, and inaccurate classification of a classification network obtained through training is caused. The embodiment of the application firstly obtains the proportion among different categories in a sample image set as a sampling original proportion (for example, the sample image set comprises 500 sample images of a first category and 100 sample images of a second category, the original sampling proportion is 1/5), and based on the sampling original proportion, the data difference among the sample images of different image categories in the sample is reduced by dynamically adjusting the original sampling proportion along with the increase of sampling times, namely, the proportion of a subclass sample in the sample is gradually increased, so that the recall rate of a classification network to the subclass sample is improved, the effective feature expression of all data learning is realized, and the correct classification learning of the sample image can be realized in the later stage.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Fig. 9 is another schematic structural diagram of a training device for classification network according to an embodiment of the present application. The device of this embodiment can be used to implement the above-described method embodiments of the present application. As shown in fig. 9, the apparatus of this embodiment includes:
a sample loss obtaining unit 91 for obtaining at least two losses of a sample by processing the sample obtained from the sample image set through the classification network.
Wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image, the sample comprising at least two sample images.
A network loss unit 92 for obtaining a network loss of the sampled samples based on at least two losses of the sampled samples and weights of the at least two losses.
The weight of at least one loss contained in at least two losses depends on the current trained times corresponding to the sampled samples, and the proportion of different losses in network losses is adjusted through the weight, so that the problem that network learning caused by adding a plurality of losses is not emphasized in the prior art, and the performance of a classified network is reduced is solved.
And a parameter adjustment unit 93 for adjusting network parameters of the classified network based on the network loss to obtain the target classified network.
According to the embodiment of the application, the network loss of the sampling sample is determined through at least two losses of the sampling sample, and the weight of at least one loss in the at least two losses depends on the current trained times corresponding to the sampling sample, so that the proportion of the at least two losses in the network loss is dynamically adjusted; because different losses are in different stages of training and have different importance, for example, some losses are important in the initial stage of training and some losses are not important in the later stage of training, the classifying network training method provided by the embodiment of the application solves the problem that network learning caused by adding a plurality of losses is not important in the prior art by dynamically adjusting the proportion of each loss in network losses, reduces the performance of the classifying network, achieves a better network learning effect by dynamically adjusting the weight value of at least one loss, and improves the performance of the target classifying network after training.
Optionally, the sample loss obtaining unit is configured to weight and sum at least two losses of the sampled samples based on weights of the at least two losses, and obtain a network loss of the sampled samples.
In the network training process, the network loss is obtained by means of weighted summation aiming at a plurality of losses, each loss can make corresponding contribution in the training, and the contribution proportion of different losses to parameter adjustment in each training can be adjusted through different weights, so that the contribution proportion of more advantageous losses is improved in different training stages.
Optionally, the at least one loss comprises at least one of a predicted loss and an embedded loss.
Optionally, the embedded loss of the at least two losses has a lower contribution to the network loss when the current number of trained times is a first value than when the current number of trained times is a second value, wherein the first value is greater than the second value; and/or
The predicted loss of the at least two losses contributes more to the network loss when the current number of trained times is a first value than to the network loss when the current number of trained times is a second value.
Optionally, in response to the current number of trained times being less than a first preset threshold, the weight of the embedded loss in the at least one loss decreases as the current number of trained times increases; and/or
In response to the current number of trained times being greater than or equal to a first preset threshold, the weight of the at least one loss is maintained at a fixed value.
In one or more alternative embodiments, the sample loss obtaining unit 91 is specifically configured to process the sampled samples through the classification network to obtain a prediction class of each sample image included in the sampled samples; a prediction loss of the sampled sample is determined based on the prediction category of each sample image included in the sampled sample and the annotation category of each sample image.
According to the method and the device for classifying the sample images, the prediction loss can be classified and learned based on labels, each sample image has a unique labeling category, the difference between the prediction category and the labeling category obtained through the classification network is the prediction loss of the sampling sample, namely the prediction loss expresses the accuracy of the prediction category of the classification network, and the classification network is trained through the prediction loss, so that the accuracy of judging specific categories by the classification network can be improved.
Optionally, the sample loss obtaining unit 91 is configured to determine, when determining the prediction loss of the sample based on the prediction category of each sample image and the labeling category of each sample image included in the sample, a prediction error value of each sample image based on the prediction category of each sample image and the labeling category of each sample image included in the sample; a prediction error of the sampling sample is determined based on the weight value of each sample image included in the sampling sample and the prediction error value of each sample image.
Optionally, the weight of the sample image depends on a first proportion of the sample image's belonging image class in the sample.
Optionally, in response to the ratio between the first proportion and the second proportion of the image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold, the weight of the sample image is the ratio between the first proportion and the second proportion; and/or
In response to the ratio between the first ratio and the second ratio being less than a second preset threshold, the weight of the sample image is 0 or 1.
In one or more alternative embodiments, the sample loss obtaining unit 91 is specifically configured to process the sampled sample through the classification network to obtain feature data of each sample image included in the sampled sample; determining an easy sample of the sampled samples based on the feature data of each sample image included in the sampled samples; and taking the easy sample as an anchor point to obtain the embedding loss of the sampling sample.
In one or more optional embodiments, an apparatus provided by an embodiment of the present application further includes:
The sampling proportion determining unit is used for determining the sampling proportion of the sample images of different categories obtained by the current sampling from the sample image set based on the corresponding sampled times of the current sampling in the plurality of times of sampling;
and the sample sampling unit is used for carrying out current sampling on the sample image set based on the sampling proportion so as to obtain a sampling sample of the current sampling.
The training device of the classification network provided by the embodiment of the application realizes the sampling of the dynamic proportion and the dynamic adjustment of the weights of different losses, and balances the effect of each image class in the network training through the sampling of the dynamic proportion, so that the target classification network has higher recall rate for both major classes and minor classes; the weight of different losses is dynamically adjusted to enable the different losses to occupy a larger proportion in network losses when the functions are larger, and the proportion of the different losses is reduced in the network losses when the functions are smaller, so that the problems that network learning is not emphasized only due to adding a plurality of losses in the prior art, and the performance of a classified network is reduced are solved.
Optionally, the at least two image categories include a first image category and a second image category, wherein the first image category contains a greater number of sample images than the second image category.
Optionally, the sampled sample comprises at least two sample images, the at least two sample images corresponding to at least one category.
Optionally, the difference between the number of sample images of different image categories for which the sampling scale corresponds decreases with increasing number of samplings.
Fig. 10 is a schematic flow chart of a classification method according to an embodiment of the present application. The method may be performed by any electronic device, such as a terminal device, a server, a mobile device, etc.
In step 1010, an image to be processed is acquired.
The method for acquiring the image to be processed in the embodiment of the application can be various, for example, a photo shot by a camera, one or more frames of video images in video shot by a camera, any image in an album and the like.
And step 1020, classifying the image to be processed through a target classification network to obtain the image prediction type of the image to be processed.
The target classification network is obtained by the training method of the classification network provided by any one of the embodiments.
The target classification network obtained by training by the training method of the classification network provided by any one of the embodiments above realizes dynamic proportion sampling and/or dynamic adjustment of weights of different losses, and solves at least one of the problems of inaccurate subclass judgment caused by unbalanced sample image sets and reduced classification network performance due to no emphasis of network learning. The images to be processed are classified through the target classification network, the obtained classification is more accurate, the network performance is better, and the recall rate of the small categories is higher than that of the classification network trained by other methods in the prior art.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Fig. 11 is a schematic structural diagram of a sorting device according to an embodiment of the present application. The device of this embodiment can be used to implement the above-described method embodiments of the present application. As shown in fig. 11, the apparatus of this embodiment includes:
an image acquisition unit 1101 for acquiring an image to be processed.
The class prediction unit 1102 is configured to perform classification processing on an image to be processed through a target classification network, so as to obtain an image prediction class of the image to be processed; the target classification network is obtained through the training method provided by any one of the embodiments.
According to another aspect of an embodiment of the present application, there is provided an electronic device, including a processor, the processor including the training apparatus or the classifying apparatus of the classification network provided in any one of the embodiments above.
According to another aspect of an embodiment of the present application, there is provided an electronic device including: a memory for storing executable instructions;
And a processor in communication with the memory for executing the executable instructions to perform the operations of the training method or classification method of the classification network as provided in any of the embodiments above.
According to another aspect of an embodiment of the present application, there is provided a computer readable storage medium storing computer readable instructions that, when executed, perform the operations of the training method or the classification method of the classification network provided in any of the embodiments above.
According to another aspect of an embodiment of the present application, there is provided a computer program product comprising computer readable code which, when run on a device, causes a processor in the device to execute instructions for implementing the training method or classification method of the classification network as provided in any of the embodiments above.
According to yet another aspect of embodiments of the present application, there is provided another computer program product for storing computer readable instructions that, when executed, cause a computer to perform the operations of the training method or the classification method of the classification network provided by any of the embodiments described above.
The computer program product may be realized in particular by means of hardware, software or a combination thereof. In one alternative, the computer program product is embodied as a computer storage medium, and in another alternative, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
According to the embodiment of the application, training and classifying methods and devices of a classifying network, electronic equipment, a computer storage medium and a computer program product are also provided, wherein the sampling proportion of sample images of different categories obtained by the current sampling from a sample image set is determined based on the sampling times corresponding to the current sampling in a plurality of times of sampling; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by a plurality of sampling to obtain a target classification network.
In some embodiments, the network acquisition instruction or the image processing instruction may be specifically a call instruction, and the first device may instruct the second device to perform training of the classification network or image classification processing by using a call manner, and accordingly, in response to receiving the call instruction, the second device may perform steps and/or flows in any embodiment of the training method or the classification method of the classification network.
It should be understood that the terms "first," "second," and the like in the embodiments of the present application are merely for distinction and should not be construed as limiting the embodiments of the present application.
It should also be understood that in the present application, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.
It should also be appreciated that for any component, data, or structure referred to in this disclosure, one or more may be generally understood without explicit limitation or otherwise provided with a contrary in the context.
It should also be understood that the description of the embodiments of the present application emphasizes the differences between the embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.
The embodiment of the application also provides electronic equipment which can be a mobile terminal, a Personal Computer (PC), a tablet personal computer, a server and the like. Referring now to fig. 12, there is shown a schematic structural diagram of an electronic device 1200 suitable for use in implementing a terminal device or server of an embodiment of the present application: as shown in fig. 12, the electronic device 1200 includes one or more processors, such as: one or more Central Processing Units (CPUs) 1201, and/or one or more image processors (GPUs) 1213, etc., which may perform various suitable actions and processes based on executable instructions stored in Read Only Memory (ROM) 1202 or loaded from storage 1208 into Random Access Memory (RAM) 1203. The communications portion 1212 may include, but is not limited to, a network card, which may include, but is not limited to, IB (Infiniband) network cards.
The processor may communicate with the rom 1202 and/or the ram 1203 to execute executable instructions, and is connected to the communication portion 1212 through the bus 1204 and communicates with other target devices through the communication portion 1212, so as to perform operations corresponding to any one of the methods provided in the embodiments of the present application, for example, determining, based on the number of times the current sample of the plurality of samples corresponds to the current sample, a sampling proportion of the current sample to obtain different types of sample images from the sample image set; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by a plurality of sampling to obtain a target classification network. Or processing a sampled sample obtained from a sample image set through a classification network to obtain at least two losses of the sampled sample, wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image, the sampled sample comprising at least two sample images; obtaining a network loss of the sampled sample based on at least two losses of the sampled sample and weights of the at least two losses, wherein the weights of at least one loss contained by the at least two losses depend on the current trained times corresponding to the sampled sample; and adjusting network parameters of the classification network based on the network loss to obtain the target classification network.
In addition, in the RAM1203, various programs and data required for device operation can also be stored. The CPU1201, ROM1202, and RAM1203 are connected to each other through a bus 1204. In the case of RAM1203, ROM1202 is an optional module. The RAM1203 stores executable instructions that cause the central processing unit 1201 to execute operations corresponding to the communication methods described above, or write executable instructions into the ROM1202 at the time of execution. An input/output (I/O) interface 1205 is also connected to the bus 1204. The communication unit 1212 may be integrally provided or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and connected to a bus.
The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1208 including a hard disk or the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. The drive 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 1210 so that a computer program read out therefrom is installed into the storage section 1208 as needed.
It should be noted that the architecture shown in fig. 12 is only an alternative implementation, and in a specific practical process, the number and types of components in fig. 12 may be selected, deleted, added or replaced according to actual needs; in the different functional component settings, implementation manners such as a separate setting or an integrated setting may also be adopted, for example, the GPU1213 and the CPU1201 may be separately set or the GPU1213 may be integrated on the CPU1201, the communication section may be separately set, or may be integrally set on the CPU1201 or the GPU1213, and so on. Such alternative embodiments fall within the scope of the present disclosure.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method shown in the flowchart, the program code may include instructions corresponding to performing the method steps provided by embodiments of the present application, for example, determining a sampling scale for a current sample to obtain a different class of sample images from a set of sample images based on a number of samples that correspond to the current sample of the plurality of samples; based on the sampling proportion, current sampling is carried out on the sample image set so as to obtain a sampling sample of the current sampling; the classification network is trained based on a plurality of sampling samples obtained by a plurality of sampling to obtain a target classification network. Or processing a sampled sample obtained from a sample image set through a classification network to obtain at least two losses of the sampled sample, wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image, the sampled sample comprising at least two sample images; obtaining a network loss of the sampled sample based on at least two losses of the sampled sample and weights of the at least two losses, wherein the weights of at least one loss contained by the at least two losses depend on the current trained times corresponding to the sampled sample; and adjusting network parameters of the classification network based on the network loss to obtain the target classification network. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1209, and/or installed from the removable media 1211. When being executed by a Central Processing Unit (CPU) 1201, performs the operations of the above-described functions defined in the method of the present application.
The method and apparatus of the present application may be implemented in a number of ways. For example, the methods and apparatus of the present application may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present application are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present application may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present application. Thus, the present application also covers a recording medium storing a program for executing the method according to the present application.
The description of the present application has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the application in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, and to enable others of ordinary skill in the art to understand the application for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (13)

1. A method of training a classification network, comprising:
Determining the sampling proportion of sample images of different image categories obtained by the current sampling from a sample image set based on the sampling times corresponding to the current sampling in the multiple samplings; the number of sample images corresponding to the different image categories is different;
based on the sampling proportion, carrying out the current sampling on the sample image set to obtain a sampling sample of the current sampling;
Processing the sampled samples obtained from the sample image set through the classification network to obtain at least two losses of the sampled samples, wherein the sample image set comprises at least two image categories, each image category comprising at least one sample image, the sampled samples comprising at least two sample images;
Obtaining a network loss of the sampled samples based on at least two losses of the sampled samples and weights of the at least two losses, wherein the weights are related to a current number of trained times, the weights of relatively more advantageous ones of the at least two losses being increased at different stages of training;
And adjusting network parameters of the classified network based on the network loss to obtain a target classified network.
2. The method of claim 1, wherein the at least two losses include a predictive loss and an embedding loss.
3. The method of claim 2, wherein the embedding loss has a lower weight when the current trained number is a first value than when the current trained number is a second value; and/or
The weight of the prediction loss is higher when the current trained times are the first numerical value than when the current trained times are the second numerical value;
Wherein the first value is greater than the second value.
4. The method of claim 2, wherein the weight of the embedding penalty decreases as the current number of trained times increases in response to the current number of trained times being less than a first preset threshold; and/or
And in response to the current trained times being greater than or equal to the first preset threshold, the weight is maintained at a fixed value.
5. The method of claim 1, wherein, in the event that the at least two losses include predicted losses, processing the sampled samples obtained from the sample image set through the classification network to obtain predicted losses for the sampled samples comprises:
Processing the sampling samples through the classification network to obtain a prediction category of each sample image included in the sampling samples;
determining a prediction error value of each sample image based on a prediction category of each sample image included in the sample and a labeling category of each sample image;
A prediction loss of the sample is determined based on a weight value of each sample image included in the sample and a prediction error value of each sample image.
6. The method of claim 5, wherein the weight of the sample image is related to a first proportion of the image class to which the sample image belongs in the sample, and the number of sample images corresponding to different image classes is different.
7. The method of claim 6, wherein the weight of the sample image is a ratio between the first scale and the second scale in response to a ratio between the first scale and a second scale of an image class to which the sample image belongs in the sample image set being greater than or equal to a second preset threshold; and/or
In response to the ratio between the first ratio and the second ratio being less than the second preset threshold, the weight of the sample image is 0 or 1.
8. The method of claim 1, wherein a difference between the number of sample images of different categories for which the sampling scale corresponds decreases as the number of samplings increases.
9. A method of classification, comprising:
Acquiring an image to be processed;
Classifying the image to be processed through a target classification network to obtain an image prediction category of the image to be processed; wherein,
The object classification network is obtained by a training method according to any one of claims 1 to 8.
10. A training device for a classification network, comprising:
A sampling proportion determining unit, configured to determine, based on a number of sampled times corresponding to a current sample of a plurality of samples, a sampling proportion of sample images of different categories obtained from a sample image set by the current sample; the number of sample images corresponding to different image categories is different;
the sample sampling unit is used for carrying out current sampling on the sample image set based on the sampling proportion so as to obtain a sampling sample of the current sampling;
A sample loss obtaining unit for processing the sampled samples obtained from the sample image set through the classification network to obtain at least two losses of the sampled samples, wherein the sample image set comprises at least two image categories, each image category comprises at least one sample image, and the sampled samples comprise at least two sample images;
A network loss unit for obtaining a network loss of the sampled sample based on at least two losses of the sampled sample and weights of the at least two losses, wherein the weights are related to a current number of trained times, the weights of the relatively more advantageous of the at least two losses being increased at different stages of training;
And the parameter adjustment unit is used for adjusting the network parameters of the classification network based on the network loss to obtain a target classification network.
11. An electronic device, comprising: a memory for storing executable instructions;
And a processor in communication with the memory to execute the executable instructions to perform the operations of the training method of the classification network of any one of claims 1 to 8 or the classification method of claim 9.
12. A computer readable storage medium storing computer readable instructions which, when executed, perform the operations of the training method of the classification network of any one of claims 1 to 8 or the classification method of claim 9.
13. A computer program product comprising computer readable code, characterized in that a processor in a device executes instructions for implementing the training method of the classification network of any of claims 1 to 8 or the classification method of claim 9 when said computer readable code is run on the device.
CN202111022512.4A 2019-01-18 2019-01-18 Classification network training method, classification method and device and electronic equipment Active CN113688933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111022512.4A CN113688933B (en) 2019-01-18 2019-01-18 Classification network training method, classification method and device and electronic equipment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111022512.4A CN113688933B (en) 2019-01-18 2019-01-18 Classification network training method, classification method and device and electronic equipment
CN201910049144.9A CN109800807B (en) 2019-01-18 2019-01-18 Training method and classification method and device of classification network, and electronic equipment

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201910049144.9A Division CN109800807B (en) 2019-01-18 2019-01-18 Training method and classification method and device of classification network, and electronic equipment

Publications (2)

Publication Number Publication Date
CN113688933A CN113688933A (en) 2021-11-23
CN113688933B true CN113688933B (en) 2024-05-24

Family

ID=66559673

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202111022512.4A Active CN113688933B (en) 2019-01-18 2019-01-18 Classification network training method, classification method and device and electronic equipment
CN201910049144.9A Active CN109800807B (en) 2019-01-18 2019-01-18 Training method and classification method and device of classification network, and electronic equipment

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201910049144.9A Active CN109800807B (en) 2019-01-18 2019-01-18 Training method and classification method and device of classification network, and electronic equipment

Country Status (1)

Country Link
CN (2) CN113688933B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210560B (en) * 2019-05-31 2021-11-30 北京市商汤科技开发有限公司 Incremental training method, classification method and device, equipment and medium of classification network
CN112241673B (en) * 2019-07-19 2022-11-22 浙江商汤科技开发有限公司 Video processing method and device, electronic equipment and storage medium
CN110555830A (en) * 2019-08-15 2019-12-10 浙江工业大学 Deep neural network skin detection method based on deep Labv3+
CN110533106A (en) * 2019-08-30 2019-12-03 腾讯科技(深圳)有限公司 Image classification processing method, device and storage medium
CN112529172B (en) * 2019-09-18 2024-09-10 华为技术有限公司 Data processing method and data processing apparatus
CN113408558B (en) * 2020-03-17 2024-03-08 百度在线网络技术(北京)有限公司 Method, apparatus, device and medium for model verification
CN111429414B (en) * 2020-03-18 2023-04-07 腾讯科技(深圳)有限公司 Artificial intelligence-based focus image sample determination method and related device
CN111832614A (en) * 2020-06-04 2020-10-27 北京百度网讯科技有限公司 Training method and device of target detection model, electronic equipment and storage medium
CN112241715A (en) * 2020-10-23 2021-01-19 北京百度网讯科技有限公司 Model training method, expression recognition method, device, equipment and storage medium
CN113111960B (en) * 2021-04-25 2024-04-26 北京文安智能技术股份有限公司 Image processing method and device and training method and system of target detection model
CN113420792A (en) * 2021-06-03 2021-09-21 阿波罗智联(北京)科技有限公司 Training method of image model, electronic equipment, road side equipment and cloud control platform
CN113792734A (en) * 2021-09-18 2021-12-14 深圳市商汤科技有限公司 Neural network training and image processing method, device, equipment and storage medium
CN114494782B (en) * 2022-01-26 2023-08-08 北京百度网讯科技有限公司 Image processing method, model training method, related device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229298A (en) * 2017-09-30 2018-06-29 北京市商汤科技开发有限公司 The training of neural network and face identification method and device, equipment, storage medium
WO2018121690A1 (en) * 2016-12-29 2018-07-05 北京市商汤科技开发有限公司 Object attribute detection method and device, neural network training method and device, and regional detection method and device
CN108520220A (en) * 2018-03-30 2018-09-11 百度在线网络技术(北京)有限公司 model generating method and device
CN108573284A (en) * 2018-04-18 2018-09-25 陕西师范大学 Deep learning facial image extending method based on orthogonal experiment analysis
CN108764281A (en) * 2018-04-18 2018-11-06 华南理工大学 A kind of image classification method learning across task depth network based on semi-supervised step certainly

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7545986B2 (en) * 2004-09-16 2009-06-09 The United States Of America As Represented By The Secretary Of The Navy Adaptive resampling classifier method and apparatus
CN102521656B (en) * 2011-12-29 2014-02-26 北京工商大学 Integrated transfer learning method for classification of unbalance samples
CN106021364B (en) * 2016-05-10 2017-12-12 百度在线网络技术(北京)有限公司 Foundation, image searching method and the device of picture searching dependency prediction model
CN106203534A (en) * 2016-07-26 2016-12-07 南京航空航天大学 A kind of cost-sensitive Software Defects Predict Methods based on Boosting
US10255416B2 (en) * 2017-01-25 2019-04-09 Ca, Inc. Secure biometric authentication with client-side feature extraction
CN108229647A (en) * 2017-08-18 2018-06-29 北京市商汤科技开发有限公司 The generation method and device of neural network structure, electronic equipment, storage medium
CN108647665B (en) * 2018-05-18 2021-07-27 西安电子科技大学 Aerial photography vehicle real-time detection method based on deep learning
CN109003272B (en) * 2018-07-26 2021-02-09 北京小米移动软件有限公司 Image processing method, device and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018121690A1 (en) * 2016-12-29 2018-07-05 北京市商汤科技开发有限公司 Object attribute detection method and device, neural network training method and device, and regional detection method and device
CN108229298A (en) * 2017-09-30 2018-06-29 北京市商汤科技开发有限公司 The training of neural network and face identification method and device, equipment, storage medium
CN108520220A (en) * 2018-03-30 2018-09-11 百度在线网络技术(北京)有限公司 model generating method and device
CN108573284A (en) * 2018-04-18 2018-09-25 陕西师范大学 Deep learning facial image extending method based on orthogonal experiment analysis
CN108764281A (en) * 2018-04-18 2018-11-06 华南理工大学 A kind of image classification method learning across task depth network based on semi-supervised step certainly

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Qi Dong 等.Imbalanced Deep Learning by Minority Class Incremental Rectification.arXiv.2018,1-14. *
朱江 等.基于深度自编码网络的安全态势要素获取机制.计算机应用.2017,165-170. *
高飞 等.基于样本类别确定度的半监督分类.北京航空航天大学学报.2018,158-168. *

Also Published As

Publication number Publication date
CN113688933A (en) 2021-11-23
CN109800807B (en) 2021-08-31
CN109800807A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
CN113688933B (en) Classification network training method, classification method and device and electronic equipment
TW202139183A (en) Method of detecting object based on artificial intelligence, device, equipment and computer-readable storage medium
CN110929836B (en) Neural network training and image processing method and device, electronic equipment and medium
CN111784595B (en) Dynamic tag smooth weighting loss method and device based on historical record
CN103262118A (en) Attribute value estimation device, attribute value estimation method, program, and recording medium
JP6926934B2 (en) Equipment and methods for assessing complexity of classification tasks
CN103064985B (en) Priori knowledge based image retrieval method
CN109902588B (en) Gesture recognition method and device and computer readable storage medium
CN110458022A (en) It is a kind of based on domain adapt to can autonomous learning object detection method
CN112329793B (en) Significance detection method based on structure self-adaption and scale self-adaption receptive fields
CN113989556A (en) Small sample medical image classification method and system
CN113989519A (en) Long-tail target detection method and system
CN112115826A (en) Face living body detection method and system based on bilateral branch network
CN111738319A (en) Clustering result evaluation method and device based on large-scale samples
CN113657510A (en) Method and device for determining data sample with marked value
CN113158904A (en) Twin network target tracking method and device based on double-mask template updating
CN117541853A (en) Classification knowledge distillation model training method and device based on category decoupling
CN116030323B (en) Image processing method and device
CN116597197A (en) Long-tail target detection method capable of adaptively eliminating negative gradient of classification
CN106033550B (en) Method for tracking target and device
CN109215057B (en) High-performance visual tracking method and device
Kaiyan et al. An apple grading method based on improved VGG16 network
Yang et al. Multi-class Weather Classification using EfficientNet-B4 with Attention
Vysotska et al. Research of Methods for Image Sharpness Evaluation in Photos of People.
CN114399530B (en) Sample differential learning-based related filtering target tracking method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant