CN117372756A - Training method and device for image classification model, electronic equipment and storage medium - Google Patents

Training method and device for image classification model, electronic equipment and storage medium Download PDF

Info

Publication number
CN117372756A
CN117372756A CN202311309107.XA CN202311309107A CN117372756A CN 117372756 A CN117372756 A CN 117372756A CN 202311309107 A CN202311309107 A CN 202311309107A CN 117372756 A CN117372756 A CN 117372756A
Authority
CN
China
Prior art keywords
image
category
training
classification model
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311309107.XA
Other languages
Chinese (zh)
Inventor
黄高
杜超群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202311309107.XA priority Critical patent/CN117372756A/en
Publication of CN117372756A publication Critical patent/CN117372756A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a training method, device, electronic equipment and storage medium of an image classification model, relates to the technical field of image processing, and aims to solve the problem of unbalanced class data. The method comprises the following steps: inputting a plurality of image samples carrying image category labels into an image classification model to be trained to obtain respective target image characteristics of the plurality of image samples; determining vMF distribution of image features of each image category according to respective target image features of image samples carrying the same image category labels; sampling vMF distribution of the image characteristics of each image category to obtain a plurality of sampled image characteristics; generating a plurality of image feature comparison pairs according to each sampling image feature and each target image feature; and training the image classification model to be trained based on the image feature comparison pairs to obtain a trained image classification model.

Description

Training method and device for image classification model, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of image processing, and in particular relates to a training method and device for an image classification model, electronic equipment and a storage medium.
Background
Long tail distribution often occurs in real world data, where most categories contain only small amounts of data, with data distribution being unbalanced. Training the model by using data with unbalanced distribution results in better learning effect on a larger number of samples and poorer learning effect on a smaller number of samples. Therefore, the long tail distribution of the data can greatly reduce the generalization performance of the model, and the problem of over fitting is easy to occur.
Related studies have shown that supervised contrast learning (SCL, supervised Contrastive Learning) has great potential in alleviating data imbalance. However, supervised contrast learning requires a large enough volume of training data to build a contrast that covers all categories, which is difficult to meet in the case of an imbalance in category data.
Disclosure of Invention
In view of the foregoing, embodiments of the present disclosure provide a training method, apparatus, electronic device, and storage medium for an image classification model, so as to overcome or at least partially solve the foregoing problems.
In a first aspect of an embodiment of the present disclosure, there is provided a training method of an image classification model, the method including:
Inputting a plurality of image samples carrying image category labels into an image classification model to be trained to obtain respective target image characteristics of the plurality of image samples;
determining vMF distribution of image features of each image category according to respective target image features of image samples carrying the same image category labels;
sampling vMF distribution of the image characteristics of each image category to obtain a plurality of sampled image characteristics;
generating a plurality of image feature comparison pairs according to each sampling image feature and each target image feature, wherein the image feature comparison pairs comprise the following two types: the positive pair of image features of two same image categories and the negative pair of image features of two different image categories;
and training the image classification model to be trained based on the image feature comparison pairs to obtain a trained image classification model.
Optionally, the training the image classification model to be trained based on the image feature comparison pairs to obtain a trained image classification model includes:
deriving a closed expression of a contrast loss function from the vMF distribution of image features for each image class with infinite image feature contrast sampled from the vMF distribution;
Determining a contrast loss function value according to the image feature contrast, the image category of the target image feature in the image feature contrast and the closed expression of the contrast loss function;
and training the image classification model to be trained based on the contrast loss function value to obtain the image classification model.
Optionally, the inputting the plurality of image samples carrying the image category labels into the image classification model to be trained to obtain the respective target image features of the plurality of image samples includes:
acquiring a plurality of image samples of a plurality of training batches;
inputting a plurality of image samples of each training batch into an image classification model to be trained of the training batch to obtain respective target image characteristics of the plurality of image samples of each training batch;
the determining vMF distribution of the image features of each image category according to the respective target image features of the image samples carrying the same image category labels comprises:
obtaining image characteristics of each image category in each training batch according to respective target image characteristics of a plurality of image samples of the training batch and respective target image characteristics of a plurality of image samples of each training batch before the training batch;
Constructing probability density functions of image characteristics vMF distribution of each image category according to the image characteristics of each image category;
acquiring statistical data of image characteristics of each image category;
determining average parameters and concentrated parameters of probability density functions of image features vMF distribution of each image category by adopting a maximum likelihood estimation method according to the statistical data;
the image feature vMF distribution for each of the image categories is determined from the average and set parameters of the probability density function of the image feature vMF distribution for each of the image categories.
Optionally, after the obtaining the plurality of sampled image features, the method further comprises:
acquiring image characteristics of an image to be processed;
according to the image characteristics of the image to be processed and the image characteristics of each image category, respectively calculating the contrast loss function value of the image to be processed in each image category;
determining the image category with the minimum contrast loss function value as the image category of the image to be processed;
and generating an image category pseudo tag of the image to be processed according to the image category of the image to be processed, wherein the image category pseudo tag is used for performing semi-supervised training.
In a second aspect of embodiments of the present disclosure, there is provided a training apparatus for an image classification model, the apparatus including:
the input module is used for inputting a plurality of image samples carrying image category labels into an image classification model to be trained to obtain respective target image characteristics of the plurality of image samples;
the determining module is used for determining vMF distribution of the image characteristics of each image category according to the respective target image characteristics of the image samples carrying the same image category labels;
the sampling module is used for sampling vMF distribution of the image characteristics of each image category to obtain a plurality of sampled image characteristics;
the generation module is used for generating a plurality of image feature comparison pairs according to each sampling image feature and each target image feature, wherein the image feature comparison pairs comprise the following two types: the positive pair of image features of two same image categories and the negative pair of image features of two different image categories;
and the training module is used for training the image classification model to be trained based on the image feature comparison and comparison to obtain a trained image classification model.
Optionally, the training module is specifically configured to:
Deriving a closed expression of a contrast loss function from the vMF distribution of image features for each image class with infinite image feature contrast sampled from the vMF distribution;
determining a contrast loss function value according to the image feature contrast, the image category of the target image feature in the image feature contrast and the closed expression of the contrast loss function;
and training the image classification model to be trained based on the contrast loss function value to obtain the image classification model.
Optionally, the input module is specifically configured to:
acquiring a plurality of image samples of a plurality of training batches;
inputting a plurality of image samples of each training batch into an image classification model to be trained of the training batch to obtain respective target image characteristics of the plurality of image samples of each training batch;
the determining vMF distribution of the image features of each image category according to the respective target image features of the image samples carrying the same image category labels comprises:
obtaining image characteristics of each image category in each training batch according to respective target image characteristics of a plurality of image samples of the training batch and respective target image characteristics of a plurality of image samples of each training batch before the training batch;
Constructing probability density functions of image characteristics vMF distribution of each image category according to the image characteristics of each image category;
acquiring statistical data of image characteristics of each image category;
determining average parameters and concentrated parameters of probability density functions of image features vMF distribution of each image category by adopting a maximum likelihood estimation method according to the statistical data;
the image feature vMF distribution for each of the image categories is determined from the average and set parameters of the probability density function of the image feature vMF distribution for each of the image categories.
Optionally, after the obtaining the plurality of sampled image features, the apparatus further comprises:
the acquisition module is used for acquiring image characteristics of the image to be processed;
the computing module is used for respectively computing the contrast loss function value of the image to be processed in each image category according to the image characteristics of the image to be processed and the image characteristics of each image category;
the class determining module is used for determining the image class with the minimum contrast loss function value as the image class of the image to be processed;
the pseudo tag generation module is used for generating an image category pseudo tag of the image to be processed according to the image category of the image to be processed, and the image category pseudo tag is used for performing semi-supervised training.
In a third aspect of the disclosed embodiments, there is provided an electronic device, including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute instructions to implement the training method of the image classification model as in the first aspect.
In a fourth aspect of embodiments of the present disclosure, a computer-readable storage medium is provided, which when executed by a processor of an electronic device, causes the electronic device to perform the training method of the image classification model as in the first aspect.
Embodiments of the present disclosure include the following advantages:
in the embodiment of the disclosure, a plurality of image samples carrying image category labels are input into an image classification model to be trained, and respective target image characteristics of the plurality of image samples are obtained; according to the respective target image characteristics of the image samples carrying the same image category labels, determining vMF distribution (von Mises-Fisher distribution, a distribution on a unit sphere) of the image characteristics of each image category; sampling vMF distribution of the image characteristics of each image category to obtain a plurality of sampled image characteristics; generating a plurality of image feature comparison pairs according to each sampling image feature and each target image feature, wherein the image feature comparison pairs comprise the following two types: the positive pair of image features of two same image categories and the negative pair of image features of two different image categories; and training the image classification model to be trained based on the image feature comparison pairs to obtain a trained image classification model. Thus, by balancing the number of times the image feature vMF of each image class is sampled, the problem of data imbalance for different image classes can be solved. Furthermore, the initial image classification model is trained according to the image feature comparison of data class balance, and the image classification model with excellent performance can be obtained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are needed in the description of the embodiments of the present disclosure will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
FIG. 1 is a flow chart of steps of a training method for an image classification model in an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a training device for an image classification model in an embodiment of the disclosure.
Detailed Description
In order that the above-recited objects, features and advantages of the present disclosure will become more readily apparent, a more particular description of the disclosure will be rendered by reference to the appended drawings and appended detailed description.
Referring to fig. 1, a step flowchart of a training method of an image classification model in an embodiment of the disclosure is shown, and as shown in fig. 1, the training method of an image classification model may specifically include steps S11 to S15.
In step S11: and inputting a plurality of image samples carrying image category labels into an image classification model to be trained to obtain respective target image characteristics of the plurality of image samples.
The plurality of image samples may be from different image data sets, and the image data sets may be image data sets with balanced image category distribution, or may be image data sets with unbalanced image category distribution. Each image sample carries an image category label that characterizes the image category to which the image sample belongs.
And inputting each image sample into an image classification model to be trained, wherein the image classification model comprises a feature extraction network which can extract the image features of each image sample to obtain the respective target image features of each image sample. The structure of the feature extraction network may be referred to in the related art.
Alternatively, a plurality of first image samples carrying image category labels may be obtained first, and then, multiple data enhancement processing is performed on each first image sample to obtain a second image sample corresponding to each first image sample. The second image sample has the same image category label as the corresponding first image sample.
Wherein the data enhancement processing performed on the first image sample may include, but is not limited to: random clipping, flipping, rotating, scaling, panning, adding noise, blurring, masking, and gamut changing, etc.
After the first image sample and the second image sample are obtained, the first image sample and the second image sample are respectively input into an image classification model to obtain target image characteristics of each image sample.
In step S12: the vMF distribution of image features for each image class is determined from the respective target image features of the image samples carrying the same image class labels.
The image features contain rich semantic information, the statistical data of the image features can represent intra-class and inter-class changes of the image, normal distributed unconstrained features are modeled from the angle of data enhancement, and the upper bound of expected cross entropy loss can be obtained. However, modeling directly with normal distributions is not trusted due to the normalization of features in contrast learning. Furthermore, for long tail data, it is not possible to estimate the distribution of all categories from a small batch of data. Thus, a reasonable and simple vMF distribution is built on the unit hypersphere in the disclosed embodiment to model the image feature distribution of each image class. vMF distribution is the probability distribution over a unit hypersphere controlled by the mean and concentration parameters μ, k.
The image samples carrying the same image category labels are the image samples of the same image category, and the target image characteristics of the image samples of one image category are the image characteristics of the image category. Thus, the vMF distribution of image features for the image class may be determined from the respective target image features of the image samples carrying the same image class labels. The method of determining vMF distribution will be described in detail later.
In step S13: and sampling vMF distribution of the image characteristics of each image category to obtain a plurality of sampled image characteristics.
The vMF distribution of the image features of each image class is sampled, and the resulting sampled image features are the image features of that image class.
In step S14: generating a plurality of image feature comparison pairs according to each sampling image feature and each target image feature, wherein the image feature comparison pairs comprise the following two types: a positive pair of image features of two identical image categories and a negative pair of image features of two different image categories.
And combining each sampling image characteristic with each target image characteristic to obtain a plurality of image characteristic comparison pairs. Each image feature pair includes a sample image feature and a target image feature. If the image feature contrast comprises the sampling image feature and the target image feature which belong to the same image category, the image feature contrast is opposite; if one image feature contrast pair includes a sampled image feature and a target image feature that are image features belonging to different image categories, the image feature contrast pair is a negative pair.
In step S15: and training the image classification model to be trained based on the image feature comparison pairs to obtain a trained image classification model.
The image classification model to be trained performs contrast learning by utilizing image feature contrast, wherein the contrast learning is a mapping relation which can enable features of the same image category but far away in a high-dimensional space to be closer after being mapped to a low-dimensional space, and image features of different image categories but far away in the high-dimensional space to be farther after being mapped to the low-dimensional space.
Optionally, the training the image classification model to be trained based on the image feature comparison pairs to obtain a trained image classification model may include: deriving a closed expression of a contrast loss function from the vMF distribution of image features for each image class with infinite image feature contrast sampled from the vMF distribution; determining a contrast loss function value according to the image feature contrast, the image category of the target image feature in the image feature contrast and the closed expression of the contrast loss function; and training the image classification model to be trained based on the contrast loss function value to obtain the image classification model.
The image classification model to be trained can predict whether the image feature contrast is positive or negative, and determine a contrast loss function value according to the prediction result and whether the image feature contrast is positive or negative. According to the contrast loss function value, the image classification model to be trained can be trained, and the trained image classification model is obtained.
Alternatively, to increase efficiency, the number of sampled image feature contrast can be expanded to infinity using mathematical analysis methods and closed-form expressions for contrast loss functions can be derived strictly. Thus, a closed expression of the contrast loss function with infinite image feature contrast sampled from the vMF distribution can be derived from the vMF distribution of image features for each image class. Determining a contrast loss function value according to the image feature contrast, the image category of the target image feature in the image feature contrast and the closed expression of the contrast loss function; and training the initial image classification model based on the contrast loss function value to obtain the image classification model.
The analytical solution is a strict formula, and the dependent variable can be obtained by giving any independent variable, and the dependent variable is the solution of the problem. The analytical solution is a solution form, and a method for obtaining the analytical solution is called an analytical method. The analytical solution is a closed form function, so that for any independent variable, it can be brought into the analytical function to obtain the correct dependent variable. Thus, the analytical solution is also referred to as a solution to the closed-form expression. Therefore, the image feature contrast and the image category of the target image feature in the image feature contrast are substituted into the closed expression of the contrast loss function, and the contrast loss function value can be obtained. And training the initial image classification model based on the contrast loss function value to obtain an image classification model.
Based on the technical scheme, the initial image classification model is trained only based on the contrast loss function value, and the image classification model obtained through training can only judge whether two image features are of the same image category or not, but cannot judge which image category an image feature belongs to. Therefore, the initial image classification model can be trained by adjusting the loss function value, and a trained image classification model is obtained.
A plurality of image samples carrying image class labels are input into an initial image classification model, which can predict a predicted class for each image sample. From the image class label and the prediction class of the image sample, an adjustment loss function value may be determined. And further, training the initial image classification model based on the adjustment loss function value and the comparison loss function value, thereby obtaining the image classification model.
Optionally, on the basis of the above technical solution, inputting the plurality of image samples carrying the image class labels into the image classification model to be trained to obtain the respective target image features of the plurality of image samples may include: acquiring a plurality of image samples of a plurality of training batches; and inputting the plurality of image samples of each training batch into an image classification model to be trained of the training batch to obtain respective target image characteristics of the plurality of image samples of each training batch.
The image classification model may have a plurality of training batches, with model parameters of the image classification model to be trained being different for each training batch. The image samples can be divided into a plurality of training batches of image samples, the image samples of each training batch are output to an image classification model to be trained of the training batch, and the respective target image characteristics of the image samples of each training batch are obtained.
The determining vMF distribution of the image features of each image class according to the respective target image features of the image samples carrying the same image class labels may include: obtaining image characteristics of each image category in each training batch according to respective target image characteristics of a plurality of image samples of the training batch and respective target image characteristics of a plurality of image samples of each training batch before the training batch; constructing probability density functions of image characteristics vMF distribution of each image category according to the image characteristics of each image category; acquiring statistical data of image characteristics of each image category; determining average parameters and concentrated parameters of probability density functions of image features vMF distribution of each image category by adopting a maximum likelihood estimation method according to the statistical data; the image feature vMF distribution for each of the image categories is determined from the average and set parameters of the probability density function of the image feature vMF distribution for each of the image categories.
And in each training batch, combining the target image characteristics of each image sample extracted from the historical training batch and the target image characteristics of each image sample extracted from the current training batch to obtain the image characteristics of each image category. From the image features of each image class, a probability density function of vMF distribution of the image features of each image class is constructed.
The calculation is relatively simple when using the maximum likelihood estimation method to determine the average and the concentration parameters of the vMF distributed probability density function for the image features of each image class.
The more target image features that are based on, the better the average and the concentration parameters of the probability density function of the vMF distribution are estimated per training batch. According to the embodiment of the disclosure, the samples of each training batch are accumulated for estimation in the training process, the number of the samples is increased, and each training batch only needs to estimate one average parameter and one concentrated parameter.
Alternatively, vMF distribution of the image features of each image category may be directly sampled to obtain the image features of each image category, and the image features of each image category are input into an initial image classification model, where the initial image classification model predicts the image category to which the image features belong. And determining an adjustment loss function value according to the real image category corresponding to the image feature and the image category of the predicted image feature.
By adopting the technical scheme of the embodiment of the disclosure, the problem of unbalanced data of different image categories can be solved by balancing the times of sampling the vMF of the image characteristics of each image category. Furthermore, the initial image classification model is trained according to the image feature comparison of data class balance, and the image classification model with excellent performance can be obtained.
Alternatively, on the basis of the above technical solution, the contrast loss function may be applied to semi-supervised learning, directly used to generate image class pseudo tags for images that are not classified, and then used to estimate the distribution in the reverse direction.
For example, a weakly enhanced image may be predicted to generate an image class pseudo tag for a strongly enhanced image. Since the image feature distribution is introduced, the contrast loss function value of the weak enhanced image of each image class can be calculated to represent the posterior probability, thereby generating the image class pseudo tag.
Sampling vMF distribution of the image characteristics of each image category to obtain the image characteristics of each image category; acquiring image characteristics of an image to be processed; respectively calculating a contrast loss function value of each image category according to the image characteristics of the image to be processed and the image characteristics of each image category; determining the image category with the minimum contrast loss function value as the image category of the image to be processed; and generating an image category pseudo tag of the image to be processed according to the image category of the image to be processed, wherein the image category pseudo tag is used for performing semi-supervised training on a model.
The image to be processed may be any image. The method of acquiring the image characteristics of the image to be processed may be referred to above. Substituting the image characteristics of the image to be processed and the image characteristics of each image category into the closed expression of the contrast loss function, and calculating to obtain the contrast loss function value between the image characteristics of the image to be processed and the image characteristics of each image category. And determining the image type with the minimum contrast loss function value as the image type of the image to be processed. According to the image type of the image to be processed, an image type pseudo tag of the image to be processed can be generated. The image to be processed carrying the image category pseudo tag can be used as a training sample in semi-supervised training, and the model is subjected to semi-supervised training.
The embodiment of the disclosure provides a new probability contrast learning (ProCo, probabilistic Contrastive Learning) algorithm, which is improved on the basis of SCL.
Firstly, introducing an SCL algorithm to lay a foundation for a ProCo algorithm. Taking image classification as an example, a set of image samples is givenWherein x is i Characterization of the ith image sample, y i The image class to which the i-th image sample belongs is characterized, N is the number of image samples, i=1, 2, … …, N. The SCL algorithm may add space +. >Image sample mapping in (a)To the image category space->Wherein K represents the image category. Mapping function->Typically modeled as a neural network, which is composed of feature extractor F: />And a linear classifier G: />Composition is prepared.
The loss of the logits (predictive vector) adjustment is a loss margin modification method, and various prior probabilities are adopted as boundaries in the training and reasoning process. Can determine the adjustment loss functionThe method comprises the following steps:
where y' is the predicted image class of xi,characterizing image class y in image sample set i Class frequency of pi y′ Characterizing class frequencies of predicted image classes y' in the image sample set; />Is the image category y i Is>Logits that are predicted image categories y'; exp characterizes an exponential function based on a natural constant e. The rest are respectivelyThe meaning of the individual characters may be referred to above.
SCL can distinguish between having the same label y i =y j Is opposite to (x) i ,x j ) And having different labels y i ≠y j Negative pair (x) i ,x j ). Given an arbitrary lot of image sample-image category labelsAnd a temperature parameter τ, two expressions for SCL loss can be determined:
wherein,and->Two losses of SCL, respectively, differ in the location of log. A (j) is lot B/{ (x) with the same image class label j i ,y i ) An index set of instances in,is its base, z i 、z p 、z a Respectively representing the image samples x extracted by the feature extractor i 、x p 、x a Is included in the normalized features of (a). The meaning of the remaining individual characters may be referred to above.
For any image sample in a batch of image samples, SCL treats other image samples with the same image class label as positive samples, and the rest as negative samples. Thus, the batch process must contain a sufficient number of image samples to ensure that each instance receives the appropriate supervisory signal. However, large batches of image samples tend to result in significant computational and memory burden. Furthermore, in an actual machine learning scenario, the data distribution typically exhibits a long tail pattern with few tail classes sampled in the small lot. This special feature requires a further expansion of the number of pattern samples to effectively supervise the tail class.
To address this problem, embodiments of the present disclosure propose a ProCo algorithm that constructs a comparison by estimating the feature distribution and samples. The unconstrained features are modeled from a data enhancement perspective with normal distribution and an upper bound for the expected loss of optimization is obtained. However, the features in contrast learning are limited to unit hyperspheres and are not suitable for modeling them directly with normal distributions. Furthermore, due to the unbalanced distribution of training data, the distribution parameters of all classes cannot be estimated in one small batch. We have therefore introduced a vMF distribution defined on the hypersphere, the parameters of which can be effectively estimated by maximum likelihood estimation of different batches. Furthermore, rather than an upper bound for efficient optimization, a closed form of the expected contrast loss function is strictly derived and applied to semi-supervised learning.
First, assuming that the image features follow a mixed vMF distribution, the probability density function of the vMF distribution of random p-dimensional unit vectors z is:
wherein, K is more than or equal to 0, mu is less than or equal to 2 =1,I (p/2-1) A first type of modified bessel function, representing the order p/2-1, is defined as:
where μ is the average parameter and κ is the concentration parameter. The more concentrated the vicinity of the average parameter μ when κ increases; when κ=0, the distribution on the balls is uniform.
Under the above assumption, a mixed vMF distribution is used to simulate the image feature distribution:
wherein the probability of the image class y is estimated to be pi y Corresponding to the frequency of class y in the image sample set. The average parameter mu and the central parameter kappa in the image characteristic distribution can be estimated by adopting a maximum likelihood estimation method.
Supposing that in the unit hypersphere S p-1 A series of N independent unit vectorsDerived from a vMF distribution. The maximum likelihood estimate of the average parameter μ and the centered parameter κ satisfies the following equation:
wherein,for the sample mean->Is the length of the sample mean. Approximation of kappa>Can be expressed as:
the average value of the samples for each image class is estimated in an on-line manner by summarizing the statistics of the previous batch and the statistics of the current batch. Specifically, the maximum likelihood estimation is performed using the estimated sample mean value of the previous batch, while a new sample mean value is maintained in the initialization of the current batch through online estimation:
Wherein,is the estimated sample average value of the j-th class at step t; />Is the sample average value of the j-th class in the current lot,/>Represents the number of samples of the previous batch, +.>Representing the number of samples of the current lot.
Based on the estimated parameters, image feature contrast can be sampled directly from the mixed vMF distribution. However, it is less efficient to sample a sufficient amount of data from the vMF distribution in each training iteration. For this reason, the embodiment of the disclosure adopts a mathematical analysis method to expand the number of samples to infinity, and strictly derives a closed expression of the contrast loss function, where the closed expression of the contrast loss function may be two kinds of:
wherein,τ is a temperature parameter.
In this way, no complex sampling operations are required, but rather infinite contrast samples are implicitly achieved by proxy loss and can be efficiently optimized. This design solves the problem of SCLs that need to rely on large amounts of data. In addition, the assumption of image feature distribution and parameter estimation can effectively acquire feature diversity among different image categories, so that better performance is obtained.
The image classification model may employ a two-branch design, including a linear classifier-based classification branch and a projection head-based representation branch, which is a multi-layer perceptron that maps a vector representation to another feature space for decoupling from the classifier. And for the classified branches and the represented branches, carrying out weighted summation by adopting an adjustment loss function and a comparison loss function to obtain a total loss function. The initial image classification model is trained using the total loss function. An additional representation branch is introduced during training, can be effectively optimized by a random gradient descent algorithm together with classification branches, and does not introduce any additional overhead during reasoning.
The disclosed embodiments employ vMF distribution to model feature distribution, with the following advantages:
1) The distribution parameters of vMF distribution can be estimated by using only the maximum likelihood estimation of the first sample moment, and the distribution parameters of different batches can be effectively calculated in the training process;
2) Based on this formula, when the number of samples goes to infinity, a closed form of the expected loss can be strictly derived, thereby avoiding the necessity of explicitly sampling a large number of comparisons, while minimizing the proxy loss function, which can effectively achieve optimization without introducing any additional overhead in the reasoning process.
It can be appreciated that different image samples carrying different labels can be used to train to obtain image classification models suitable for different application scenarios. For example, when training an image classification model, the image sample used is a heart ultrasound image sample, and a target image classification model for classifying a heart ultrasound image (also called an echocardiogram) can be obtained.
The ultrasound image of the heart may show internal sectional structures of the heart, such as the heart chamber, atrium, artery, etc. The cardiac ultrasound images may be classified in a variety of dimensions, for example, by ultrasound technique, by tangent plane, by image quality, by tangent plane integrity, and so forth. Classification by ultrasound technology can be divided into: two-dimensional echocardiography technical types, spectrum Doppler echocardiography technical types, color Doppler echocardiography technical types and the like; classification by section can be divided into: a parasternal long axis section, a cardiac apex four-cavity cardiac section, a parasternal left chamber short axis section, a xiphoid lower section and other various section types; classification by image quality may be divided into various image quality classes, e.g., first, second, third … … N classes with progressive quality enhancement; the classification by facet integrity may be divided into a variety of facet integrity levels, for example, level one, two, three … … N with progressive quality enhancement.
Conventionally, the above-mentioned various classification works are implemented by means of manual classification, and are affected by human factors, so that it is not possible to output high classification efficiency nor ensure the accuracy of classification. The heart ultrasonic images are classified by using the target image classification model obtained through training in the embodiment of the disclosure, and the heart ultrasonic images can be classified efficiently and accurately.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the disclosed embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the disclosed embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the disclosed embodiments.
Fig. 2 is a schematic structural diagram of a training device for an image classification model according to an embodiment of the disclosure, as shown in fig. 2, where the device includes an input module, a determining module, a sampling module, a generating module, and a training module, where:
The input module is used for inputting a plurality of image samples carrying image category labels into an image classification model to be trained to obtain respective target image characteristics of the plurality of image samples;
the determining module is used for determining vMF distribution of the image characteristics of each image category according to the respective target image characteristics of the image samples carrying the same image category labels;
the sampling module is used for sampling vMF distribution of the image characteristics of each image category to obtain a plurality of sampled image characteristics;
the generation module is used for generating a plurality of image feature comparison pairs according to each sampling image feature and each target image feature, wherein the image feature comparison pairs comprise the following two types: the positive pair of image features of two same image categories and the negative pair of image features of two different image categories;
and the training module is used for training the image classification model to be trained based on the image feature comparison and comparison to obtain a trained image classification model.
Optionally, the training module is specifically configured to:
deriving a closed expression of a contrast loss function from the vMF distribution of image features for each image class with infinite image feature contrast sampled from the vMF distribution;
Determining a contrast loss function value according to the image feature contrast, the image category of the target image feature in the image feature contrast and the closed expression of the contrast loss function;
and training the image classification model to be trained based on the contrast loss function value to obtain the image classification model.
Optionally, the input module is specifically configured to:
acquiring a plurality of image samples of a plurality of training batches;
inputting a plurality of image samples of each training batch into an image classification model to be trained of the training batch to obtain respective target image characteristics of the plurality of image samples of each training batch;
the determining vMF distribution of the image features of each image category according to the respective target image features of the image samples carrying the same image category labels comprises:
obtaining image characteristics of each image category in each training batch according to respective target image characteristics of a plurality of image samples of the training batch and respective target image characteristics of a plurality of image samples of each training batch before the training batch;
constructing probability density functions of image characteristics vMF distribution of each image category according to the image characteristics of each image category;
Acquiring statistical data of image characteristics of each image category;
determining average parameters and concentrated parameters of probability density functions of image features vMF distribution of each image category by adopting a maximum likelihood estimation method according to the statistical data;
the image feature vMF distribution for each of the image categories is determined from the average and set parameters of the probability density function of the image feature vMF distribution for each of the image categories.
Optionally, after the obtaining the plurality of sampled image features, the apparatus further comprises:
the acquisition module is used for acquiring image characteristics of the image to be processed;
the computing module is used for respectively computing the contrast loss function value of the image to be processed in each image category according to the image characteristics of the image to be processed and the image characteristics of each image category;
the class determining module is used for determining the image class with the minimum contrast loss function value as the image class of the image to be processed;
the pseudo tag generation module is used for generating an image category pseudo tag of the image to be processed according to the image category of the image to be processed, and the image category pseudo tag is used for performing semi-supervised training.
It should be noted that, the device embodiment is similar to the method embodiment, so the description is simpler, and the relevant places refer to the method embodiment.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the disclosed embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present disclosure may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present disclosure are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices, and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the disclosed embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the disclosed embodiments.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal device comprising the element.
The foregoing has described in detail the method, apparatus, electronic device and storage medium for training an image classification model provided by the present disclosure, and specific examples have been applied to illustrate the principles and embodiments of the present disclosure, where the foregoing examples are only for aiding in understanding the method of the present disclosure and the core ideas thereof; meanwhile, as one of ordinary skill in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present disclosure, the contents of the present specification should not be construed as limiting the present disclosure in summary.

Claims (10)

1. A method of training an image classification model, the method comprising:
inputting a plurality of image samples carrying image category labels into an image classification model to be trained to obtain respective target image characteristics of the plurality of image samples;
determining vMF distribution of image features of each image category according to respective target image features of image samples carrying the same image category labels;
sampling vMF distribution of the image characteristics of each image category to obtain a plurality of sampled image characteristics;
generating a plurality of image feature comparison pairs according to each sampling image feature and each target image feature, wherein the image feature comparison pairs comprise the following two types: the positive pair of image features of two same image categories and the negative pair of image features of two different image categories;
and training the image classification model to be trained based on the image feature comparison pairs to obtain a trained image classification model.
2. The method of claim 1, wherein training the image classification model to be trained based on the plurality of image feature comparison results in a trained image classification model, comprising:
Deriving a closed expression of a contrast loss function from the vMF distribution of image features for each image class with infinite image feature contrast sampled from the vMF distribution;
determining a contrast loss function value according to the image feature contrast, the image category of the target image feature in the image feature contrast and the closed expression of the contrast loss function;
and training the image classification model to be trained based on the contrast loss function value to obtain the image classification model.
3. The method of claim 1, wherein inputting the plurality of image samples carrying image class labels into the image classification model to be trained, resulting in respective target image features of the plurality of image samples, comprises:
acquiring a plurality of image samples of a plurality of training batches;
inputting a plurality of image samples of each training batch into an image classification model to be trained of the training batch to obtain respective target image characteristics of the plurality of image samples of each training batch;
the determining vMF distribution of the image features of each image category according to the respective target image features of the image samples carrying the same image category labels comprises:
Obtaining image characteristics of each image category in each training batch according to respective target image characteristics of a plurality of image samples of the training batch and respective target image characteristics of a plurality of image samples of each training batch before the training batch;
constructing probability density functions of image characteristics vMF distribution of each image category according to the image characteristics of each image category;
acquiring statistical data of image characteristics of each image category;
determining average parameters and concentrated parameters of probability density functions of image features vMF distribution of each image category by adopting a maximum likelihood estimation method according to the statistical data;
the image feature vMF distribution for each of the image categories is determined from the average and set parameters of the probability density function of the image feature vMF distribution for each of the image categories.
4. The method of claim 1, wherein after the obtaining the plurality of sampled image features, the method further comprises:
acquiring image characteristics of an image to be processed;
according to the image characteristics of the image to be processed and the image characteristics of each image category, respectively calculating the contrast loss function value of the image to be processed in each image category;
Determining the image category with the minimum contrast loss function value as the image category of the image to be processed;
and generating an image category pseudo tag of the image to be processed according to the image category of the image to be processed, wherein the image category pseudo tag is used for performing semi-supervised training.
5. An apparatus for training an image classification model, the apparatus comprising:
the input module is used for inputting a plurality of image samples carrying image category labels into an image classification model to be trained to obtain respective target image characteristics of the plurality of image samples;
the determining module is used for determining vMF distribution of the image characteristics of each image category according to the respective target image characteristics of the image samples carrying the same image category labels;
the sampling module is used for sampling vMF distribution of the image characteristics of each image category to obtain a plurality of sampled image characteristics;
the generation module is used for generating a plurality of image feature comparison pairs according to each sampling image feature and each target image feature, wherein the image feature comparison pairs comprise the following two types: the positive pair of image features of two same image categories and the negative pair of image features of two different image categories;
And the training module is used for training the image classification model to be trained based on the image feature comparison and comparison to obtain a trained image classification model.
6. The apparatus of claim 5, wherein the training module is specifically configured to: :
deriving a closed expression of a contrast loss function from the vMF distribution of image features for each image class with infinite image feature contrast sampled from the vMF distribution;
determining a contrast loss function value according to the image feature contrast, the image category of the target image feature in the image feature contrast and the closed expression of the contrast loss function;
and training the image classification model to be trained based on the contrast loss function value to obtain the image classification model.
7. The apparatus of claim 5, wherein the input module is specifically configured to: :
acquiring a plurality of image samples of a plurality of training batches;
inputting a plurality of image samples of each training batch into an image classification model to be trained of the training batch to obtain respective target image characteristics of the plurality of image samples of each training batch;
The determining vMF distribution of the image features of each image category according to the respective target image features of the image samples carrying the same image category labels comprises:
obtaining image characteristics of each image category in each training batch according to respective target image characteristics of a plurality of image samples of the training batch and respective target image characteristics of a plurality of image samples of each training batch before the training batch;
constructing probability density functions of image characteristics vMF distribution of each image category according to the image characteristics of each image category;
acquiring statistical data of image characteristics of each image category;
determining average parameters and concentrated parameters of probability density functions of image features vMF distribution of each image category by adopting a maximum likelihood estimation method according to the statistical data;
the image feature vMF distribution for each of the image categories is determined from the average and set parameters of the probability density function of the image feature vMF distribution for each of the image categories.
8. The apparatus of claim 5, wherein after said deriving a plurality of sampled image features, the apparatus further comprises:
The acquisition module is used for acquiring image characteristics of the image to be processed;
the computing module is used for respectively computing the contrast loss function value of the image to be processed in each image category according to the image characteristics of the image to be processed and the image characteristics of each image category;
the class determining module is used for determining the image class with the minimum contrast loss function value as the image class of the image to be processed;
the pseudo tag generation module is used for generating an image category pseudo tag of the image to be processed according to the image category of the image to be processed, and the image category pseudo tag is used for performing semi-supervised training.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the training method of the image classification model of any of claims 1 to 4.
10. A computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the training method of the image classification model of any of claims 1-4.
CN202311309107.XA 2023-10-10 2023-10-10 Training method and device for image classification model, electronic equipment and storage medium Pending CN117372756A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311309107.XA CN117372756A (en) 2023-10-10 2023-10-10 Training method and device for image classification model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311309107.XA CN117372756A (en) 2023-10-10 2023-10-10 Training method and device for image classification model, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117372756A true CN117372756A (en) 2024-01-09

Family

ID=89390385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311309107.XA Pending CN117372756A (en) 2023-10-10 2023-10-10 Training method and device for image classification model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117372756A (en)

Similar Documents

Publication Publication Date Title
CN110852447A (en) Meta learning method and apparatus, initialization method, computing device, and storage medium
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
CN109389171B (en) Medical image classification method based on multi-granularity convolution noise reduction automatic encoder technology
CN112052948B (en) Network model compression method and device, storage medium and electronic equipment
CN109063743B (en) Construction method of medical data classification model based on semi-supervised multitask learning
JP2022141931A (en) Method and device for training living body detection model, method and apparatus for living body detection, electronic apparatus, storage medium, and computer program
CN111898703B (en) Multi-label video classification method, model training method, device and medium
CN114998602B (en) Domain adaptive learning method and system based on low confidence sample contrast loss
CN113610552A (en) User loss prediction method and device
JP6172317B2 (en) Method and apparatus for mixed model selection
KR20220024990A (en) Framework for Learning to Transfer Learn (L2TL)
CN114186063A (en) Training method and classification method of cross-domain text emotion classification model
CN114332545A (en) Image data classification method and device based on low-bit pulse neural network
CN115270752A (en) Template sentence evaluation method based on multilevel comparison learning
CN109101984B (en) Image identification method and device based on convolutional neural network
Jat et al. Applications of statistical techniques and artificial neural networks: A review
CN110866609B (en) Method, device, server and storage medium for acquiring interpretation information
CN110288002B (en) Image classification method based on sparse orthogonal neural network
CN117372756A (en) Training method and device for image classification model, electronic equipment and storage medium
CN115757112A (en) Test subset construction method based on variation analysis and related equipment
Nakashima et al. Incremental learning of fuzzy rule-based classifiers for large data sets
Shen et al. A deep embedding model for co-occurrence learning
CN113935413A (en) Distribution network wave recording file waveform identification method based on convolutional neural network
Perez et al. Face Patches Designed through Neuroevolution for Face Recognition with Large Pose Variation
CN112529637A (en) Service demand dynamic prediction method and system based on context awareness

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination