CN113378904A - Image classification method based on anti-domain adaptive network - Google Patents
Image classification method based on anti-domain adaptive network Download PDFInfo
- Publication number
- CN113378904A CN113378904A CN202110607513.9A CN202110607513A CN113378904A CN 113378904 A CN113378904 A CN 113378904A CN 202110607513 A CN202110607513 A CN 202110607513A CN 113378904 A CN113378904 A CN 113378904A
- Authority
- CN
- China
- Prior art keywords
- domain
- image
- network
- source domain
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000003044 adaptive effect Effects 0.000 title claims description 23
- 238000012549 training Methods 0.000 claims abstract description 17
- 230000003042 antagnostic effect Effects 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims description 11
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 abstract description 28
- 238000009826 distribution Methods 0.000 abstract description 26
- 238000002474 experimental method Methods 0.000 abstract description 4
- 230000006978 adaptation Effects 0.000 description 15
- 238000011156 evaluation Methods 0.000 description 11
- 238000012360 testing method Methods 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 6
- 230000008485 antagonism Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000012800 visualization Methods 0.000 description 4
- 238000002679 ablation Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000002790 cross-validation Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- GMBQZIIUCVWOCD-WWASVFFGSA-N Sarsapogenine Chemical compound O([C@@H]1[C@@H]([C@]2(CC[C@@H]3[C@@]4(C)CC[C@H](O)C[C@H]4CC[C@H]3[C@@H]2C1)C)[C@@H]1C)[C@]11CC[C@H](C)CO1 GMBQZIIUCVWOCD-WWASVFFGSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image classification method based on an antagonistic domain self-adaptive network, which redefines an antagonistic loss function LadvThe problem of source domain and target domain generalization is overcome, but redefining the loss-resistant function is costly, and the source domain and target domain distributions cannot be guaranteed to be close by only optimizing a new ADAN target function, i.e. the loss-resistant function, so the invention adopts the method of minimizing the additional divergence, i.e. the metric loss function LmetricTo align the source domain and the target domain. By adopting the antagonistic learning and the metric learning, on one hand, the generalization problem in the prior ADAN is challenged, and on the other hand, the domain divergence is ensured to be minimized in the training process. Experiments show that the method can be well suitable for the unsupervised field self-adaptive task, and the classification performance (accuracy rate) of the target domain image is improved.
Description
Technical Field
The invention belongs to the technical field of image classification, and particularly relates to an image classification method based on an anti-domain adaptive network.
Background
One common assumption behind most machine learning models is: the source domain and the target domain have the same data distribution. However, this common assumption cannot be guaranteed in real-world applications, which may result in a drastic degradation of the performance of classifying the target domain data. Therefore, domain adaptation is proposed to solve this problem by reducing domain differences. The latest Domain Adaptation methods are based on Adversarial learning, and these methods are generally called Adversarial Domain Adaptation Networks (ADAN).
Fig. 1 is a schematic diagram illustrating the principle of an image classification method based on a domain-robust adaptive network.
ADAN is similar to a generative confrontation network (GANS) that trains a feature representation network F (similar to the producers in a generative confrontation network) and a domain arbiter D in a confrontational manner. Specifically, in the image classification method based on the anti-domain adaptive network, images of a source domain and an object domain are respectively input into a feature representation network F for feature extraction to obtain image features, and then the image features are respectively sent into a domain discriminator D, and once the domain discriminator D cannot distinguish whether the image features are from the source domain or the object domain, the learned image features are considered to be unchanged. Thus, the feature representation network F and the classifier C are trained through the source domain image, and the trained feature representation network F and the classifier C can be suitable for classification of the target domain image.
In domain adaptation, the source domain and the target domain have different data distributions, and the goal of domain adaptation is to learn a new feature representation so that the source domain and the target domain can be well aligned. ADAN takes advantage of the idea of adversarial learning, which assumes that the two domains are aligned as long as the domain discriminators are confused. However, recent advances have shown that such assumptions may not be reliable. ADAN, however, inherits the shortcomings of GANS, and even if training is successful, the learned distribution may be far from the expected distribution, which is referred to as the generalization problem in GANS. Therefore, even if the domain discriminator is successfully confused, it cannot be guaranteed that the learned representation is domain-invariant, that is, the trained feature representation network F and the classifier C cannot well classify the target domain image, and the classification performance needs to be improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides an image classification method based on an anti-domain adaptive network, overcomes the generalization of a source domain and a target domain, aligns the source domain and the target domain, and improves the classification performance (accuracy rate) of images in the target domain.
In order to achieve the above object, the present invention provides an image classification method based on an anti-domain adaptive network, comprising the steps of:
(1) construction of an adaptive network of the anti-domain for image classification
Constructing a feature representation network F, a domain discriminator D and a classifier C aiming at images in a source domain and a target domain;
wherein the feature representation network F is used for the image x in the source domainsCarrying out feature extraction to obtain source domain image features F (x)s) For image x in the target domaintCarrying out feature extraction to obtain the image feature F (x) of the target domaint) The domain discriminator D is used for the source domain image feature F (x)s) And target domain image features F (x)t) The probability D (F (x)) of the source domain is obtained by discriminations) And D (F (x))t) Classifier C is used to classify the source domain image features F (x)s) Classifying to obtain image classification probability pi(softmax(F(xs) I ═ 1,2, … I, for the target domain image feature F (x)t) Classifying to obtain image classification probability pi(softmax(F(xt)),i=1,2,…I;
(2) Training of an adaptive network of the antagonistic domain
2.1), extracting a batch of M images from the source domain, denoted xs_1,xs_2,...,xs_MExtracting a batch of N images, denoted x, from the target domaint_1,xt_2,...,xt_NRespectively inputting the image features into a feature representation network F to obtain M source domain image features:
F(xs_1),F(xs_2),...,F(xs_M);
and N target domain image features:
F(xt_1),F(xt_2),...,F(xt_N);
m source domain graphsImage feature F (x)s_1),F(xs_2),...,F(xs_M) And N target domain image features F (x)t_1),F(xt_2),...,F(xt_N) Respectively inputting the data into a domain discriminator D to obtain the probability of belonging to a source domain:
D(F(xs_1)),D(F(xs_2)),...,D(F(xs_M))
D(F(xt_1)),D(F(xt_2)),...,D(F(xt_N))
m source domain image features F (x)s_1),F(xs_2),...,F(xs_M) Respectively inputting the image data into a classifier C to obtain image classification probability:
pi(softmax(F(xs_1))),pi(softmax(F(xs_2))),…,pi(softmax(F(xs_M))),i=1,2,…,I;
wherein I is the number of the image types, and I is the number of the image types;
2.2), calculating the overall target function L of training:
L=minFmaxDLadv+λminFLmetric+βminCLcls
wherein:
Lmetric=E{k(F(xs_m),F(xs_m′))}m=1,2,…,M,m′=1,2,…,M,m≠m′ +E{k(F(xt_n),F(xt_n′))}n=1,2,…,N,n′=1,2,…,N,n≠n′ -2E{k(F(xs_m),F(xt_n))}m=1,2,…,M,n=1,2,…,N
wherein, minFmaxDLadvOf (1) containsMeaning as follows: by updating the network parameters characterizing the network F such that the penalty function L is opposedadvAt a minimum, the network parameters of the arbiter D are updated such that the penalty function L is opposedadvMaximum, formation of antagonistic training, minFLmetricThe meaning of (A) is: by updating the network parameters characterizing the network F such that the metric loss function LmetricMin is minimumCLclsThe meaning of (A) is: the network parameters of the network F are represented by updating features such that the cross entropy loss function LmetricAt minimum, λ, β are equilibrium parameters greater than 0;
wherein E represents the expectation value of all element values in the equation, k (,) is a Gaussian kernel function,to indicate a function, when i ═ ys_mWhen it is 1, the rest is 0, ys_mAs source domain image xs_mTrue class (tag value), pi(softmax(F(xs_m) ()) represents the source domain image features F (x) of the classifier outputs_m) Probability of belonging to class i;
(3) image classification
Inputting an image of an unknown class from a source domain or a target domain into a feature representation network F for feature extraction to obtain image features, and then sending the image features into a classifier C to obtain probabilities belonging to each class, wherein the class corresponding to the maximum probability value is the class of the input image.
The invention aims to realize the following steps:
the invention discloses an image classification method based on an anti-domain adaptive network, which has the following defects that ADAN inherits the GANS: in the field of image classification technology, even if training is successful, the distribution of the learned target domain may be far from the expected distribution, that is, there is a generalization problem and it cannot be guaranteed that the learned image features are domain-invariant. Aiming at the problem, the invention redefines the resistance loss function LadvOvercoming the problem of source and target domain generalization, but redefining the loss-immunity function is costly, and the source domain cannot be guaranteed by merely optimizing a new ADAN target function, i.e., the loss-immunity functionAnd the target domain distribution is close, the invention therefore uses minimizing the additional divergence, i.e. the metric loss function LmetricTo align the source domain and the target domain. By adopting the antagonistic learning and the metric learning, on one hand, the generalization problem in the prior ADAN is challenged, and on the other hand, the domain divergence is ensured to be minimized in the training process. Experiments show that the method can be well suitable for the unsupervised field self-adaptive task, and the classification performance (accuracy rate) of the target domain image is improved.
Drawings
FIG. 1 is a schematic diagram of an image classification method based on a domain-robust adaptive network in the prior art;
FIG. 2 is a flowchart of an embodiment of an image classification method based on an anti-domain adaptive network according to the present invention;
FIG. 3 is a schematic diagram of an embodiment of an adaptive anti-domain network for image classification according to the present invention;
FIG. 4 is the results of the model analysis of the present invention, wherein (a) the parametric sensitivity of λ is given, (b) the test errors in different iterations are given, (c) the total loss value during training is given, and (d) the distribution difference of different features is given;
FIG. 5 is a visualization of t-SNE and learned representations of ablation studies, where (a), (b), and (c) visualize the original representation (non-adapted), CDAN representation, and invention representation, respectively, and the numbers near each cluster are the corresponding category labels, and (d) give the results of ablation studies, where w/o is an abbreviation for none and adv is an abbreviation for resistant learning.
Fig. 6 is a qualitative image classification result diagram.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Since ADAN is similar to GANS, it also has a generalization problem. To this end, the invention is directed to ADANRestated to be suitable for image classification. But the reformulation is costly, e.g. by only optimizing the new ADAN anti-loss function LadvThere is no guarantee that the source domain and target domain distributions are close. Thus, the present invention further aligns the two domains by minimizing the additional divergence. Finally, supervised classification loss over the source domain is exploited to ensure the discriminativity of the learned image features.
FIG. 2 is a flowchart of an embodiment of an image classification method based on an anti-domain adaptive network.
In this embodiment, as shown in fig. 2, the image classification method based on the robust domain adaptive network of the present invention includes the following steps:
step S1: constructing an adaptive network of the anti-domain for image classification
As shown in fig. 3, a feature representation network F, a domain discriminator D and a classifier C are constructed for the images in the source domain and the target domain, wherein the feature representation network F is used for the image x in the source domainsCarrying out feature extraction to obtain source domain image features F (x)s) For image x in the target domaintCarrying out feature extraction to obtain the image feature F (x) of the target domaint) The domain discriminator D is used for the source domain image feature F (x)s) And target domain image features F (x)t) The probability D (F (x)) of the source domain is obtained by discriminations) And D (F (x))t) Classifier C is used to classify the source domain image features F (x)s) Classifying to obtain image classification probability pi(softmax(F(xs) I ═ 1,2, … I, for the target domain image feature F (x)t) Classifying to obtain image classification probability pi(softmax(F(xt)),i=1,2,…I。
The feature representation network F, the domain discriminator D and the classifier C in fig. 3 are all conventional image feature extraction networks, discriminators and classifiers, for example, the feature representation network F may use a common ResNet-50 residual neural network, and will not be described herein again.
Step S2: training of an adaptive network of the countermeasure domain
Step S2.1: set Batch Size (Batch Size) to M, extract one from the source domainBatch M images, denoted xs_1,xs_2,...,xs_MSetting the Batch Size (Batch Size) to N, extracting a Batch of N images from the target domain, denoted as xt_1,xt_2,...,xt_NRespectively inputting the image features into a feature representation network F to obtain M source domain image features:
F(xs_1),F(xs_2),...,F(xs_M);
and N target domain image features:
F(xt_1),F(xt_2),...,F(xt_N)。
m source domain image features F (x)s_1),F(xs_2),...,F(xs_M) And N target domain image features F (x)t_1),F(xt_2),...,F(xt_N) Respectively inputting the data into a domain discriminator D to obtain the probability of belonging to a source domain:
D(F(xs_1)),D(F(xs_2)),...,D(F(xs_M))
D(F(xt_1)),D(F(xt_2)),...,D(F(xt_N))
m source domain image features F (x)s_1),F(xs_2),...,F(xs_M) Respectively inputting the image data into a classifier C to obtain image classification probability:
pi(softmax(F(xs_1))),pi(softmax(F(xs_2))),…,pi(softmax(F(xs_M))),i=1,2,…,I;
wherein, I is the number of the image types, and I is the number of the image types.
Step S2.2: calculating a trained global objective function L:
L=minFmaxDLadv+λminFLmetric+βminCLcls
wherein:
Lmetric=E{k(F(xs_m),F(xs_m′))}m=1,2,…,M,m′=1,2,…,M,m≠m′ +E{k(F(xt_n),F(xt_n′))}n=1,2,…,N,n′=1,2,…,N,n≠n′ -2E{k(F(xs_m),F(xt_n))}m=1,2,…,M,n=1,2,…,N
wherein, minFmaxDLadvThe meaning of (A) is: by updating the network parameters characterizing the network F such that the penalty function L is opposedadvAt a minimum, the network parameters of the arbiter D are updated such that the penalty function L is opposedadvMaximum, formation of antagonistic training, minFLmetricThe meaning of (A) is: by updating the network parameters characterizing the network F such that the metric loss function LmetricMin is minimumCLclsThe meaning of (A) is: the network parameters of the network F are represented by updating features such that the cross entropy loss function LmetricAt minimum, λ, β are equilibrium parameters greater than 0.
Wherein E represents the expectation value of all element values in the equation, k (,) is a Gaussian kernel function,to indicate a function, when i ═ ys_mWhen it is 1, the rest is 0, ys_mAs source domain image xs_mTrue class (tag value), pi(softimax(F(xs_m) ()) represents the source domain image features F (x) of the classifier outputs_m) Probability of belonging to class i.
In the invention, aiming at the generalization problem of ADAN, inspired by F divergence, the invention provides a new loss-resisting function LadvNamely:
penalty function LadvDistance F, while it overcomes the generalization problem, is costly. In particular, the F distance may be small even if the two source domain distributions P and the target domain distribution Q are not very close. Such a cost is unacceptable in the task of domain adaptation, which destroys the goal of domain adaptation, i.e. aligning the source domain distribution P and the target domain distribution Q. Therefore, in the invention, a measurement loss function is additionally optimized to measure the divergence of the source domain distribution P and the target domain distribution Q, so that the generalization distance and the distribution divergence can be ensured.
Step S3: image classification
Inputting an image of an unknown class from a source domain or a target domain into a feature representation network F for feature extraction to obtain image features, and then sending the image features into a classifier C to obtain probabilities belonging to each class, wherein the class corresponding to the maximum probability value is the class of the input image.
Experimental verification
1. Introduction to data set
MNIST, USPS and SVHN are three widely used handwritten digit data sets. Specifically, the MNIST dataset consists of 6 ten thousand training images and 1 ten thousand test images, and is 28 × 28 in size. There are 7,291 16 x 16 training images and 2,007 test images of the same size in the USPS. SVHN is an abbreviation for street view house number and is more challenging because it has different backgrounds in different images, containing more than 60 ten thousand label numbers cropped from the street view image. The number of categories in each digital data set is 10, i.e. from 0 to 9. In this experiment, we used the same settings as in the existing image classification method in order to make a fair comparison.
The data set Office-31 includes 4,652 images of 31 categories that are common in Office settings, such as calculators, projectors, and printers. The image of the data set Office-31 is from three different subsets, namely amazon (a), digital single lens (D), and webcam (W). Com, taken by a single lens, webcam.
Office-Home is a recently released large-scale data set consisting of 15,500 images of 65 categories in Office and Home environments, such as knives, pens and keyboards. The images in this dataset consist of 4 fields, namely art (Ar), clip art (Cl), product (Pr) and Real World (RW). The style of the image may be reflected by the domain name. Specifically, art includes painting, sketching, and/or artistic portrayal. The clip-and-paste picture is composed of clip-and-paste picture images. The product consists of an image without background and the real world consists of an image taken by a camera.
2. Experimental setup
2.1), protocol
For fair comparison, we fully follow the unsupervised domain adaptation protocol widely used in existing image classification methods. Specifically, we use a labeled source domain image and an unlabeled target domain image. The existing image classification method is the classification precision in the target field. The parameter λ is chosen by significance weighted cross-validation on the source domain and β is fixed to 1.
2.2) detailed description
We performed the implementation based on PyTorch. The main parts of the implementation are the feature representation network and the domain arbiter. For number recognition, we use the same signature network reported by Hoffman et al, which is very similar to the convolutional neural network LeNet. For the evaluations performed on Office-31 and Office-Home, we used the residual network ResNet-50 as our feature representation network, which is the same as that used by existing image classification methods. As shown in fig. 2, the discriminator is embodied by a MLP (multi-layer perceptron) of three fully connected layers as shown in fig. 3. The classifier is embodied by one FC (fully connected) layer and one SoftMax classifier. The residual ResNet-50 used was pre-trained on the ImageNet dataset. Other networks, such as discriminators, classifiers, and feature representation networks for digital recognition, are trained ab initio on the evaluated data set. We used a minimum batch stochastic gradient descent with momentum of 0.95 for optimization. The learning rate is dynamically updated according to the strategy reported in (Ganin and Lempitsky 2014).
2.3) existing image classification method for comparison
To fully validate the effectiveness and superiority of the present invention, we compared it with baseline ResNet-50 without adaptation and several existing up-to-date image classification methods. Specifically, two conventional two-step methods with the latest characterization expressions GFK and TCA were compared. Since the method of the present invention can also be considered as an ADAN-based image classification method and a metric learning-based image classification method, we compare it with the metric learning-based image classification method DDC, ADAN method: RevGrad, MCD, CyCADA and CDAN were compared. Comparisons were also made with typical image classification methods from other protocols, such as DRCN, CoGAN, and UNIT. Since our implementation is under a standard unsupervised domain adaptation protocol and is the same as the setup of existing image classification methods, the baseline results are directly cited in the original paper.
3. Image classification result comparison
3.1), number recognition
First, we evaluated our approach on a relatively simple task: and recognizing handwritten numbers. The number recognition results are shown in table 1, and the number recognition results are expressed by percentage.
TABLE 1
In table 1, the best number recognition results are highlighted in bold numbers. The same experimental setup was used for the image classification methods except that UNIT used a larger training set on SVHN → MNIST. The digital recognition results of the comparative image classification methods are cited in the corresponding literature, and if the author does not perform the test, some digital recognition results are missing. ORACLE represents the numerical recognition result of the supervised model on the target domain.
Table 1 shows the numerical identification results obtained from MNIST → USPS, USPS → MNIST and SVHN → MNIST. As can be seen from table 1, domain adaptation is really effective for handling domain differences. On USPS → MNIST, the baseline without adaptation only reaches around 70%. The domain adaptation method can achieve 98% accuracy, which is very close to the accuracy of the model trained directly in the target domain. This shows that for relatively simple domain adaptation tasks, the current state-of-the-art approach provides a very practical and near ideal solution. We can also see that the method of the invention achieves the best numerical recognition results on all three evaluations. The improvements in MNIST → USPS and USPS → MNIST are relatively insignificant because the numerical recognition results are already very close to the upper bound. The most challenging digit recognition results on SVHN → MNIST in the three tests clearly demonstrate that the present invention is much better than the existing methods. It is worth noting that the present invention is 4.4% higher than CDAN because they are all under the ADAN framework. The result shows that the method has better generalization capability and the learning expression of the method has stronger portability. The digital identification results also verify that it is reasonable and valuable to solve the generalization problem in ADAN.
3.2) object recognition
The object recognition results for the Office-31 data set are shown in table 2, with the object recognition results being expressed in percentage.
TABLE 2
In table 2, the best object recognition results are highlighted in bold numbers. Since there are three domains in the dataset. We performed 3 × 2 ═ 6 evaluations in a paired fashion. A simple conclusion is that the method of the invention is able to achieve the best performance in all 6 evaluations. More deeply, we can find that the antagonism approach generally performs better than the metric-based approach. At the same time, we can also observe that these 6 evaluations are clearly distinct. When D and W are used as target domains, most methods can achieve better effect. However, when using a as the target domain, the accuracy may drop significantly. Thus, the object recognition results of W → A and D → A reflect the ability to handle more challenging domain adaptation tasks. We can see that the present invention is clearly superior to the current state-of-the-art methods in both of these evaluations. Specifically, the performance improvements of W → A and D → A are 4.2% and 2.7%, respectively. Finally, in terms of average improvement, it is worth noting that since most comparison methods have an accuracy rate of nearly 100% on both evaluation indexes W → D and D → W, the results are impaired by the averaging operation.
3.3) Large Scale object recognition
Finally, the method is tested on the recently released large-scale data set Office-Home. The Office-Home data set top-field adaptive object recognition results are shown in table 3, and the object recognition results are expressed by percentage.
TABLE 3
This dataset has 4 fields. Therefore, table 3 shows the results of 4 × 3 — 12 cross-domain object identifications. These object identification results also verify that the present invention is able to achieve the best object identification results at different evaluations. On average, the present invention improves the object recognition result by 2.4% compared to the best existing object recognition result. Notably, the mean was calculated from 12 evaluations. Therefore, it is very difficult to obtain such an effect. Compared with the CDAN, the object recognition performance of the present invention on the classical data set and the large-scale data set shows that our proposition to deal with the generalization problem in the ADAN is reasonable and practical, and therefore the latest and best object recognition result can be achieved.
4. Model analysis
Taking SVHN → MNIST as an example, we further analyze the method of the present invention by giving results of parameter sensitivity, convergence, distribution divergence, visualization and qualitative understanding.
4.1) parameter sensitivity
The invention relates to two hyper-parameters lambda and beta. Specifically, β is fixed to 1 because it performs well in the evaluation of the test. We present the results of the numerical identification of SVHN → MNIST with different lambda values in FIG. 4 (a). We observed that λ is relatively insensitive when small. The hyperparameters may be adjusted using importance weighted cross-validation.
Fig. 4(b) and (c) also reflect the convergence of our method, and fig. 4(d) shows the difference in the distribution of the different features.
4.2) fusion
It is widely accepted that the antagonism model is difficult to train. Therefore, we present in fig. 4(b) the test errors in different iterations of the inventive method on SVHN → MNIST, reflecting the convergence of the inventive method; the total loss of the method of the invention is given in fig. 4(c), reflecting the difference in the distribution of the different features, and these figures also explain the convergence trend of the invention. It is readily observed that the test error and total loss increase almost monotonically with the optimal number of iterations. The training of the present invention is effective, stable and fluid.
4.3) distribution divergence
While optimizing neural network distance is beneficial for generalization, it comes at the cost that small neural network distances do not guarantee small distribution divergence. Therefore, we further optimize the divergence measure in the model.
Assessment was performed using SVHN → MNIST and W → A.
Fig. 5(d) reports the difference between the two domain distributions in terms of the maximum mean difference. From fig. 5(a), (b), (c), we can also see that the distribution divergence of the representation learned by the present invention is minimal, which demonstrates that the present invention deals well with the disadvantage of neural network distance.
4.4), feature visualization
For a better understanding, we use t-SNE to visualize the learning representation of the present invention and give the results in fig. 5(a), (b), (c). We take SVHN → MNIST as an example. The original representation and the CDAN representation are also visualized for comparison. We visualize the characteristics of the source and target domains to show transferability and distinctiveness. From this fig. 5(a), (b), (c) it can be seen that the present invention has the best performance. The two domains align well and the different classes are clearly separated.
4.5), ablation study
The invention consists of an antagonism learning part and a metric learning part. Therefore, we present the image classification results of the present invention without adversarial learning and without metric learning, respectively. Take W → A as an example. The results are shown in FIG. 5 (d). It can be seen that good overall performance can only be obtained with co-optimization. The results prove that the formula is reasonable and practical in the invention.
4.6) qualitative image classification results
In addition to the visualization results, we further looked at CDAN misclassifications but our method handled a good sample. The experimental result verifies the effectiveness of the method in processing the generalization problem.
Figure 6 gives the qualitative image classification result. In fig. 6, 30 images randomly selected from the SVHN → MNIST evaluations are shown, 3 for each category, with the label under each image being in error and the predicted (number-recognized) result of the CDAN. These images are misclassified by the CDAN, but corrected by the present invention, the correct label for each column is 0-9 from left to right.
Some randomly selected images are given in fig. 6 due to space limitations. We can observe that some confusing categories, such as 4 and 9 in handwritten format, are misclassified by CDAN. However, the method of the invention considers the generalization characteristic and the distribution divergence at the same time, can correctly identify the number for all the images and has stronger robustness.
5. Conclusion
For the generalization problem in the adversarial domain adaptive network, the existing image classification method gives little attention. The invention reconstructs the traditional ADAN confrontation loss by utilizing the neural network distance with good generalization characteristic, and simultaneously optimizes the distribution divergence so as to reduce the cost brought by the neural network distance. A large number of experiments prove that compared with the existing antagonism self-adaptive image classification method, the method has the advantage that the accuracy rate is remarkably improved.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.
Claims (2)
1. An image classification method based on an anti-domain adaptive network is characterized by comprising the following steps:
(1) construction of an adaptive network of the anti-domain for image classification
Constructing a feature representation network F, a domain discriminator D and a classifier C aiming at images in a source domain and a target domain;
wherein the feature representation network F is used for the image x in the source domainsCarrying out feature extraction to obtain source domain image features F (x)s) For image x in the target domaintCarrying out feature extraction to obtain the image feature F (x) of the target domaint) The domain discriminator D is used for the source domain image feature F (x)s) And target domain image features F (x)t) The probability D (F (x)) of the source domain is obtained by discriminations) And D (F (x))t) Classifier C is used to classify the source domain image features F (x)s) Classifying to obtain image classification probability pi(softmax(F(xs) I ═ 1,2, … I, for the target domain image feature F (x)t) Classifying to obtain image classification probability pi(softmax(F(xt)),i=1,2,…I;
(2) Training of an adaptive network of the antagonistic domain
2.1), extracting a batch of M images from the source domain, denoted asxs_1,xs_2,...,xs_MExtracting a batch of N images, denoted x, from the target domaint_1,xt_2,...,xt_NRespectively inputting the image features into a feature representation network F to obtain M source domain image features:
F(xs_1),F(xs_2),...,F(xs_M);
and N target domain image features:
F(xt_1),F(xt_2),...,F(xt_N);
m source domain image features F (x)s_1),F(xs_2),...,F(xs_M) And N target domain image features F (x)t_1),F(xt_2),...,F(xt_N) Respectively inputting the data into a domain discriminator D to obtain the probability of belonging to a source domain:
D(F(xs_1)),D(F(xs_2)),...,D(F(xs_M))
D(F(xt_1)),D(F(xt_2)),...,D(F(xt_N))
m source domain image features F (x)s_1),F(xs_2),...,F(xs_M) Respectively inputting the image data into a classifier C to obtain image classification probability:
pi(softmax(F(xs_1))),pi(softmax(F(xs_2))),...,pi(softmax(F(xs_M))),i=1,2,...,I;
wherein I is the number of the image types, and I is the number of the image types;
2.2), calculating the overall target function L of training:
L=minFmaxDLadv+λminFLmetric+βminCLcls
wherein:
Lmetric=E{k(F(xs_m),F(xs_m'))}m=1,2,...,M,m'=1,2,...,M,m≠m′+E{k(F(xt_n),F(xt_n'))}n=1,2,...,N,n'=1,2,...,N,n≠n′-2E{k(F(xs_m),F(xt_n))}m=1,2,...,M,n=1,2,...,N
wherein, minFmaxDLadvThe meaning of (A) is: by updating the network parameters characterizing the network F such that the penalty function L is opposedadvAt a minimum, the network parameters of the arbiter D are updated such that the penalty function L is opposedadvMaximum, formation of antagonistic training, minFLmetricThe meaning of (A) is: by updating the network parameters characterizing the network F such that the metric loss function LmetricMin is minimumCLclsThe meaning of (A) is: the network parameters of the network F are represented by updating features such that the cross entropy loss function LmetricAt minimum, λ, β are equilibrium parameters greater than 0;
wherein E represents the expectation value of all element values in the equation, k (,) is a Gaussian kernel function,to indicate a function, when i ═ ys_mWhen it is 1, the rest is 0, ys_mAs source domain image xs_mTrue class (tag value), pi(softmax(F(xs_m) ()) represents the source domain image features F (x) of the classifier outputs_m) Probability of belonging to class i;
(3) image classification
Inputting an image of an unknown class from a source domain or a target domain into a feature representation network F for feature extraction to obtain image features, and then sending the image features into a classifier C to obtain probabilities belonging to each class, wherein the class corresponding to the maximum probability value is the class of the input image.
2. The method for image classification based on the countermeasure domain adaptive network of claim 1, wherein the feature representation network F is a residual network ResNet-50.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110607513.9A CN113378904B (en) | 2021-06-01 | 2021-06-01 | Image classification method based on countermeasure domain self-adaptive network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110607513.9A CN113378904B (en) | 2021-06-01 | 2021-06-01 | Image classification method based on countermeasure domain self-adaptive network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113378904A true CN113378904A (en) | 2021-09-10 |
CN113378904B CN113378904B (en) | 2022-06-14 |
Family
ID=77575267
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110607513.9A Active CN113378904B (en) | 2021-06-01 | 2021-06-01 | Image classification method based on countermeasure domain self-adaptive network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113378904B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113673631A (en) * | 2021-10-22 | 2021-11-19 | 广东众聚人工智能科技有限公司 | Abnormal image detection method and device |
CN114511737A (en) * | 2022-01-24 | 2022-05-17 | 北京建筑大学 | Training method of image recognition domain generalization model |
CN114693972A (en) * | 2022-03-29 | 2022-07-01 | 电子科技大学 | Reconstruction-based intermediate domain self-adaptive method |
CN117253097A (en) * | 2023-11-20 | 2023-12-19 | 中国科学技术大学 | Semi-supervision domain adaptive image classification method, system, equipment and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110443293A (en) * | 2019-07-25 | 2019-11-12 | 天津大学 | Based on double zero sample image classification methods for differentiating and generating confrontation network text and reconstructing |
CN111368886A (en) * | 2020-02-25 | 2020-07-03 | 华南理工大学 | Sample screening-based label-free vehicle picture classification method |
CN111524205A (en) * | 2020-04-23 | 2020-08-11 | 北京信息科技大学 | Image coloring processing method and device based on loop generation countermeasure network |
CN111738315A (en) * | 2020-06-10 | 2020-10-02 | 西安电子科技大学 | Image classification method based on countermeasure fusion multi-source transfer learning |
CN111814871A (en) * | 2020-06-13 | 2020-10-23 | 浙江大学 | Image classification method based on reliable weight optimal transmission |
CN112131967A (en) * | 2020-09-01 | 2020-12-25 | 河海大学 | Remote sensing scene classification method based on multi-classifier anti-transfer learning |
US20200411201A1 (en) * | 2019-06-27 | 2020-12-31 | Retrace Labs | Systems And Method For Artificial-Intelligence-Based Dental Image To Text Generation |
CN112183581A (en) * | 2020-09-07 | 2021-01-05 | 华南理工大学 | Semi-supervised mechanical fault diagnosis method based on self-adaptive migration neural network |
US20210073630A1 (en) * | 2019-09-10 | 2021-03-11 | Robert Bosch Gmbh | Training a class-conditional generative adversarial network |
-
2021
- 2021-06-01 CN CN202110607513.9A patent/CN113378904B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200411201A1 (en) * | 2019-06-27 | 2020-12-31 | Retrace Labs | Systems And Method For Artificial-Intelligence-Based Dental Image To Text Generation |
CN110443293A (en) * | 2019-07-25 | 2019-11-12 | 天津大学 | Based on double zero sample image classification methods for differentiating and generating confrontation network text and reconstructing |
US20210073630A1 (en) * | 2019-09-10 | 2021-03-11 | Robert Bosch Gmbh | Training a class-conditional generative adversarial network |
CN111368886A (en) * | 2020-02-25 | 2020-07-03 | 华南理工大学 | Sample screening-based label-free vehicle picture classification method |
CN111524205A (en) * | 2020-04-23 | 2020-08-11 | 北京信息科技大学 | Image coloring processing method and device based on loop generation countermeasure network |
CN111738315A (en) * | 2020-06-10 | 2020-10-02 | 西安电子科技大学 | Image classification method based on countermeasure fusion multi-source transfer learning |
CN111814871A (en) * | 2020-06-13 | 2020-10-23 | 浙江大学 | Image classification method based on reliable weight optimal transmission |
CN112131967A (en) * | 2020-09-01 | 2020-12-25 | 河海大学 | Remote sensing scene classification method based on multi-classifier anti-transfer learning |
CN112183581A (en) * | 2020-09-07 | 2021-01-05 | 华南理工大学 | Semi-supervised mechanical fault diagnosis method based on self-adaptive migration neural network |
Non-Patent Citations (4)
Title |
---|
AARON CHADHA等: "Improved Techniques for Adversarial Discriminative Domain Adaptation", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 * |
JINGJING LI等: "Structured Domain Adaptation", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》 * |
贺海: "基于样本筛选的无监督领域自适应的图像分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
赵文仓等: "基于鉴别模型和对抗损失的无监督域自适应方法", 《高技术通讯》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113673631A (en) * | 2021-10-22 | 2021-11-19 | 广东众聚人工智能科技有限公司 | Abnormal image detection method and device |
CN113673631B (en) * | 2021-10-22 | 2022-03-29 | 广东众聚人工智能科技有限公司 | Abnormal image detection method and device |
CN114511737A (en) * | 2022-01-24 | 2022-05-17 | 北京建筑大学 | Training method of image recognition domain generalization model |
CN114511737B (en) * | 2022-01-24 | 2022-09-09 | 北京建筑大学 | Training method of image recognition domain generalization model |
CN114693972A (en) * | 2022-03-29 | 2022-07-01 | 电子科技大学 | Reconstruction-based intermediate domain self-adaptive method |
CN114693972B (en) * | 2022-03-29 | 2023-08-29 | 电子科技大学 | Intermediate domain field self-adaption method based on reconstruction |
CN117253097A (en) * | 2023-11-20 | 2023-12-19 | 中国科学技术大学 | Semi-supervision domain adaptive image classification method, system, equipment and storage medium |
CN117253097B (en) * | 2023-11-20 | 2024-02-23 | 中国科学技术大学 | Semi-supervision domain adaptive image classification method, system, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113378904B (en) | 2022-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113378904B (en) | Image classification method based on countermeasure domain self-adaptive network | |
Zhang et al. | Unsupervised multi-class domain adaptation: Theory, algorithms, and practice | |
Ruff et al. | A unifying review of deep and shallow anomaly detection | |
Fang et al. | Open set domain adaptation: Theoretical bound and algorithm | |
Shen et al. | Wasserstein distance guided representation learning for domain adaptation | |
Zhu et al. | Detecting corrupted labels without training a model to predict | |
Ding et al. | DECODE: Deep confidence network for robust image classification | |
Chherawala et al. | Feature set evaluation for offline handwriting recognition systems: application to the recurrent neural network model | |
Akbari et al. | How does loss function affect generalization performance of deep learning? Application to human age estimation | |
Yu et al. | Simple and effective stochastic neural networks | |
Littwin et al. | The multiverse loss for robust transfer learning | |
Liu et al. | Exploiting web images for fine-grained visual recognition by eliminating open-set noise and utilizing hard examples | |
Wu et al. | Spatial–temporal relation reasoning for action prediction in videos | |
Li et al. | Locality linear fitting one-class SVM with low-rank constraints for outlier detection | |
Gu et al. | Unsupervised and semi-supervised robust spherical space domain adaptation | |
Wang et al. | BP-triplet net for unsupervised domain adaptation: A Bayesian perspective | |
Li et al. | Subspace-based minority oversampling for imbalance classification | |
Li et al. | Robust multi-label semi-supervised classification | |
Hwang et al. | Exploiting transferable knowledge for fairness-aware image classification | |
Yang et al. | A feature learning approach for face recognition with robustness to noisy label based on top-N prediction | |
Lee et al. | Neuralfp: out-of-distribution detection using fingerprints of neural networks | |
Zhao et al. | Domain adaptation with feature and label adversarial networks | |
Du et al. | Learning transferable and discriminative features for unsupervised domain adaptation | |
He et al. | Addressing the Overfitting in Partial Domain Adaptation with Self-Training and Contrastive Learning | |
Ho et al. | Document classification in a non-stationary environment: A one-class svm approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231027 Address after: Room 209, Building 1, No. 36 Xiyong Avenue, Shapingba District, Chongqing, 400000 Patentee after: Xinchen (Chongqing) Microelectronics Co.,Ltd. Address before: 611731, No. 2006, West Avenue, Chengdu hi tech Zone (West District, Sichuan) Patentee before: University of Electronic Science and Technology of China |
|
TR01 | Transfer of patent right |