CN116385808A - Big data cross-domain image classification model training method, image classification method and system - Google Patents

Big data cross-domain image classification model training method, image classification method and system Download PDF

Info

Publication number
CN116385808A
CN116385808A CN202310644725.3A CN202310644725A CN116385808A CN 116385808 A CN116385808 A CN 116385808A CN 202310644725 A CN202310644725 A CN 202310644725A CN 116385808 A CN116385808 A CN 116385808A
Authority
CN
China
Prior art keywords
domain
target
image
big data
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310644725.3A
Other languages
Chinese (zh)
Other versions
CN116385808B (en
Inventor
谢贻富
范武松
刘宇
刘文涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei City Cloud Data Center Co ltd
Original Assignee
Hefei City Cloud Data Center Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei City Cloud Data Center Co ltd filed Critical Hefei City Cloud Data Center Co ltd
Priority to CN202310644725.3A priority Critical patent/CN116385808B/en
Publication of CN116385808A publication Critical patent/CN116385808A/en
Application granted granted Critical
Publication of CN116385808B publication Critical patent/CN116385808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/84Arrangements for image or video recognition or understanding using pattern recognition or machine learning using probabilistic graphical models from image or video features, e.g. Markov models or Bayesian networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the technical field of self-adaption in the fields of computers and depths, in particular to a training method, an image classification method and a system for a big data cross-domain image classification model. The basic model constructed by the invention comprises a backbone network, a regional level attention extraction module, a global average pooling layer, a global maximum pooling layer, a cascade layer, a full-connection bottleneck layer, a residual characteristic correction module and a full-connection classification layer. According to the invention, the residual characteristic correction module is inserted after the task specific layer, so that the explicit learning cross-domain difference is realized, the most relevant source domain category is allocated with a larger weight by minimizing the proposed domain identification loss, the cross-domain migration performance is excellent, and the semantic negative migration is effectively relieved. When the number of source domain labels is more than that of target domains, the existence of the residual characteristic correction block is beneficial to reducing semantic negative migration, and the accuracy of the finally obtained big data cross-domain image classification model on the target domains is ensured.

Description

Big data cross-domain image classification model training method, image classification method and system
Technical Field
The invention relates to the technical field of self-adaption in the fields of computers and depths, in particular to a training method, an image classification method and a system for a big data cross-domain image classification model.
Background
In recent years, large data and deep learning are widely used for tasks such as image analysis, natural language processing, and image generation. However, most of the most advanced models rely mainly on large amounts of marking data, which is time consuming and costly, and some special-scene (e.g. medical image analysis) models even have no access to the data. This means that models need to have excellent generalization capability to migrate knowledge learned from limited, accessible datasets to new fields.
A number of Artificial Intelligence (AI) techniques are further improving the accuracy and efficiency of image classification, image segmentation, and text generation, as affected by the rapid development of deep learning and computer vision techniques. However, deep neural networks tend to identify a given visual pattern extracted from rich marker data, susceptible to inter-domain heterogeneity caused by different devices, scanning protocols, and background interference, a phenomenon also known as domain shifting.
A commonly used method for solving the data migration problem is domain adaptation, and the following limitations still exist when the existing unsupervised domain adaptation method is applied to cross-domain image analysis. 1) Space negative migration problem: the existing domain self-adaptive method mainly regards images as a whole, and then performs cross-domain alignment on the characterization extracted by the convolutional neural network without considering complex distribution of different areas. It is apparent that the mobility possessed by different areas of the image is different. Certain regions in the image (e.g., the background) may not contribute much to domain adaptation, and forced alignment of these regions may lead to negative migration of extraneous knowledge; 2) Semantic negative migration: most of the current unsupervised domain adaptation methods only alleviate the edge distribution difference, and do not consider semantic (i.e. tag distribution) diversity of the two domains. When the source domain label space is a superset of the target domain label space, the domain adaptation model cannot determine which semantic features belong to the shared label space, and the shared label refers to a label which belongs to both the source domain and the target domain.
Therefore, how to overcome the problems of space negative migration and semantic negative migration at the same time, and realizing effective and feasible self-adaption in the unsupervised field has become a technical problem to be solved urgently.
Disclosure of Invention
In order to overcome the defects of space negative migration and semantic negative migration existing in the field self-adaption in the prior art, the invention provides a large-data cross-domain image classification model training method, and the defects of large cross-domain distribution difference of multi-center image data, poor model generalization and easiness in negative migration are overcome.
The invention provides a big data cross-domain image classification model training method, which comprises the following steps:
s1, constructing a basic model, a source domain data set Xs containing labeled source domain samples (Xs, ys) and a target domain data set Xt containing unlabeled target domain samples Xt; xs is the original image from the source domain, ys is the category to which xs belongs in the source domain; xt is the original image from the unknown class of the target domain;
the basic model comprises a backbone network, a regional level attention extraction module, a global average pooling layer, a global maximum pooling layer, a cascade layer, a full-connection bottleneck layer, a residual error characteristic correction module and a full-connection classification layer;
the backbone network is used for obtaining feature images of the original image under a plurality of different sizes, and combining all the feature images to obtain a target feature image G (1), G (1) ∈R H(1,min)×W(1,min)×C(1,max) H (1, min) is the minimum length in the plurality of feature maps, W (1, min) is the minimum width in the plurality of feature maps, and C (1, max) is the maximum feature dimension in the plurality of feature maps;
the regional level attention extraction module extracts the spatial attention values on the H (1, min) multiplied by W (1, min) sub-regions respectively according to the target feature graph G (1), and the regional level attention extraction module outputs each sub-regionSpatial attention vector G (2), G (2) ∈R formed by combining spatial attention values on regions H(1,min)×W(1,min)×C(1,max)
Let the vector of the target feature map G (1) and the spatial attention vector G (2) added along the channel dimension element be denoted as G (3); the global average pooling layer is used for carrying out average pooling processing on the characteristic values of the subareas in G (3) so as to output an average spatial attention vector A (avg); the global maximum pooling layer is used for carrying out maximum pooling processing on the characteristic values of the subareas in G (3) so as to output a maximum space attention vector A (max);
the cascade layer splices the average spatial attention vector A (avg) and the maximum spatial attention vector A (max) along channel dimension elements and then outputs a spliced vector A; the full-connection bottleneck layer performs dimension reduction processing on the spliced vector A to obtain a low-dimension vector A1; the residual characteristic correction module performs residual correction on the input low-dimensional vector A1 and outputs a correction vector A2; adding the elements A1 and A2 along the channel dimension to obtain a target vector A3;
the full-connection classification layer judges the category of the original image according to the low-dimensional vector A1 aiming at the source domain; the full-connection classification layer judges the category of the original image according to the target vector A3 aiming at the target domain;
s2, selecting m1 labeled source domain samples (Xs, ys) from a source domain data set Xs to form a source domain training set, and selecting m2 unlabeled target domain samples Xt from a target domain data set Xt to form a target domain training set; enabling the basic model to perform machine learning on the source domain training set and the target domain training set, and reversely updating parameters of the basic model by combining the set loss function L;
s3, judging whether the basic model converges or not; if not, returning to the step S2; if yes, extracting a big data cross-domain image classification model from the converged basic model; the big data cross-domain image classification model is used for image classification over the target domain.
Preferably, the loss function L is a weighted sum of a plurality of characteristic losses; the feature loss includes a loss of attention L (att);
the regional level attention extraction module consists of Q domain feature extractors, q=h (1, min) ×w (1, min); the field feature extractors are in one-to-one correspondence with the number of subareas in the target feature map G (1); each domain feature extractor is also connected with a corresponding domain discriminator, and the domain discriminator judges the probability that the subarea belongs to the source domain according to the space attention value of the subarea extracted by the corresponding domain feature extractor;
L(att)=[1/(Q×N0)]×∑ q=1 Qu=1 N0 Ld[G2(q,u),d(u)]
the union set of the target domain training sample set and the source domain training sample set is made to be a total training sample set, G2 (Q, u) represents the probability that the corresponding sub-region on the original image of the (u) th field in the total training sample set output by the (Q) th field discriminator belongs to the source domain, and Q is more than or equal to 1 and less than or equal to Q; n0 is the number of samples in the total training sample set;
d (u) is a binary number, if the u-th original image in the total training sample set belongs to the source domain training sample, d (u) =0; d (u) =1 if the u-th original image in the total training sample set belongs to the target domain training sample;
ld [ G2 (q, u), d (u) ] represents a binary cross entropy loss function of G2 (q, u) and d (u).
Preferably, the feature loss further includes a domain difference loss L (mmd):
L(mmd)=∑ k=1 |Cs| [w(k)×L(mmd,k)]
cs represents a source domain class set, |cs| represents the number of source domain classes, w (k) represents the weight of the kth source domain class, and L (mmd, k) represents the judgment loss of the kth class of the source domain;
w(k)=∑ u=1 Nt pt(u,k)
pt (u, k) represents the probability that the ith unlabeled target domain sample xt in the target domain training set predicted by the basic model belongs to the kth source domain class, and Nt represents the number of unlabeled target domain samples xt in the target domain training set;
L(mmd,k)=||ε1-ε2|| 2
ε1=∑ uk=1 N(k,s) [ψ(A(1,k,s,uk))/N(k,s)]
ε2=∑ uk=1 N(k,t) [pt(uk,k)×ψ(A(1,k,t,uk))/N(k,t)]
wherein ε 1 and ε 2 are transition terms; let X (k, s) represent a set of labeled source domain samples (xs, ys) belonging to a kth source domain class in the source domain training set, N (k, s) represent the number of samples in X (k, s), and a (1, k, s, uk) represent a dimension-reduction vector A1 corresponding to a uk sample xs in X (k, s); let X (k, t) denote a set of unlabeled target domain samples in the target domain training set that are judged by the base model to be the kth source domain class, N (k, t) denote the number of samples in X (k, t), pt (uk, k) denote the probability that the kth sample xt in X (k, t) corresponds to the kth source domain class in the class probability distribution output by the base model, A (1, k, t, uk) denote the dimension-reduction vector A1 that the kth sample xt corresponds to in X (k, t); ||# | representation #, is a binary norm of (2); ψ (#) represents the kernel mapping function value of #.
Preferably, the feature loss further comprises a source domain classification loss L (cls):
L(cls)=[1/Ns]×∑ us=1 Ns Ly(Ys(us),ys(us))
ys (us) represents an |Cs|dimensional independent heat vector corresponding to the classification result of the us sample xs in the source domain training set by the basic model; ys (us) represents an |Cs|dimensional independent heat vector corresponding to a true class ys of a us th sample xs in the source domain training set; ly (Ys (us), ys (us)) represents the cross entropy loss function of Ys (us) and Ys (us); ns is the number of samples in the source domain training set.
Preferably, the feature loss further comprises a target entropy canonical loss L (ent):
L(ent)=[1/Nt]×∑ u=1 Nt [H(pt(u,max))]
pt (u, max) represents the maximum probability in the class probability distribution of the u-th sample xt in the target domain training set; h (pt (u, max)) represents the information entropy of pt (u, max); nt represents the number of samples in the target domain training set.
Preferably, the backbone network adopts a ResNet-50 network, the backbone network is used for obtaining feature images of the original image under a plurality of different sizes, the ResNet-50 network takes image feature data output in the last three stages as the feature images, and then the target feature image G (1) is obtained by combining all the feature images.
The invention also provides a big data cross-domain image classification method which can realize high-precision classification on a target domain and comprises the following steps of;
SA1, acquiring a labeled source domain image and a label-free target domain image corresponding to a designated application scene; the label-free target domain image is an image collected by the appointed equipment in the appointed application scene; the labeled source domain image is an image of a known source domain category acquired by known equipment under the appointed application scene; the source domain category and the target domain category partially overlap or completely overlap;
SA2, enabling the labeled source domain image to be used as a labeled source domain sample (xs, ys), and enabling the unlabeled target domain image to be used as an unlabeled target domain sample xt; acquiring a big data cross-domain image classification model by combining the training method of the big data cross-domain image classification model;
SA3, inputting the target domain image to be tested into a big data cross-domain image classification model, and outputting a classification result of the target domain image to be tested on the target domain by the big data cross-domain image classification model.
The invention also provides a big data cross-domain image classification system which is used for carrying the big data cross-domain image classification method and comprises a memory, wherein the memory stores a big data cross-domain image classification model and a computer program, and the computer program is used for realizing the big data cross-domain image classification method when being executed.
Preferably, the method further comprises a processor, wherein the processor is connected with the memory, and the processor is used for executing the computer program to realize the big data cross-domain image classification method.
The invention also provides another big data cross-domain image classification system for carrying the big data cross-domain image classification model training method, which carries a computer program, wherein the computer program is used for realizing the big data cross-domain image classification model training method when being executed so as to obtain the big data cross-domain image classification model for classifying the target domain image.
The invention has the advantages that:
1. according to the training method for the big data cross-domain image classification model, disclosed by the invention, the migration capability of fine granularity is explored through the domain discriminators of a plurality of local layers, so that the foreground region of the image can be focused more, and the space negative migration is effectively relieved.
2. According to the invention, the residual characteristic correction module is inserted after the task specific layer, so that explicit learning of the cross-domain difference is realized, and larger weight is distributed to the most relevant source domain category by minimizing the proposed domain difference loss, so that the method has excellent cross-domain migration performance, and meanwhile, the semantic negative migration is effectively relieved. When the number of source domain labels is more than that of target domains, the existence of the residual characteristic correction block is beneficial to reducing semantic negative migration, and the accuracy of the finally obtained big data cross-domain image classification model on the target domains is ensured.
3. The big data cross-domain image classification method provided by the invention is a part of domain adaptation method based on attention guidance under a part of domain adaptation scene, and the big data cross-domain image classification model provided by the invention is adopted to classify the target domain image, so that higher classification precision is realized.
4. The method comprises the steps of firstly carrying out image preprocessing and feature extraction through a backbone network, then carrying out regional cross-domain mobility generation based on multi-countermeasure learning, completing target semantic feature correction by residual error learning, and finally completing parameter optimization and result generation in an end-to-end mode. The feasibility and the superiority of the method are verified from two aspects of four evaluation indexes and sample weight distribution by carrying out experiments on two disclosed data sets.
Drawings
FIG. 1 is a topological diagram of a big data cross-domain image classification model;
FIG. 2 is a flow chart of a training method of a big data cross-domain image classification model;
FIG. 3 is a basic model topology;
FIG. 4 is a flow chart of a large data cross-domain image classification method;
FIG. 5 (a) is a graph showing the results of precision comparison under an X-ray image dataset of an embodiment;
FIG. 5 (b) is a comparison of sensitivity under an X-ray image dataset of an embodiment;
FIG. 5 (c) is a comparison of specificity under the X-ray image dataset of the example;
FIG. 5 (d) is a F1 score comparison result under the X-ray image dataset of the example;
FIG. 6 (a) is a graph of TADA method class weight assignment for site1-site2 tasks;
FIG. 6 (b) is a chart of the class weight assignment of the method of the present invention for site1-site2 tasks.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the big data cross-domain image classification model provided in this embodiment includes: the system comprises a backbone network, a regional level attention extraction module, a global average pooling layer, a global maximum pooling layer, a cascade layer, a full-connection bottleneck layer, a residual error characteristic correction module and a full-connection classification layer;
the backbone network is used for acquiring feature images of the original image under a plurality of different sizes, and acquiring a target feature image by combining all the feature images;
specifically, the backbone network transforms all feature maps of the original image into feature maps with uniform structures through spatial downsampling and channel dimension upsampling, and then adds the feature maps with uniform structures along channel dimension elements to obtain a target feature map G (1).
Specifically, the backbone network is input as an original image, and output as N feature graphs G (1, 1), G (1, 2), …, G (1, N), …, G (1, N); g (1, N) represents an nth characteristic diagram, wherein N is more than or equal to 1 and less than or equal to N; g (1, n) ∈R H(1,n)×W(1,n)×C(1,n) That is, G (1, n) is used to describe the characteristics of an image of length H (1, n) and width W (1, n) in C (1, n) dimensions;
the backbone network performs space dimension conversion on the feature map, and the feature map after the feature map G (1, n) conversion is marked as G (1, n, a);
G(1,n,a)∈R H(1,min)×W(1,min)×C(1,max)
H(1,min)=min{H(1,1)、H(1,2)、…、H(1,n)、…、H(1,N)}
W(1,min)=min{W(1,1)、W(1,2)、…、W(1,n)、…、W(1,N)}
C(1,max)=max{C(1,1)、C(1,2)、…、C(1,n)、…、C(1,N)}
R H(1,min)×W(1,min)×C(1,max) features of an image of length H (1, min) and width W (1, min) in C (1, max) dimensions are represented; min represents a minimum value and max represents a maximum value;
adding the converted feature graphs { G (1, N, a) |1 is more than or equal to N and less than or equal to N } along channel dimension elements by a backbone network to obtain a target feature graph G (1); g (1) ∈R H(1,min)×W(1,min)×C(1,max)
Specific:
G(1,n,a)={g(i,j,n,a)|1≤i≤H(1,min);1≤j≤W(1,min)}
g (i, j, n, a) represents feature data of a sub-region of the converted feature map G (1, n, a) whose image coordinates are (i, j);
g(i,j,n,a)={g(i,j,n,c1)|1≤c1≤C(1,max)}
G(1)={g1(i,j,c1)|1≤i≤H(1,min);1≤j≤W(1,min);1≤c1≤C(1,max)}
g1(i,j,c1)=∑ n=1 N g(i,j,n,c1)
g (i, j, n, c 1) represents data in the c1 st dimension in g (i, j, n, a); g1 (i, j, c 1) represents data in the c1 st dimension among the feature data of the sub-region of the target feature map G (1) whose image coordinates are (i, j).
The regional level attention extraction module consists of Q domain feature extractors, q=h (1, min) ×w (1, min); the domain feature extractor corresponds to the number of sub-regions in the target feature map G (1) one by one.
The input of the regional level attention extraction module is connected with the output of the backbone network and is used for acquiring a target feature graph G (1); the domain feature extractor is used for acquiring the space attention value of the corresponding subarea in G (1); the output of the regional level attention extraction module is a spatial attention vector G (2) consisting of the spatial attention values of all the subregions in G (1);
G(2)={g2(i,j,c1)|1≤i≤H(1,min);1≤j≤W(1,min);1≤c1≤C1}
g2 (i, j) represents the spatial attention value of the sub-region of G (1) with image coordinates (i, j), G2 (i, j, c 1) represents the value in the c 1-th dimension of G2 (i, j); c1 represents the total number of dimensions of G (2), c1=c (1, max).
The input of the global average pooling layer and the input of the global maximum pooling layer are vectors G (3) obtained by adding G (1) and G (2) along channel dimension elements;
G(3)={g3(i,j,c1)|1≤i≤H(1,min);1≤j≤W(1,min);1≤c1≤C1}
g3(i,j,c1)=g1(i,j,c1)+g2(i,j,c1)
where G3 (i, j, c 1) represents the feature value of the sub-region of the stitching feature vector G (3) with the image coordinate (i, j).
The global average pooling layer is used for carrying out average pooling processing on the characteristic values of the subareas in G (3) so as to output an average spatial attention vector A (avg); the global maximum pooling layer is used for carrying out maximum pooling processing on the characteristic values of the subareas in G (3) so as to output a maximum space attention vector A (max);
A(avg)={a(avg,c1)|1≤c1≤C1};a(avg,c1)=∑ i=1 H(1,min)j=1 W(1,min) g3(i,j,c1)/Q
A(max)={a(max,c1)|1≤c1≤C1};a(max,c1)=max{g3(i,j,c1)|1≤i≤H(1,min);1≤j≤W(1,min)}
wherein a (avg, c 1) represents a value in the c1 st dimension in a (avg); a (max, c 1) represents a value in the c1 st dimension in a (max).
The input of the cascade layer is connected with the output of the global average pooling layer and the output of the global maximum pooling layer, and the cascade layer splices the average spatial attention vector A (avg) and the maximum spatial attention vector A (max) along channel dimension elements and then outputs a spliced vector A, wherein A=A (avg) ||A (max); i.e. the dimension of a is 2 xc 1.
The input of the full-connection bottleneck layer is connected with the output of the cascade layer, and the full-connection bottleneck layer performs dimension reduction processing on the spliced vector A to obtain a low-dimension vector A1 and outputs the low-dimension vector A1.
In practice, C1 may be set to 2048, i.e., a (avg) and a (max) are both 2048-dimensional vectors, and a is a 4096-dimensional vector. A1 may be specifically set to a 256-dimensional vector.
The input of the residual characteristic correction module is connected with the output of the full-connection bottleneck layer, and the residual characteristic correction module carries out residual correction on the input low-dimensional vector A1 and outputs a correction vector A2.
The input of the full-connection classification layer is a target vector A3 obtained by adding the elements A1 and A2 along the channel dimension, and the full-connection classification layer judges and outputs the classification result of the original image in the target domain according to the target vector A3.
A={a0(c2)|1≤c2≤2×C1}
A1={a1(c3)|1≤c3≤C3}
A2={a2(c3)|1≤c3≤C3}
A3={a3(c3)|1≤c3≤C3}
a3(c3)=a1(c3)+a2(c3)
a0 (C2) represents data in the C2-th dimension of the stitching vector a, A1 (C3) represents data in the C3-th dimension of the low-dimensional vector A1, A2 (C3) represents data in the C3-th dimension of the correction vector A2, A3 (C3) represents data in the C3-th dimension of the vector A3, and C3 represents the number of dimensions of the full-connection bottleneck layer output data.
As shown in fig. 2 and 3, the training method of the big data cross-domain classification model includes the following steps:
s1, constructing a basic model, a source domain data set Xs containing labeled source domain samples (Xs, ys) and a target domain data set Xt containing unlabeled target domain samples Xt; xs is the original image from the source domain, ys is the category to which xs belongs in the source domain; xt is the original image from the unknown class of the target domain;
the basic model comprises a backbone network, a regional level attention extraction module, a global average pooling layer, a global maximum pooling layer, a cascade layer, a full-connection bottleneck layer, a residual error characteristic correction module and a full-connection classification layer; the input of the backbone network is the input of the basic model, namely the original image, and the output of the backbone network is connected with the input of the regional level attention extraction module; the input of the global average pooling layer and the input of the global maximum pooling layer are respectively connected with the output of the backbone network and the output of the regional level attention extraction module; the input of the cascade layer is respectively connected with the output of the global average pooling layer and the output of the global maximum pooling layer; the output of the cascade layer is connected with the input of the full-connection bottleneck layer, and the output of the full-connection bottleneck layer is respectively connected with the input of the residual error characteristic correction module and the input of the full-connection classification layer; the input of the full-connection classification layer is also connected with the output of the residual characteristic correction module, and the output of the full-connection classification layer is the classification result of the original image;
in the basic model, each domain feature extractor is also connected with a corresponding domain discriminator, and the domain discriminator judges the probability that the sub-region belongs to the source domain according to the space attention value of the sub-region extracted by the corresponding domain feature extractor;
in the training process, if the original image comes from the source domain, a classification result of the original image on the source domain is obtained according to the low-dimensional vector A1; namely, making a low-dimensional vector A1 output by a full-connection bottleneck layer be denoted As As1, and acquiring a classification category Ys of an original image on a source domain by a full-connection classification layer according to the As 1;
if the original image is from the target domain, obtaining a classification result of the original image on the target domain according to the vector A3; namely, let the low-dimensional vector A1 output by the full-connection bottleneck layer be denoted as At1, the residual error feature correction module performs residual error correction on the low-dimensional vector At1 and outputs a correction vector At2; at1 and At2 are added along channel dimension elements and then are marked as At3, and the fully-connected classification layer acquires classification categories Yt of the original image on the target domain according to the At 3.
Specifically, the full-connection classification layer acquires the class probability distribution of the original image, and then outputs the class corresponding to the maximum probability as the classification result of the original image.
S2, selecting m1 labeled source domain samples (Xs, ys) from a source domain data set Xs to form a source domain training set, and selecting m2 unlabeled target domain samples Xt from a target domain data set Xt to form a target domain training set; enabling the basic model to perform machine learning on the source domain training set and the target domain training set, and reversely updating parameters of the basic model by combining the set loss function L;
s3, judging whether the basic model converges or not; if not, returning to the step S2; and if so, extracting a big data cross-domain image classification model from the converged basic model.
L=L(cls)+α×L(mmd)+β×L(ent)-λ×L(att)
Alpha, beta and lambda are all manually set trade-off factors;
l (mmd) is the domain difference penalty;
L(mmd)=∑ k=1 |Cs| [w(k)×L(mmd,k)]
cs represents a set of source domain categories, ys e Cs, |cs| represents the number of source domain categories, w (k) represents the weight of the kth source domain category, and L (mmd, k) represents the judgment loss of the kth source domain category.
w(k)=∑ u=1 Nt pt(u,k)
pt (u, k) represents the probability that the ith original image xt belongs to the kth source domain category in the target domain training set predicted by the basic model, and Nt represents the number of unlabeled target domain samples xt in the target domain training set.
L(mmd,k)=||ε1-ε2|| 2
ε1=∑ uk=1 N(k,s) [ψ(A(1,k,s,uk))/N(k,s)]
ε2=∑ uk=1 N(k,t) [pt(uk,k)×ψ(A(1,k,t,uk))/N(k,t)]
Wherein ε 1 and ε 2 are transition terms; let X (k, s) represent a set of labeled source domain samples (xs, ys) belonging to a kth source domain class in the source domain training set, N (k, s) represent the number of samples in X (k, s), and a (1, k, s, uk) represent a dimension-reduction vector A1 corresponding to a uk sample xs in X (k, s); let X (k, t) denote a set of unlabeled target domain samples in the target domain training set that are judged by the base model to be the kth source domain class, N (k, t) denote the number of samples in X (k, t), pt (uk, k) denote the probability that the kth sample xt in X (k, t) corresponds to the kth source domain class in the class probability distribution output by the base model, A (1, k, t, uk) denote the dimension-reduction vector A1 that the kth sample xt corresponds to in X (k, t); ||# | representation #, is a binary norm of (2); psi (#) represents the kernel mapping function value of # and specifically, psi can be set as a Gaussian kernel function; that is, in the present embodiment, ψ (a (1, k, s, uk)) represents the gaussian kernel function of a (1, k, s, uk), and ψ (a (1, k, t, uk)) represents the gaussian kernel function of a (1, k, t, uk).
L (ent) represents the target entropy canonical loss:
L(ent)=[1/Nt]×∑ u=1 Nt [H(pt(u,max))]
pt (u, max) represents the probability corresponding to the classification result of the u-th sample xt in the target domain training set, and the probability corresponding to the classification result of the original image is the maximum probability in the classification probability distribution of the original image output by the basic model; h (pt (u, max)) represents the information entropy of pt (u, max), specifically calculated according to shannon's formula; nt represents the number of samples in the target domain training set.
L (att) represents the attention loss:
L(att)=[1/(Q×N0)]×∑ q=1 Qu=1 N0 Ld[G2(q,u),d(u)]
the union set of the target domain training sample set and the source domain training sample set is made to be a total training sample set, and G2 (q, u) represents the probability that the corresponding sub-region on the original image of the (u) th field of the total training sample set output by the (q) th field of the discriminant belongs to the source domain; n0 is the number of samples in the total training sample set, n0=ns+nt; q is the number of domain discriminators in the domain-level attention extraction module;
ld [ G2 (q, u), d (u) ] represents a binary cross entropy loss function of G2 (q, u) and d (u); d (u) is a binary number, if the u-th original image in the total training sample set belongs to the source domain training sample, d (u) =0; d (u) =1 if the u-th original image in the total training sample set belongs to the target domain training sample.
L (cls) represents source domain classification loss:
L(cls)=[1/Ns]×∑ us=1 Ns Ly(Ys(us),ys(us))
ys (us) represents an |Cs|dimensional independent heat vector corresponding to the classification result of the us sample xs in the source domain training set by the basic model; ys (us) represents an |Cs|dimensional independent heat vector corresponding to a true class ys of a us th sample xs in the source domain training set; ly (Ys (us), ys (us)) represents the cross entropy loss function of Ys (us) and Ys (us); ns is the number of samples in the source domain training set.
The image classification method based on the big data cross-domain image classification model provided in the embodiment is used for classifying the acquired images under the specified scene.
As shown in fig. 4, the image classification method based on the big data cross-domain image classification model specifically includes the following steps:
SA1, acquiring a labeled source domain image and a label-free target domain image corresponding to a designated application scene; the label-free target domain image of the appointed application scene is an image acquired by appointed equipment in the appointed application scene; the active label source domain image corresponding to the appointed application scene is an image of a known source domain category acquired by known equipment under the appointed application scene; the source domain category and the target domain category partially overlap or completely overlap;
SA2, enabling the labeled source domain image to be used as a labeled source domain sample (xs, ys), and enabling the unlabeled target domain image to be used as an unlabeled target domain sample xt; acquiring a big data cross-domain image classification model by combining the training method of the big data cross-domain image classification model;
SA3, inputting the target domain image to be tested into a big data cross-domain image classification model, and outputting a classification result of the target domain image to be tested on the target domain by the big data cross-domain image classification model.
The classification accuracy of the big data cross-domain image classification model on the target domain is verified by combining a specific embodiment.
In this embodiment, the diagnosis of a lung disease based on X-ray images is taken as an example, and specific experiments are adopted to verify and explain the two lung disease X-ray image data sets. In this embodiment, the backbone network adopts a res net-50 network, and let n=3, and image feature data output in the last 3 stages of the res net-50 network is taken as a feature map.
The experimental environment is as follows:
operating system: ubuntu 18.04;
memory: 128GB;
CPU:Intel (R) Xeon (R) ;
GPU:NVIDIA TITAN X(12GB)。
the algorithm code according to the embodiment is implemented by using a Python language and a Pytorch framework.
The present embodiment performs an experiment based on two Kaggle contest public data sets, which are respectively denoted as site1 data set and site2 data set.
The site1 dataset contains 9690X-ray image samples, the image samples being divided into 5 categories; the site2 data set comprises 6071X-ray image samples, and the image samples are divided into 4; the site1 dataset and the site2 dataset have 3 shared category labels. Specific details of the two data sets are shown in table 1.
Table 1 data set information description vs. table
Figure SMS_1
As can be seen from table 1, the three items of "normal", "one-type lung disease" and "two-type lung disease" are shared class labels of the site1 data set and the site2 data set, the three items of "three-type lung disease" and "pulmonary tuberculosis" are class labels specific to the site1 data set, and the "lung shadow" is a class label specific to the site2 data set.
In this embodiment, two sets of experiments were performed, one set using the site1 dataset as the source domain and the site2 dataset as the target domain, denoted as site-site 2; one set is denoted as site2-site 1 with the site2 dataset as the source domain and the site1 dataset as the target domain.
In this embodiment, for each set of experiments, five algorithms are used to obtain the image classification model of the target domain, and then the performance of the image classification model is tested on the target domain, and specifically, accuracy (accuracies), sensitivity (sensitivity), specificity (specificity) and F1 score are used as performance indexes of the image classification model.
The five algorithms are respectively a big data cross-domain image classification model training method (called the method for short), a ResNet-50 network, a partial domain adaptation method ETN, a partial domain adaptation method DRCN and a spatial attention mechanism based method TADA.
The performance index of the model obtained by the five algorithms in the two sets of experiments is shown with reference to fig. 5 (a) to (d), respectively.
From FIGS. 5 (a) - (d), it can be seen that the method of the present invention is superior to the other four comparative methods in four indices. Specifically, taking experiment site1-site2 as an example, the method of the invention is improved by 5.4%, 3.8%, 2.0% and 3.6% on four evaluation indexes respectively compared with the method TADA based on a space attention mechanism, which is the second best. These results show that the method of the invention can fully utilize partial domain adaptation and spatial attention mechanisms to perform large data cross-domain image analysis.
Referring to fig. 6 (a) - (b), the category weight assignment of the method of the present invention and the spatial attention mechanism based method TADA in the experiments site1-site2 are shown, respectively. It can be seen from the figure that TADA cannot mitigate the detrimental semantic negative migration produced by outliers, which ultimately results in suboptimal performance of TADA. However, the method of the present invention assigns a far greater weight to all shared categories than to the feature category, which is less than 0.1. The result proves that the target semantic feature residual correction can automatically reduce the weight of outlier data, and the cross-domain distribution difference is more focused, so that the higher-precision image classification on the target domain is realized through partial domain adaptation.
In the embodiment, through experiments performed on the site1 data set and the site2 data set, feasibility and superiority of the large-data cross-domain image classification method provided by the invention are verified from two aspects of four evaluation indexes and sample weight distribution.
The above embodiments are merely preferred embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A training method for a big data cross-domain image classification model is characterized by comprising the following steps:
s1, constructing a basic model, a source domain data set Xs containing labeled source domain samples (Xs, ys) and a target domain data set Xt containing unlabeled target domain samples Xt; xs is the original image from the source domain, ys is the category to which xs belongs in the source domain; xt is the original image from the unknown class of the target domain;
the basic model comprises a backbone network, a regional level attention extraction module, a global average pooling layer, a global maximum pooling layer, a cascade layer, a full-connection bottleneck layer, a residual error characteristic correction module and a full-connection classification layer;
the backbone network is used for obtaining feature images of the original image under a plurality of different sizes, and combining all the feature images to obtain a target feature image G (1), G (1) ∈R H(1,min)×W(1,min)×C(1,max) H (1, min) is the minimum length of the plurality of feature maps, W (1, min) is the plurality of feature mapsC (1, max) is the largest feature dimension in the plurality of feature maps;
the regional level attention extraction module respectively extracts the spatial attention values on the H (1, min) multiplied by W (1, min) sub-regions according to the target feature graph G (1), and outputs a spatial attention vector G (2) formed by combining the spatial attention values on each sub-region, G (2) epsilon R H(1,min)×W(1,min)×C(1,max)
Let the vector of the target feature map G (1) and the spatial attention vector G (2) added along the channel dimension element be denoted as G (3); the global average pooling layer is used for carrying out average pooling processing on the characteristic values of the subareas in G (3) so as to output an average spatial attention vector A (avg); the global maximum pooling layer is used for carrying out maximum pooling processing on the characteristic values of the subareas in G (3) so as to output a maximum space attention vector A (max);
the cascade layer splices the average spatial attention vector A (avg) and the maximum spatial attention vector A (max) along channel dimension elements and then outputs a spliced vector A; the full-connection bottleneck layer performs dimension reduction processing on the spliced vector A to obtain a low-dimension vector A1; the residual characteristic correction module performs residual correction on the input low-dimensional vector A1 and outputs a correction vector A2; adding the elements A1 and A2 along the channel dimension to obtain a target vector A3;
the full-connection classification layer judges the category of the original image according to the low-dimensional vector A1 aiming at the source domain; the full-connection classification layer judges the category of the original image according to the target vector A3 aiming at the target domain;
s2, selecting m1 labeled source domain samples (Xs, ys) from a source domain data set Xs to form a source domain training set, and selecting m2 unlabeled target domain samples Xt from a target domain data set Xt to form a target domain training set; enabling the basic model to perform machine learning on the source domain training set and the target domain training set, and reversely updating parameters of the basic model by combining the set loss function L;
s3, judging whether the basic model converges or not; if not, returning to the step S2; if yes, extracting a big data cross-domain image classification model from the converged basic model; the big data cross-domain image classification model is used for image classification over the target domain.
2. The big data cross-domain image classification model training method of claim 1, wherein the loss function L is a weighted summation of a plurality of feature losses; the feature loss includes a loss of attention L (att);
the regional level attention extraction module consists of Q domain feature extractors, q=h (1, min) ×w (1, min); the field feature extractors are in one-to-one correspondence with the number of subareas in the target feature map G (1); each domain feature extractor is also connected with a corresponding domain discriminator, and the domain discriminator judges the probability that the subarea belongs to the source domain according to the space attention value of the subarea extracted by the corresponding domain feature extractor;
L(att)=[1/(Q×N0)]×∑ q=1 Qu=1 N0 Ld[G2(q,u),d(u)]
the union set of the target domain training sample set and the source domain training sample set is made to be a total training sample set, G2 (Q, u) represents the probability that the corresponding sub-region on the original image of the (u) th field in the total training sample set output by the (Q) th field discriminator belongs to the source domain, and Q is more than or equal to 1 and less than or equal to Q; n0 is the number of samples in the total training sample set;
d (u) is a binary number, if the u-th original image in the total training sample set belongs to the source domain training sample, d (u) =0; d (u) =1 if the u-th original image in the total training sample set belongs to the target domain training sample;
ld [ G2 (q, u), d (u) ] represents a binary cross entropy loss function of G2 (q, u) and d (u).
3. The big data cross-domain image classification model training method of claim 2, wherein the feature loss further comprises a domain difference loss L (mmd):
L(mmd)=∑ k=1 |Cs| [w(k)×L(mmd,k)]
cs represents a source domain class set, |cs| represents the number of source domain classes, w (k) represents the weight of the kth source domain class, and L (mmd, k) represents the judgment loss of the kth class of the source domain;
w(k)=∑ u=1 Nt pt(u,k)
pt (u, k) represents the probability that the ith unlabeled target domain sample xt in the target domain training set predicted by the basic model belongs to the kth source domain class, and Nt represents the number of unlabeled target domain samples xt in the target domain training set;
L(mmd,k)=||ε1-ε2|| 2
ε1=∑ uk=1 N(k,s) [ψ(A(1,k,s,uk))/N(k,s)]
ε2=∑ uk=1 N(k,t) [pt(uk,k)×ψ(A(1,k,t,uk))/N(k,t)]
wherein ε 1 and ε 2 are transition terms; let X (k, s) represent a set of labeled source domain samples (xs, ys) belonging to a kth source domain class in the source domain training set, N (k, s) represent the number of samples in X (k, s), and a (1, k, s, uk) represent a dimension-reduction vector A1 corresponding to a uk sample xs in X (k, s); let X (k, t) denote a set of unlabeled target domain samples in the target domain training set that are judged by the base model to be the kth source domain class, N (k, t) denote the number of samples in X (k, t), pt (uk, k) denote the probability that the kth sample xt in X (k, t) corresponds to the kth source domain class in the class probability distribution output by the base model, A (1, k, t, uk) denote the dimension-reduction vector A1 that the kth sample xt corresponds to in X (k, t); ||# | representation #, is a binary norm of (2); ψ (#) represents the kernel mapping function value of #.
4. The big data cross-domain image classification model training method of claim 2, wherein the feature loss further comprises a source domain classification loss L (cls):
L(cls)=[1/Ns]×∑ us=1 Ns Ly(Ys(us),ys(us))
ys (us) represents an |Cs|dimensional independent heat vector corresponding to the classification result of the us sample xs in the source domain training set by the basic model; ys (us) represents an |Cs|dimensional independent heat vector corresponding to a true class ys of a us th sample xs in the source domain training set; ly (Ys (us), ys (us)) represents the cross entropy loss function of Ys (us) and Ys (us); ns is the number of samples in the source domain training set.
5. The big data cross-domain image classification model training method of claim 2, wherein the feature loss further comprises a target entropy regularization loss L (ent):
L(ent)=[1/Nt]×∑ u=1 Nt [H(pt(u,max))]
pt (u, max) represents the maximum probability in the class probability distribution of the u-th sample xt in the target domain training set; h (pt (u, max)) represents the information entropy of pt (u, max); nt represents the number of samples in the target domain training set.
6. The big data cross-domain image classification model training method according to claim 1, wherein a backbone network adopts a ResNet-50 network, the backbone network is used for obtaining feature images of an original image under a plurality of different sizes, the ResNet-50 network takes image feature data output in the last three stages as feature images, and then the target feature image G (1) is obtained by combining all the feature images.
7. The big data cross-domain image classification method is characterized by comprising the following steps of;
SA1, acquiring a labeled source domain image and a label-free target domain image corresponding to a designated application scene; the label-free target domain image is an image collected by the appointed equipment in the appointed application scene; the labeled source domain image is an image of a known source domain category acquired by known equipment under the appointed application scene; the source domain category and the target domain category partially overlap or completely overlap;
SA2, enabling the labeled source domain image to be used as a labeled source domain sample (xs, ys), and enabling the unlabeled target domain image to be used as an unlabeled target domain sample xt; acquiring a big data cross-domain image classification model by combining the training method of the big data cross-domain image classification model according to any one of claims 1-6;
SA3, inputting the target domain image to be tested into a big data cross-domain image classification model, and outputting a classification result of the target domain image to be tested on the target domain by the big data cross-domain image classification model.
8. A big data cross-domain image classification system comprising a memory having stored therein a big data cross-domain image classification model and a computer program which when executed is adapted to implement the big data cross-domain image classification method of claim 7.
9. The big data cross-domain image classification system of claim 8, further comprising a processor coupled to the memory, the processor for executing the computer program to implement the big data cross-domain image classification method of claim 7.
10. A big data cross-domain image classification system, characterized in that a computer program is carried, which computer program, when executed, is adapted to implement a training method of the big data cross-domain image classification model according to any of claims 1-6, to obtain a big data cross-domain image classification model for classifying a target domain image.
CN202310644725.3A 2023-06-02 2023-06-02 Big data cross-domain image classification model training method, image classification method and system Active CN116385808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310644725.3A CN116385808B (en) 2023-06-02 2023-06-02 Big data cross-domain image classification model training method, image classification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310644725.3A CN116385808B (en) 2023-06-02 2023-06-02 Big data cross-domain image classification model training method, image classification method and system

Publications (2)

Publication Number Publication Date
CN116385808A true CN116385808A (en) 2023-07-04
CN116385808B CN116385808B (en) 2023-08-01

Family

ID=86971472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310644725.3A Active CN116385808B (en) 2023-06-02 2023-06-02 Big data cross-domain image classification model training method, image classification method and system

Country Status (1)

Country Link
CN (1) CN116385808B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116777896A (en) * 2023-07-07 2023-09-19 浙江大学 Negative migration inhibition method for cross-domain classification and identification of apparent defects

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929670A (en) * 2019-12-02 2020-03-27 合肥城市云数据中心股份有限公司 Muck truck cleanliness video identification and analysis method based on yolo3 technology
US20210110306A1 (en) * 2019-10-14 2021-04-15 Visa International Service Association Meta-transfer learning via contextual invariants for cross-domain recommendation
CN112733788A (en) * 2021-01-20 2021-04-30 武汉大学 Cross-sensor migration-based high-resolution remote sensing image impervious surface extraction method
WO2021097774A1 (en) * 2019-11-21 2021-05-27 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for multi-source domain adaptation for semantic segmentation
CN113344044A (en) * 2021-05-21 2021-09-03 北京工业大学 Cross-species medical image classification method based on domain self-adaptation
CN114092964A (en) * 2021-10-19 2022-02-25 杭州电子科技大学 Cross-domain pedestrian re-identification method based on attention guidance and multi-scale label generation
CN114973350A (en) * 2022-03-24 2022-08-30 西北工业大学 Cross-domain facial expression recognition method irrelevant to source domain data
CN115080699A (en) * 2022-07-04 2022-09-20 福州大学 Cross-modal retrieval method based on modal specific adaptive scaling and attention network
CN115578593A (en) * 2022-10-19 2023-01-06 北京建筑大学 Domain adaptation method using residual attention module
CN116070696A (en) * 2023-01-10 2023-05-05 中国兵器装备集团自动化研究所有限公司 Cross-domain data deep migration method, device, equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210110306A1 (en) * 2019-10-14 2021-04-15 Visa International Service Association Meta-transfer learning via contextual invariants for cross-domain recommendation
WO2021097774A1 (en) * 2019-11-21 2021-05-27 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for multi-source domain adaptation for semantic segmentation
CN110929670A (en) * 2019-12-02 2020-03-27 合肥城市云数据中心股份有限公司 Muck truck cleanliness video identification and analysis method based on yolo3 technology
CN112733788A (en) * 2021-01-20 2021-04-30 武汉大学 Cross-sensor migration-based high-resolution remote sensing image impervious surface extraction method
CN113344044A (en) * 2021-05-21 2021-09-03 北京工业大学 Cross-species medical image classification method based on domain self-adaptation
CN114092964A (en) * 2021-10-19 2022-02-25 杭州电子科技大学 Cross-domain pedestrian re-identification method based on attention guidance and multi-scale label generation
CN114973350A (en) * 2022-03-24 2022-08-30 西北工业大学 Cross-domain facial expression recognition method irrelevant to source domain data
CN115080699A (en) * 2022-07-04 2022-09-20 福州大学 Cross-modal retrieval method based on modal specific adaptive scaling and attention network
CN115578593A (en) * 2022-10-19 2023-01-06 北京建筑大学 Domain adaptation method using residual attention module
CN116070696A (en) * 2023-01-10 2023-05-05 中国兵器装备集团自动化研究所有限公司 Cross-domain data deep migration method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DONGBO XI 等: "Domain Adaptation with Category Attention Network for Deep Sentiment Analysis", 《HTTPS://ARXIV.ORG/ABS/2112.15290》 *
亢洁;李佳伟;杨思力;: "基于域适应卷积神经网络的人脸表情识别", 计算机工程, no. 12 *
朱磊 等: "基于残差网络和注意力机制的步态识别算法", 《北大核心》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116777896A (en) * 2023-07-07 2023-09-19 浙江大学 Negative migration inhibition method for cross-domain classification and identification of apparent defects
CN116777896B (en) * 2023-07-07 2024-03-19 浙江大学 Negative migration inhibition method for cross-domain classification and identification of apparent defects

Also Published As

Publication number Publication date
CN116385808B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN110956185B (en) Method for detecting image salient object
Liang et al. Instance segmentation in 3d scenes using semantic superpoint tree networks
Chuang et al. A feature learning and object recognition framework for underwater fish images
Lalitha et al. A survey on image segmentation through clustering algorithm
CN108629783B (en) Image segmentation method, system and medium based on image feature density peak search
CN116385808B (en) Big data cross-domain image classification model training method, image classification method and system
Hara et al. Attentional network for visual object detection
CN108764244B (en) Potential target area detection method based on convolutional neural network and conditional random field
CN114694038A (en) High-resolution remote sensing image classification method and system based on deep learning
Kim et al. Improving discrimination ability of convolutional neural networks by hybrid learning
JP4567660B2 (en) A method for determining a segment of an object in an electronic image.
Li et al. Learning to learn cropping models for different aspect ratio requirements
Sahu et al. Dynamic routing using inter capsule routing protocol between capsules
CN116129286A (en) Method for classifying graphic neural network remote sensing images based on knowledge graph
Yadav et al. Image segmentation techniques: a survey
Liu et al. 3d-queryis: A query-based framework for 3d instance segmentation
Cai et al. IOS-Net: An inside-to-outside supervision network for scale robust text detection in the wild
CN111462132A (en) Video object segmentation method and system based on deep learning
CN110516638B (en) Sign language recognition method based on track and random forest
Costa et al. Cluster analysis using self-organizing maps and image processing techniques
Falcão et al. The role of optimum connectivity in image segmentation: Can the algorithm learn object information during the process?
JP2017117025A (en) Pattern identification method, device thereof, and program thereof
Jiang et al. An optimized higher order CRF for automated labeling and segmentation of video objects
Wang et al. Self-supervised learning for high-resolution remote sensing images change detection with variational information bottleneck
CN114187440A (en) Small sample target detection system and method based on dynamic classifier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant