CN116385808A

CN116385808A - Big data cross-domain image classification model training method, image classification method and system

Info

Publication number: CN116385808A
Application number: CN202310644725.3A
Authority: CN
Inventors: 谢贻富; 范武松; 刘宇; 刘文涛
Original assignee: Hefei City Cloud Data Center Co ltd
Current assignee: Hefei City Cloud Data Center Co ltd
Priority date: 2023-06-02
Filing date: 2023-06-02
Publication date: 2023-07-04
Anticipated expiration: 2043-06-02
Also published as: CN116385808B

Abstract

The invention relates to the technical field of self-adaption in the fields of computers and depths, in particular to a training method, an image classification method and a system for a big data cross-domain image classification model. The basic model constructed by the invention comprises a backbone network, a regional level attention extraction module, a global average pooling layer, a global maximum pooling layer, a cascade layer, a full-connection bottleneck layer, a residual characteristic correction module and a full-connection classification layer. According to the invention, the residual characteristic correction module is inserted after the task specific layer, so that the explicit learning cross-domain difference is realized, the most relevant source domain category is allocated with a larger weight by minimizing the proposed domain identification loss, the cross-domain migration performance is excellent, and the semantic negative migration is effectively relieved. When the number of source domain labels is more than that of target domains, the existence of the residual characteristic correction block is beneficial to reducing semantic negative migration, and the accuracy of the finally obtained big data cross-domain image classification model on the target domains is ensured.

Description

Big data cross-domain image classification model training method, image classification method and system

Technical Field

The invention relates to the technical field of self-adaption in the fields of computers and depths, in particular to a training method, an image classification method and a system for a big data cross-domain image classification model.

Background

In recent years, large data and deep learning are widely used for tasks such as image analysis, natural language processing, and image generation. However, most of the most advanced models rely mainly on large amounts of marking data, which is time consuming and costly, and some special-scene (e.g. medical image analysis) models even have no access to the data. This means that models need to have excellent generalization capability to migrate knowledge learned from limited, accessible datasets to new fields.

A number of Artificial Intelligence (AI) techniques are further improving the accuracy and efficiency of image classification, image segmentation, and text generation, as affected by the rapid development of deep learning and computer vision techniques. However, deep neural networks tend to identify a given visual pattern extracted from rich marker data, susceptible to inter-domain heterogeneity caused by different devices, scanning protocols, and background interference, a phenomenon also known as domain shifting.

A commonly used method for solving the data migration problem is domain adaptation, and the following limitations still exist when the existing unsupervised domain adaptation method is applied to cross-domain image analysis. 1) Space negative migration problem: the existing domain self-adaptive method mainly regards images as a whole, and then performs cross-domain alignment on the characterization extracted by the convolutional neural network without considering complex distribution of different areas. It is apparent that the mobility possessed by different areas of the image is different. Certain regions in the image (e.g., the background) may not contribute much to domain adaptation, and forced alignment of these regions may lead to negative migration of extraneous knowledge; 2) Semantic negative migration: most of the current unsupervised domain adaptation methods only alleviate the edge distribution difference, and do not consider semantic (i.e. tag distribution) diversity of the two domains. When the source domain label space is a superset of the target domain label space, the domain adaptation model cannot determine which semantic features belong to the shared label space, and the shared label refers to a label which belongs to both the source domain and the target domain.

Therefore, how to overcome the problems of space negative migration and semantic negative migration at the same time, and realizing effective and feasible self-adaption in the unsupervised field has become a technical problem to be solved urgently.

Disclosure of Invention

In order to overcome the defects of space negative migration and semantic negative migration existing in the field self-adaption in the prior art, the invention provides a large-data cross-domain image classification model training method, and the defects of large cross-domain distribution difference of multi-center image data, poor model generalization and easiness in negative migration are overcome.

The invention provides a big data cross-domain image classification model training method, which comprises the following steps:

s1, constructing a basic model, a source domain data set Xs containing labeled source domain samples (Xs, ys) and a target domain data set Xt containing unlabeled target domain samples Xt; xs is the original image from the source domain, ys is the category to which xs belongs in the source domain; xt is the original image from the unknown class of the target domain;

the basic model comprises a backbone network, a regional level attention extraction module, a global average pooling layer, a global maximum pooling layer, a cascade layer, a full-connection bottleneck layer, a residual error characteristic correction module and a full-connection classification layer;

the backbone network is used for obtaining feature images of the original image under a plurality of different sizes, and combining all the feature images to obtain a target feature image G (1), G (1) ∈R ^{H(1,min)×W(1,min)×C(1,max)} H (1, min) is the minimum length in the plurality of feature maps, W (1, min) is the minimum width in the plurality of feature maps, and C (1, max) is the maximum feature dimension in the plurality of feature maps;

the regional level attention extraction module extracts the spatial attention values on the H (1, min) multiplied by W (1, min) sub-regions respectively according to the target feature graph G (1), and the regional level attention extraction module outputs each sub-regionSpatial attention vector G (2), G (2) ∈R formed by combining spatial attention values on regions ^{H(1,min)×W(1,min)×C(1,max)} ；

Let the vector of the target feature map G (1) and the spatial attention vector G (2) added along the channel dimension element be denoted as G (3); the global average pooling layer is used for carrying out average pooling processing on the characteristic values of the subareas in G (3) so as to output an average spatial attention vector A (avg); the global maximum pooling layer is used for carrying out maximum pooling processing on the characteristic values of the subareas in G (3) so as to output a maximum space attention vector A (max);

the cascade layer splices the average spatial attention vector A (avg) and the maximum spatial attention vector A (max) along channel dimension elements and then outputs a spliced vector A; the full-connection bottleneck layer performs dimension reduction processing on the spliced vector A to obtain a low-dimension vector A1; the residual characteristic correction module performs residual correction on the input low-dimensional vector A1 and outputs a correction vector A2; adding the elements A1 and A2 along the channel dimension to obtain a target vector A3;

the full-connection classification layer judges the category of the original image according to the low-dimensional vector A1 aiming at the source domain; the full-connection classification layer judges the category of the original image according to the target vector A3 aiming at the target domain;

s2, selecting m1 labeled source domain samples (Xs, ys) from a source domain data set Xs to form a source domain training set, and selecting m2 unlabeled target domain samples Xt from a target domain data set Xt to form a target domain training set; enabling the basic model to perform machine learning on the source domain training set and the target domain training set, and reversely updating parameters of the basic model by combining the set loss function L;

s3, judging whether the basic model converges or not; if not, returning to the step S2; if yes, extracting a big data cross-domain image classification model from the converged basic model; the big data cross-domain image classification model is used for image classification over the target domain.

Preferably, the loss function L is a weighted sum of a plurality of characteristic losses; the feature loss includes a loss of attention L (att);

the regional level attention extraction module consists of Q domain feature extractors, q=h (1, min) ×w (1, min); the field feature extractors are in one-to-one correspondence with the number of subareas in the target feature map G (1); each domain feature extractor is also connected with a corresponding domain discriminator, and the domain discriminator judges the probability that the subarea belongs to the source domain according to the space attention value of the subarea extracted by the corresponding domain feature extractor;

L(att)=[1/(Q×N0)]×∑ _q=1 ^Q ∑ _u=1 ^N0 Ld[G2(q,u),d(u)]

the union set of the target domain training sample set and the source domain training sample set is made to be a total training sample set, G2 (Q, u) represents the probability that the corresponding sub-region on the original image of the (u) th field in the total training sample set output by the (Q) th field discriminator belongs to the source domain, and Q is more than or equal to 1 and less than or equal to Q; n0 is the number of samples in the total training sample set;

d (u) is a binary number, if the u-th original image in the total training sample set belongs to the source domain training sample, d (u) =0; d (u) =1 if the u-th original image in the total training sample set belongs to the target domain training sample;

ld [ G2 (q, u), d (u) ] represents a binary cross entropy loss function of G2 (q, u) and d (u).

Preferably, the feature loss further includes a domain difference loss L (mmd):

L(mmd)=∑ _k=1 ^|Cs| [w(k)×L(mmd,k)]

cs represents a source domain class set, |cs| represents the number of source domain classes, w (k) represents the weight of the kth source domain class, and L (mmd, k) represents the judgment loss of the kth class of the source domain;

w(k)=∑ _u=1 ^Nt pt(u,k)

pt (u, k) represents the probability that the ith unlabeled target domain sample xt in the target domain training set predicted by the basic model belongs to the kth source domain class, and Nt represents the number of unlabeled target domain samples xt in the target domain training set;

L(mmd,k)=||ε1-ε2|| ²

ε1=∑ _uk=1 ^N(k,s) [ψ(A(1,k,s,uk))/N(k,s)]

ε2=∑ _uk=1 ^N(k,t) [pt(uk,k)×ψ(A(1,k,t,uk))/N(k,t)]

wherein ε 1 and ε 2 are transition terms; let X (k, s) represent a set of labeled source domain samples (xs, ys) belonging to a kth source domain class in the source domain training set, N (k, s) represent the number of samples in X (k, s), and a (1, k, s, uk) represent a dimension-reduction vector A1 corresponding to a uk sample xs in X (k, s); let X (k, t) denote a set of unlabeled target domain samples in the target domain training set that are judged by the base model to be the kth source domain class, N (k, t) denote the number of samples in X (k, t), pt (uk, k) denote the probability that the kth sample xt in X (k, t) corresponds to the kth source domain class in the class probability distribution output by the base model, A (1, k, t, uk) denote the dimension-reduction vector A1 that the kth sample xt corresponds to in X (k, t); ||# | representation #, is a binary norm of (2); ψ (#) represents the kernel mapping function value of #.

Preferably, the feature loss further comprises a source domain classification loss L (cls):

L(cls)=[1/Ns]×∑ _us=1 ^Ns Ly(Ys(us),ys(us))

ys (us) represents an |Cs|dimensional independent heat vector corresponding to the classification result of the us sample xs in the source domain training set by the basic model; ys (us) represents an |Cs|dimensional independent heat vector corresponding to a true class ys of a us th sample xs in the source domain training set; ly (Ys (us), ys (us)) represents the cross entropy loss function of Ys (us) and Ys (us); ns is the number of samples in the source domain training set.

Preferably, the feature loss further comprises a target entropy canonical loss L (ent):

L(ent)=[1/Nt]×∑ _u=1 ^Nt [H(pt(u,max))]

pt (u, max) represents the maximum probability in the class probability distribution of the u-th sample xt in the target domain training set; h (pt (u, max)) represents the information entropy of pt (u, max); nt represents the number of samples in the target domain training set.

Preferably, the backbone network adopts a ResNet-50 network, the backbone network is used for obtaining feature images of the original image under a plurality of different sizes, the ResNet-50 network takes image feature data output in the last three stages as the feature images, and then the target feature image G (1) is obtained by combining all the feature images.

The invention also provides a big data cross-domain image classification method which can realize high-precision classification on a target domain and comprises the following steps of;

SA1, acquiring a labeled source domain image and a label-free target domain image corresponding to a designated application scene; the label-free target domain image is an image collected by the appointed equipment in the appointed application scene; the labeled source domain image is an image of a known source domain category acquired by known equipment under the appointed application scene; the source domain category and the target domain category partially overlap or completely overlap;

SA2, enabling the labeled source domain image to be used as a labeled source domain sample (xs, ys), and enabling the unlabeled target domain image to be used as an unlabeled target domain sample xt; acquiring a big data cross-domain image classification model by combining the training method of the big data cross-domain image classification model;

SA3, inputting the target domain image to be tested into a big data cross-domain image classification model, and outputting a classification result of the target domain image to be tested on the target domain by the big data cross-domain image classification model.

The invention also provides a big data cross-domain image classification system which is used for carrying the big data cross-domain image classification method and comprises a memory, wherein the memory stores a big data cross-domain image classification model and a computer program, and the computer program is used for realizing the big data cross-domain image classification method when being executed.

Preferably, the method further comprises a processor, wherein the processor is connected with the memory, and the processor is used for executing the computer program to realize the big data cross-domain image classification method.

The invention also provides another big data cross-domain image classification system for carrying the big data cross-domain image classification model training method, which carries a computer program, wherein the computer program is used for realizing the big data cross-domain image classification model training method when being executed so as to obtain the big data cross-domain image classification model for classifying the target domain image.

The invention has the advantages that:

1. according to the training method for the big data cross-domain image classification model, disclosed by the invention, the migration capability of fine granularity is explored through the domain discriminators of a plurality of local layers, so that the foreground region of the image can be focused more, and the space negative migration is effectively relieved.

2. According to the invention, the residual characteristic correction module is inserted after the task specific layer, so that explicit learning of the cross-domain difference is realized, and larger weight is distributed to the most relevant source domain category by minimizing the proposed domain difference loss, so that the method has excellent cross-domain migration performance, and meanwhile, the semantic negative migration is effectively relieved. When the number of source domain labels is more than that of target domains, the existence of the residual characteristic correction block is beneficial to reducing semantic negative migration, and the accuracy of the finally obtained big data cross-domain image classification model on the target domains is ensured.

3. The big data cross-domain image classification method provided by the invention is a part of domain adaptation method based on attention guidance under a part of domain adaptation scene, and the big data cross-domain image classification model provided by the invention is adopted to classify the target domain image, so that higher classification precision is realized.

4. The method comprises the steps of firstly carrying out image preprocessing and feature extraction through a backbone network, then carrying out regional cross-domain mobility generation based on multi-countermeasure learning, completing target semantic feature correction by residual error learning, and finally completing parameter optimization and result generation in an end-to-end mode. The feasibility and the superiority of the method are verified from two aspects of four evaluation indexes and sample weight distribution by carrying out experiments on two disclosed data sets.

Drawings

FIG. 1 is a topological diagram of a big data cross-domain image classification model;

FIG. 2 is a flow chart of a training method of a big data cross-domain image classification model;

FIG. 3 is a basic model topology;

FIG. 4 is a flow chart of a large data cross-domain image classification method;

FIG. 5 (a) is a graph showing the results of precision comparison under an X-ray image dataset of an embodiment;

FIG. 5 (b) is a comparison of sensitivity under an X-ray image dataset of an embodiment;

FIG. 5 (c) is a comparison of specificity under the X-ray image dataset of the example;

FIG. 5 (d) is a F1 score comparison result under the X-ray image dataset of the example;

FIG. 6 (a) is a graph of TADA method class weight assignment for site1-site2 tasks;

FIG. 6 (b) is a chart of the class weight assignment of the method of the present invention for site1-site2 tasks.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, the big data cross-domain image classification model provided in this embodiment includes: the system comprises a backbone network, a regional level attention extraction module, a global average pooling layer, a global maximum pooling layer, a cascade layer, a full-connection bottleneck layer, a residual error characteristic correction module and a full-connection classification layer;

the backbone network is used for acquiring feature images of the original image under a plurality of different sizes, and acquiring a target feature image by combining all the feature images;

specifically, the backbone network transforms all feature maps of the original image into feature maps with uniform structures through spatial downsampling and channel dimension upsampling, and then adds the feature maps with uniform structures along channel dimension elements to obtain a target feature map G (1).

Specifically, the backbone network is input as an original image, and output as N feature graphs G (1, 1), G (1, 2), …, G (1, N), …, G (1, N); g (1, N) represents an nth characteristic diagram, wherein N is more than or equal to 1 and less than or equal to N; g (1, n) ∈R ^{H(1,n)×W(1,n)×C(1,n)} That is, G (1, n) is used to describe the characteristics of an image of length H (1, n) and width W (1, n) in C (1, n) dimensions;

the backbone network performs space dimension conversion on the feature map, and the feature map after the feature map G (1, n) conversion is marked as G (1, n, a);

G(1,n,a)∈R ^{H(1,min)×W(1,min)×C(1,max)}

H(1,min)=min{H(1,1)、H(1,2)、…、H(1,n)、…、H(1,N)}

W(1,min)=min{W(1,1)、W(1,2)、…、W(1,n)、…、W(1,N)}

C(1,max)=max{C(1,1)、C(1,2)、…、C(1,n)、…、C(1,N)}

R ^{H(1,min)×W(1,min)×C(1,max)} features of an image of length H (1, min) and width W (1, min) in C (1, max) dimensions are represented; min represents a minimum value and max represents a maximum value;

adding the converted feature graphs { G (1, N, a) |1 is more than or equal to N and less than or equal to N } along channel dimension elements by a backbone network to obtain a target feature graph G (1); g (1) ∈R ^{H(1,min)×W(1,min)×C(1,max)} 。

Specific:

G(1,n,a)={g(i,j,n,a)|1≤i≤H(1,min);1≤j≤W(1,min)}

g (i, j, n, a) represents feature data of a sub-region of the converted feature map G (1, n, a) whose image coordinates are (i, j);

g(i,j,n,a)={g(i,j,n,c1)|1≤c1≤C(1,max)}

G(1)={g1(i,j,c1)|1≤i≤H(1,min);1≤j≤W(1,min);1≤c1≤C(1,max)}

g1(i,j,c1)=∑ _n=1 ^N g(i,j,n,c1)

g (i, j, n, c 1) represents data in the c1 st dimension in g (i, j, n, a); g1 (i, j, c 1) represents data in the c1 st dimension among the feature data of the sub-region of the target feature map G (1) whose image coordinates are (i, j).

The regional level attention extraction module consists of Q domain feature extractors, q=h (1, min) ×w (1, min); the domain feature extractor corresponds to the number of sub-regions in the target feature map G (1) one by one.

The input of the regional level attention extraction module is connected with the output of the backbone network and is used for acquiring a target feature graph G (1); the domain feature extractor is used for acquiring the space attention value of the corresponding subarea in G (1); the output of the regional level attention extraction module is a spatial attention vector G (2) consisting of the spatial attention values of all the subregions in G (1);

G(2)={g2(i,j,c1)|1≤i≤H(1,min);1≤j≤W(1,min);1≤c1≤C1}

g2 (i, j) represents the spatial attention value of the sub-region of G (1) with image coordinates (i, j), G2 (i, j, c 1) represents the value in the c 1-th dimension of G2 (i, j); c1 represents the total number of dimensions of G (2), c1=c (1, max).

The input of the global average pooling layer and the input of the global maximum pooling layer are vectors G (3) obtained by adding G (1) and G (2) along channel dimension elements;

G(3)={g3(i,j,c1)|1≤i≤H(1,min);1≤j≤W(1,min);1≤c1≤C1}

g3(i,j,c1)=g1(i,j,c1)+g2(i,j,c1)

where G3 (i, j, c 1) represents the feature value of the sub-region of the stitching feature vector G (3) with the image coordinate (i, j).

The global average pooling layer is used for carrying out average pooling processing on the characteristic values of the subareas in G (3) so as to output an average spatial attention vector A (avg); the global maximum pooling layer is used for carrying out maximum pooling processing on the characteristic values of the subareas in G (3) so as to output a maximum space attention vector A (max);

A(avg)={a(avg,c1)|1≤c1≤C1}；a(avg,c1)=∑ _i=1 ^H(1,min) ∑ _j=1 ^W(1,min) g3(i,j,c1)/Q

A(max)={a(max,c1)|1≤c1≤C1}；a(max,c1)=max{g3(i,j,c1)|1≤i≤H(1,min);1≤j≤W(1,min)}

wherein a (avg, c 1) represents a value in the c1 st dimension in a (avg); a (max, c 1) represents a value in the c1 st dimension in a (max).

The input of the cascade layer is connected with the output of the global average pooling layer and the output of the global maximum pooling layer, and the cascade layer splices the average spatial attention vector A (avg) and the maximum spatial attention vector A (max) along channel dimension elements and then outputs a spliced vector A, wherein A=A (avg) ||A (max); i.e. the dimension of a is 2 xc 1.

The input of the full-connection bottleneck layer is connected with the output of the cascade layer, and the full-connection bottleneck layer performs dimension reduction processing on the spliced vector A to obtain a low-dimension vector A1 and outputs the low-dimension vector A1.

In practice, C1 may be set to 2048, i.e., a (avg) and a (max) are both 2048-dimensional vectors, and a is a 4096-dimensional vector. A1 may be specifically set to a 256-dimensional vector.

The input of the residual characteristic correction module is connected with the output of the full-connection bottleneck layer, and the residual characteristic correction module carries out residual correction on the input low-dimensional vector A1 and outputs a correction vector A2.

The input of the full-connection classification layer is a target vector A3 obtained by adding the elements A1 and A2 along the channel dimension, and the full-connection classification layer judges and outputs the classification result of the original image in the target domain according to the target vector A3.

A={a0(c2)|1≤c2≤2×C1}

A1={a1(c3)|1≤c3≤C3}

A2={a2(c3)|1≤c3≤C3}

A3={a3(c3)|1≤c3≤C3}

a3(c3)=a1(c3)+a2(c3)

a0 (C2) represents data in the C2-th dimension of the stitching vector a, A1 (C3) represents data in the C3-th dimension of the low-dimensional vector A1, A2 (C3) represents data in the C3-th dimension of the correction vector A2, A3 (C3) represents data in the C3-th dimension of the vector A3, and C3 represents the number of dimensions of the full-connection bottleneck layer output data.

As shown in fig. 2 and 3, the training method of the big data cross-domain classification model includes the following steps:

the basic model comprises a backbone network, a regional level attention extraction module, a global average pooling layer, a global maximum pooling layer, a cascade layer, a full-connection bottleneck layer, a residual error characteristic correction module and a full-connection classification layer; the input of the backbone network is the input of the basic model, namely the original image, and the output of the backbone network is connected with the input of the regional level attention extraction module; the input of the global average pooling layer and the input of the global maximum pooling layer are respectively connected with the output of the backbone network and the output of the regional level attention extraction module; the input of the cascade layer is respectively connected with the output of the global average pooling layer and the output of the global maximum pooling layer; the output of the cascade layer is connected with the input of the full-connection bottleneck layer, and the output of the full-connection bottleneck layer is respectively connected with the input of the residual error characteristic correction module and the input of the full-connection classification layer; the input of the full-connection classification layer is also connected with the output of the residual characteristic correction module, and the output of the full-connection classification layer is the classification result of the original image;

in the basic model, each domain feature extractor is also connected with a corresponding domain discriminator, and the domain discriminator judges the probability that the sub-region belongs to the source domain according to the space attention value of the sub-region extracted by the corresponding domain feature extractor;

in the training process, if the original image comes from the source domain, a classification result of the original image on the source domain is obtained according to the low-dimensional vector A1; namely, making a low-dimensional vector A1 output by a full-connection bottleneck layer be denoted As As1, and acquiring a classification category Ys of an original image on a source domain by a full-connection classification layer according to the As 1;

if the original image is from the target domain, obtaining a classification result of the original image on the target domain according to the vector A3; namely, let the low-dimensional vector A1 output by the full-connection bottleneck layer be denoted as At1, the residual error feature correction module performs residual error correction on the low-dimensional vector At1 and outputs a correction vector At2; at1 and At2 are added along channel dimension elements and then are marked as At3, and the fully-connected classification layer acquires classification categories Yt of the original image on the target domain according to the At 3.

Specifically, the full-connection classification layer acquires the class probability distribution of the original image, and then outputs the class corresponding to the maximum probability as the classification result of the original image.

s3, judging whether the basic model converges or not; if not, returning to the step S2; and if so, extracting a big data cross-domain image classification model from the converged basic model.

L=L(cls)+α×L(mmd)+β×L(ent)-λ×L(att)

Alpha, beta and lambda are all manually set trade-off factors;

l (mmd) is the domain difference penalty;

L(mmd)=∑ _k=1 ^|Cs| [w(k)×L(mmd,k)]

cs represents a set of source domain categories, ys e Cs, |cs| represents the number of source domain categories, w (k) represents the weight of the kth source domain category, and L (mmd, k) represents the judgment loss of the kth source domain category.

w(k)=∑ _u=1 ^Nt pt(u,k)

pt (u, k) represents the probability that the ith original image xt belongs to the kth source domain category in the target domain training set predicted by the basic model, and Nt represents the number of unlabeled target domain samples xt in the target domain training set.

L(mmd,k)=||ε1-ε2|| ²

ε1=∑ _uk=1 ^N(k,s) [ψ(A(1,k,s,uk))/N(k,s)]

ε2=∑ _uk=1 ^N(k,t) [pt(uk,k)×ψ(A(1,k,t,uk))/N(k,t)]

Wherein ε 1 and ε 2 are transition terms; let X (k, s) represent a set of labeled source domain samples (xs, ys) belonging to a kth source domain class in the source domain training set, N (k, s) represent the number of samples in X (k, s), and a (1, k, s, uk) represent a dimension-reduction vector A1 corresponding to a uk sample xs in X (k, s); let X (k, t) denote a set of unlabeled target domain samples in the target domain training set that are judged by the base model to be the kth source domain class, N (k, t) denote the number of samples in X (k, t), pt (uk, k) denote the probability that the kth sample xt in X (k, t) corresponds to the kth source domain class in the class probability distribution output by the base model, A (1, k, t, uk) denote the dimension-reduction vector A1 that the kth sample xt corresponds to in X (k, t); ||# | representation #, is a binary norm of (2); psi (#) represents the kernel mapping function value of # and specifically, psi can be set as a Gaussian kernel function; that is, in the present embodiment, ψ (a (1, k, s, uk)) represents the gaussian kernel function of a (1, k, s, uk), and ψ (a (1, k, t, uk)) represents the gaussian kernel function of a (1, k, t, uk).

L (ent) represents the target entropy canonical loss:

L(ent)=[1/Nt]×∑ _u=1 ^Nt [H(pt(u,max))]

pt (u, max) represents the probability corresponding to the classification result of the u-th sample xt in the target domain training set, and the probability corresponding to the classification result of the original image is the maximum probability in the classification probability distribution of the original image output by the basic model; h (pt (u, max)) represents the information entropy of pt (u, max), specifically calculated according to shannon's formula; nt represents the number of samples in the target domain training set.

L (att) represents the attention loss:

L(att)=[1/(Q×N0)]×∑ _q=1 ^Q ∑ _u=1 ^N0 Ld[G2(q,u),d(u)]

the union set of the target domain training sample set and the source domain training sample set is made to be a total training sample set, and G2 (q, u) represents the probability that the corresponding sub-region on the original image of the (u) th field of the total training sample set output by the (q) th field of the discriminant belongs to the source domain; n0 is the number of samples in the total training sample set, n0=ns+nt; q is the number of domain discriminators in the domain-level attention extraction module;

ld [ G2 (q, u), d (u) ] represents a binary cross entropy loss function of G2 (q, u) and d (u); d (u) is a binary number, if the u-th original image in the total training sample set belongs to the source domain training sample, d (u) =0; d (u) =1 if the u-th original image in the total training sample set belongs to the target domain training sample.

L (cls) represents source domain classification loss:

L(cls)=[1/Ns]×∑ _us=1 ^Ns Ly(Ys(us),ys(us))

The image classification method based on the big data cross-domain image classification model provided in the embodiment is used for classifying the acquired images under the specified scene.

As shown in fig. 4, the image classification method based on the big data cross-domain image classification model specifically includes the following steps:

SA1, acquiring a labeled source domain image and a label-free target domain image corresponding to a designated application scene; the label-free target domain image of the appointed application scene is an image acquired by appointed equipment in the appointed application scene; the active label source domain image corresponding to the appointed application scene is an image of a known source domain category acquired by known equipment under the appointed application scene; the source domain category and the target domain category partially overlap or completely overlap;

The classification accuracy of the big data cross-domain image classification model on the target domain is verified by combining a specific embodiment.

In this embodiment, the diagnosis of a lung disease based on X-ray images is taken as an example, and specific experiments are adopted to verify and explain the two lung disease X-ray image data sets. In this embodiment, the backbone network adopts a res net-50 network, and let n=3, and image feature data output in the last 3 stages of the res net-50 network is taken as a feature map.

The experimental environment is as follows:

operating system: ubuntu 18.04;

memory: 128GB;

CPU：Intel (R) Xeon (R) ；

GPU：NVIDIA TITAN X（12GB）。

the algorithm code according to the embodiment is implemented by using a Python language and a Pytorch framework.

The present embodiment performs an experiment based on two Kaggle contest public data sets, which are respectively denoted as site1 data set and site2 data set.

The site1 dataset contains 9690X-ray image samples, the image samples being divided into 5 categories; the site2 data set comprises 6071X-ray image samples, and the image samples are divided into 4; the site1 dataset and the site2 dataset have 3 shared category labels. Specific details of the two data sets are shown in table 1.

Table 1 data set information description vs. table

As can be seen from table 1, the three items of "normal", "one-type lung disease" and "two-type lung disease" are shared class labels of the site1 data set and the site2 data set, the three items of "three-type lung disease" and "pulmonary tuberculosis" are class labels specific to the site1 data set, and the "lung shadow" is a class label specific to the site2 data set.

In this embodiment, two sets of experiments were performed, one set using the site1 dataset as the source domain and the site2 dataset as the target domain, denoted as site-site 2; one set is denoted as site2-site 1 with the site2 dataset as the source domain and the site1 dataset as the target domain.

In this embodiment, for each set of experiments, five algorithms are used to obtain the image classification model of the target domain, and then the performance of the image classification model is tested on the target domain, and specifically, accuracy (accuracies), sensitivity (sensitivity), specificity (specificity) and F1 score are used as performance indexes of the image classification model.

The five algorithms are respectively a big data cross-domain image classification model training method (called the method for short), a ResNet-50 network, a partial domain adaptation method ETN, a partial domain adaptation method DRCN and a spatial attention mechanism based method TADA.

The performance index of the model obtained by the five algorithms in the two sets of experiments is shown with reference to fig. 5 (a) to (d), respectively.

From FIGS. 5 (a) - (d), it can be seen that the method of the present invention is superior to the other four comparative methods in four indices. Specifically, taking experiment site1-site2 as an example, the method of the invention is improved by 5.4%, 3.8%, 2.0% and 3.6% on four evaluation indexes respectively compared with the method TADA based on a space attention mechanism, which is the second best. These results show that the method of the invention can fully utilize partial domain adaptation and spatial attention mechanisms to perform large data cross-domain image analysis.

Referring to fig. 6 (a) - (b), the category weight assignment of the method of the present invention and the spatial attention mechanism based method TADA in the experiments site1-site2 are shown, respectively. It can be seen from the figure that TADA cannot mitigate the detrimental semantic negative migration produced by outliers, which ultimately results in suboptimal performance of TADA. However, the method of the present invention assigns a far greater weight to all shared categories than to the feature category, which is less than 0.1. The result proves that the target semantic feature residual correction can automatically reduce the weight of outlier data, and the cross-domain distribution difference is more focused, so that the higher-precision image classification on the target domain is realized through partial domain adaptation.

In the embodiment, through experiments performed on the site1 data set and the site2 data set, feasibility and superiority of the large-data cross-domain image classification method provided by the invention are verified from two aspects of four evaluation indexes and sample weight distribution.

The above embodiments are merely preferred embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A training method for a big data cross-domain image classification model is characterized by comprising the following steps:

the backbone network is used for obtaining feature images of the original image under a plurality of different sizes, and combining all the feature images to obtain a target feature image G (1), G (1) ∈R ^{H(1,min)×W(1,min)×C(1,max)} H (1, min) is the minimum length of the plurality of feature maps, W (1, min) is the plurality of feature mapsC (1, max) is the largest feature dimension in the plurality of feature maps;

the regional level attention extraction module respectively extracts the spatial attention values on the H (1, min) multiplied by W (1, min) sub-regions according to the target feature graph G (1), and outputs a spatial attention vector G (2) formed by combining the spatial attention values on each sub-region, G (2) epsilon R ^{H(1,min)×W(1,min)×C(1,max)} ；

2. The big data cross-domain image classification model training method of claim 1, wherein the loss function L is a weighted summation of a plurality of feature losses; the feature loss includes a loss of attention L (att);

L(att)=[1/(Q×N0)]×∑ _q=1 ^Q ∑ _u=1 ^N0 Ld[G2(q,u),d(u)]

3. The big data cross-domain image classification model training method of claim 2, wherein the feature loss further comprises a domain difference loss L (mmd):

L(mmd)=∑ _k=1 ^|Cs| [w(k)×L(mmd,k)]

w(k)=∑ _u=1 ^Nt pt(u,k)

L(mmd,k)=||ε1-ε2|| ²

ε1=∑ _uk=1 ^N(k,s) [ψ(A(1,k,s,uk))/N(k,s)]

ε2=∑ _uk=1 ^N(k,t) [pt(uk,k)×ψ(A(1,k,t,uk))/N(k,t)]

4. The big data cross-domain image classification model training method of claim 2, wherein the feature loss further comprises a source domain classification loss L (cls):

L(cls)=[1/Ns]×∑ _us=1 ^Ns Ly(Ys(us),ys(us))

5. The big data cross-domain image classification model training method of claim 2, wherein the feature loss further comprises a target entropy regularization loss L (ent):

L(ent)=[1/Nt]×∑ _u=1 ^Nt [H(pt(u,max))]

6. The big data cross-domain image classification model training method according to claim 1, wherein a backbone network adopts a ResNet-50 network, the backbone network is used for obtaining feature images of an original image under a plurality of different sizes, the ResNet-50 network takes image feature data output in the last three stages as feature images, and then the target feature image G (1) is obtained by combining all the feature images.

7. The big data cross-domain image classification method is characterized by comprising the following steps of;

SA2, enabling the labeled source domain image to be used as a labeled source domain sample (xs, ys), and enabling the unlabeled target domain image to be used as an unlabeled target domain sample xt; acquiring a big data cross-domain image classification model by combining the training method of the big data cross-domain image classification model according to any one of claims 1-6;

8. A big data cross-domain image classification system comprising a memory having stored therein a big data cross-domain image classification model and a computer program which when executed is adapted to implement the big data cross-domain image classification method of claim 7.

9. The big data cross-domain image classification system of claim 8, further comprising a processor coupled to the memory, the processor for executing the computer program to implement the big data cross-domain image classification method of claim 7.

10. A big data cross-domain image classification system, characterized in that a computer program is carried, which computer program, when executed, is adapted to implement a training method of the big data cross-domain image classification model according to any of claims 1-6, to obtain a big data cross-domain image classification model for classifying a target domain image.