CN113936143A - Image identification generalization method based on attention mechanism and generation countermeasure network - Google Patents

Image identification generalization method based on attention mechanism and generation countermeasure network Download PDF

Info

Publication number
CN113936143A
CN113936143A CN202111061764.8A CN202111061764A CN113936143A CN 113936143 A CN113936143 A CN 113936143A CN 202111061764 A CN202111061764 A CN 202111061764A CN 113936143 A CN113936143 A CN 113936143A
Authority
CN
China
Prior art keywords
domain
virtual
data
network
image recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111061764.8A
Other languages
Chinese (zh)
Other versions
CN113936143B (en
Inventor
谭志
滕昭飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Civil Engineering and Architecture
Original Assignee
Beijing University of Civil Engineering and Architecture
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Civil Engineering and Architecture filed Critical Beijing University of Civil Engineering and Architecture
Priority to CN202111061764.8A priority Critical patent/CN113936143B/en
Publication of CN113936143A publication Critical patent/CN113936143A/en
Application granted granted Critical
Publication of CN113936143B publication Critical patent/CN113936143B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image identification generalization method based on an attention mechanism and a generation countermeasure network, which designs the detail characteristics of a multi-layer parallel attention mechanism model captured image, provides a divergent data fusion algorithm to improve the performance of a classifier and designs disturbance regularization to realize maximum domain transfer, combines the above parts and the problems to be solved, provides an image identification domain generalization method based on the attention mechanism and the generation countermeasure network, and establishes a network model with image identification generalization capability. The method provided by the invention can achieve the effect of improving the phenomenon of 'domain deviation' of the image identification generalization model, and improve the performance of identifying the unknown data distribution data set image.

Description

Image identification generalization method based on attention mechanism and generation countermeasure network
Technical Field
The invention relates to the technical field of computer vision, in particular to an image identification generalization method based on an attention mechanism and a generation countermeasure network.
Background
Image recognition is a technique that uses a computer to perform a series of acquisition, pre-processing, feature extraction on an image, thereby analyzing and understanding different targets. The method is widely applied to the field of artificial intelligence, such as technologies of flower and plant recognition, face recognition, pedestrian re-recognition and the like. The image recognition is based on the main features of the images, and different classified images in the same data set are recognized, for example, C is an unclosed graph, O is a single-closed graph, B is a double-closed graph, and the like. For different data set images with the same type, the identification features of the data set images are often different, for example, one data set is a black-and-white single-channel image, the other data set is an RGB color image, and the number of channels of the images is different; differences in pixels of the images will also cause differences in the features identified, with differences in the features selected when identifying an image with 128 x 128 pixels versus an image with 192 x 192 pixels. Therefore, in the image recognition process, no matter which field image the computer comes from, the computer needs to extract the features for multiple times, so that more different features can be extracted, and the final recognition and classification function is realized.
With the appearance of more and more deep learning application scenes, the cost of marking data is greatly increased, so that the migration learning receives more and more attention. Domain generalization as a sub-field of migration learning, studies how to generalize a model trained on source domain data into a target domain of unknown data distribution. In research discovery, a single-domain data set is processed, and a training model is required to realize feature extraction and realize a final recognition function. However, when the current model is used to train the data sets with the same category but different features, the recognition performance of the model is obviously reduced. This is because there is a certain difference in image characteristics between different data sets, and the model of the current training model does not have the ability to distinguish new data sets, which is called the "domain shift" problem.
In order to solve the problem of domain migration, Qiao et al propose a method for enhancing a virtual domain based on meta-learning confrontation, and construct a generalization model that can maximize a single domain, thereby improving model portability. And (3) considering a worst-case expression of data distribution near a source domain in a feature space, expanding a virtual enhancement domain by inputting data, inputting the expanded data into the model for training, and finally improving the generalization capability of the model. Firstly, preprocessing data before training a model, judging whether the image pixel size and the channel number of a source domain data set of the training model are the same as those of an image of a target domain data set used for testing, and if the specifications of the images are different, performing unification operation; secondly, sending the source domain data to a Wasserstein self-encoder model, and generating more virtual domains capable of simulating the data distribution of the target domain by using the maximum mean square distance; thirdly, inputting the generated virtual domain data into a Wasserstein self-encoder model to retrain the model, and updating parameters of the model by a gradient descent method; fourthly, calculating the identification loss of the source domain data, inputting the source domain data into an identification model of the neural network, and optimizing the training model by a gradient descent method; fifthly, training the generated virtual domain by using the parameters of the updated image recognition network model, and evaluating the loss of the virtual domain; sixthly, updating and generating parameters of the image recognition model of the current virtual domain by combining the losses of the fourth step and the fifth step; seventhly, circularly executing the second step to the sixth step for 3 times to generate different virtual domains; and eighthly, training the whole process for 10000 times in a circulating manner, and continuously updating the parameters of the model by a gradient descent method to obtain the final model parameters. The above method also has the following disadvantages:
1. the network model constructed in the whole process is not perfect enough, so that the feature extraction of the image is not sufficient.
2. The data enhancement type is too simple to compromise the performance of the classifier.
3. While domain passing can be achieved at the image sample level, semantic consistency is broken.
Disclosure of Invention
The embodiment of the invention provides an image identification generalization method based on an attention mechanism and a generation countermeasure network, which is used for solving the problems in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme.
An image identification generalization method based on an attention mechanism and generation of a countermeasure network comprises the following steps:
s1, establishing an initial merging domain based on the source domain of the feature graph to be processed;
s2, inputting the initial merging domain into the generation countermeasure network for processing through a divergent data fusion algorithm to obtain a first virtual enhancement domain, and performing fusion processing on the data of the first virtual enhancement domain and the data of the source domain to obtain a target merging domain;
s3, constructing a network model with image recognition generalization capability, and inputting the target merging domain into a second virtual enhancement domain which generates a confrontation network and combines a disturbance regularization method to generate a sample level and a feature level which both ensure semantic consistency;
s4, inputting the second virtual enhancement domain into the image recognition network with the multi-layer attention mechanism in the network model with the image recognition generalization capability for training; inputting the target merging domain into a generation countermeasure network in the network model with the image recognition generalization capability for training, calculating to obtain the model loss of the generation countermeasure network in the network model with the image recognition generalization capability, and optimizing the parameters of the network model with the image recognition generalization capability based on the model loss and in combination with a gradient descent method;
s5, repeatedly executing the step S4 to complete the construction of the network model with the image recognition generalization capability;
s6 establishing a total virtual augmented domain based on the second virtual augmented domain data;
s7, combining a source domain and a total virtual enhancement domain of a feature map to be processed, randomly acquiring sample data, inputting the sample data into a network model with image recognition generalization capability for training, and calculating to obtain parameters for generating a model loss and an optimization model of the countermeasure network;
s8 repeatedly executing steps S1-S7, and continuously updating the learned weights of the network model with the image identification generalization capability through a gradient descent method;
s9 repeatedly executes the steps S2 to S8, and the weights learned by the finally trained network model with the image recognition generalization capability, namely the recognized image features, are obtained.
Preferably, step S1 includes:
s11 constructing an initial merging domain S based on the source domain S of the feature graph to be processedcbkThe initial merged field ScbkData X of each lotbskPassing through type
Xbsk={Xs1,…,Xsj} (1)
Obtaining; in the formula, k represents the number and the iteration times of the current virtual enhancement domain, the initial value is 0, bs represents the batch, and j belongs to [1, bs ];
step S2 includes:
s21 merges the initial domain ScbkMerging the source domain S with the feature map to be processed to obtain a first virtual enhanced domain Sadvk
S22 repeating substep S21, adding the first virtual augmentation domain SadvkThe number of (2); the first virtual enhanced domain SadvkData X of each batchbsPassing through type
Xbs={Xs1,…,Xsj,Xadvk1,…,Xadvkm} (2)
Obtaining; wherein m is [1, bs ]],XsjRepresenting a first virtual enhanced domain SadvkSampling j images, X, from the source domain S for each batch of dataadvkmRepresenting a first virtual enhanced domain SadvkEach batch of data is respectively from the current first virtual enhanced domain SadvkSampling m images;
s23 passing formula
CsBs/(k +1) (3) and Cadvk=bs-Cs (4)
Calculating to obtain a target merging domain CadvkThe amount of data per batch in (c).
Preferably, step S3 includes:
passing through type
Lr=minW||x-W(x)||2 (5)
Calculating the data relation between the feature graph to be processed and the first virtual enhancement domain to obtain the sample level loss Lr
Passing through type
Figure BDA0003256643930000041
Performing iterative countermeasure training on the generated countermeasure network to obtain a characteristic level loss Lcon(ii) a Wherein W is a countermeasure generation network, x represents an input image, Z represents an extracted feature, and ZcbkFeatures representing merged fields, ZadvkFeatures representing a virtual augmented domain;
loss of L through sample hierarchyrAnd feature level loss LconOptimally generating a countermeasure network, and obtaining a second virtual enhanced domain S with sample level and feature level both ensuring semantic consistencyadvk
Preferably, the step S4 of inputting the second virtual augmentation domain into the image recognition network with multi-layer attention mechanism in the network model with image recognition generalization capability for training includes:
s31 is implemented by adding a second virtual enhanced domain SadvkPerforming convolution operation of 1 x 1 on the feature map M, reducing 1/2 the channel dimension C of the feature map M, and obtaining dimensions MA∈RC/2*H*W、MB∈RC/2*H*WAnd MC∈RC/2*H*WA characteristic diagram of (1);
s32 pairs of feature maps MAAnd MBPerforming Reshape operation to obtain MA1∈RH*W*C/2And MB1∈RH*W*C/2
S33 passing formula
Figure BDA0003256643930000042
Calculating to obtain a pixel correlation characteristic diagram; where i ∈ {1,2, …, N }, MAB∈RH*W*H*W
S34 mixing MABPerforming Softmax normalization processing and performing Softmax normalization processing on the MC∈RC/2*H*WPerforming matrix multiplication, and performing a pass-through
Figure BDA0003256643930000043
Carrying out normalization operation to obtain a feature map MABC,MABC∈RC/2*H*W
S35 characteristic diagram MABCPerforming 1-1 convolution operation to obtain a characteristic diagram MF1∈RC*H*W
S36, carrying out global pooling operation on the merged domain to obtain a compressed feature map MD∈RC*1*1Sequentially pairing the compressed characteristic graphs M through two full connection layersD∈RC*1*1Performing dimensionality reduction operation and upsampling operation to obtain a feature map MF2∈RC*H*W
S37 passing formula
MF=αFMF1FMF2FM (9)
For the second virtual enhanced domain SadvkCharacteristic map M, characteristic map MF1∈RC*H*WAnd a feature map MF2∈RC*H*WPerforming weighted fusion to obtain a prediction feature map MF
Preferably, the process of generating the countermeasure network, in which the target merged domain is input into the network model with the image recognition generalization capability in step S4, includes:
through the process, a challenge sample is generated
Figure BDA0003256643930000051
Training a generation countermeasure network in the network model with the image recognition generalization capability based on the countermeasure sample combined with a disturbance regularization method, calculating to obtain model loss of the generation countermeasure network in the network model with the image recognition generalization capability, and optimizing parameters of the network model with the image recognition generalization capability based on the model loss and combined with a gradient descent method.
Preferably, the method further comprises, before step S1:
and processing the original picture through the MNIST data set to obtain the feature graph to be processed, wherein the pixels of the feature graph are 32 x 32, and the number of channels is RGB three channels.
According to the technical scheme provided by the embodiment of the invention, the image identification generalization method based on the attention mechanism and the generation countermeasure network provided by the invention designs the detail characteristics of the captured image of the multilayer parallel attention mechanism model, provides a divergent data fusion algorithm for improving the performance of the classifier and designing the disturbance regularization for realizing the maximum domain transfer, and provides an image identification domain generalization method based on the attention mechanism and the generation countermeasure network in combination with the above parts and the problems to be solved, and establishes the network model with the image identification generalization capability. The method provided by the invention can achieve the effect of improving the phenomenon of 'domain deviation' of the image identification generalization model, and improve the performance of identifying the unknown data distribution data set image.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a process flow diagram of an image recognition generalization method based on attention mechanism and generation of a countermeasure network provided by the present invention;
FIG. 2 is a block diagram of an image recognition generalization model based on an attention mechanism and an image recognition generalization method for generating a competing network according to the present invention;
FIG. 3 is a block diagram of a multi-layered parallel attention mechanism based on an attention mechanism and an image recognition generalization method for generating a countermeasure network according to the present invention;
FIG. 4 is a flow chart of a preferred embodiment of an image recognition generalization method based on attention mechanism and generation of a countermeasure network provided by the present invention;
FIG. 5 is a graphical representation of a comparison of performance on a digital identification series data set of a preferred embodiment of an image identification generalization method based on attention mechanism and generation of a competing network according to the present invention;
FIG. 6 is a diagram showing the comparison of the performance on CIFAR-10 series data sets of a preferred embodiment of the image recognition generalization method based on attention mechanism and generation of a countermeasure network provided by the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
The invention provides an image identification generalization method based on an attention mechanism and a generation countermeasure network, which is used for solving the following technical problems:
1. the method improves the network structure, fully captures the detailed characteristics of the image, performs characteristic fusion and establishes the relation between characteristic information.
2. The method for improving the data enhancement is improved, more training is carried out on the data of the source domain, the data divergence is realized, and the performance of the classifier is improved.
3. Meanwhile, semantic consistency is realized on a sample level and a feature level, and maximum domain transfer is realized.
Referring to fig. 1, the method provided by the present invention comprises the following steps:
s1, establishing an initial merging domain based on the source domain of the feature diagram to be processed, inputting the initial merging domain into the generated countermeasure network for processing, and obtaining a first virtual enhancement domain;
s2, fusing the data of the first virtual enhanced domain and the data of the source domain through a divergent data fusion algorithm to obtain a target merged domain;
s3, inputting the target merging domain into a second virtual enhancement domain which generates a countermeasure network and combines a disturbance regularization method to generate a sample level and a feature level, wherein semantic consistency is guaranteed;
s4, constructing a network model with image recognition generalization capability, inputting the second virtual enhancement domain into an image recognition network with a multi-layer attention mechanism in the network model with the image recognition generalization capability for training, and optimizing the classification performance of the network model; inputting the target merging domain into a generation countermeasure network in the network model with the image recognition generalization capability for training, calculating to obtain model loss of the generation countermeasure network in the network model with the image recognition generalization capability, and optimizing parameters of the network model with the image recognition generalization capability based on the model loss and in combination with a gradient descent method;
s5, repeatedly executing the step S4 to complete the construction of the network model with the image recognition generalization capability;
s6 establishing a total virtual augmented domain based on the second virtual augmented domain data;
s7, combining the source domain of the feature diagram to be processed with the total virtual enhancement domain, randomly acquiring sample data, inputting the sample data into the network model with the image recognition generalization capability for training, and calculating to obtain the model loss of the generated countermeasure network and the parameters of the optimization model;
s8 repeatedly executing step S7, and continuously updating the learned weights of the network model with the image recognition generalization capability through a gradient descent method;
s9 repeatedly executing steps S2-S8, and obtaining the weight learned by the finally trained network model with the image recognition generalization ability.
In the preferred embodiment of the present invention, the steps S1 and S2 and the divergent data fusion algorithm specifically include:
s11 constructing a domain S identical to the source domain S of the feature map to be processedcbAs an initial merged field ScbkThus merging the domains S before generating the virtual augmented domaincbSampled data XcbData X of all source domains Ss. Assuming that the source domain data set has several batches, wherein bs represents the batch size, k represents the number of the current virtual enhanced domains and the initial value is 0, ScbData X of each batchbsAs shown in equation 3:
Xbs={Xs1,…,Xsj} (1);
wherein j is ∈ [1, bs ].
S21 converting domain ScbMerging the source domain S with the feature map to be processed to obtain a virtual enhanced domain Sadvk
S21 repeatedly performs substep S11, generating more first virtual objects as the iterationQuasi-enhancement domain SadvkThe data of the source domain and the virtual enhanced domain need to be fused to form a merged domain Scb. So far merge the domain ScbSampling data X of each batchcbGenerating a plurality of sets of first virtual augmentation domains S from the source domain S and the current cycleadvkThus, after iteratively generating a virtual enhancement domain, merging the domains ScbData per batch XbsWill become as shown in equation 4:
Xbs={Xs1,…,Xsj,Xadvk1,…,Xadvkm} (2)。
wherein m is [1, bs ]]。XsjAnd XadvkmRepresents the merged field SadvkSampling j images from the source domain S and sampling j images from the current virtual enhanced domain S for each batch of dataadvkSampling m images; when the virtual enhanced domain is generated in an iterative manner, the source domain data distribution does not spread out of the source domain rapidly, so that a certain limit is required to be input for more source domain data, and as the number of the virtual enhanced domains increases, the data distribution of the enhanced domain can gradually simulate the data distribution of more unknown domains, and the dependence on the source domain data is reduced. Let k be the number of virtual augmented domains to be generated, so the source domain S and the virtual augmented domain S are selectedadvkSelecting and composing target merging domain CadvkExpressions of the number of data in each batch are respectively shown in formula 5 and formula 6:
Csbs/(k +1) (3) and Cadvk=bs-Cs (4)。
The invention designs a multi-layer parallel attention mechanism model for capturing the detail characteristics of an image, provides a divergent data fusion algorithm for improving the performance of a classifier and a disturbance regularization for realizing maximum domain transfer, provides an image identification domain generalization method based on an attention mechanism and generation of a confrontation network by combining the above parts and the problems to be solved, and establishes a network model with image identification generalization capability. The structure of the network model with the image recognition generalization capability is shown in fig. 2. Where a countermeasure network W is generated, x representing the input image and z representing the extracted features. Completing the source domain data X in the model training processsFeature mapping of (2), secondlyA category mapping to the source domain is implemented. Gradually generating virtual enhanced domain S in an iterative modeadvkAnd performing feature mapping on the virtual enhanced domain, judging whether the feature distribution of the generated virtual enhanced domain is outside the source domain distribution through a merged domain training model, continuously updating parameters of the model, and finally achieving the purpose of maximizing domain transfer.
Further, the perturbation regularization in step S3 is to generate a virtual enhancement domain meeting the requirements of the sample level and the feature level. Calculating the data relationship between the input image and the reconstructed image (i.e. between the feature map to be processed and the virtual enhancement domain) by taking the Wasserstein distance as a metric at a sample level, and ensuring the semantic consistency of the images, as shown in formula (5):
Lr=minW||x-W(x)||2 (5)。
using Wasserstein distance as virtual augmented domain feature Z through iterative confrontation trainingadvkAnd merge domain data feature ZcbkThe distribution distance ensures the consistency of the feature hierarchy, and the specific form is shown as formula (6):
Figure BDA0003256643930000091
a sample level loss L is obtainedrAnd feature level loss LconThe optimization of the paired pit network can be realized by controlling or reducing the value of the paired pit network by a conventional method, and the iteration is continued to finally obtain a second virtual enhanced domain S with the sample level and the feature level both ensuring the semantic consistencyadvk
Further, as shown in fig. 3, the multi-layer parallel attention mechanism model in step S4 is composed of a plurality of branches, and the input of the model is a feature map output through convolutional layer convolution operation. And meanwhile, carrying out multi-branch parallel processing on the input feature map, and finally fusing the feature vectors of different branches to obtain the output prediction feature map. Assuming a second virtual augmented domain S of inputadvkIs M ∈ RC/2*H*WWherein C, H and W represent the number of channels, height and width of the input feature map M, respectivelyAnd (4) degree.
In three branches A (M), B (M) and C (M), the convolution operation of 1 x 1 is firstly carried out on M, and the channel dimension C of the branches is reduced to 1/2, so that the dimension of the completely new characteristic diagram is MA∈RC/2*H*W,MB∈RC/2*H*WAnd MC∈RC/2*H*W. Second pair of feature map MAAnd MBCarrying out Reshape operation to obtain MA1∈RH*W*C/2And MB1∈RH*W*C/2Then to MA1Performing transposition processing and mixing with MB1Matrix multiplication is carried out to obtain the final pixel correlation characteristic diagram MABWherein M isABIs represented by formula 7:
Figure BDA0003256643930000092
where i ∈ {1,2, …, N }, MAB∈RH*W*H*W. Will MABPerforming Softmax normalization processing and MC∈RC/2*H*WMatrix multiplication is carried out, and the matrix is normalized as a result, and finally the characteristic diagram M is obtainedABC∈RC/2*H*W,MABCThe specific calculation of (a) is shown in equation 8:
Figure BDA0003256643930000101
for feature map MABCPerforming 1 × 1 convolution operation to recover to the initial channel number C, thus obtaining output characteristic graphs M of three branches A (M), B (M) and C (M)F1∈RC*H*W. In the branch D (M), firstly, the global pooling operation is carried out on the input feature map M, and the feature map M is compressed into MD∈RC*1*1And reducing the channel number C to C/16 through the dimension reduction operation of the full connection layer, and performing nonlinear processing by using a Relu activation function. In order to ensure that the size of the output characteristic diagram is equal to that of the input characteristic diagram, the output characteristic diagram is up-sampled by a full connection layer, and the channel number of the output characteristic diagram is restored to the original size C because of the fact thatThe dimension of the final feature map of the branch is MF2∈RC*H*W
Through the above operations, the feature maps of the branches d (m) are weighted and fused, so the form of the predicted feature map output by the multi-layer parallel attention mechanism model is shown as formula 9:
MF=αFMF1FMF2FM (9)
wherein alpha isF、βFAnd gammaFThe weight is gradually updated in the model continuous learning process for the weight coefficient of the feature, so that the multilayer parallel attention mechanism model is associated with more feature information by establishing the dependency relationship of the weight, and the structure diagram of the model is shown in fig. 3.
The invention also provides an embodiment for showing the effect of using the method of the invention.
As shown in fig. 4, the execution process of the present embodiment includes:
firstly, unifying the pixel and channel number of the data set used by the training model and the testing model, selecting the source domain training set as the MNIST data set, and changing the pixel of the image in the data set into 32 x 32, wherein the channel number is RGB three channels.
The second step, initializing, the number k of the virtual enhanced domains is 0, and taking the source domain S as the initial merging domain ScbkThe data composition is shown in equation (4).
Thirdly, when the first virtual enhanced domain S is generatedadvkThen (or when K is larger than 0) generating a first virtual confrontation field SadvkData X ofadvkAnd data X of the source domain SsPerforming divergent data fusion processing, and calculating merged domain S of each batch by formulas (6) and (7)cbkSimultaneously combining into a target merged domain Cadvk
Fourthly, merging the target into a domain CadvkData feeding is carried out to generate a countermeasure network and a virtual enhancement domain S which guarantees semantic consistency at a sample level and a feature level is generated by combining disturbance regularizationadvk
The fifth step, use the virtual enhanced domain SadvkData X ofadvkInputting the data into an image recognition network containing a multi-layer attention mechanism, training the recognition network, and optimizing the classification performance. Merging targets into domain C at the same timeadvkData X ofcbkGenerating countermeasure samples in conjunction with perturbation regularization input into a generative countermeasure network
Figure BDA0003256643930000102
And carrying out countermeasure training, calculating the loss of the generated countermeasure network model and optimizing parameters by a gradient descent method.
And sixthly, circulating 25 times of the fifth step. In order to generate more enhanced data with different distributions, the image recognition capability of the classifier is continuously optimized, and a model capable of expanding the data distribution and simulating more unknown domains can be further obtained.
Seventhly, the currently generated virtual enhanced domain S is usedadvkData X ofadvkAdding to the Total virtual augmentation Domain SadvIn (1).
Eighthly, the source domain S and the total virtual enhancement domain SadvMerging, sampling data from the merged image, training the image recognition network (by adopting the same process as the fifth step), continuously calculating loss, and optimizing the parameters of the model.
Step nine, circulating the step eight, and increasing k by 1
Step ten, 3 times of third to ninth steps are circulated. 3 is the threshold number of data sets
And step ten, circulating the step ten 10000 times.
And step ten, finishing all the circulation and outputting the weight learned by the finally trained recognition model.
According to the method, firstly, a network structure is improved, a multi-level parallel attention mechanism is integrated, the problem that the model cannot fully extract image features in the training process is avoided, key features can be continuously extracted in the feature extraction process, and more information connections are established. Secondly, in order to improve the performance of the classifier and improve the data enhancement method, in the process of realizing divergent data fusion, along with the increase of generated virtual enhancement domains, the proportion of data in each batch of combination domains can be changed continuously, and the divergence of the data is realized. Finally, the semantic consistency of the sample level and the feature level of the data is simultaneously met, and the maximum domain transfer is realized.
The invention obviously improves the portability of the image recognition model, and the experimental results are shown in fig. 5 and 6.
In FIG. 5, MNIST data set is selected as the source domain training set, and four sets of digital data sets, SYHN, MNIST-M, SYN and USPS, are selected as the test set for testing. It is shown in FIG. 4 that the test results of the present invention on SYHN, MNIST-M and SYN datasets are significantly higher than other methods, but the test results on USPS datasets are lower than those of the d-SNE method, because the d-SNE only performs a great deal of improvement on the identification accuracy of the USPS datasets, the d-SNE performs poorly on the transplantation performance of the other three sets of datasets, and the average results of the four sets of test sets are taken, which is significantly higher than that of the other methods.
In fig. 6, the average value of the recognition accuracy of the model in the four data sets of SVHN, MNIST-M, SYN and USPS is used as the evaluation index of the model transplantation performance after updating, and it is obvious that the performance of the invention is greatly improved compared with the average accuracy of the image recognition of the previous model.
In summary, the image identification generalization method based on the attention mechanism and the generation countermeasure network provided by the invention designs the detail characteristics of the captured image of the multilayer parallel attention mechanism model, proposes the divergent data fusion algorithm to improve the performance of the classifier and designs the disturbance regularization to realize the maximum domain transfer, and combines the above parts and the problems to be solved, proposes the image identification domain generalization method based on the attention mechanism and the generation countermeasure network, and establishes the network model with the image identification generalization capability. The method provided by the invention can achieve the effect of improving the phenomenon of 'domain deviation' of the image identification generalization model, and improve the performance of identifying the unknown data distribution data set image.
Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. An image identification generalization method based on an attention mechanism and generation of a countermeasure network is characterized by comprising the following steps:
s1, establishing an initial merging domain based on the source domain of the feature graph to be processed;
s2, inputting the initial merging domain into the generation countermeasure network for processing through a divergent data fusion algorithm to obtain a first virtual enhancement domain, and performing fusion processing on the data of the first virtual enhancement domain and the data of the source domain to obtain a target merging domain;
s3, constructing a network model with image recognition generalization capability, and inputting the target merging domain into a second virtual enhancement domain which generates a confrontation network and combines a disturbance regularization method to generate a sample level and a feature level which both ensure semantic consistency;
s4, inputting the second virtual enhancement domain into the image recognition network with a multi-layer attention mechanism in the network model with the image recognition generalization capability for training; inputting the target merging domain into a generation countermeasure network in the network model with the image recognition generalization capability for training, calculating to obtain model loss of the generation countermeasure network in the network model with the image recognition generalization capability, and optimizing parameters of the network model with the image recognition generalization capability based on the model loss and in combination with a gradient descent method;
s5, repeatedly executing the step S4 to complete the construction of the network model with the image recognition generalization capability;
s6 establishing a total virtual augmented domain based on the second virtual augmented domain data;
s7, combining the source domain of the feature diagram to be processed with the total virtual enhancement domain, randomly acquiring sample data, inputting the sample data into the network model with the image recognition generalization capability for training, and calculating to obtain the model loss of the generated countermeasure network and the parameters of the optimization model;
s8 repeatedly executing steps S1-S7, and continuously updating the learned weights of the network model with the image recognition generalization capability through a gradient descent method;
s9 repeatedly executing steps S2-S8, and obtaining the weight learned by the finally trained network model with the image recognition generalization ability.
2. The method according to claim 1, wherein step S1 includes:
s11 constructing an initial merging domain S based on the source domain S of the feature graph to be processedcbkThe initial merged field ScbkData X of each lotbskPassing through type
Xbsk={Xs1,…,Xsj} (1)
Obtaining; in the formula, k represents the number and the iteration times of the current virtual enhancement domain, the initial value is 0, bs represents the batch, and j belongs to [1, bs ];
step S2 includes:
s21 merges the initial domain ScbkMerging the obtained data with the source domain S of the feature map to be processed to obtain the first virtual enhanced domain Sadvk
S22 repeating substep S21, adding the first virtual augmentation domain SadvkThe number of (2); the first virtual enhanced domain SadvkData X of each batchbsPassing through type
Xbs={Xs1,…,Xsj,Xadvk1,…,Xadvkm} (2)
Obtaining; wherein m is [1, bs ]],XsjRepresenting a first virtual enhanced domain SadvkSampling j images, X, from the source domain S for each batch of dataadvkmRepresenting a first virtual enhanced domain SadvkEach batch of data is respectively from the current first virtual enhanced domain SadvkSampling m images;
s23 passing formula
CsBs/(k +1) (3) and Cadvk=bs-Cs (4)
Calculating to obtain a target merging domain CadvkThe amount of data per batch in (c).
3. The method according to claim 2, wherein step S3 includes:
passing through type
Lr=minW||x-W(x)||2 (5)
Calculating a feature map and a first virtual to be processedEnhancing data relation between domains to obtain sample level loss Lr
Passing through type
Figure FDA0003256643920000021
Performing iterative countermeasure training on the generated countermeasure network to obtain a characteristic level loss Lcon(ii) a Wherein W is a countermeasure generation network, x represents an input image, Z represents an extracted feature, and ZcbkFeatures representing merged fields, ZadvkFeatures representing a virtual augmented domain;
loss of L through sample hierarchyrAnd feature level loss LconOptimally generating a countermeasure network, and obtaining a second virtual enhanced domain S with the sample level and the feature level both ensuring semantic consistencyadvk
4. The method according to claim 3, wherein the step S4 of inputting the second virtual augmentation domain into the image recognition network with multi-layer attention mechanism in the network model with image recognition generalization capability comprises:
s31 is implemented by adding S to the second virtual enhanced domainadvkCarrying out convolution operation of 1 x 1 on the feature map M, reducing 1/2 the channel dimension C of the feature map M, and obtaining dimensions MA∈RC/2*H*W、MB∈RC/2*H*WAnd MC∈RC/2*H*WA characteristic diagram of (1);
s32 pairs of feature maps MAAnd MBPerforming Reshape operation to obtain MA1∈RH*W*C/2And MB1∈RH*W*C/2
S33 passing formula
Figure FDA0003256643920000031
Calculating to obtain a pixel correlation characteristic diagram; where i ∈ {1,2, …, N }, MAB∈RH*W*H*W
S34 mixing MABPerforming Softmax normalization processing and performing Softmax normalization processing on the MC∈RC/2*H*WPerforming matrix multiplication, and performing a pass-through
Figure FDA0003256643920000032
Carrying out normalization operation to obtain a feature map MABC,MABC∈RC/2*H*W
S35 characteristic diagram MABCPerforming 1-1 convolution operation to obtain a characteristic diagram MF1∈RC*H*W
S36, carrying out global pooling operation on the merged domain to obtain a compressed feature map MD∈RC*1*1Sequentially pairing the compressed characteristic graphs M through two full connection layersD∈RC*1*1Performing dimensionality reduction operation and upsampling operation to obtain a feature map MF2∈RC*H*W
S37 passing formula
MF=αFMF1FMF2FM (9)
For the second virtual enhanced domain SadvkCharacteristic map M ofF1∈RC*H*WAnd a feature map MF2∈RC*H*WPerforming weighted fusion to obtain a prediction feature map MF
5. The method according to claim 3, wherein the step of generating the countermeasure network by inputting the target merging domain into the network model with the image recognition generalization capability in step S4 comprises:
through the process, a challenge sample is generated
Figure FDA0003256643920000033
Training a generation countermeasure network in the network model with the image recognition generalization capability based on the countermeasure sample combined with a disturbance regularization method, calculating to obtain model loss of the generation countermeasure network in the network model with the image recognition generalization capability, and optimizing parameters of the network model with the image recognition generalization capability based on the model loss and combined with a gradient descent method.
6. The method according to any one of claims 1 to 5, further comprising, prior to step S1:
and processing the original picture through the MNIST data set to obtain the feature graph to be processed, wherein the pixels of the feature graph are 32 x 32, and the number of channels is RGB three channels.
CN202111061764.8A 2021-09-10 2021-09-10 Image identification generalization method based on attention mechanism and generation countermeasure network Active CN113936143B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111061764.8A CN113936143B (en) 2021-09-10 2021-09-10 Image identification generalization method based on attention mechanism and generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111061764.8A CN113936143B (en) 2021-09-10 2021-09-10 Image identification generalization method based on attention mechanism and generation countermeasure network

Publications (2)

Publication Number Publication Date
CN113936143A true CN113936143A (en) 2022-01-14
CN113936143B CN113936143B (en) 2022-07-01

Family

ID=79275382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111061764.8A Active CN113936143B (en) 2021-09-10 2021-09-10 Image identification generalization method based on attention mechanism and generation countermeasure network

Country Status (1)

Country Link
CN (1) CN113936143B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511737A (en) * 2022-01-24 2022-05-17 北京建筑大学 Training method of image recognition domain generalization model
CN116883681A (en) * 2023-08-09 2023-10-13 北京航空航天大学 Domain generalization target detection method based on countermeasure generation network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728729A (en) * 2019-09-29 2020-01-24 天津大学 Unsupervised CT projection domain data recovery method based on attention mechanism
CN111126386A (en) * 2019-12-20 2020-05-08 复旦大学 Sequence field adaptation method based on counterstudy in scene text recognition
CN111340819A (en) * 2020-02-10 2020-06-26 腾讯科技(深圳)有限公司 Image segmentation method, device and storage medium
CN111461239A (en) * 2020-04-03 2020-07-28 成都考拉悠然科技有限公司 White box attack method of CTC scene character recognition model
CN112150442A (en) * 2020-09-25 2020-12-29 帝工(杭州)科技产业有限公司 New crown diagnosis system based on deep convolutional neural network and multi-instance learning
CN112347850A (en) * 2020-09-30 2021-02-09 新大陆数字技术股份有限公司 Infrared image conversion method, living body detection method, device and readable storage medium
CN112766079A (en) * 2020-12-31 2021-05-07 北京航空航天大学 Unsupervised image-to-image translation method based on content style separation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728729A (en) * 2019-09-29 2020-01-24 天津大学 Unsupervised CT projection domain data recovery method based on attention mechanism
CN111126386A (en) * 2019-12-20 2020-05-08 复旦大学 Sequence field adaptation method based on counterstudy in scene text recognition
CN111340819A (en) * 2020-02-10 2020-06-26 腾讯科技(深圳)有限公司 Image segmentation method, device and storage medium
CN111461239A (en) * 2020-04-03 2020-07-28 成都考拉悠然科技有限公司 White box attack method of CTC scene character recognition model
CN112150442A (en) * 2020-09-25 2020-12-29 帝工(杭州)科技产业有限公司 New crown diagnosis system based on deep convolutional neural network and multi-instance learning
CN112347850A (en) * 2020-09-30 2021-02-09 新大陆数字技术股份有限公司 Infrared image conversion method, living body detection method, device and readable storage medium
CN112766079A (en) * 2020-12-31 2021-05-07 北京航空航天大学 Unsupervised image-to-image translation method based on content style separation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHIHONG CHEN ET AL.: "Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domian adaptation", 《NEURAL COMPUTING AND APPLICATIONS》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511737A (en) * 2022-01-24 2022-05-17 北京建筑大学 Training method of image recognition domain generalization model
CN116883681A (en) * 2023-08-09 2023-10-13 北京航空航天大学 Domain generalization target detection method based on countermeasure generation network
CN116883681B (en) * 2023-08-09 2024-01-30 北京航空航天大学 Domain generalization target detection method based on countermeasure generation network

Also Published As

Publication number Publication date
CN113936143B (en) 2022-07-01

Similar Documents

Publication Publication Date Title
CN107945118B (en) Face image restoration method based on generating type confrontation network
CN109711426B (en) Pathological image classification device and method based on GAN and transfer learning
CN111242841B (en) Image background style migration method based on semantic segmentation and deep learning
CN111798369B (en) Face aging image synthesis method for generating confrontation network based on circulation condition
WO2022252272A1 (en) Transfer learning-based method for improved vgg16 network pig identity recognition
CN108230278B (en) Image raindrop removing method based on generation countermeasure network
CN113936143B (en) Image identification generalization method based on attention mechanism and generation countermeasure network
Huang et al. Deep hyperspectral image fusion network with iterative spatio-spectral regularization
CN110570363A (en) Image defogging method based on Cycle-GAN with pyramid pooling and multi-scale discriminator
CN112396607A (en) Streetscape image semantic segmentation method for deformable convolution fusion enhancement
CN112183637A (en) Single-light-source scene illumination re-rendering method and system based on neural network
CN110717953A (en) Black-white picture coloring method and system based on CNN-LSTM combined model
Ji et al. ColorFormer: Image colorization via color memory assisted hybrid-attention transformer
CN111626926A (en) Intelligent texture image synthesis method based on GAN
Goel et al. Gray level enhancement to emphasize less dynamic region within image using genetic algorithm
Xiong et al. Joint intensity–gradient guided generative modeling for colorization
Althbaity et al. Colorization Of Grayscale Images Using Deep Learning
CN117011515A (en) Interactive image segmentation model based on attention mechanism and segmentation method thereof
CN109583406B (en) Facial expression recognition method based on feature attention mechanism
Xu et al. Attention‐based multi‐channel feature fusion enhancement network to process low‐light images
Wang et al. Face super-resolution via hierarchical multi-scale residual fusion network
CN115620342A (en) Cross-modal pedestrian re-identification method, system and computer
CN114219960A (en) Space target ISAR image classification method under small sample condition of XGboost based on multi-learner optimization
Xu et al. AS 3 ITransUNet: Spatial-Spectral Interactive Transformer U-Net with Alternating Sampling for Hyperspectral Image Super-Resolution
Nasrin et al. PColorNet: investigating the impact of different color spaces for pathological image classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant