CN113936143A

CN113936143A - Image identification generalization method based on attention mechanism and generation countermeasure network

Info

Publication number: CN113936143A
Application number: CN202111061764.8A
Authority: CN
Inventors: 谭志; 滕昭飞
Original assignee: Beijing University of Civil Engineering and Architecture
Current assignee: Beijing University of Civil Engineering and Architecture
Priority date: 2021-09-10
Filing date: 2021-09-10
Publication date: 2022-01-14
Anticipated expiration: 2041-09-10
Also published as: CN113936143B

Abstract

The invention provides an image identification generalization method based on an attention mechanism and a generation countermeasure network, which designs the detail characteristics of a multi-layer parallel attention mechanism model captured image, provides a divergent data fusion algorithm to improve the performance of a classifier and designs disturbance regularization to realize maximum domain transfer, combines the above parts and the problems to be solved, provides an image identification domain generalization method based on the attention mechanism and the generation countermeasure network, and establishes a network model with image identification generalization capability. The method provided by the invention can achieve the effect of improving the phenomenon of 'domain deviation' of the image identification generalization model, and improve the performance of identifying the unknown data distribution data set image.

Description

Image identification generalization method based on attention mechanism and generation countermeasure network

Technical Field

The invention relates to the technical field of computer vision, in particular to an image identification generalization method based on an attention mechanism and a generation countermeasure network.

Background

Image recognition is a technique that uses a computer to perform a series of acquisition, pre-processing, feature extraction on an image, thereby analyzing and understanding different targets. The method is widely applied to the field of artificial intelligence, such as technologies of flower and plant recognition, face recognition, pedestrian re-recognition and the like. The image recognition is based on the main features of the images, and different classified images in the same data set are recognized, for example, C is an unclosed graph, O is a single-closed graph, B is a double-closed graph, and the like. For different data set images with the same type, the identification features of the data set images are often different, for example, one data set is a black-and-white single-channel image, the other data set is an RGB color image, and the number of channels of the images is different; differences in pixels of the images will also cause differences in the features identified, with differences in the features selected when identifying an image with 128 x 128 pixels versus an image with 192 x 192 pixels. Therefore, in the image recognition process, no matter which field image the computer comes from, the computer needs to extract the features for multiple times, so that more different features can be extracted, and the final recognition and classification function is realized.

With the appearance of more and more deep learning application scenes, the cost of marking data is greatly increased, so that the migration learning receives more and more attention. Domain generalization as a sub-field of migration learning, studies how to generalize a model trained on source domain data into a target domain of unknown data distribution. In research discovery, a single-domain data set is processed, and a training model is required to realize feature extraction and realize a final recognition function. However, when the current model is used to train the data sets with the same category but different features, the recognition performance of the model is obviously reduced. This is because there is a certain difference in image characteristics between different data sets, and the model of the current training model does not have the ability to distinguish new data sets, which is called the "domain shift" problem.

In order to solve the problem of domain migration, Qiao et al propose a method for enhancing a virtual domain based on meta-learning confrontation, and construct a generalization model that can maximize a single domain, thereby improving model portability. And (3) considering a worst-case expression of data distribution near a source domain in a feature space, expanding a virtual enhancement domain by inputting data, inputting the expanded data into the model for training, and finally improving the generalization capability of the model. Firstly, preprocessing data before training a model, judging whether the image pixel size and the channel number of a source domain data set of the training model are the same as those of an image of a target domain data set used for testing, and if the specifications of the images are different, performing unification operation; secondly, sending the source domain data to a Wasserstein self-encoder model, and generating more virtual domains capable of simulating the data distribution of the target domain by using the maximum mean square distance; thirdly, inputting the generated virtual domain data into a Wasserstein self-encoder model to retrain the model, and updating parameters of the model by a gradient descent method; fourthly, calculating the identification loss of the source domain data, inputting the source domain data into an identification model of the neural network, and optimizing the training model by a gradient descent method; fifthly, training the generated virtual domain by using the parameters of the updated image recognition network model, and evaluating the loss of the virtual domain; sixthly, updating and generating parameters of the image recognition model of the current virtual domain by combining the losses of the fourth step and the fifth step; seventhly, circularly executing the second step to the sixth step for 3 times to generate different virtual domains; and eighthly, training the whole process for 10000 times in a circulating manner, and continuously updating the parameters of the model by a gradient descent method to obtain the final model parameters. The above method also has the following disadvantages:

1. the network model constructed in the whole process is not perfect enough, so that the feature extraction of the image is not sufficient.

2. The data enhancement type is too simple to compromise the performance of the classifier.

3. While domain passing can be achieved at the image sample level, semantic consistency is broken.

Disclosure of Invention

The embodiment of the invention provides an image identification generalization method based on an attention mechanism and a generation countermeasure network, which is used for solving the problems in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme.

An image identification generalization method based on an attention mechanism and generation of a countermeasure network comprises the following steps:

s1, establishing an initial merging domain based on the source domain of the feature graph to be processed;

s2, inputting the initial merging domain into the generation countermeasure network for processing through a divergent data fusion algorithm to obtain a first virtual enhancement domain, and performing fusion processing on the data of the first virtual enhancement domain and the data of the source domain to obtain a target merging domain;

s3, constructing a network model with image recognition generalization capability, and inputting the target merging domain into a second virtual enhancement domain which generates a confrontation network and combines a disturbance regularization method to generate a sample level and a feature level which both ensure semantic consistency;

s4, inputting the second virtual enhancement domain into the image recognition network with the multi-layer attention mechanism in the network model with the image recognition generalization capability for training; inputting the target merging domain into a generation countermeasure network in the network model with the image recognition generalization capability for training, calculating to obtain the model loss of the generation countermeasure network in the network model with the image recognition generalization capability, and optimizing the parameters of the network model with the image recognition generalization capability based on the model loss and in combination with a gradient descent method;

s5, repeatedly executing the step S4 to complete the construction of the network model with the image recognition generalization capability;

s6 establishing a total virtual augmented domain based on the second virtual augmented domain data;

s7, combining a source domain and a total virtual enhancement domain of a feature map to be processed, randomly acquiring sample data, inputting the sample data into a network model with image recognition generalization capability for training, and calculating to obtain parameters for generating a model loss and an optimization model of the countermeasure network;

s8 repeatedly executing steps S1-S7, and continuously updating the learned weights of the network model with the image identification generalization capability through a gradient descent method;

s9 repeatedly executes the steps S2 to S8, and the weights learned by the finally trained network model with the image recognition generalization capability, namely the recognized image features, are obtained.

Preferably, step S1 includes:

s11 constructing an initial merging domain S based on the source domain S of the feature graph to be processed_cbkThe initial merged field S_cbkData X of each lot_bskPassing through type

X_bsk＝{X_s1,…,X_sj} (1)

Obtaining; in the formula, k represents the number and the iteration times of the current virtual enhancement domain, the initial value is 0, bs represents the batch, and j belongs to [1, bs ];

step S2 includes:

s21 merges the initial domain S_cbkMerging the source domain S with the feature map to be processed to obtain a first virtual enhanced domain S_advk；

S22 repeating substep S21, adding the first virtual augmentation domain S_advkThe number of (2); the first virtual enhanced domain S_advkData X of each batch_bsPassing through type

X_bs＝{X_s1,…,X_sj,X_advk1,…,X_advkm} (2)

Obtaining; wherein m is [1, bs ]]，X_sjRepresenting a first virtual enhanced domain S_advkSampling j images, X, from the source domain S for each batch of data_advkmRepresenting a first virtual enhanced domain S_advkEach batch of data is respectively from the current first virtual enhanced domain S_advkSampling m images;

s23 passing formula

C_sBs/(k +1) (3) and C_advk＝bs-C_s (4)

Calculating to obtain a target merging domain C_advkThe amount of data per batch in (c).

Preferably, step S3 includes:

passing through type

L_r＝min_W||x-W(x)||² (5)

Calculating the data relation between the feature graph to be processed and the first virtual enhancement domain to obtain the sample level loss L_r；

Passing through type

Performing iterative countermeasure training on the generated countermeasure network to obtain a characteristic level loss L_con(ii) a Wherein W is a countermeasure generation network, x represents an input image, Z represents an extracted feature, and Z_cbkFeatures representing merged fields, Z_advkFeatures representing a virtual augmented domain;

loss of L through sample hierarchy_rAnd feature level loss L_conOptimally generating a countermeasure network, and obtaining a second virtual enhanced domain S with sample level and feature level both ensuring semantic consistency_advk。

Preferably, the step S4 of inputting the second virtual augmentation domain into the image recognition network with multi-layer attention mechanism in the network model with image recognition generalization capability for training includes:

s31 is implemented by adding a second virtual enhanced domain S_advkPerforming convolution operation of 1 x 1 on the feature map M, reducing 1/2 the channel dimension C of the feature map M, and obtaining dimensions M_A∈R^C/2*H*W、M_B∈R^C/2*H*WAnd M_C∈R^C/2*H*WA characteristic diagram of (1);

s32 pairs of feature maps M_AAnd M_BPerforming Reshape operation to obtain M_A1∈R^H*W*C/2And M_B1∈R^H*W*C/2；

S33 passing formula

Calculating to obtain a pixel correlation characteristic diagram; where i ∈ {1,2, …, N }, M_AB∈R^H*W*H*W；

S34 mixing M_ABPerforming Softmax normalization processing and performing Softmax normalization processing on the M_C∈R^C/2*H*WPerforming matrix multiplication, and performing a pass-through

Carrying out normalization operation to obtain a feature map M_ABC，M_ABC∈R^C/2*H*W；

S35 characteristic diagram M_ABCPerforming 1-1 convolution operation to obtain a characteristic diagram M_F1∈R^C*H*W；

S36, carrying out global pooling operation on the merged domain to obtain a compressed feature map M_D∈R^C*1*1Sequentially pairing the compressed characteristic graphs M through two full connection layers_D∈R^C*1*1Performing dimensionality reduction operation and upsampling operation to obtain a feature map M_F2∈R^C*H*W；

S37 passing formula

M_F＝α_FM_F1+β_FM_F2+γ_FM (9)

For the second virtual enhanced domain S_advkCharacteristic map M, characteristic map M_F1∈R^C*H*WAnd a feature map M_F2∈R^C*H*WPerforming weighted fusion to obtain a prediction feature map M_F。

Preferably, the process of generating the countermeasure network, in which the target merged domain is input into the network model with the image recognition generalization capability in step S4, includes:

through the process, a challenge sample is generated

Training a generation countermeasure network in the network model with the image recognition generalization capability based on the countermeasure sample combined with a disturbance regularization method, calculating to obtain model loss of the generation countermeasure network in the network model with the image recognition generalization capability, and optimizing parameters of the network model with the image recognition generalization capability based on the model loss and combined with a gradient descent method.

Preferably, the method further comprises, before step S1:

and processing the original picture through the MNIST data set to obtain the feature graph to be processed, wherein the pixels of the feature graph are 32 x 32, and the number of channels is RGB three channels.

According to the technical scheme provided by the embodiment of the invention, the image identification generalization method based on the attention mechanism and the generation countermeasure network provided by the invention designs the detail characteristics of the captured image of the multilayer parallel attention mechanism model, provides a divergent data fusion algorithm for improving the performance of the classifier and designing the disturbance regularization for realizing the maximum domain transfer, and provides an image identification domain generalization method based on the attention mechanism and the generation countermeasure network in combination with the above parts and the problems to be solved, and establishes the network model with the image identification generalization capability. The method provided by the invention can achieve the effect of improving the phenomenon of 'domain deviation' of the image identification generalization model, and improve the performance of identifying the unknown data distribution data set image.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a process flow diagram of an image recognition generalization method based on attention mechanism and generation of a countermeasure network provided by the present invention;

FIG. 2 is a block diagram of an image recognition generalization model based on an attention mechanism and an image recognition generalization method for generating a competing network according to the present invention;

FIG. 3 is a block diagram of a multi-layered parallel attention mechanism based on an attention mechanism and an image recognition generalization method for generating a countermeasure network according to the present invention;

FIG. 4 is a flow chart of a preferred embodiment of an image recognition generalization method based on attention mechanism and generation of a countermeasure network provided by the present invention;

FIG. 5 is a graphical representation of a comparison of performance on a digital identification series data set of a preferred embodiment of an image identification generalization method based on attention mechanism and generation of a competing network according to the present invention;

FIG. 6 is a diagram showing the comparison of the performance on CIFAR-10 series data sets of a preferred embodiment of the image recognition generalization method based on attention mechanism and generation of a countermeasure network provided by the invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

The invention provides an image identification generalization method based on an attention mechanism and a generation countermeasure network, which is used for solving the following technical problems:

1. the method improves the network structure, fully captures the detailed characteristics of the image, performs characteristic fusion and establishes the relation between characteristic information.

2. The method for improving the data enhancement is improved, more training is carried out on the data of the source domain, the data divergence is realized, and the performance of the classifier is improved.

3. Meanwhile, semantic consistency is realized on a sample level and a feature level, and maximum domain transfer is realized.

Referring to fig. 1, the method provided by the present invention comprises the following steps:

s1, establishing an initial merging domain based on the source domain of the feature diagram to be processed, inputting the initial merging domain into the generated countermeasure network for processing, and obtaining a first virtual enhancement domain;

s2, fusing the data of the first virtual enhanced domain and the data of the source domain through a divergent data fusion algorithm to obtain a target merged domain;

s3, inputting the target merging domain into a second virtual enhancement domain which generates a countermeasure network and combines a disturbance regularization method to generate a sample level and a feature level, wherein semantic consistency is guaranteed;

s4, constructing a network model with image recognition generalization capability, inputting the second virtual enhancement domain into an image recognition network with a multi-layer attention mechanism in the network model with the image recognition generalization capability for training, and optimizing the classification performance of the network model; inputting the target merging domain into a generation countermeasure network in the network model with the image recognition generalization capability for training, calculating to obtain model loss of the generation countermeasure network in the network model with the image recognition generalization capability, and optimizing parameters of the network model with the image recognition generalization capability based on the model loss and in combination with a gradient descent method;

s7, combining the source domain of the feature diagram to be processed with the total virtual enhancement domain, randomly acquiring sample data, inputting the sample data into the network model with the image recognition generalization capability for training, and calculating to obtain the model loss of the generated countermeasure network and the parameters of the optimization model;

s8 repeatedly executing step S7, and continuously updating the learned weights of the network model with the image recognition generalization capability through a gradient descent method;

s9 repeatedly executing steps S2-S8, and obtaining the weight learned by the finally trained network model with the image recognition generalization ability.

In the preferred embodiment of the present invention, the steps S1 and S2 and the divergent data fusion algorithm specifically include:

s11 constructing a domain S identical to the source domain S of the feature map to be processed_cbAs an initial merged field S_cbkThus merging the domains S before generating the virtual augmented domain_cbSampled data X_cbData X of all source domains S_s. Assuming that the source domain data set has several batches, wherein bs represents the batch size, k represents the number of the current virtual enhanced domains and the initial value is 0, S_cbData X of each batch_bsAs shown in equation 3:

X_bs＝{X_s1,…,X_sj} (1)；

wherein j is ∈ [1, bs ].

S21 converting domain S_cbMerging the source domain S with the feature map to be processed to obtain a virtual enhanced domain S_advk。

S21 repeatedly performs substep S11, generating more first virtual objects as the iterationQuasi-enhancement domain S_advkThe data of the source domain and the virtual enhanced domain need to be fused to form a merged domain S_cb. So far merge the domain S_cbSampling data X of each batch_cbGenerating a plurality of sets of first virtual augmentation domains S from the source domain S and the current cycle_advkThus, after iteratively generating a virtual enhancement domain, merging the domains S_cbData per batch X_bsWill become as shown in equation 4:

X_bs＝{X_s1,…,X_sj,X_advk1,…,X_advkm} (2)。

wherein m is [1, bs ]]。X_sjAnd X_advkmRepresents the merged field S_advkSampling j images from the source domain S and sampling j images from the current virtual enhanced domain S for each batch of data_advkSampling m images; when the virtual enhanced domain is generated in an iterative manner, the source domain data distribution does not spread out of the source domain rapidly, so that a certain limit is required to be input for more source domain data, and as the number of the virtual enhanced domains increases, the data distribution of the enhanced domain can gradually simulate the data distribution of more unknown domains, and the dependence on the source domain data is reduced. Let k be the number of virtual augmented domains to be generated, so the source domain S and the virtual augmented domain S are selected_advkSelecting and composing target merging domain C_advkExpressions of the number of data in each batch are respectively shown in formula 5 and formula 6:

C_sbs/(k +1) (3) and C_advk＝bs-C_s (4)。

The invention designs a multi-layer parallel attention mechanism model for capturing the detail characteristics of an image, provides a divergent data fusion algorithm for improving the performance of a classifier and a disturbance regularization for realizing maximum domain transfer, provides an image identification domain generalization method based on an attention mechanism and generation of a confrontation network by combining the above parts and the problems to be solved, and establishes a network model with image identification generalization capability. The structure of the network model with the image recognition generalization capability is shown in fig. 2. Where a countermeasure network W is generated, x representing the input image and z representing the extracted features. Completing the source domain data X in the model training process_sFeature mapping of (2), secondlyA category mapping to the source domain is implemented. Gradually generating virtual enhanced domain S in an iterative mode_advkAnd performing feature mapping on the virtual enhanced domain, judging whether the feature distribution of the generated virtual enhanced domain is outside the source domain distribution through a merged domain training model, continuously updating parameters of the model, and finally achieving the purpose of maximizing domain transfer.

Further, the perturbation regularization in step S3 is to generate a virtual enhancement domain meeting the requirements of the sample level and the feature level. Calculating the data relationship between the input image and the reconstructed image (i.e. between the feature map to be processed and the virtual enhancement domain) by taking the Wasserstein distance as a metric at a sample level, and ensuring the semantic consistency of the images, as shown in formula (5):

L_r＝min_W||x-W(x)||² (5)。

using Wasserstein distance as virtual augmented domain feature Z through iterative confrontation training_advkAnd merge domain data feature Z_cbkThe distribution distance ensures the consistency of the feature hierarchy, and the specific form is shown as formula (6):

a sample level loss L is obtained_rAnd feature level loss L_conThe optimization of the paired pit network can be realized by controlling or reducing the value of the paired pit network by a conventional method, and the iteration is continued to finally obtain a second virtual enhanced domain S with the sample level and the feature level both ensuring the semantic consistency_advk。

Further, as shown in fig. 3, the multi-layer parallel attention mechanism model in step S4 is composed of a plurality of branches, and the input of the model is a feature map output through convolutional layer convolution operation. And meanwhile, carrying out multi-branch parallel processing on the input feature map, and finally fusing the feature vectors of different branches to obtain the output prediction feature map. Assuming a second virtual augmented domain S of input_advkIs M ∈ R^C/2*H*WWherein C, H and W represent the number of channels, height and width of the input feature map M, respectivelyAnd (4) degree.

In three branches A (M), B (M) and C (M), the convolution operation of 1 x 1 is firstly carried out on M, and the channel dimension C of the branches is reduced to 1/2, so that the dimension of the completely new characteristic diagram is M_A∈R^C/2*H*W，M_B∈R^C/2*H*WAnd M_C∈R^C/2*H*W. Second pair of feature map M_AAnd M_BCarrying out Reshape operation to obtain M_A1∈R^H*W*C/2And M_B1∈R^H*W*C/2Then to M_A1Performing transposition processing and mixing with M_B1Matrix multiplication is carried out to obtain the final pixel correlation characteristic diagram M_ABWherein M is_ABIs represented by formula 7:

where i ∈ {1,2, …, N }, M_AB∈R^H*W*H*W. Will M_ABPerforming Softmax normalization processing and M_C∈R^C/2*H*WMatrix multiplication is carried out, and the matrix is normalized as a result, and finally the characteristic diagram M is obtained_ABC∈R^C/2*H*W，M_ABCThe specific calculation of (a) is shown in equation 8:

for feature map M_ABCPerforming 1 × 1 convolution operation to recover to the initial channel number C, thus obtaining output characteristic graphs M of three branches A (M), B (M) and C (M)_F1∈R^C*H*W. In the branch D (M), firstly, the global pooling operation is carried out on the input feature map M, and the feature map M is compressed into M_D∈R^C*1*1And reducing the channel number C to C/16 through the dimension reduction operation of the full connection layer, and performing nonlinear processing by using a Relu activation function. In order to ensure that the size of the output characteristic diagram is equal to that of the input characteristic diagram, the output characteristic diagram is up-sampled by a full connection layer, and the channel number of the output characteristic diagram is restored to the original size C because of the fact thatThe dimension of the final feature map of the branch is M_F2∈R^C*H*W。

Through the above operations, the feature maps of the branches d (m) are weighted and fused, so the form of the predicted feature map output by the multi-layer parallel attention mechanism model is shown as formula 9:

M_F＝α_FM_F1+β_FM_F2+γ_FM (9)

wherein alpha is_F、β_FAnd gamma_FThe weight is gradually updated in the model continuous learning process for the weight coefficient of the feature, so that the multilayer parallel attention mechanism model is associated with more feature information by establishing the dependency relationship of the weight, and the structure diagram of the model is shown in fig. 3.

The invention also provides an embodiment for showing the effect of using the method of the invention.

As shown in fig. 4, the execution process of the present embodiment includes:

firstly, unifying the pixel and channel number of the data set used by the training model and the testing model, selecting the source domain training set as the MNIST data set, and changing the pixel of the image in the data set into 32 x 32, wherein the channel number is RGB three channels.

The second step, initializing, the number k of the virtual enhanced domains is 0, and taking the source domain S as the initial merging domain S_cbkThe data composition is shown in equation (4).

Thirdly, when the first virtual enhanced domain S is generated_advkThen (or when K is larger than 0) generating a first virtual confrontation field S_advkData X of_advkAnd data X of the source domain S_sPerforming divergent data fusion processing, and calculating merged domain S of each batch by formulas (6) and (7)_cbkSimultaneously combining into a target merged domain C_advk。

Fourthly, merging the target into a domain C_advkData feeding is carried out to generate a countermeasure network and a virtual enhancement domain S which guarantees semantic consistency at a sample level and a feature level is generated by combining disturbance regularization_advk。

The fifth step, use the virtual enhanced domain S_advkData X of_advkInputting the data into an image recognition network containing a multi-layer attention mechanism, training the recognition network, and optimizing the classification performance. Merging targets into domain C at the same time_advkData X of_cbkGenerating countermeasure samples in conjunction with perturbation regularization input into a generative countermeasure network

And carrying out countermeasure training, calculating the loss of the generated countermeasure network model and optimizing parameters by a gradient descent method.

And sixthly, circulating 25 times of the fifth step. In order to generate more enhanced data with different distributions, the image recognition capability of the classifier is continuously optimized, and a model capable of expanding the data distribution and simulating more unknown domains can be further obtained.

Seventhly, the currently generated virtual enhanced domain S is used_advkData X of_advkAdding to the Total virtual augmentation Domain S_advIn (1).

Eighthly, the source domain S and the total virtual enhancement domain S_advMerging, sampling data from the merged image, training the image recognition network (by adopting the same process as the fifth step), continuously calculating loss, and optimizing the parameters of the model.

Step nine, circulating the step eight, and increasing k by 1

Step ten, 3 times of third to ninth steps are circulated. 3 is the threshold number of data sets

And step ten, circulating the step ten 10000 times.

And step ten, finishing all the circulation and outputting the weight learned by the finally trained recognition model.

According to the method, firstly, a network structure is improved, a multi-level parallel attention mechanism is integrated, the problem that the model cannot fully extract image features in the training process is avoided, key features can be continuously extracted in the feature extraction process, and more information connections are established. Secondly, in order to improve the performance of the classifier and improve the data enhancement method, in the process of realizing divergent data fusion, along with the increase of generated virtual enhancement domains, the proportion of data in each batch of combination domains can be changed continuously, and the divergence of the data is realized. Finally, the semantic consistency of the sample level and the feature level of the data is simultaneously met, and the maximum domain transfer is realized.

The invention obviously improves the portability of the image recognition model, and the experimental results are shown in fig. 5 and 6.

In FIG. 5, MNIST data set is selected as the source domain training set, and four sets of digital data sets, SYHN, MNIST-M, SYN and USPS, are selected as the test set for testing. It is shown in FIG. 4 that the test results of the present invention on SYHN, MNIST-M and SYN datasets are significantly higher than other methods, but the test results on USPS datasets are lower than those of the d-SNE method, because the d-SNE only performs a great deal of improvement on the identification accuracy of the USPS datasets, the d-SNE performs poorly on the transplantation performance of the other three sets of datasets, and the average results of the four sets of test sets are taken, which is significantly higher than that of the other methods.

In fig. 6, the average value of the recognition accuracy of the model in the four data sets of SVHN, MNIST-M, SYN and USPS is used as the evaluation index of the model transplantation performance after updating, and it is obvious that the performance of the invention is greatly improved compared with the average accuracy of the image recognition of the previous model.

In summary, the image identification generalization method based on the attention mechanism and the generation countermeasure network provided by the invention designs the detail characteristics of the captured image of the multilayer parallel attention mechanism model, proposes the divergent data fusion algorithm to improve the performance of the classifier and designs the disturbance regularization to realize the maximum domain transfer, and combines the above parts and the problems to be solved, proposes the image identification domain generalization method based on the attention mechanism and the generation countermeasure network, and establishes the network model with the image identification generalization capability. The method provided by the invention can achieve the effect of improving the phenomenon of 'domain deviation' of the image identification generalization model, and improve the performance of identifying the unknown data distribution data set image.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image identification generalization method based on an attention mechanism and generation of a countermeasure network is characterized by comprising the following steps:

s4, inputting the second virtual enhancement domain into the image recognition network with a multi-layer attention mechanism in the network model with the image recognition generalization capability for training; inputting the target merging domain into a generation countermeasure network in the network model with the image recognition generalization capability for training, calculating to obtain model loss of the generation countermeasure network in the network model with the image recognition generalization capability, and optimizing parameters of the network model with the image recognition generalization capability based on the model loss and in combination with a gradient descent method;

s8 repeatedly executing steps S1-S7, and continuously updating the learned weights of the network model with the image recognition generalization capability through a gradient descent method;

2. The method according to claim 1, wherein step S1 includes:

X_bsk＝{X_s1,…,X_sj} (1)

step S2 includes:

s21 merges the initial domain S_cbkMerging the obtained data with the source domain S of the feature map to be processed to obtain the first virtual enhanced domain S_advk；

X_bs＝{X_s1,…,X_sj,X_advk1,…,X_advkm} (2)

s23 passing formula

C_sBs/(k +1) (3) and C_advk＝bs-C_s (4)

3. The method according to claim 2, wherein step S3 includes:

passing through type

L_r＝min_W||x-W(x)||² (5)

Calculating a feature map and a first virtual to be processedEnhancing data relation between domains to obtain sample level loss L_r；

Passing through type

loss of L through sample hierarchy_rAnd feature level loss L_conOptimally generating a countermeasure network, and obtaining a second virtual enhanced domain S with the sample level and the feature level both ensuring semantic consistency_advk。

4. The method according to claim 3, wherein the step S4 of inputting the second virtual augmentation domain into the image recognition network with multi-layer attention mechanism in the network model with image recognition generalization capability comprises:

s31 is implemented by adding S to the second virtual enhanced domain_advkCarrying out convolution operation of 1 x 1 on the feature map M, reducing 1/2 the channel dimension C of the feature map M, and obtaining dimensions M_A∈R^C/2*H*W、M_B∈R^C/2*H*WAnd M_C∈R^C/2*H*WA characteristic diagram of (1);

S33 passing formula

S37 passing formula

M_F＝α_FM_F1+β_FM_F2+γ_FM (9)

For the second virtual enhanced domain S_advkCharacteristic map M of_F1∈R^C*H*WAnd a feature map M_F2∈R^C*H*WPerforming weighted fusion to obtain a prediction feature map M_F。

5. The method according to claim 3, wherein the step of generating the countermeasure network by inputting the target merging domain into the network model with the image recognition generalization capability in step S4 comprises:

through the process, a challenge sample is generated

6. The method according to any one of claims 1 to 5, further comprising, prior to step S1: