CN112001398B

CN112001398B - Domain adaptation method, device, apparatus, image processing method, and storage medium

Info

Publication number: CN112001398B
Application number: CN202010869777.7A
Authority: CN
Inventors: 陆磊; 吴子扬
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2024-04-12
Anticipated expiration: 2040-08-26
Also published as: CN112001398A

Abstract

The domain adaptation method acquires a source domain training image and a target domain training image corresponding to the source domain training image after being converted into a target domain, and further fuses the source domain training image and the corresponding target domain training image in a pixel space to obtain a fused training image, and effectively fuses pixel information of the source domain and pixel information of the target domain together through image fusion, so that the effect of aligning the pixels of the source domain and the target domain is achieved. Further, when the image processing model is trained, the fusion training image is used as a training sample, the reconstruction loss of the image processing model on the fusion training image is used as a loss function, and the image processing model is trained by taking the reconstruction loss as the loss function, so that the characteristics learned by the image processing model do not distinguish between the source domain and the target domain, and the characteristics have good adaptability to both the source domain and the target domain.

Description

Domain adaptation method, device, apparatus, image processing method, and storage medium

Technical Field

The present invention relates to the field of domain adaptation technologies, and in particular, to a domain adaptation method, apparatus, device, image processing method, and storage medium.

Background

In real life, humans may have knowledge migration capabilities for some areas. Such as: if we can play the table tennis, we can play the tennis ball in analogy to learning; for another example, if we have played chess, we can be analogized to playing chess. It is desirable that the machine also have this capability, the ability to migrate knowledge. The domain adaptation method belongs to one of knowledge migration capabilities.

In machine learning, domain adaptation methods typically involve two (or more) domains, referred to as a source domain and a target domain, respectively. Typically, the data is tagged in the source domain and tagged in the target domain, and typically the tags of the source and target domains are shared. It is desirable to have a model trained in the source domain with a domain adaptation method that also works better in the target domain. If the source domain is the digital character of the printer and the target domain is the digital character of the handwriting, we hope that the character recognition model trained in the source domain has better effect on the target domain after using the domain adaptation method.

Disclosure of Invention

In view of the foregoing, the present application has been made to provide a domain adaptation method, apparatus, device, image processing method, and storage medium, so that an image processing model trained in a source domain can have a good recognition effect in a target domain through domain adaptation processing. The specific scheme is as follows:

A domain adaptation method, comprising:

acquiring a source domain training image and a target domain training image corresponding to the source domain training image after the source domain training image is converted into a target domain;

fusing the source domain training image and the corresponding target domain training image in a pixel space to obtain a fused training image;

and training the image processing model by taking the fusion training image as a training sample and taking the reconstruction loss of the image processing model on the fusion training image as a loss function.

Preferably, the acquiring process of the target domain training image includes:

and converting the source domain training image into the target domain by utilizing the pre-trained image conversion model to obtain a corresponding target domain training image after converting the source domain training image into the target domain.

Preferably, the fusing the source domain training image and the corresponding target domain training image in the pixel space to obtain a fused training image includes:

the first image block of the target area in the source domain training image is scratched;

and acquiring a second image block of the target area in the target area training image, and filling the target area in the source area training image by using the second image block to obtain a fusion training image.

Preferably, the step of matting out the first image block of the target area in the source domain training image includes:

setting the pixel value of the image of the target area in the source domain training image to be zero to obtain a processed source domain training image;

the obtaining a second image block of the target area in the target area training image, filling the target area in the source area training image by using the second image block, and obtaining a fusion training image includes:

setting the pixel values of the images except the target area in the target area training image to be zero to obtain a processed target area training image;

and superposing the processed source domain training image and the processed target domain training image to obtain a fusion training image.

Preferably, the training the image processing model with the fused training image as a training sample and the reconstruction loss of the image processing model on the fused training image as a loss function includes:

taking the fusion training image as a training sample, coding the fusion training image by using a coding layer of an image processing model to obtain a coding result, and decoding the coding result by using a decoding layer to obtain a reconstructed image;

Determining a reconstruction loss of the fusion training image based on the reconstruction image and the fusion training image;

and training the image processing model by taking the reconstruction loss as a loss function.

Preferably, the loss function further comprises: a first classification penalty for the fused training image;

training the image processing model with the fused training image as a training sample, with a reconstruction penalty of the image processing model on the fused training image, and a first classification penalty of the fused training image as a penalty function, comprising:

using a first classification layer of the image processing model to classify the image based on the coding result to obtain a first classification result;

determining a first classification penalty for the fused training image based on the first classification result;

And training the image processing model by taking the reconstruction loss and the first classification loss as loss functions.

Preferably, the determining the first classification loss of the fusion training image based on the first classification result includes:

and determining a first classification loss of the fusion training image with label smoothing Lable smooth based on the first classification result and the classification label of the source domain training image corresponding to the fusion training image.

Preferably, the training sample further comprises: the source domain training image; the loss function further includes: a second classification penalty for the source domain training image;

training the image processing model with the fused training image and the source domain training image as training samples, with a reconstruction loss of the fused training image by the image processing model, and a second classification loss of the source domain training image as a loss function, comprising:

encoding the fusion training image and the source domain training image by using an encoding layer of an image processing model to obtain respective encoding results, and decoding the encoding results of the fusion training image by using a decoding layer to obtain a reconstructed image;

performing image classification based on the coding result of the source domain training image by using a second classification layer of the image processing model to obtain a second classification result;

determining a second classification loss of the source domain training image based on the second classification result;

and training the image processing model by taking the reconstruction loss and the second classification loss as loss functions.

An image processing method, comprising:

acquiring an image to be processed;

and processing the image to be processed by using the image processing model trained by the domain adaptation method to obtain an image processing result.

Preferably, the process of processing the image to be processed by using the image processing model includes:

and coding the image to be processed by using the image processing model to obtain hidden layer characterization features corresponding to the image to be processed.

A domain adaptation device, comprising:

the image acquisition unit is used for acquiring a source domain training image and a target domain training image corresponding to the source domain training image after the source domain training image is converted into a target domain;

The image fusion unit is used for fusing the source domain training image and the corresponding target domain training image in a pixel space to obtain a fused training image;

the model training unit is used for taking the fusion training image as a training sample and taking the reconstruction loss of the image processing model on the fusion training image as a loss function to train the image processing model.

A domain adaptation device, comprising: a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the steps of the domain adaptation method as described above.

A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the domain adaptation method as described above.

By means of the technical scheme, the domain adaptation method obtains the source domain training image and the corresponding target domain training image after the source domain training image is converted into the target domain, and further an image fusion strategy is adopted to fuse the source domain training image and the corresponding target domain training image in a pixel space to obtain a fused training image, and through image fusion, the pixel information of the source domain and the pixel information of the target domain are effectively fused together, so that the effect of aligning the pixels of the source domain and the pixels of the target domain is achieved. Further, when the image processing model is trained, the fusion training image is used as a training sample, the reconstruction loss of the image processing model on the fusion training image is used as a loss function, and the image processing model is trained by taking the reconstruction loss as the loss function, so that the characteristics learned by the image processing model do not distinguish between the source domain and the target domain, the problem of characteristic alignment of the source domain and the target domain is solved, and the characteristics have good adaptability to the source domain and the target domain, namely, the image processing model also has good recognition effect in the target domain.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a flow chart of a domain adaptation method according to an embodiment of the present disclosure;

FIG. 2 illustrates a schematic diagram of an image conversion process;

FIG. 3 illustrates a schematic diagram of an image fusion process;

FIG. 4 illustrates a schematic diagram of a training process for an image processing model;

FIG. 5 illustrates a schematic diagram of a training process for another image processing model;

FIG. 6 illustrates a schematic diagram of a training process for yet another image processing model;

FIG. 7 illustrates a schematic diagram of a training process for yet another image processing model;

fig. 8 is a schematic structural diagram of a domain adaptation device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a domain adaptation device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The application provides a domain adaptation scheme which can be used for domain adaptation problems of any computer vision, such as face recognition tasks, image classification tasks and the like. Through the domain adaptation scheme, the image processing model with good recognition effect in the target domain can be trained.

The scheme can be realized based on the terminal with the data processing capability, and the terminal can be a mobile phone, a computer, a server, a cloud terminal and the like.

Next, as described in connection with fig. 1, the domain adaptation method of the present application may include the following steps:

and step S100, acquiring a source domain training image, and converting the source domain training image into a target domain corresponding to the target domain training image.

Specifically, domain adaptation methods generally comprise two or more domains, referred to as a source domain and a target domain, respectively, wherein training images of the source domain are referred to herein simply as source domain training images. The source domain training image is typically annotated with information, while the target domain is not annotated with information.

And converting the source domain training image I into a target domain to obtain a corresponding target domain training image F (I). In the step, a source domain training image and a target domain training image are simultaneously acquired.

The target domain training image may be a target domain training image corresponding to a source domain training image obtained by converting the source domain training image into the target domain by using a pre-trained image conversion model. In addition, the source domain training images and corresponding target domain training images may be existing, such as directly acquired through a set of published training images, or the like.

For the above mentioned image conversion model, it may be a cyclic et al image conversion algorithm model. In order to enable the converted image to have higher definition quality, relay supervision can be added on the basis of the cyclic gram, feature maps of different layers in the middle of the cyclic gram model are used for generating conversion images with different resolutions, and then the conversion image with the highest resolution is selected from the conversion images to serve as the converted image, so that the conversion image with higher definition is obtained.

Referring to fig. 2, taking a heterogeneous face recognition task as an example, a source domain is a color RGB image, a target domain is an infrared NIR image as an example, and a source domain training image can be converted into a target domain training image through image conversion.

It should be noted that, the source domain training images and the target domain training images are in one-to-one correspondence, that is, for any source domain training image, there is one target domain training image corresponding to any source domain training image.

And step S110, fusing the source domain training image and the corresponding target domain training image in a pixel space to obtain a fused training image.

Specifically, in the embodiment of the present application, it is considered that the quality of the converted target domain training image may not be high, and when the target domain training image is directly used to train the image processing model, the model training effect may be poor, and the pixel information of the source domain training image cannot be utilized. Therefore, the embodiment of the application provides an image fusion strategy, namely, the source domain training image and the corresponding target domain training image are fused in a pixel space to obtain a fused training image. Through image fusion, the obtained fusion training image effectively fuses the pixel information of the source domain and the target domain together, and plays a role in aligning the pixels of the source domain and the target domain.

And step S120, training the image processing model by taking the fusion training image as a training sample and taking the reconstruction loss of the image processing model on the fusion training image as a loss function.

The image processing model can be a model which is already pre-trained by utilizing a source domain training image, or an untrained original model. The image processing model has the function of extracting hidden layer characterization features of an input training sample, and judges the performance of one image processing model, and mainly judges whether the extracted hidden layer characterization features are strong enough for the expression of the training sample. After extracting the hidden layer characterization feature of the training sample, the hidden layer characterization feature can be input into different task modules to obtain a corresponding task processing result, for example, the hidden layer characterization feature can be input into a classification module to obtain an image classification result of the training sample. In addition, the hidden layer characterization features can also be input into the coding layer, so that the reconstructed image and the like can be obtained through coding.

In the step, the fusion training image obtained in the previous step is used as a training sample to be input into an image processing model, and the image processing model is trained by taking the reconstruction loss of the image processing model on the fusion training image as a loss function.

The image processing model extracts hidden layer characterization features from the fusion training image, codes the fusion training image based on the extracted hidden layer characterization features to obtain a coded reconstructed image, further calculates the difference between the reconstructed image and the fusion training image, uses the difference as the reconstruction loss of the fusion training image, and trains the image processing model.

According to the domain adaptation method, the source domain training image is acquired, the corresponding target domain training image is converted from the source domain training image to the target domain, and then the image fusion strategy is adopted, the source domain training image and the corresponding target domain training image are fused in the pixel space to obtain the fusion training image, and through image fusion, the pixel information of the source domain and the pixel information of the target domain are effectively fused together, so that the effect of aligning the pixels of the source domain and the pixels of the target domain is achieved. Further, when the image processing model is trained, the fusion training image is used as a training sample, the reconstruction loss of the image processing model on the fusion training image is used as a loss function, and the image processing model is trained by taking the reconstruction loss as the loss function, so that the characteristics learned by the image processing model do not distinguish between the source domain and the target domain, the problem of characteristic alignment of the source domain and the target domain is solved, and the characteristics have good adaptability to the source domain and the target domain, namely, the image processing model also has good recognition effect in the target domain.

In some embodiments of the present application, the process of fusing the source domain training image and the corresponding target domain training image in the pixel space to obtain the fused training image is described in step S110.

In an alternative implementation manner, the image fusion process in the embodiment of the present application may be implemented as follows:

s1, the first image block of the target area in the source domain training image I is scratched.

The target area may be a preset area, such as an area with a size of 10×10 in the upper right corner of the image or other areas. In addition, the target area may be a randomly determined area that occupies a set proportion of the total area of the image, such as a randomly determined area that occupies 20% of the total area of the image.

And subtracting the first image corresponding to the target area from the source domain training image.

S2, acquiring a second image block of the target area in the target domain training image, and filling the target area in the source domain training image by using the second image block to obtain a fusion training image.

The size of the target domain training image obtained after the source domain training image is converted into the target domain is consistent with the size of the source domain training image. Therefore, in this step, the second image block corresponding to the target area in the target area training image may be obtained, and then the second image block is used to fill the target area in the source area training image, and the filled source area training image is used as the fusion training image.

Referring to fig. 3, a schematic diagram of a fused training image obtained by fusing a source domain training image and a target domain training image is illustrated.

As can be seen from fig. 3, the target area may be an area of the middle part of the right side in the image, and the pixel information of the source domain training image and the target domain training image may be fused into one image through the image fusion process.

Further optionally, for S1 above, the process of matting the first image block of the target area in the source domain training image may specifically include:

s11, setting the image pixel value of the target area in the source domain training image to be zero, and obtaining the processed source domain training image.

Specifically, the occlusion mask=0 can be added to the target area in the source domain training image I, and the mask=1 is added to the rest areas in the source domain training image I, that is, the image pixel value of the target area is set to zero.

On this basis, the step S2 of obtaining a second image block of the target region in the target region training image, and filling the target region in the source region training image with the second image block to obtain a fused training image may specifically include:

S21, setting the pixel values of the images except the target area in the target area training image to be zero, and obtaining the processed target area training image.

Specifically, the mask=1 may be added to the target area in the target area training image F (I), and the mask=0 may be added to the remaining areas, so as to obtain a processed target area training image.

S22, superposing the processed source domain training image and the processed target domain training image to obtain a fusion training image.

Specifically, the formula can be expressed as:

I _fuse ＝I*mask+F(I)*(1-mask)

wherein I is _fuse The method comprises the steps of representing a fusion training image, wherein I represents a source domain training image, F (I) represents a target domain training image corresponding to a target domain after the source domain training image I is converted, wherein the mask value comprises two parts, the mask value corresponding to the target region position is 0, and the mask value of other positions except the target region position is 1.

In some embodiments of the present application, the process of training the image processing model with the fused training image as a training sample and the reconstruction loss of the fused training image by the image processing model as a loss function in the step S120 is further described.

In connection with fig. 4, a schematic diagram of a training process for an image processing model is illustrated.

The implementation process of the step S120 may include the following steps:

s11, taking the fusion training image as a training sample, coding the fusion training image by using a coding layer of an image processing model to obtain a coding result, and decoding the coding result by using a decoding layer to obtain a reconstructed image.

The coding layer codes the fusion training image, and coded hidden layer characterization features can be obtained and used as a coding result.

The coding layer may use a ResNet50 structure. The decoding layer may use a multi-layer block stack of layers Deconvolution + InstanceNorm + Relu. Of course, the specific network structure of the coding layer and the decoding layer may also take other forms, which are not strictly limited in this application.

Fusion training image I in 3 x 112 dimensions _fuse As input, 2048-dimensional hidden layer characterization feature vectors can be obtained after the encoding layer is processed, and then 2048-dimensional hidden layer characterization feature vector input can be usedAnd 3 x 112 dimension reconstructed images can be reconstructed by the deconvolution structure of the plurality of layers of the decoding layers.

S12, determining the reconstruction loss of the fusion training image based on the reconstruction image and the fusion training image.

Specifically, the reconstruction loss of the fusion training image can be represented by L _con Expressed, wherein the reconstruction loss can be expressed as:

wherein x is _i Representing the size, y, of the ith pixel value in the fused training image _i The i-th pixel value in the reconstructed image is represented, and n is the total pixel value number contained in the image.

S13, training the image processing model by taking the reconstruction loss as a loss function.

By training the image processing model by taking the reconstruction loss as a loss function, the characteristics learned by the image processing model do not distinguish the source domain from the target domain, the problem of characteristic alignment of the source domain and the target domain is solved, and the characteristics have good adaptability to both the source domain and the target domain, namely, the image processing model can have good recognition effect in the target domain.

On the basis of the above embodiment, a further training manner of the image processing model is described in the embodiment of the present application, and the loss function in the embodiment further increases the first classification loss L of the fused training image compared with the previous embodiment _fuse Wherein the training samples remain unchanged as the fused training image. And the training process of the image processing model is to train by taking the fusion training image as a training sample, taking the reconstruction loss of the fusion training image by the image processing model and the first classification loss of the fusion training image as a loss function.

In connection with fig. 5, a schematic diagram of a training process of another image processing model is illustrated.

The training process of the image processing model may include:

s21, taking the fusion training image as a training sample, coding the fusion training image by using a coding layer of an image processing model to obtain a coding result, and decoding the coding result by using a decoding layer to obtain a reconstructed image.

S22, determining the reconstruction loss of the fusion training image based on the reconstruction image and the fusion training image.

The steps S21 to S22 are identical to the steps S11 to S12, and are described in detail with reference to the foregoing, and are not repeated here.

S23, performing image classification based on the coding result by using a first classification layer of the image processing model to obtain a first classification result.

Specifically, the encoding result is input into a first classification layer, and the first classification layer classifies the fusion training image to obtain a first classification result.

S24, determining first classification loss of the fusion training image based on the first classification result.

Specifically, the classification label of the source domain training image can be used as the classification label of the corresponding fusion training image. Thus, the first classification loss L of the fused training image can be determined based on the first classification result and the classification label of the fused training image _fuse 。

Wherein the first classification loss L _fuse Can be expressed as:

where j represents the classified image category, p _j Representing the probability that the training sample is predicted to belong to class j, q _j Indicating a predetermined probability belonging to the j-th class.

Alternatively, qj may be 1 when a training sample belongs to the j-th class, otherwise q _j Is 0.

Alternatively, consider that the fused training image is source domain trainingThe fusion result of the image and the target domain training image is not completely identical to the source domain training image, so that when the classification label of the source domain training image is used as the classification label of the fusion training image, the credibility of the classification label of the source domain training image cannot be completely ensured, and therefore, the first classification loss L with label smoothing Lable smooth can be selectively used in the embodiment _fuse . That is, q _j Instead of simply taking a 1 or 0, it is taken as follows:

where ε represents the dominant probability of the corresponding class of the source domain training image, which may generally be set to 0.1 or other small value between 0 and 1, K is the total number of classes of the classified image.

From this, it can be seen that when a certain training sample belongs to the j-th class, q _j Taking 1-0.1=0.9 instead of directly taking 1, q when a certain training sample does not belong to the j-th class _j Instead of taking 0 directly, 0.1/(K-1) is taken. That is, a little penalty is also made for the case where the classification is correct, so as to be suitable for the case where the classification label of the fusion training image is not completely trusted.

S25, training the image processing model by taking the reconstruction loss and the first classification loss as loss functions.

Specifically, in this embodiment, the loss function L includes two terms, i.e., L _con And L _fuse ：

L＝λL _con +βL _fuse

Wherein lambda and beta are super parameters of different loss weights, and can be set according to actual training conditions.

Compared with the previous embodiment, the loss function in the present embodiment increases the first classification loss of the fusion training image, so that the distinguishing capability of the features learned by the image processing model on the images of different source domains and target domains can be ensured.

On the basis of the embodiment, another embodiment of the image processing model is described in the embodimentIn the training mode, compared with the previous embodiment, the training samples in the present embodiment add the source domain training image, and the loss function further adds the second classification loss L of the source domain training image _CE 。

On this basis, the training process of the image processing model can comprise two kinds of training processes, namely:

and the first training model is trained by taking the fusion training image and the source domain training image as training samples, taking the reconstruction loss of the image processing model on the fusion training image and the second classification loss of the source domain training image as a loss function.

And secondly, taking the fusion training image and the source domain training image as training samples, taking the reconstruction loss of the image processing model on the fusion training image, the first classification loss of the fusion training image and the second classification loss of the source domain training image as loss functions, and training the image processing model.

In contrast, the two training modes are distinguished by a loss function, and the second loss function increases the first classification loss of the fused training image compared with the first one.

Next, the two training modes are described separately.

For the first training mode, fig. 6 may be combined, which illustrates a schematic diagram of a training process of a further image processing model.

The training process of the image processing model may include:

s31, coding the fusion training image and the source domain training image by using a coding layer of an image processing model to obtain respective coding results, and decoding the coding results of the fusion training image by using a decoding layer to obtain a reconstructed image.

The same coding layer may be used for coding the fusion training image and the coding layer for coding the source domain training image, or two coding layers may be used, but the two coding layers share network parameters. The process of encoding the fusion training image and decoding to obtain the reconstructed image may be described with reference to the foregoing related description, which is not repeated here.

S32, determining the reconstruction loss of the fusion training image based on the reconstruction image and the fusion training image.

Specifically, the process of determining the reconstruction loss of the fusion training image may be described with reference to the foregoing related description, which is not repeated herein.

S33, performing image classification based on the coding result of the source domain training image by using a second classification layer of the image processing model to obtain a second classification result.

Specifically, the encoding result of the source domain training image is input into a second classification layer, and the source domain training image is classified by the second classification layer to obtain a first classification result.

S34, determining second classification loss of the source domain training image based on the second classification result.

Specifically, the source domain training image carries a classification label, so that the second classification loss L of the source domain training image can be determined based on the second classification result and the classification label of the source domain training image _CE 。

Wherein the second classification loss L _CE Can be expressed as:

where j represents the classified image category, p _j Representing the probability that the source domain training image is predicted to belong to the j-th class, y _j To indicate a vector, when a certain source domain training image belongs to the j-th class, y _j Take a value of 1, otherwise, y _j The value is 0.

S35, using the reconstruction loss L _con Said second classification loss L _CE The image processing model is trained as a loss function.

Specifically, in this embodiment, the loss function L includes two terms, i.e., L _con And L _CE ：

L＝αL _CE +λL _con

Wherein, alpha and lambda are super parameters of different loss weights, which can be set according to actual training conditions.

Compared with the previous embodiment, the loss function in this embodiment increases the second classification loss of the source domain training image, so that the distinguishing capability of the features learned by the image processing model on different source domain training images can be ensured.

Further to the second training approach, fig. 7 may be combined, which illustrates a training process schematic of yet another image processing model.

The training process of the image processing model may include:

s41, respectively encoding the fusion training image and the source domain training image by using an encoding layer of an image processing model to obtain respective encoding results, and decoding the encoding results of the fusion training image by using a decoding layer to obtain a reconstructed image.

S42, determining the reconstruction loss of the fusion training image based on the reconstruction image and the fusion training image.

Specifically, the above-mentioned S41-S42 are in one-to-one correspondence with the above-mentioned S31-S32, and detailed description thereof is omitted herein.

S43, performing image classification based on the coding result of the fusion training image by using a first classification layer of the image processing model to obtain a first classification result.

Specifically, the present step S43 corresponds to the previous step S23, and is described in detail with reference to the previous description, which is not repeated here.

S44, performing image classification based on the coding result of the source domain training image by using a second classification layer of the image processing model to obtain a second classification result.

Specifically, the present step S44 corresponds to the previous step S33, and the detailed description thereof is referred to above, and will not be repeated here.

S45, determining a first classification loss of the fusion training image based on the first classification result.

Specifically, the step S45 corresponds to the step S24, and is described in detail with reference to the foregoing descriptions, which are not repeated here.

S46, determining second classification loss of the source domain training image based on the second classification result.

Specifically, the present step S46 corresponds to the previous step S34, and the detailed description is referred to above in the related description, which is not repeated here.

S47, using the reconstruction loss L _con Said first classification loss L _fuse Said second classification loss L _CE The image processing model is trained as a loss function.

Specifically, in the present embodiment, the loss function L includes three terms, i.e., L _con 、L _fuse And L _CE ：

L＝αL _CE +λL _con +βL _fuse

Wherein, alpha, lambda and beta are super parameters of different loss weights, which can be set according to actual training conditions.

Compared with the previous embodiment, the loss function in this embodiment includes the reconstruction loss of the fusion training image, the first classification loss of the fusion training image, and the second classification loss of the source domain training image, and by using the reconstruction loss, the features learned by the image processing model do not distinguish between the source domain and the target domain, so that the problem of alignment of the features of the source domain and the target domain is solved, and the features have better adaptability to both the source domain and the target domain, that is, the image processing model also has a good recognition effect in the target domain. Through fusing the first classification loss of the training image, the distinguishing capability of the features learned by the image processing model on images of different source domains and target domains can be ensured. The second classification loss of the source domain training image further improves the distinguishing capability of the features learned by the image processing model on different source domain images.

In some embodiments of the present application, an image processing method is further disclosed, where the image processing method uses the image processing model trained by the domain adaptation method of the foregoing embodiments to process the image to be processed, so as to obtain an image processing result.

The above has explained that the image processing model has the function of extracting hidden layer characterization features of an input image, and determines the performance of an image processing model, and mainly whether the extracted hidden layer characterization features are sufficiently strong for the expression of the input image. After extracting the hidden layer characterization feature of the input image, the hidden layer characterization feature can be input into different task modules to obtain a corresponding task processing result, for example, the hidden layer characterization feature can be input into a classification module to obtain an image classification result of the input image. In addition, the hidden layer characterization features can also be input into the coding layer, so that the reconstructed image and the like can be obtained through coding.

Therefore, in this embodiment, the image to be processed may be encoded by using the image processing model, so as to obtain the hidden layer characterization feature corresponding to the image to be processed, where the hidden layer characterization feature may be used as the image processing result. In addition, the image processing model can further utilize the hidden layer characterization feature to input different task models so as to obtain corresponding task processing results, and the corresponding task processing results can be used as image processing results, such as the hidden layer characterization feature is input into a classification module so as to obtain a classification result of an image to be processed.

In this embodiment, heterogeneous face recognition will be described as an example.

Heterogeneous face recognition refers to the process of matching (1:1) or retrieving (1:N) face images of two different domains (source and target).

For any two images of the source domain and the target domain, the images can be encoded through the image processing model to obtain the encoded hidden layer characterization features, and the similarity of the two images can be measured through the distance between the two hidden layer characterization features, such as Euclidean distance. For a 1:1 matching task, the distance can be directly compared with a set threshold, and when the distance is higher than the set threshold, the two images are considered to be not the same person, otherwise, the two images are considered to be the same person. For 1: n search tasks, namely, N images similar to the target image are required to be searched in an image set in a source domain or a target domain, the distance between the target image and each image in the image set can be calculated respectively, the first N images with smaller distance in the image set are selected to serve as N images similar to the target image according to the distance order.

The domain adaptation device provided in the embodiments of the present application will be described below, and the domain adaptation device described below and the domain adaptation method described above may be referred to correspondingly to each other.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a domain adaptation device according to an embodiment of the present application.

As shown in fig. 8, the apparatus may include:

an image obtaining unit 11, configured to obtain a source domain training image and a target domain training image corresponding to the source domain training image after the source domain training image is converted to a target domain;

the image fusion unit 12 is configured to fuse the source domain training image and the corresponding target domain training image in a pixel space to obtain a fused training image;

the model training unit 13 is configured to train the image processing model by using the fused training image as a training sample and using a reconstruction loss of the image processing model on the fused training image as a loss function.

Optionally, the process of acquiring the target domain training image by the image acquisition unit may include:

Optionally, the process of fusing the source domain training image and the corresponding target domain training image in the pixel space by the image fusion unit to obtain a fused training image may include:

Optionally, the process of the image fusion unit to scratch the first image block of the target area in the source domain training image may include:

on this basis, the process of obtaining the fused training image by the image fusion unit obtaining the second image block of the target area in the target area training image and filling the target area in the source area training image by using the second image block may include:

Optionally, the process of training the image processing model by using the fused training image as a training sample and using a loss of reconstruction of the fused training image by the image processing model as a loss function by the model training unit may include:

Optionally, the foregoing loss function may further include: a first classification penalty for the fused training image; on this basis, the model training unit may train the image processing model with the fused training image as a training sample, with a reconstruction loss of the fused training image by the image processing model, and a first classification loss of the fused training image as a loss function, and the training process may include:

Optionally, the process of determining the first classification loss of the fusion training image by the model training unit based on the first classification result may include:

Optionally, the training sample may further include: the source domain training image; the loss function may further include: the second classification of the source domain training image is lost. On this basis, the model training unit may train the image processing model by using the fused training image and the source domain training image as training samples, using a reconstruction loss of the image processing model on the fused training image, and a second classification loss of the source domain training image as a loss function, and the training process may include:

The domain adaptation device provided by the embodiment of the application can be applied to domain adaptation equipment, such as a terminal: cell phones, computers, etc. Alternatively, fig. 9 shows a block diagram of a hardware structure of the domain adaptation device, and referring to fig. 9, the hardware structure of the domain adaptation device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;

in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete communication with each other through the communication bus 4;

Processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;

the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;

wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to:

Alternatively, the refinement function and the extension function of the program may be described with reference to the above.

The embodiment of the application also provides a storage medium, which may store a program adapted to be executed by a processor, the program being configured to:

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the present specification, each embodiment is described in a progressive manner, and each embodiment focuses on the difference from other embodiments, and may be combined according to needs, and the same similar parts may be referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A domain adaptation method, comprising:

training the image processing model by taking the fusion training image as a training sample and taking the reconstruction loss of the image processing model on the fusion training image as a loss function;

The training sample further comprises: the source domain training image; the loss function further includes: a second classification penalty for the source domain training image;

and training the image processing model by taking the fusion training image and the source domain training image as training samples and taking the reconstruction loss of the image processing model on the fusion training image and the second classification loss of the source domain training image as a loss function.

2. The method of claim 1, wherein the process of acquiring the target domain training image comprises:

3. The method of claim 1, wherein fusing the source domain training image and the corresponding target domain training image in pixel space to obtain a fused training image comprises:

4. A method according to claim 3, wherein the matting the first image block of the target region in the source domain training image comprises:

5. The method of claim 1, wherein training the image processing model with the fused training image as a training sample and a loss of reconstruction of the fused training image by the image processing model as a loss function comprises:

6. The method of claim 1, wherein the loss function further comprises: a first classification penalty for the fused training image;

7. The method of claim 6, wherein the determining a first classification penalty for the fused training image based on the first classification result comprises:

8. The method of claim 1, wherein training the image processing model with the fused training image and the source domain training image as training samples, with a reconstruction penalty of the image processing model on the fused training image, and a second classification penalty of the source domain training image as a penalty function, comprises:

9. An image processing method, comprising:

acquiring an image to be processed;

processing the image to be processed by using the image processing model trained by the domain adaptation method according to any one of claims 1-8 to obtain an image processing result.

10. The method of claim 9, wherein the processing of the image to be processed using the image processing model comprises:

11. A domain adaptation device, comprising:

the model training unit is used for taking the fusion training image as a training sample, taking the reconstruction loss of the image processing model on the fusion training image as a loss function, and training the image processing model;

the model training unit takes the fusion training image and the source domain training image as training samples, and takes the reconstruction loss of the image processing model on the fusion training image and the second classification loss of the source domain training image as a loss function to train the image processing model.

12. A domain adaptation device, comprising: a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the steps of the domain adaptation method according to any one of claims 1 to 8.

13. A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the domain adaptation method of any of claims 1 to 8.