CN115909002A - Image translation method based on contrast learning - Google Patents
Image translation method based on contrast learning Download PDFInfo
- Publication number
- CN115909002A CN115909002A CN202211232833.1A CN202211232833A CN115909002A CN 115909002 A CN115909002 A CN 115909002A CN 202211232833 A CN202211232833 A CN 202211232833A CN 115909002 A CN115909002 A CN 115909002A
- Authority
- CN
- China
- Prior art keywords
- image
- network
- loss
- output
- contrast
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000013519 translation Methods 0.000 title claims abstract description 43
- 238000013507 mapping Methods 0.000 claims abstract description 43
- 239000013598 vector Substances 0.000 claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000006243 chemical reaction Methods 0.000 claims abstract description 20
- 238000005070 sampling Methods 0.000 claims description 18
- 238000005457 optimization Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 14
- 238000005065 mining Methods 0.000 claims description 12
- 238000010586 diagram Methods 0.000 claims description 9
- 238000012546 transfer Methods 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 7
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 claims description 4
- 238000009825 accumulation Methods 0.000 claims description 3
- 230000010355 oscillation Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 6
- 230000003042 antagnostic effect Effects 0.000 abstract 1
- 230000002457 bidirectional effect Effects 0.000 abstract 1
- 239000000523 sample Substances 0.000 description 34
- 230000006870 function Effects 0.000 description 15
- 230000005284 excitation Effects 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000000052 comparative effect Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 241000283070 Equus zebra Species 0.000 description 2
- 229910015234 MoCo Inorganic materials 0.000 description 2
- 230000001427 coherent effect Effects 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 244000025254 Cannabis sativa Species 0.000 description 1
- 241001023788 Cyttus traversi Species 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 210000000225 synapse Anatomy 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an image translation method based on contrast learning, which comprises the following steps: inputting an input image into a generator; inputting the image generated by the generator and the real image of the target domain into a discriminator; calculating the loss of the generated countermeasure network; re-inputting the input image and the output image of the generator into an encoder in the generator, inputting the encoding vectors of the input image and the output image into a mapping network to obtain the characteristic vectors of the input image and the output image in the same characteristic space, and calculating the contrast loss between the characteristic vectors of the input image and the output image; optimizing contrast loss using focus loss; and (4) performing back propagation on the antagonistic network loss and the optimized contrast loss, and optimizing the network. The invention can greatly reduce the memory occupation and the training time by utilizing the model generated by the contrast loss, and simultaneously achieves the image conversion effect with more obvious details than the unidirectional image conversion and the bidirectional image conversion.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an image translation method based on contrast learning.
Background
Image translation is a widely applied technology in computer vision, and aims to learn a mapping relation and realize conversion from a source domain image to a target domain image. The generation of the countermeasure network has a strong image generation capability by virtue of a strong expression capability of the neural network, and is a mainstream technology of image translation.
Nowadays, with the increasing popularization of the internet, application scenarios based on image translation technology for generating an anti-network are becoming more common, including a lot of application scenarios such as image coloring, image high-resolution conversion, and image editing. In the field of automatic driving, a high-definition city scene graph is converted into a semantic label graph, and then the semantic label graph is input into a recognition system for further analysis. In short video applications, various types of video conversion effects need to be added to the video, and technical support for image translation is needed. Meanwhile, the artistic style image generated according to the real photo also provides creation reference for designers, and the artistic style image and the designer represent wide application value and huge commercial value of the image translation technology.
Image translation often employs a cyclic consistency penalty or employs a predefined content-aware penalty to ensure correlation between domains. However, the loss of cycle consistency needs an additional symmetric network, and the model is large and not beneficial to training; the content perception loss needs to be predefined, and the deviation exists in measurement, so that the generating capacity of the generator is limited.
Disclosure of Invention
The invention aims to: in order to overcome the defects in the prior art, the image translation method based on the contrast learning is provided, the content relevance before and after the image is converted can be effectively measured, the problems that the model is large in image translation and the consistency of the measured content has large deviation are solved, and the image translation method has a better generation effect compared with the traditional model.
The technical scheme is as follows: in order to achieve the above object, the present invention provides an image translation method based on contrast learning, comprising the steps of:
s1: inputting an input image into a generator, the generator comprising an encoder and a decoder; the encoder is mainly used for encoding the characteristics of an input image into a characteristic vector, and the decoder is mainly used for decoding the characteristic vector into an image in a target domain;
s2: inputting the image generated by the generator and the real image of the target domain into a discriminator to obtain the output of the discriminator, namely generating an image prediction probability and a real image prediction probability;
s3: calculating the loss of the generated countermeasure network according to the generated image prediction probability and the real image prediction probability;
s4: re-inputting the input image and the output image of the generator into an encoder in the generator to obtain the encoded output of the input image and the output image, inputting the encoded vectors of the input image and the output image into a mapping network to obtain the feature vectors of the input image and the output image in the same feature space, defining a contrast learning method in image translation according to a definition concept of contrast learning, sampling image blocks on the input image and the output image, dividing positive and negative samples, proposing contrast loss, and calculating the contrast loss between the feature vectors of the input image and the output image;
s5: optimizing the contrast loss by using the focus loss, and solving the problem that positive and negative samples are unevenly sampled;
s6: and carrying out back propagation on the anti-network loss and the optimized contrast loss, optimizing the network, and using the optimized network to realize image translation.
Further, the generator in step S1 has a structure that: adopts a U-Net network structure and comprises an encoder G enc And decoder G dec Therein, the encoder G enc Is composed of 3 convolution layers; a conversion module transformation is arranged between the encoder and the decoder and is used for image conversion between the two fields; the encoder consists of n layers of upsampling; the encoder and the decoder are in jump connection corresponding to the convolution layer, so that the characteristic information of the input image can be effectively prevented from being lost after convolution calculation, and the transmission efficiency of the information is improved.
Further, the network structure of the discriminator in step S2 is: the method is characterized in that a discrimination network structure with an attention module is constructed on the basis of a discriminator of the cycleGAN, and the expression capability of the discriminator of the cycleGAN in a generated network after improvement is enhanced, so that the capability of the discriminator of the network structure is too weak, and the quality of a generated image is poor. Therefore, in order to achieve better balance between the two, the discriminant network PatchGAN in the original cycleGAN is improved, and the size of the receptive field of the original network is maintained, and the dense residual block and the attention mechanism are added, so that the judgment capability of the discriminant network is obviously enhanced.
Further, in step S4 of the present invention: first, the correlation definition is introduced, with the optimization objective being to make the image in the input domain Passed model conversion and image from the target field->The performance is similar. Given an example image that contains no pairing +>The image translation model aims to learn an image translation model to realize the conversion from an X-domain image to a Y-domain image, and the judgment result is based on the judgment result>
The idea of contrast learning is to correlate two signals, the link between one query instance and its positive instance, against other points in the dataset (called negative instances). The query and the positive and N negative examples are mapped to K-dimensional vectors respectively v,and &>Representing the nth negative number. Thus, an (N + 1) classification problem is created in which the distance between the query and other examples is passed as logits using a multiplicative approach. The probability that the positive example was chosen is represented by calculating the cross entropy loss.
Wherein v is the sum of the values of,and &>Representing a query vector, a positive example vector, and a negative example vector, respectively.
The goal is to correlate input and output data. In contrast learning, a query refers to an output image block. Positive and negative examples are corresponding and non-corresponding input image blocks.
Contrast loss is introduced in image translation, and in an unsupervised learning environment, the whole image should share content, and corresponding characteristics should exist between input and output image blocks. Given an image block displaying the output zebra head, it should be possible to associate it more strongly with an image block of the input horse head than with other image blocks of the horse image. Even at the pixel level, the color of the zebra body correlates more closely with the horse body color than with the background tone of grass. Thus for the input and output images the image blocks correspond to each other at the same location, which is a positive example. And image blocks in other positions are non-corresponding negative examples. Meanwhile, the image block corresponds to a certain point on the feature map, and the smaller the feature map is, the larger the size of the image block is. Sampling positive and negative samples of the multi-scale image block, and enabling a learning target to be based on the image block and the multilayer characteristic diagram.
The specific process of the encoding output in the step S4 is as follows:
using the encoder G in the generator G enc Extracting high-order semantic information of the image; g enc Each spatial position on one of the intermediate feature maps represents an image block of the input image, with deeper layers corresponding to larger image blocks; for reference to the SimCLR model (Chen T, kornblith S, norouzi M, et al. A simple frame for coherent aspects of visual representation [ C]Pmlr,2020 l Passing a feature map, generating a set of featuresWherein +>Represents the output of the ith selected layer, where L ∈ {1,2, \8230;, L }, S ∈ {1,2, \8230;, S ∈ l In which S is l Is the number of spatial locations in each layer, and the corresponding feature (positive example) is called +>And other non-corresponding features (negative examples) are referred to as +>Wherein C is l Is the number of channels per layer;
The calculation of the contrast loss in the step S4 is specifically:
the optimization goal is to match the corresponding input-output image blocks at specific positions; the other Image blocks in the same input were named negative samples and are expressed as NCEIT Loss (NCE Loss for Image transformation), and the contrast Loss is expressed as follows:
wherein H is a two-layer MLP network, G is a generator, X is a source domain, S t The number of feature points on a certain feature map is represented, and corresponds to the number of images, namely image blocks; l represents the number of intermediate layers.
It is noted that the invention may also utilize image blocks of other images in the dataset as negative examples; encoding a random negative image in a data set x intoAnd using outer coding in which an auxiliary moving average encoder is used to maintain a large, consistent negative sample dictionary, and MoCo (He K, fan H, wu Y, et al. Momentum constellation for unsupervised visual representation learning C]//Proceedings ofthe IEEE/CVF conference on computerision and scatter recording.2020: 9729-9738), pictures can be sampled from a longer history, which is more efficient than end-to-end updates and banks; the comparative losses are expressed as follows:
wherein, the data set negative filmFrom Z in the source domain from an external dictionary - Samples whose data calculation uses a moving average encoder->And a moving average MLP->For ease of calculation, image blocks on the same input are taken as negative examples.
Further, in step S5: the problem of unbalanced sampling usually exists during sampling in contrast learning, so that the capability of a mapping network H for distinguishing positive and negative samples is reduced, and the characteristics of the positive samples are difficult to learn by the H due to a large number of negative samples, and the training of a generator and a discriminator is not facilitated. To alleviate this problem of negative and positive sample maldistribution. The target loss is optimized in the method of the invention by using the focus loss.
Focal Loss (FL) is an improved version of Cross-Entropy Loss (CE) that addresses the class imbalance problem by assigning more weight to the hard-to-classify or easy-to-misclassify instance (i.e., background with noise texture or partial object or object of interest), and down-weights the simple instance (i.e., background object).
The known optimization goal is to match the corresponding input-output image block at a specific location. The other Image blocks in the same input are taken as negative samples and named NCEIT Loss (NCE Loss for Image transformation).
Wherein S is t The number of feature points on a certain feature map is represented, and corresponds to the number of images, namely image blocks; l represents the number of intermediate layers. The NCEIT Loss is computed over multiple feature maps, because the represented semantic information of the multiple feature maps and the image blocks corresponding to the input image are different in size, computing the noise contrast estimation Loss over the multiple feature maps is beneficial for H-networks to learn more information, the H-networks being used to map the input and output image blocks to the same embedding space, while mapping the relevant image blocks to similar feature spaces and mapping the irrelevant image blocks to feature spaces that are further apart. For each output image block, only the input image block with the same position is the relevant signal, while the image blocks at other positions are negative excitation signals, for feature maps with the size of tens of the output image blocks, the number of positive samples is far smaller than that of negative samples, and the gradient information of a large number of negative samples covers the unique gradient information of the positive samples, so that the focus loss is introduced to solve the problem. The method comprises the following specific steps:
Wherein, gamma is the weight decay rate of the simple sample;
the resulting comparison Loss, NCEIT Loss, equation is as follows:
wherein G denotes a generator, H denotes a mapping network, X denotes a source domain, L denotes the number of intermediate layers, S l The corresponding feature point number of each layer is represented,represents the output feature vector, after input to the encoding network and the mapping network, for an output image block, is->Represents the output feature vector, which is associated with (in the same position as) the output image block, after having passed through the encoder network and the mapping network>Representing the output feature vector of an input tile that is not relevant (not identical in location) to the output tile after it has passed through the encoder network and the mapping network.
Further, in step S6:
back Propagation (BP) is a short term for "error back propagation" and is a common method used in conjunction with optimization methods (such as gradient descent) to train artificial neural networks. The method calculates the gradient of the loss function for all weights in the network. This gradient is fed back to the optimization method for updating the weights to minimize the loss function.
Back propagation requires a known output to be obtained for each input value to compute the gradient of the loss function. It is therefore generally considered to be a supervised learning approach, although it is also used in some unsupervised networks (e.g. auto-encoders). It is a generalization of the Delta rule of multi-layer feed-forward networks and can calculate gradients for each layer iteration using a chain-wise rule. Back propagation requires that the excitation function of the artificial neuron (or "node") be differentiable.
The back propagation algorithm (BP algorithm) is mainly composed of two phases: excitation propagation and weight updating.
Stage 1: propagation of excitation
The propagation link in each iteration comprises two steps: (forward propagation phase) sending training inputs into the network to obtain an excitation response; and (in a back propagation stage), the difference between the excitation response and the target output corresponding to the training input is obtained, so that the response errors of the hidden layer and the output layer are obtained.
And 2, stage: weight update
For the weight on each synapse, updating is performed as follows: multiplying the input excitation and response errors, thereby obtaining a gradient of the weight; this gradient is multiplied by a proportion and added to the weight after inversion.
This ratio (percentage) will affect the speed and effectiveness of the training process and thus become the "training factor". The direction of the gradient indicates the direction of error propagation and therefore needs to be inverted when updating the weights, thereby reducing the weight-induced errors.
Stages 1 and 2 may be iteratively iterated until the network's response to the input reaches a satisfactory predetermined target range.
In image translation, a plurality of feature maps and a plurality of feature points on the feature maps are sampled by contrast learning. If the sampling ratio of the positive and negative samples is set to 1. In the parameter updating process, the simple samples have no influence on the model, so that the memory occupation in the model training process can be greatly increased by still storing the simple samples in the memory; at the same time, the same penalty is calculated in the back propagation process, which further increases the burden of model training. Therefore, in order to accelerate the training, the mapping network H is updated by adopting an architecture similar to OHEM;
obtaining a feature vector of a point in a feature map of an intermediate layer of an encoder; for a certain point on a certain characteristic diagram of an output image after passing through an encoder, assuming that n negative samples and 1 positive sample are obtained after sampling, taking the n negative samples and the 1 positive sample as a batch, transmitting the batch into a mapping network H, and obtaining the loss of the n +1 samples through forward propagation; the n +1 losses are then sorted from large to small; then, (n + 1)/gamma samples with large loss are selected and input into a copy version H _ copy of the lower mapping network for forward and backward propagation, and the gradient of the H _ copy is copied to the upper mapping network; and finally, mapping the network H to update parameters, in order to reduce the oscillation during training, updating H _ copy for N times by using a gradient accumulation mode, and transmitting the accumulated gradient to the network H after dividing the accumulated gradient by N.
The back propagation process of optimizing the contrast loss uses an online difficult sample mining method, which specifically comprises the following steps:
two copies of the ROI network are stored in memory, including a read-only ROI network that allocates memory only for forward transfers of all ROIs and a standard ROI network that allocates memory for forward and backward transfers. For SGD iteration, given a certain convolution feature map, the read-only ROI network performs forward transfer and calculates the loss of all input ROIs; then, the ROI sampling module sequences the ROIs according to the loss values, selects the ROI samples with the first K loss values being the largest, and inputs the ROI samples into a conventional ROI network; the network provides only forward and backward delivery of the selected ROI samples, the gradient values generated by the network are delivered to the read-only ROI network, and finally the read-only ROI network performs parameter update according to the gradient values, and all the ROIs of the N images are recorded as R, so that the effective batch size of the read-only ROI network is R, while the effective batch size of the conventional ROI network is K.
Firstly, contrast learning is introduced into image translation, image blocks with the same or different positions in a source domain image and a target domain image are selected as positive and negative samples of the contrast learning, then a coder in a generator is used for extracting high-order semantic features, an auxiliary mapping network is used for mapping feature vectors to the same projection space, and then contrast loss is calculated on the projection space. The provided model does not need a dual generator and a dual discriminator, and the occupation of the model training memory and the training time are greatly reduced. Meanwhile, the auxiliary network is used for measuring the similarity degree between the source domain image and the target domain image, and the method is beneficial to learning more general information between the fields in the training process of the generator. And secondly, optimizing contrast learning by using an online difficult sample mining and focus loss method. And (4) sampling positive and negative samples of the comparative learning on the multilayer characteristic diagram. The characteristic graph is large, and the problem of uneven sampling exists, so that the characterization capability of the mapping network is limited. In order to improve the problem of uneven comparative learning sampling, the method firstly uses an online difficult sample mining technology to preferentially perform back propagation on the difficult samples; in addition, the loss is improved by using the focus loss, so that the weight of the loss of the fewer positive samples is larger. Finally, a cycleGAN network structure based on an attention mechanism is provided. A spatial attention module is added to the generator for learning class weights between feature maps. When a space attention module is added into the discriminator, a dense residual block is introduced to carry out jump connection so as to improve the transmission efficiency of the input image characteristics.
Has the advantages that: compared with the prior art, the method introduces specific steps of contrast learning in image translation and how to determine positive and negative samples in the contrast learning, improves based on cycleGAN, and adds an MLP mapping network H which maps related examples to similar positions in a feature space and maps unrelated examples to positions with longer distance in the feature space. By doing so, the intrinsic relevance of the source domain and the target domain can be effectively measured. Second, the proposed contrast loss is optimized. When the feature map is large when the image is generated, the number of the obtained positive samples is far smaller than that of the negative samples, and therefore the contrast loss is improved by adopting the online difficult sample mining and focus loss technology. The loss of the difficult samples is preferentially propagated reversely in the difficult sample mining; the loss weight of the difficult samples is made larger by changing the weight of the difficult simple samples loss in the focal loss. The model generated by using the contrast loss can greatly reduce the occupation of the training memory and the training time, simultaneously achieves the image conversion effect with more obvious details than the one-way image conversion and the two-way image conversion, and solves the problems that the existing image translation model is large, is difficult to train and the field correlation measurement index is inaccurate.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of a generator structure in the present invention;
FIG. 3 is a schematic diagram of the structure of the discriminator in the present invention;
FIG. 4 is a schematic diagram of the structure of the generation of a countermeasure network in the present invention;
FIG. 5 is a back propagation schematic of the on-line difficult sample mining of the present invention.
Detailed Description
The present invention is further illustrated by the following figures and specific examples, which are to be understood as illustrative only and not as limiting the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.
The invention provides an image translation method based on contrast learning, which comprises the following steps as shown in figure 1:
s1: inputting an input image into a generator, the generator comprising an encoder and a decoder; the encoder is mainly used for encoding the characteristics of an input image into a characteristic vector, and the decoder is mainly used for decoding the characteristic vector into an image in a target domain;
s2: inputting the image generated by the generator and the real image of the target domain into a discriminator to obtain the output of the discriminator, namely generating an image prediction probability and a real image prediction probability;
s3: calculating the loss of the generated countermeasure network according to the generated image prediction probability and the real image prediction probability;
s4: re-inputting the input image and the output image of the generator into an encoder in the generator to obtain the encoded output of the input image and the output image, inputting the encoded vectors of the input image and the output image into a mapping network to obtain the feature vectors of the input image and the output image in the same feature space, defining a contrast learning method in image translation according to a definition concept of contrast learning, sampling image blocks on the input image and the output image, dividing positive and negative samples, proposing contrast loss, and calculating the contrast loss between the feature vectors of the input image and the output image;
s5: optimizing the contrast loss by using the focus loss, and solving the problem that the positive and negative samples are not uniformly sampled;
s6: and carrying out back propagation on the anti-network loss and the optimized contrast loss, optimizing the network, and using the optimized network to realize image translation.
As shown in fig. 2, the structure of the generator in step S1 is: adopts a U-Net network structure and comprises an encoder G enc And decoder G dec Wherein the encoder G enc Is composed of 3 convolution layers; a conversion module transformation is arranged between the encoder and the decoder and is used for image conversion between the two fields; the encoder consists of n layers of upsampling; the encoder and the decoder are in jump connection corresponding to the convolution layer, so that the characteristic information of the input image can be effectively prevented from being lost after convolution calculation, and the transmission efficiency of the information is improved.
As shown in fig. 3, the network structure of the discriminator in step S2 is: a discriminant network structure with an attention module is constructed on the basis of a discriminant of the CycleGAN, and the discriminant of the CycleGAN is enhanced in the generated network after improvement, so that the discriminant of the network structure is weak in capability, and the quality of the generated image is poor. Therefore, in order to achieve better balance between the two networks, the PatchGAN in the original CycleGAN is improved, and the judgment capability of the judgment network is obviously enhanced by adding the dense residual block and the attention mechanism while keeping the size of the receptive field of the original network.
In the step S3, calculating the loss of the generated countermeasure network according to the generated image prediction probability and the real image prediction probability; generating the confrontation loss as the minimum loss, and the formula is as follows:
where D is the discriminator, G is the generator, X represents the source domain, and Y represents the target domain. x denotes the source domain image and y denotes the target domain image. D (y) is the prediction probability for the real image, and D (G (x)) is the prediction probability for the generated image G (x). This loss is expressed as a cross-entropy loss between the prediction probability and the true probability.
In step S4: first, the correlation definition is introduced, with the optimization objective being to make the image in the input domainPassed model conversion and image from the target field->The performance is similar. Given an example image containing no pairingsThe image translation model aims to learn an image translation model to realize the conversion from an X-domain image to a Y-domain image, and the judgment result is based on the judgment result>
The idea of contrast learning is to correlate two signals, the link between one query instance and its positive instance, against other points in the dataset (called negative instances). The query and the positive and N negative examples are mapped to K-dimensional vectors respectively v,and &>Representing the nth negative number. Thus, an (N + 1) classification problem is created in which the distance between the query and other examples is passed as logits using a multiplicative approach. The probability that the positive example was chosen is represented by calculating the cross entropy loss.
Wherein v is the sum of the values of,and &>Representing a query vector, a positive example vector, and a negative example vector, respectively.
The goal is to correlate input and output data. In contrast learning, a query refers to an output image block. Positive and negative examples are corresponding and non-corresponding input image blocks.
Encoder G in model usage generator G enc Extracting high-order semantic information of the image; g enc Each spatial position on one of the intermediate feature maps represents an image block of the input image, with deeper layers corresponding to larger image blocks; by referring to the SimCLR model (Chen T, kornblith S, norouzi M, et al. A simple frame for coherent learning of visual representations [ C)]Pmlr,2020 l Passing a feature map, generating a set of featuresWherein->Represents the output of the ith selected layer, where L ∈ {1,2, \8230;, L }, S ∈ {1,2, \8230;, S ∈ l In which S is l Is the number of spatial locations in each layer, and the corresponding feature (positive example) is called £ er>And other non-corresponding features (negative examples) are referred to as +>Wherein C is l Is the number of channels per layer;
The optimization goal is to match the corresponding input-output image blocks at specific positions; other Image blocks in the same input are taken as negative samples and named as NCEIT Loss (NCE Loss for Image transformation), and the expression of the contrast Loss is as follows:
wherein H is a two-layer MLP network, G is a generator, X is a source domain, S t The number of feature points on a certain feature map is represented, and corresponds to the number of images, namely image blocks; l represents the number of intermediate layers.
It is noted that the invention may also utilize image blocks of other images in the dataset as negative examples; encoding a random negative image in a data set x asAnd using outer coding in which an auxiliary moving average encoder is used to maintain a large, consistent negative sample dictionary, and MoCo (He K, fan H, wu Y, et al. Momentum constellation for unsupervised visual representation learning C]// Proceedings of the IEEE/CVF conference on computing and dpattern registration. 2020: 9729-9738), pictures can be sampled from a longer history, which is more efficient than end-to-end updates and banks; the expression of the comparative losses is as follows:
wherein, the data set negative filmFrom Z in the source domain from an external dictionary - Samples whose data calculation uses a moving average encoder->And a moving average MLP->For simplicity of calculation, image blocks on the same input are taken as negative examples.
In step S5:
the known optimization goal is to match the corresponding input-output image block at a specific location. The other Image blocks in the same input are used as negative samples and named NCEIT Loss (NCE Loss for Image transformation).
Wherein S is t The number of feature points on a certain feature map is represented, and corresponds to the number of images, namely image blocks; l represents the number of intermediate layers. The NCEIT Loss is computed over multiple feature maps, because the represented semantic information of the multiple feature maps and the image blocks corresponding to the input image are different in size, computing the noise contrast estimation Loss over the multiple feature maps is beneficial for H-networks to learn more information, the H-networks being used to map the input and output image blocks to the same embedding space, while mapping the relevant image blocks to similar feature spaces and mapping the irrelevant image blocks to feature spaces that are further apart. For each output image block, only the input image block with the same position is the relevant signal, and the image blocks at other positions are negative excitation signals, so that for feature maps with the size of tens of the output image blocks, the number of positive samples is far smaller than that of negative samples, and the gradient information of a large number of negative samples covers the unique gradient information of the positive samples, thereby introducing focus loss to solve the problems. The method comprises the following specific steps:
Wherein gamma is the weight decay rate of the simple sample;
the resulting comparison Loss, NCEIT Loss, equation is as follows:
wherein G denotes a generator, H denotes a mapping network, X denotes a source domain, L denotes the number of intermediate layers, S l The corresponding feature point number of each layer is represented,represents the output feature vector, after input to the encoding network and the mapping network, for an output image block, is->Represents the output feature vector, which is associated with (in the same position as) the output image block, after having passed through the encoder network and the mapping network>Representing the output feature vector of an input tile that is not relevant (not identical in location) to the output tile after it has passed through the encoder network and the mapping network.
In step S6:
in image translation, a plurality of feature maps and a plurality of feature points on the feature maps are sampled by contrast learning. If the sampling ratio of the positive and negative samples is set to 1. In the parameter updating process, the simple samples have no influence on the model, so that the memory occupation in the model training process can be greatly increased by still storing the simple samples in the memory; at the same time, the same computational penalty is incurred in the back-propagation process, which further increases the burden of model training. Therefore, in order to accelerate the training, the mapping network H is updated by adopting an architecture similar to OHEM;
obtaining a feature vector of a point in a feature map of an intermediate layer of an encoder; for a certain point on a certain characteristic diagram of an output image after passing through an encoder, assuming that n negative samples and 1 positive sample are obtained after sampling, taking the n negative samples and the 1 positive sample as a batch, transmitting the batch into a mapping network H, and obtaining the loss of the n +1 samples through forward propagation; the n +1 losses are then sorted from large to small; then, (n + 1)/gamma samples with large loss are selected and input into a copy version H _ copy of the lower mapping network for forward and backward propagation, and the gradient of the H _ copy is copied to the upper mapping network; and finally, mapping the network H to update parameters, in order to reduce the oscillation during training, updating H _ copy for N times by using a gradient accumulation mode, and transmitting the accumulated gradient to the network H after dividing the accumulated gradient by N.
Based on the above, the present invention performs example application and analysis on the above scheme, specifically as follows:
1) Residual structure in instance normalization and network
The example standardization is a standardization mode commonly used in image style conversion, and is specifically expressed as that the image is normalized at a channel level based on BN, and then the image is subjected to 'denormalization' by using the mean value and standard deviation of the channel corresponding to the target style picture so as to obtain the style of the target picture.
The residual block is used to improve the learning ability of the network. A hopping connection is used between the input and the output. In mathematical statistics, residual refers to the difference between the actual observed value and the estimated value (fitted value). "residual" implies important information about the basic assumptions of the model. The residual can be considered as an observed value of error if the regression model is correct. It should conform to the assumptions of the model and have some properties of error. The residual analysis refers to the process of using the information provided by the residual to investigate the reasonableness of model assumptions and the reliability of data.
2) Generating a countermeasure network principle and loss function definition
The generation of the confrontation network is based on the game theory, and both sides of the game are a generator and a discriminator. The GAN includes two modules, a Generator (Generator, G) and a Discriminator (Discriminator, D). The task of G is to randomly generate synthetic data to be spurious. In order to satisfy the randomness, a plurality of random numbers are usually used as the input of G, for example, 100 random numbers obtained by random sampling in a standard normal Distribution (random normal Distribution) are denoted as random noise (random noise) z, and the output is a picture with the same resolution as the real picture. The task of D is to distinguish between true and false, namely to judge whether a picture is a true picture or a false picture synthesized by G. Thus, the input of D is a picture, the output is a score, and a higher score indicates a truer input picture. During training, the generator generates false samples judged by the confusion discriminator as much as possible, the discriminator judges the false samples correctly as much as possible, high scores are output for all real pictures, and low scores are output for all false pictures. Repeating the steps, along with the optimization, the discrimination capability of the D is stronger and the generation capability of the G is stronger, and the two are mutually played and jointly advanced. In an ideal case, G can eventually generate a false picture that is indistinguishable from a real picture.
The principle of generating the countermeasure network is shown in fig. 4, in which the Generator and the Discriminator are constructed by a multi-layer neural network, and are differentiable functions. The generator maps the random noise vector into an image, and the generated image is as same as a real image as possible, so that the judgment of the discriminator fails, namely for G (z) input, the output of the discriminator D is real; the decision device has an optimization goal to decide as correctly as possible for the input. That is, real image input is determined to be real, and generated image input is determined to be fake. The two are continuously confronted in training and are improved together, so that the discriminator learns the essential characteristics of a real image, and the generator generates a false sample which is almost the same as the real sample.
The optimization process for generating the countermeasure network is as follows: in each iteration, a batch of real pictures x is randomly selected, a batch of z is randomly generated, and then z is input into G to obtain a batch of dummy pictures x' = G (z). The loss function of D includes two aspects: the first is that the fraction D (x) for x should be relatively high, i.e. as close as possible to 1; secondly, the fraction D (x ') corresponding to x' should be relatively low, i.e. as close as possible to 0. And adjusting each parameter of the D according to the direction of the reduction of the loss function, and completing the optimization of the D. After optimization, the score difference of D output for real or false pictures becomes large. With respect to G, the goal is to let D misinterpret x ' as a true picture, so the penalty function for G can be the difference between D (x ') and 1, the smaller this difference, indicating that x ' is more true when viewed from D. And adjusting each parameter of G according to the direction of the reduction of the loss function, and completing the optimization of G. After optimization, the false picture synthesized by G is more real in the generated image under the judgment of D.
The game of confrontation of GAN can pass discriminant functionAnd a generating function>The maximum and minimum values of the objective function therebetween. The generator G will select a random sample->Distribution p z (z converted to generate samples G (z.) discriminator D attempts to match them with the true sample distribution p data (x 0 is distinguished by the training samples, and G attempts to make the generated samples similar in distribution to the training samples.
Where x represents the real sample data and G (z) represents the generated sample data. Intuitively, for a given generator G,the discriminant D is optimized to discriminate the generated samples G (z) by attempting to assign higher values to the samples from the distribution p data (x) And lower values are assigned to the generated sample G (z). Instead, for a given discriminator D,g is optimized such that the discriminator D classifies errors, assigning a higher value to the generated sample G (z).
3) Contrast learning and loss function
The self-supervision learning is a machine learning method for network training by mining inherent characteristics of training data without depending on manually labeled labels. It is intended to learn a generic feature expression for use in downstream tasks. The basic idea of comparative learning, which is a typical self-supervised learning, is as follows: a feature expression (mapping network) is learned by constructing pairs of positive and negative samples, by which pairs of positive samples are relatively close in projection space and pairs of negative samples are as far apart as possible in projection space. Can be expressed by the following formula:
distance(f(x),f(x+))>>distance(f(x),f(x-))
where x + is a positive sample, x-is a negative sample, and f is the encoder network that needs learning.
In order to optimize the encoder network, a cross entropy based on softmax is constructed as a loss function, and the formula is as follows:
this loss function is also called Info noise-dependent estimation, and there is only one positive case and K negative cases in the sample, which can be essentially regarded as a classification problem of K +1 class.
4) Loss of focus and its definition
The focus loss is to solve the problem of imbalance of positive and negative samples in a single-stage target detection scene. In this scenario, there is an extreme imbalance between the foreground and background classes during training (e.g., 1.
First define the Cross Entropy (Cross Entropy) loss for the binary class:
in the above case y ∈ { ± 1} specifies the target list, -1 represents the negative class sample label and +1 represents the positive class sample label. And p ∈ [0,1] is the probability that the model estimates that the input is positive.
Wherein p is t To predict the probability that the input is correct, i.e. that the input y is +1 is p t Close to 1 and vice versa.
CE(p,y)=CE(p t )=-log(p t )
Classification of CE loss in simple samples (i.e., predicted p) t Values close to 1) also produce a significant loss, and the cumulative value of the gradient values for a large number of simple samples will exceed the gradient values for a complex sample.
To solve the above problem, focus loss (FocalLoss) (Lin TY, goyal P, girshickr, actual. Focal local for dense object detection [ C)]I/Proceedings of the IEEE international conference on component.2017: 2980-2988) to which a regulator (1-p) is added t ) γ As a cross entropy loss, γ can be adjusted as a hyper-parameter. The loss of focus is defined as:
FL(p t )=-(1-p t ) γ log(p t )
the focal loss has two properties: (1) When a sample is misclassified, i.e., y = +1 and p t Very small, the modulation factor is close to 1 and the losses are not affected. With p t Value close to 1, coefficient (1-p) t ) γ Going to 0, the loss weight is decreasing as the example classification approaches good. (2) Smooth adjustment of weights of simple samples by hyper-parameter gammaThe rate of decline. FL corresponds to CE when γ =0, and the factor (1-p) is adjusted as γ increases t ) γ The influence of (c) will also increase.
5) On-line difficult sample mining (OHEM) and network architecture thereof
An Online difficult sample Mining algorithm (OHEM) based on a back propagation algorithm is an algorithm which is commonly used for relieving the problem of imbalance of positive and negative samples in the field of target detection and is evolved from difficult sample Mining (Hard sample Mining). Algorithm flow as described below, for the input image of the t-th iteration of the SGD (Stochastic gradientDescriptent random gradient descent) optimization method, a convolution feature map is first calculated using a convolution network. The ROI network then uses the feature map and all input ROIs (regions of interest of regionoInterest) for forward pass and reverse update. This step only involves ROI pooling, several fc layers and loss calculation per ROI. The loss represents the performance of the current network on each ROI, the greater the loss, the worse the performance. The input ROIs are then sorted by loss and the K ROIs that perform the worst for the current network are chosen as difficult examples. Most forward calculations are shared between ROIs by convolution feature mapping, so the additional calculations required to assign all ROIs are relatively small. Furthermore, since only a few ROIs are selected to update the model, and all ROI losses are propagated backwards, the cost of backward pass is not higher than before.
In the FastRCNN family of target detectors, there are many ways to implement OHEM, such as modifying the lossy layer. The loss layer can calculate the loss of all ROIs, then select the difficult ROI according to the sorting result by sorting the loss, i.e. the ROI with larger loss is input, and finally set the loss of the non-difficult ROI to 0. This method is simple but its implementation efficiency is low because even if most of the ROIs are lost to 0, the ROI network can still allocate memory for all ROIs and perform backward transfer, which seriously affects the training efficiency of the model.
To overcome this problem, OHEM (Shrivastava A, gupta A, girshick R. Tracking region-based object detectors with online hard amplified [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition.2016: 761-769) proposes the architecture shown in FIG. 5. Two copies of the ROI network are maintained in memory, including a read-only ROI network that allocates memory only for forward transfers of all ROIs and a standard ROI network that allocates memory for forward and backward transfers. For the SGD iteration, given a certain convolution feature map, the read-only ROI network performs forward pass and computes the loss of all input ROIs (indicated by the green arrows in the figure). Then, the ROI sampling module sorts the ROIs according to the loss values, selects ROI samples with the largest first K loss values, and inputs the ROI samples into a conventional ROI network (red arrows in the figure). The network provides only forward and backward delivery of the selected ROI samples and the gradient values generated by the network will be delivered to the read-only ROI network (grey block at a in the figure). And finally, updating the parameters by the read-only ROI network according to the gradient value. All the ROIs for the N images are noted as R, so the effective batch size for the read-only ROI network is R, while the effective batch size for the conventional ROI network is K.
In order to verify the actual effect of the image translation model provided by the invention, the image translation model provided by the invention is compared with the existing image translation model, and the specific data table 1 shows that:
TABLE 1 training of relevant indices
According to the experimental results, under the condition that the batch size is the same, the contrast loss occupies less memory than the cycle consistency loss, and the training time length is only 1/3 of the cycle consistency loss. This is because the model proposed in the present invention only has the generator and the arbiter of the target domain, and the added auxiliary mapping network has little computation.
The result shows that the training time of Focal local is increased, the memory occupation and the model parameter quantity are similar to those of the original model, and the memory occupation and the training time of the model can be effectively reduced by the OHEM. The method shows that the OHEM improves the convergence rate of the model during training without reducing the generation effect of the original model.
Claims (10)
1. An image translation method based on contrast learning is characterized by comprising the following steps:
s1: inputting an input image into a generator, the generator comprising an encoder and a decoder;
s2: inputting the image generated by the generator and the real image of the target domain into a discriminator to obtain the output of the discriminator, namely generating an image prediction probability and a real image prediction probability;
s3: calculating the loss of the generated countermeasure network according to the generated image prediction probability and the real image prediction probability;
s4: re-inputting the input image and the output image of the generator into an encoder in the generator to obtain the encoding output of the input image and the output image, inputting the encoding vectors of the input image and the output image into a mapping network to obtain the characteristic vectors of the input image and the output image in the same characteristic space, and calculating the contrast loss between the characteristic vectors of the input image and the output image;
s5: optimizing contrast loss using focus loss;
s6: and carrying out back propagation on the anti-network loss and the optimized contrast loss, optimizing the network, and using the optimized network to realize image translation.
2. The image translation method based on contrast learning according to claim 1, wherein the generator in step S1 is configured to: adopts a U-Net network structure and comprises an encoder G enc And decoder G dec Wherein the encoder G enc Is composed of 3 convolution layers; a conversion module transformation is arranged between the encoder and the decoder and is used for image conversion between the two fields; the encoder consists of n layers of upsampling; the encoder and decoder are connected via a jump corresponding to the convolutional layer.
3. The image translation method based on contrast learning of claim 1, wherein the network structure of the discriminator in step S2 is: a discrimination network structure with an attention module is constructed on the basis of a discriminator of the cycleGAN, the discrimination network PatchGAN in the original cycleGAN is improved, and an intensive residual block and an attention mechanism are added while the receptive field size of the original network is kept.
4. The image translation method based on contrast learning of claim 1, wherein the calculation method for generating the loss of the countermeasure network in the step S3 is as follows:
generating the confrontation loss as the minimum loss, and the formula is as follows:
wherein D is a discriminator, G is a generator, X represents a source domain, and Y represents a target domain; x represents a source domain image and y represents a target domain image; d (y) is the prediction probability for the true image, and D (G (x)) is the prediction probability for the generated image G (x), and the loss represents the cross entropy loss between the prediction probability and the true probability.
5. The image translation method based on contrast learning of claim 1, wherein the specific process of the encoding output in step S4 is as follows:
using encoders G in generators G enc Extracting high-order semantic information of the image; g enc Each spatial position on one of the intermediate feature maps represents an image block of the input image, with deeper layers corresponding to larger image blocks; by using the SimCLR model for reference, the middle L layer is selected and passes through a small two-layer MLP network H l Passing a feature map, generating a set of featuresWherein->Represents the output of the ith selected layerWhere L is equal to {1,2, \8230;, L }, S is equal to {1,2, \8230;, S is equal to {1,2, \ 8230;, S l In which S is l Is the number of spatial positions in each layer, and the corresponding feature is called @>And other non-corresponding features are called->Wherein C l Is the number of channels per layer;
6. The image translation method based on contrast learning according to claim 5, wherein the calculation of the contrast loss in step S4 is specifically:
the optimization goal is to match the corresponding input-output image blocks at specific positions; the other image blocks in the same input were taken as negative samples and named NCEIT Loss, the expression of contrast Loss is as follows:
wherein H is a two-layer MLP network, G is a generator, X is a source domain, S t The number of feature points on a certain feature map is represented, and corresponds to the number of images, namely image blocks; l represents the number of intermediate layers.
7. The image translation method based on contrast learning according to claim 5, wherein the calculation of the contrast loss in step S4 is specifically:
using data setsImage blocks of other images as negative examples; encoding a random negative image in a data set x intoAnd using the following outer coding, in this variant a large, consistent negative sample dictionary is maintained using an auxiliary moving average encoder, the expression for contrast loss is as follows:
8. The image translation method based on contrast learning according to claim 6, wherein the step S5 specifically comprises:
Wherein gamma is the weight decay rate of the simple sample;
the resulting comparison Loss, NCEIT Loss, equation is as follows:
wherein G denotes a generator, H denotes a mapping network, X denotes a source domain, L denotes the number of intermediate layers, S l The corresponding feature point number of each layer is represented,representing output feature vectors, -based on the input into the coding network and the mapping network for the output image block>Represents the output feature vector, based on the input image block associated with the output image block, after having passed through the encoder network and the mapping network>Representing output feature vectors of input image blocks not associated with the output image block after they have passed through the encoder network and the mapping network.
9. The image translation method based on contrast learning according to claim 1, wherein the step S6 specifically comprises:
obtaining a feature vector of a point in a feature map of an intermediate layer of an encoder; for a certain point on a certain characteristic diagram of an output image after passing through an encoder, assuming that n negative samples and 1 positive sample are obtained after sampling, taking the n negative samples and the 1 positive sample as a batch, transmitting the batch into a mapping network H, and obtaining the loss of the n +1 samples through forward propagation; then sorting the n +1 losses from large to small; then, (n + 1)/gamma samples with large loss are selected and input into a copy version H _ copy of the lower mapping network for forward and backward propagation, and the gradient of the H _ copy is copied to the upper mapping network; and finally, mapping the network H to update parameters, in order to reduce the oscillation during training, updating H _ copy for N times in a gradient accumulation mode, and transmitting the accumulated gradient to the network H after dividing by N.
10. The image translation method based on contrast learning of claim 1, wherein the back propagation process for optimizing contrast loss in step S6 uses an online difficult sample mining method, which is as follows:
two copies of the ROI network are stored in memory, including a read-only ROI network that allocates memory only for forward transfers of all ROIs and a standard ROI network that allocates memory for forward and backward transfers. For SGD iteration, given a certain convolution feature map, the read-only ROI network performs forward transfer and calculates the loss of all input ROIs; then, the ROI sampling module sequences ROIs according to the loss values, selects the first K ROI samples with the largest loss values and inputs the ROI samples into a conventional ROI network; the network only provides forward and backward delivery of the selected ROI samples, the gradient values generated by the network are delivered to the read-only ROI network, and finally the read-only ROI network performs parameter update according to the gradient values, and all ROIs of the N images are recorded as R, so the effective batch size of the read-only ROI network is R, while the effective batch size of the conventional ROI network is K.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211232833.1A CN115909002A (en) | 2022-10-10 | 2022-10-10 | Image translation method based on contrast learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211232833.1A CN115909002A (en) | 2022-10-10 | 2022-10-10 | Image translation method based on contrast learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115909002A true CN115909002A (en) | 2023-04-04 |
Family
ID=86488681
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211232833.1A Pending CN115909002A (en) | 2022-10-10 | 2022-10-10 | Image translation method based on contrast learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115909002A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116152901A (en) * | 2023-04-24 | 2023-05-23 | 广州趣丸网络科技有限公司 | Training method of image generation model and stylized image generation method |
CN116631566A (en) * | 2023-05-23 | 2023-08-22 | 重庆邮电大学 | Medical image report intelligent generation method based on big data |
CN116738911A (en) * | 2023-07-10 | 2023-09-12 | 苏州异格技术有限公司 | Wiring congestion prediction method and device and computer equipment |
CN116912680A (en) * | 2023-06-25 | 2023-10-20 | 西南交通大学 | SAR ship identification cross-modal domain migration learning and identification method and system |
-
2022
- 2022-10-10 CN CN202211232833.1A patent/CN115909002A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116152901A (en) * | 2023-04-24 | 2023-05-23 | 广州趣丸网络科技有限公司 | Training method of image generation model and stylized image generation method |
CN116631566A (en) * | 2023-05-23 | 2023-08-22 | 重庆邮电大学 | Medical image report intelligent generation method based on big data |
CN116631566B (en) * | 2023-05-23 | 2024-05-24 | 广州合昊医疗科技有限公司 | Medical image report intelligent generation method based on big data |
CN116912680A (en) * | 2023-06-25 | 2023-10-20 | 西南交通大学 | SAR ship identification cross-modal domain migration learning and identification method and system |
CN116738911A (en) * | 2023-07-10 | 2023-09-12 | 苏州异格技术有限公司 | Wiring congestion prediction method and device and computer equipment |
CN116738911B (en) * | 2023-07-10 | 2024-04-30 | 苏州异格技术有限公司 | Wiring congestion prediction method and device and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115909002A (en) | Image translation method based on contrast learning | |
Gu et al. | Stack-captioning: Coarse-to-fine learning for image captioning | |
CN111723780B (en) | Directional migration method and system of cross-domain data based on high-resolution remote sensing image | |
CN114332578A (en) | Image anomaly detection model training method, image anomaly detection method and device | |
CN111476285B (en) | Training method of image classification model, image classification method and storage medium | |
CN112116593B (en) | Domain self-adaptive semantic segmentation method based on base index | |
CN113469186B (en) | Cross-domain migration image segmentation method based on small number of point labels | |
CN110689599A (en) | 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement | |
CN112990078B (en) | Facial expression generation method based on generation type confrontation network | |
CN110599443A (en) | Visual saliency detection method using bidirectional long-term and short-term memory network | |
CN114419323A (en) | Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method | |
CN115761240B (en) | Image semantic segmentation method and device for chaotic back propagation graph neural network | |
CN114692732A (en) | Method, system, device and storage medium for updating online label | |
CN117217368A (en) | Training method, device, equipment, medium and program product of prediction model | |
CN114819091B (en) | Multi-task network model training method and system based on self-adaptive task weight | |
CN114330090B (en) | Defect detection method, device, computer equipment and storage medium | |
CN118279320A (en) | Target instance segmentation model building method based on automatic prompt learning and application thereof | |
CN107256554B (en) | Single-layer pulse neural network structure for image segmentation | |
CN117708698A (en) | Class determination method, device, equipment and storage medium | |
CN116452472A (en) | Low-illumination image enhancement method based on semantic knowledge guidance | |
CN115861664A (en) | Feature matching method and system based on local feature fusion and self-attention mechanism | |
CN117011539A (en) | Target detection method, training method, device and equipment of target detection model | |
CN113095335B (en) | Image recognition method based on category consistency deep learning | |
CN114998972A (en) | Lightweight face filtering method and system | |
CN114139629A (en) | Self-guided mixed data representation learning method and system based on metric learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |