CN114742890A - 6D attitude estimation data set migration method based on image content and style decoupling - Google Patents
6D attitude estimation data set migration method based on image content and style decoupling Download PDFInfo
- Publication number
- CN114742890A CN114742890A CN202210261360.1A CN202210261360A CN114742890A CN 114742890 A CN114742890 A CN 114742890A CN 202210261360 A CN202210261360 A CN 202210261360A CN 114742890 A CN114742890 A CN 114742890A
- Authority
- CN
- China
- Prior art keywords
- image
- domain
- style
- content
- loss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013508 migration Methods 0.000 title claims abstract description 69
- 230000005012 migration Effects 0.000 title claims abstract description 62
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 22
- 239000004576 sand Substances 0.000 claims description 12
- 230000008485 antagonism Effects 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 3
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 241000272525 Anas platyrhynchos Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000003292 glue Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a 6D attitude estimation data set migration method based on image content and style decoupling, which comprises the following steps: the method comprises the following steps: training a pseudo-paired image generation network; step two: and training the autonomously designed image migration network. According to the method, the image content and style representation are decoupled by using the encoder, the image is reconstructed by using the content representation space shared with the source domain data and the style representation space shared with the target domain data, and strong supervision of an object structure and refinement of a key point structure are realized by using the designed object structure feature extractor and key point attention feature extractor during reconstruction. The method can effectively make up the inter-domain gap between the real data and the synthetic data, uses the decoupling representation as input, can effectively improve the mode collapse, and does not increase the complexity of the 6D attitude estimation algorithm while fully utilizing the data of the label-free target domain.
Description
Technical Field
The invention belongs to the field of 6D attitude estimation, and relates to a 6D attitude estimation data set migration method based on image content and style decoupling, which can effectively make up for an inter-domain gap between real data and synthetic data.
Background
The 6D pose estimation task aims at estimating 6 degrees of freedom of a given object relative to the camera, including 3D rotation and 3D translation, which is a fundamental task in computer vision. It is widely applicable to many real-world tasks such as robotic manipulation, augmented reality, and autopilot.
In recent years, with the development of deep neural networks, many 6D attitude estimation algorithms based on convolutional neural networks have been proposed and achieve good performance. However, the convolutional neural network is extremely data-driven, so that a large amount of real data with 3D pose labels is often required for training to obtain a better effect. In fact, the 3D pose labeling of real images is extremely difficult to obtain, but the 3D pose labeling of composite images is easily obtained. However, due to the domain gap between the real data and the synthetic data, the performance of the 6D pose estimation network trained on the synthetic data set may be severely degraded when tested on real images. Therefore, it is gradually noticed by more and more researchers how to reduce the inter-domain gap between the non-labeled real data and the labeled synthetic data.
The image data migration method can be used for 6D posture estimation data set migration and is divided into paired image domain migration and unpaired image domain migration. Although the paired image domain migration has good effects in the aspects of structure preservation and style migration, the pairing conditions are too harsh, and the real data set and the synthetic data set of 6D attitude estimation cannot meet the pairing requirements. Although the unpaired image domain migration has good effects in the fields of target detection, classification and the like, the unpaired image domain migration does not perform well in pixel-level tasks such as 6D attitude estimation due to the fact that an image pair cannot be formed and strong supervision on an object structure is lacked.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a 6D attitude estimation data set migration method based on image content and style decoupling, the method can realize inter-domain migration of unpaired images while strongly supervising the object structure, and a migration network is independently designed for a 6D attitude estimation task, so that inter-domain gaps between real data and synthetic data in a 6D attitude estimation data set are effectively made up.
The technical scheme of the invention comprises the following steps:
A6D pose estimation dataset migration method based on image content and style decoupling, comprising:
step one, training a pseudo-pairing image generation network, namely, training a source domain image IsAnd a target domain image ItSending the pseudo-matching image into a generation network for training;
acquiring a domain-invariant content space and a domain-specific style space of cross-domain shared information by using an encoder; secondly, exchanging a domain-invariant content space, fixing a style space specific to the domain, and sending the representation space of the domain into an image generator to generate a pseudo-paired image of the domain; finally, decoupling the pseudo-pairing image and exchanging the domain invariant content space to obtain a reconstructed source domain imageAnd a reconstructed target domain imageTo achieve training on unpaired data;
step two, training an autonomously designed image migration network: the image migration network is based on the trained pseudo-paired image generation network and adopts an object structure feature extractor HcStyle feature extractor HsAnd key point attention force feature extractor HkThe image generator is further refined for structures near object keypoints.
Optionally, the key point attention feature extractor HkUsing source domain images IsTraining to obtain a source domain image IsAttention feature extractor H for sending key pointskObtaining the heat map of the key points, and utilizing the heat map of the key points to extract the structural features of the objectcAnd performing attention weighting on the structural loss of the processed source domain image, wherein the structural loss of the key point is defined as:
l represents loss, keypoint represents key point, and 2 represents L2 loss; hcA representation structural feature extractor; hkA representation key point feature extractor; i issRepresenting a source domain image; t is a unit oftA migration image representing a target domain; the circle in the formula represents the Hadamard product; the lower subscript 2 of the double vertical bars represents the two-norm.
Optionally, the total loss function of the image migration network is:
Ltotal: total loss; λ: a weight; l isKLLoss of KL;loss of domain antagonism;reconstruction loss, with the loss denoted as L1 loss at subscript 1;structural losses, with the lower subscript 2 indicating losses as L2 losses);loss of style;key point structureLoss;color is lost.
Optionally, the second step specifically includes:
(2.1) rendering the source domain image IsContent encoder for delivery to source domainDeriving source domain image content codingTarget domain image ItStyle encoder for delivery to a target domainObtaining target domain image style coding
(2.2) encoding target Domain image stylesAnd source domain image content encodingInput target domain image generator GtIn generating a migration image T of a source domaint;
(2.3) Using the structural feature extractor HcExtracting object structural features fc: structural feature extractor HcUsing the pre-trained VGG-19 network, the object part T of the migration image T is obtained by using the mask image MobjectThen T is addedobjectSending the pre-trained VGG-19 network, and taking out a conv4_2 layer as an object structural feature fcThe structural loss of an object is defined as:
fcis a structural feature of the object, lower corner mark TsTransition image, subscript T, representing source domain imagetMigration image representing target Domain image, subscript ItRepresenting the target field image, subscript IsRepresenting a source domain image; the double vertical lines represent norms, and the lower corner mark 2 of the double vertical lines represents a norm;
(2.4) extraction of features by means of the StylesExtracting object structural features fs: style feature extractor HsUsing a pre-trained VGG-19 network which is the same as the structure feature extractor, firstly sending the migration image T into the pre-trained VGG-19 network, and taking out the conv1_1, conv2_1, conv3_1, conv4_1 and conv5_1 layers to calculate a gram matrix as the style feature fsThe weight of each layer is 1, 0.8, 0.5, 0.3 and 0.1 respectively; the style loss is defined as:
fsindicating style characteristics, lower corner mark TtMigration image representing target Domain image, subscript ItRepresenting a target domain image; the double vertical lines represent norms, and the lower corner mark 2 of the double vertical lines represents a norm;
(2.5) attention feature extractor H using key pointskExtracting heat maps of key points of the image: key point attention feature extractor HkUsing source domain images IsTraining to obtain the source domain image IsAttention feature extractor H for feeding key pointskObtaining the heat map of the key points, and utilizing the heat map of the key points to extract the structural features of the objectcAnd performing attention weighting on the structural loss of the processed source domain image, wherein the structural loss of the key point is defined as:
l represents loss, keypoint represents key point, and 2 represents L2 loss; hcTo representA structural feature extractor; hkA representation key point feature extractor; I.C. AsRepresenting a source domain image; t istA migration image representing a target domain; the circle in the formula represents the Hadamard product; the lower corner mark 2 of the double vertical lines represents a two-norm;
(2.6) since the inter-domain differences are partly caused by light, decoupling the light when defining the color loss, representing the image from the RGB color model to the LAB color model by ρ, applying the L1 loss to the other two channels after removing the light channel:
l represents loss, color represents color loss, 1 represents L1 loss; ρ represents the image from the RGB color model to the LAB color model; i issRepresenting a source domain image; m is a group ofsRepresenting a source domain mask image; t istA migration image representing a target domain; the double vertical lines represent norms and the subscript 1 to the double vertical lines represents a norm.
Optionally, the feature extractor H using the key point attention forcekThe heatmap for extracting the key points of the image is specifically as follows:
2.5.1, extracting a feature map of the input image by using a feature pyramid network and ResNet 101;
2.5.2 inputting the extracted features into a Key Point extractor HkThe network comprises 4 continuous 3 x 3 convolution layers, each layer is followed by a Relu serving as an activation function, the last layer is up-sampled to obtain a feature map with the same size as the picture, and the extracted features are used for generating a pixel-level probability distribution map h by utilizing softmax to represent the probability that the pixel point is the key point.
Optionally, the first step specifically includes:
(1.1) source domain image IsFeed-to-source-domain style encoderAnd source domain content encoderObtaining source-domain style codingAnd source content encodingTarget domain image ItFeed target domain style encoderAnd a target domain content encoderObtaining target domain style codingAnd target domain content encoding
(1.2) encoding the Source Domain contentAnd target domain content encodingFeed content discriminator DcContent encoding distinguishing two domains;
(1.3) Source Domain image Style codingAnd content encoding of target domain imagesFeed source domain image generator GsGenerating a pseudo-paired image F of a target Domains(ii) a Imaging the target domainStyle codingAnd content encoding of source domain imagesInput target domain image generator GtPseudo-paired image F of medium-generation source domaint;
(1.4) pseudo-paired image F of Source DomaintDestination domain identifier DtDistinguishing a real image and a generated image of a target domain;
(1.5) pseudo-paired image F of target DomainsDestination domain identifier DsDistinguishing a real image of a source domain from a generated image;
(1.6) pseudo-paired image F of target DomainsStyle coding ofPseudo-paired image F with source domaintContent encoding ofFeed source domain image generator GsTo generate a reconstructed source domain imagePseudo-paired image F of source domaintStyle coding ofPseudo-paired image F with target domainsContent encoding ofInput target domain image generator GtTo generate a reconstructed target field image
Optionally, the source content encoderAnd target domain image generator GtThe last layer of (2) shares the weight;
the target domain content encoderAnd source domain image generator GsThe last layer of (2) shares the weight.
Optionally, the method further includes a third step of testing the network:
step 3.1-Source Domain image IsInput source domain content encoderDeriving source content codingTarget domain image ItInput target domain style encoderObtaining target domain style coding
Step 3.2 encoding the Source Domain contentAnd target domain style codingInput target domain image generator GtObtaining a migration image Tt。
Compared with the prior art, the invention has the following advantages:
1. the invention provides a method for realizing inter-domain migration of unpaired images while carrying out strong supervision on object structures.
2. A migration network is independently designed for the 6D attitude estimation task, and the inter-domain gap between real data and synthetic data in the 6D attitude estimation data set is effectively made up.
3. The invention uses decoupling representation as input, can effectively improve mode collapse and increase output diversity.
4. The invention solves the problem of inter-domain gaps from the aspect of data generation, and does not increase the complexity of a 6D attitude estimation algorithm while fully utilizing the data of the label-free target domain.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
fig. 1 shows an overall overview of the image content and style decoupling based 6D pose estimation dataset migration method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to embodiments, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the invention:
the domain-invariant content space refers to a space composed of structural features of objects that do not vary with the image domain.
The domain-specific style space means that each image domain has its own style characteristics, and the style characteristics of different domains constitute different style spaces.
The representation space of the domain is divided into a source domain representation space and a target domain representation space. The representation space of the source domain consists of a content space of the source domain and a style space of the source domain, the content space of the source domain consists of content characteristics of the source domain, and the style space of the source domain consists of style characteristics of the source domain; the representation space of the target domain is composed of a content space of the target domain and a style space of the target domain, the content space of the target domain is composed of content features of the target domain, and the style space of the target domain is composed of style features of the target domain.
The source domain image refers to a composite image, and the target domain image refers to a real image.
All the letter meanings related to the corner marks in the invention are uniformly defined as follows:
marking the upper corner: c represents a structure; s represents a style; k represents a key point;
lower corner mark: s represents a source domain; t represents a target domain;
h generally denotes an extractor; g generally denotes an image generator; e generally denotes an encoder; z generally represents a code; i generally represents the original image; t generally represents a migration image; m represents a mask image; l represents a loss; λ represents a weight; the letters, in combination with the upper and lower superscript meanings, denote corresponding content, e.g.Denoted as source domain style encoder.
With reference to fig. 1, the method for migrating a 6D pose estimation data set for RGB images of the present invention includes:
the method comprises the following steps: training a pseudo-pairing image generation network: source domain image IsAnd a target domain image ItSending the pseudo-matching image into a pseudo-matching image generation network;
first acquired with the encoder (source domain image I)sAnd a target domain image It) A domain-invariant content space and a domain-specific style space that share information across domains; and secondly, exchanging the domain-invariant content space and fixing the domain-specific style space, and sending the representation space of the domain into respective image generators (of the source domain and the target domain) to generate a pseudo-paired image of the domain. Finally, decoupling the pseudo-matching image and exchanging the space of the unchanged content of the domain to reconstruct the original source domain imageAnd target domain imageTo enable training of unpaired data. The method uses decoupling representation as input, can effectively improve mode collapse and increase output diversity. The method solves the problem of inter-domain gaps from the aspect of data generation, and does not increase the complexity of a 6D attitude estimation algorithm while fully utilizing the data of the label-free target domain.
Step two: training an autonomously designed image migration network: the image migration network is designed with an object structure feature extractor H on the basis of a trained pseudo-paired image generation networkcStyle feature extractor HsAnd key point attention feature extractor HkThe image generator is further tuned. And the tuning content is to refine the structures near the key points of the object through the key point structure loss in the step two. Wherein, the object structure characteristic extractor HcAnd style feature extractor HsAnd the image after the source domain image processing is the image after the source domain image migration.
In the first step, the pseudo-pairing image generation network is trained by using input 6D attitude estimation data, and the method comprises the following steps:
step 101: source domain image IsFeed source domain style encoderAnd a content encoderObtaining a style codeAnd content codingTarget domain image ItStyle encoder for delivery to a target domainAnd a content encoderObtaining a style codeAnd content coding
Step 102: encoding source domain image stylesAnd content encoding of target domain imagesInput source domain image generator GsGenerating a pseudo-paired image F of a target Domains(ii) a Encoding target domain image stylesContent coding of source domain imagesInput target domain image generator GtPseudo-paired image F of medium-generation source domaint。
Step 103: pseudo-paired image F of target domainsFeed source domain style encoderAnd a content encoderDeriving a style codeAnd content codingPseudo-pairing image F of source domaintStyle encoder for delivery to a target domainAnd a content encoderDeriving a style codeAnd content coding
Step 104: pseudo-paired image F of target domainsStyle coding ofPseudo-paired image F with source domaintContent encoding ofInput source domain image generator GsTo generate a reconstructed source domain imagePseudo-pairing image F of source domaintStyle coding ofPseudo-paired image F with target domainsContent coding ofInput target domain image generator GtTo generate a reconstructed target domain image
The training of the image migration network comprises the following steps:
step 201: source domain image IsContent encoder for delivery to source domainObtaining content codingTarget domain image ItStyle encoder for delivery to a target domainObtaining a style code
Step 202: encoding target domain image stylesContent coding of source domain imagesInput target domain image generator GtIn generating a migration image T of a source domaint。
Step 203: transferring image T of source domaintSource domain image IsAnd its mask image Ms(because of the migration of the source domain image TtContent and target domain image IsSame, therefore the migration image T of the source domaintThe mask image is the source field image IsMask image Ms) Sent into a pre-trained structural feature extractor HcTo obtain an image TtStructural features of the objectAnd source domain image IsStructural features of the object
Step 204: transferring image T of source domaintAnd a target domain image ItSent into a style feature extractor H which is trained in advancesObtaining an image TtStyle characteristics ofAnd a target domain image ItStyle characteristics of
Step 205: source domain image IsSending the pre-trained key point attention feature extractor HkObtaining a source domain image IsHeat map of
The invention is further elucidated with reference to the drawing.
a) The method is a pseudo-pairing image generation network, and the network training comprises the following steps:
step 301: source domain image IsFeed-to-source-domain style encoderAnd a content encoderObtaining a style codeAnd content codingTarget domain image ItStyle encoder for delivery to a target domainAnd a content encoderDeriving a style codeAnd content coding
Step 302: applying KL penalties to style coding encourages style representations to be as close as possible to the previous gaussian distribution:
LKL=E[DKL((Zs)||N(0,1))];
wherein:
p represents the true sample distribution, q represents the estimated sample distribution, DKL(p | | q) represents the distance between p and q; zsRepresenting a style code and Z representing a code.
Step 303: encoding source contentAnd target domain content encodingFeed content discriminator DcThe content antagonism losses are:
step 304: encoding source domain image stylesAnd content encoding of target domain imagesFeed source domain image generator GsGenerating a pseudo-paired image F of a target Domains(ii) a Encoding target domain image stylesAnd content encoding of source domain imagesInput target domain image generator GtPseudo-paired image F of medium-generation source domaint。
Step 305: pseudo-paired image F of source domaintDestination domain identifier DtThe target domain resistance loss is:
step 306: pseudo-paired image F of target domainsFeed source domain discriminator DsThe loss of source domain antagonism is, the loss of domain antagonism isComprises the following steps:
step 307: encoding a style of a pseudo-paired image of a target domainContent encoding of pseudo-paired images with source domainFeed source domain image generator GsTo generate a reconstructed source domain imageEncoding a style of a pseudo-paired image of a source domainContent encoding of pseudo-paired images with target domainInput target domain image generator GtTo generate a reconstructed target field imageThe reconstruction loss is defined as:
step 308: the overall loss function is:
Ltotal: total loss; λ: weights (corresponding to the same loss function as the trailing corner markers); l isKLLoss of KL;loss of domain antagonism;reconstruction loss, with the loss denoted as L1 loss at subscript 1;structural losses, with the lower subscript 2 indicating losses as L2 losses);loss of style;loss of key point structure;color is lost.
(b) The image migration network is designed autonomously, and a source content encoder trained in a pseudo-paired image generation network is used on the basis of the pseudo-paired image generation networkTarget domain style encoderAnd a target domain image generator GtAnd (4) forming. Designs a structural feature extractor HcStyle feature extractor HsAnd key point attention force feature extractor Hk. The method comprises the following specific steps:
step 401, source domain image IsContent encoder for delivery to source domainObtaining content codingTarget domain image ItStyle encoder for delivery to a target domainDeriving a style code
Step 402, target domain image style codingAnd content encoding of source domain imagesInput target domain image generator GtIn generating a migration image T of a source domaint。
Step 402, transferring the image T of the source domaintSource domain image IsAnd its mask image MsSent into a pre-trained structural feature extractor HcIn (1). HcUsing the pre-trained VGG-19 network, the object part T of the migration image T is obtained by using the mask image MobjectThen T is addedobjectSending into a pre-trained VGG-19 network, and taking out a conv4_2 layer as an object structural feature fcThe structural loss of an object is defined as:
f is a feature, the upper corner mark c indicates the structure, the lower corner mark TsTransition image, subscript T, representing source domain imagetMigration image representing target Domain image, subscript ItRepresenting the target field image, subscript IsRepresenting a source domain image; the double vertical lines represent norms, and the lower corner mark 2 of the double vertical lines represents a norm;
step 403, transferring the image T of the source domaintAnd a target domain image ItSent into a style feature extractor H which is trained in advancesIn (1). HsUsing a pre-trained VGG-19 network which is the same as the structure feature extractor, firstly sending the migration image T into the pre-trained VGG-19 network, and taking out the conv1_1, conv2_1, conv3_1, conv4_1 and conv5_1 layers to calculate a gram matrix as the style feature fsThe weights of each layer are 1, 0.8, 0.5, 0.3 and 0.1, respectively. The style loss is:
f is a feature, the upper corner mark s indicates the style, the lower corner mark TtMigration image representing target Domain image, subscript ItRepresenting a target domain image; the double vertical lines represent norms, and the lower corner mark 2 of the double vertical lines represents a norm;
step 404, the source domain image IsSource domain image I for inputsPre-trained key point attention feature extractor HkIn the above, obtaining a heat map of the key points, and performing attention weighting on the structural loss of the generated image by using the obtained heat map, the structural loss of the key points is:
in the formula: hcA representation structural feature extractor; hkA representation key point feature extractor; i issRepresenting a source domainAn image; t istA migration image representing a target domain; the circles in the formula represent the Hadamard products, i.e., the products of the corresponding elements of the matrix; the double vertical lines represent norms and the subscript 2 of the double vertical lines represents a two-norm.
Step 405, because the inter-domain difference part is caused by light, we decouple the light when defining the color loss, represent the image from the RGB color model to the LAB color model by rho, and apply the L1 loss to the other two channels after removing the light channel:
l represents loss, color represents color loss, 1 represents L1 loss (a common loss); ρ represents the image from the RGB color model to the LAB color model; i issRepresenting a source domain image; msRepresenting a source domain mask image; t is a unit oftA migration image representing a target domain; the double vertical lines represent the norm, and the subscript 1 of the double vertical lines represents a norm.
Step 406, the overall loss function of the image style migration network is:
l total: total loss; λ: weights (corresponding to the same loss function as the back corner markers); such asλ weight, weight of content versus resistance loss; l is a radical of an alcoholKLLoss of KL;loss of domain antagonism;reconstruction loss (lower corner mark 1 tableLoss shown as L1 loss);structural losses (lower subscript 1 denotes loss as L2 loss);loss of style;loss of key point structure;color loss.
The extraction step in step 404 is specifically as follows:
and 501, extracting a feature map of the input image by using the feature pyramid network and the ResNet 101.
Step 502, inputting the extracted features into a key point extraction network, wherein the network comprises 4 continuous 3 x 3 convolution layers, each layer is followed by a Relu serving as an activation function, the last layer is subjected to up-sampling to obtain a feature map with the same size as the picture, and the extracted features are used for generating a pixel-level probability distribution map h by utilizing softmax to represent the probability that the pixel points are key points;
(c) the method is a test network and comprises the following specific steps:
step 601, the source domain image I is processedsInput source content encoderObtaining content codingTarget domain image ItInput target domain style encoderDeriving a style code
Step 602, encoding contentAnd style codingInput target domain image generator GtObtaining a migration image Tt。
Experimental part:
to demonstrate the effectiveness of the method, tests were performed on the LINEMOD real dataset and the LINEMOD-PBR synthetic dataset. Firstly, inputting RGB images of a real data set and a synthetic data set into a network to obtain an RGB image after the synthetic data set is migrated; then the synthetic image, the migrated image and the label of the synthetic data set are respectively sent into a 6D attitude estimation network to obtain a 6D attitude estimation network model of the synthetic image and a 6D attitude estimation network model of the migrated image; and finally, respectively testing the performances of the two 6D posture estimation network models on the real data set.
Because the data volume of the LINEMOD real data set is small, only one thousand pictures are trained for testing. The 6D posture estimation network uses HRNet estimation key points proposed by the research institute of science and Microsoft Asia in 2019, and then calculates the object posture by one PnP. ADD indexes are tested on eight objects such as Cat, and as shown in the following table 1, the average ADD value of the method is ten percent higher than that of a LINEMOD-PBR synthetic data set, which shows that the method effectively makes up for the inter-domain gap between real data and synthetic data in a 6D attitude estimation data set.
TABLE 1
Model | PBR | Our |
Cat | 0.455 | 0.543 |
Cam | 0.187 | 0.337 |
Phone | 0.253 | 0.389 |
Iron | 0.268 | 0.340 |
Driller | 0.617 | 0.766 |
Can | 0.592 | 0.727 |
Glue | 0.211 | 0.255 |
Duck | 0.139 | 0.164 |
Mean | 0.340 | 0.440 |
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.
Claims (8)
1. A6D attitude estimation data set migration method based on image content and style decoupling is characterized by comprising the following steps:
step one, training a pseudo-pairing image generation network, namely, training a source domain image IsAnd a target domain image ItSending the pseudo-matching image into a generation network for training;
acquiring a domain-invariant content space and a domain-specific style space of cross-domain shared information by using an encoder; secondly, exchanging a domain-invariant content space, fixing a domain-specific style space, and sending the domain representation space into an image generator to generate a domain pseudo-pairing image; finally, decoupling the pseudo-paired images and exchanging the domain-invariant content space to obtain the reconstructed source domain imageAnd a reconstructed target domain imageTo enable training of unpaired data;
step two, training an autonomously designed image migration network: the image migration network is based on the trained pseudo-paired image generation network and adopts an object structure feature extractor HcStyle feature extractor HsAnd key point attention feature extractor HkThe image generator is further refined for structures near object keypoints.
2. The decoupled 6D pose based on image content and style of claim 1The method for migrating the estimation data set is characterized in that the key point attention feature extractor HkUsing source domain images IsTraining to obtain the source domain image IsAttention feature extractor H for feeding key pointskObtaining the heat map of the key points, and utilizing the heat map of the key points to extract the structural features of the objectcAnd performing attention weighting on the structural loss of the processed source domain image, wherein the structural loss of the key point is defined as:
l represents loss, keypoint represents key point, and 2 represents L2 loss; hcA representation structural feature extractor; hkA representation key point feature extractor; i issRepresenting a source domain image; t istA migration image representing a target domain; the circle in the formula represents the Hadamard product; the double vertical bar subscript 2 represents the two-norm.
3. The image content and style decoupling based 6D pose estimation data set migration method according to claim 1 or 2, wherein the total loss function of the image migration network is:
4. The image content and style decoupling based 6D pose estimation data set migration method of claim 1 or 2, wherein said step two specifically comprises:
(2.1) source domain image IsContent encoder for delivery to source domainDeriving source domain image content codingTarget domain image ItStyle encoder for delivery to a target domainObtaining target domain image style coding
(2.2) encoding target Domain image stylesAnd source domain image content encodingInput target domain image generator GtMigration of medium generation source domainsImage Tt;
(2.3) Using the structural feature extractor HcExtracting object structural features fc: structural feature extractor HcUsing the pre-trained VGG-19 network, the object part T of the migration image T is obtained by using the mask image MobjectThen T is addedobjectSending the pre-trained VGG-19 network, and taking out a conv4_2 layer as an object structural feature fcThe structural loss of an object is defined as:
fcis a structural feature of the object, lower corner mark TsTransition image, subscript T, representing source domain imagetMigration image representing target Domain image, subscript ItRepresenting the target field image, subscript IsRepresenting a source domain image; the double vertical lines represent norms, and the lower corner mark 2 of the double vertical lines represents a norm;
(2.4) extraction of features by means of a Style extractor HsExtracting object structural features fs: style feature extractor HsUsing a pre-trained VGG-19 network which is the same as the structure feature extractor, firstly sending the migration image T into the pre-trained VGG-19 network, and taking out the conv1_1, conv2_1, conv3_1, conv4_1 and conv5_1 layers to calculate a gram matrix as the style feature fsThe weight of each layer is 1, 0.8, 0.5, 0.3 and 0.1 respectively; the style loss is defined as:
fsindicating style characteristics, lower corner mark TtMigration image representing target Domain image, subscript ItRepresenting a target domain image; the double vertical lines represent norms, and the lower corner mark 2 of the double vertical lines represents a norm;
(2.5) attention feature extractor H using key pointskExtracting heat maps of key points of the image: key point attention feature elevatorGet ware HkUsing source domain images IsTraining to obtain the source domain image IsAttention feature extractor H for feeding key pointskObtaining the heat map of the key points, and utilizing the heat map of the key points to extract the structural features of the objectcAnd performing attention weighting on the structural loss of the processed source domain image, wherein the structural loss of the key point is defined as:
l represents loss, keypoint represents key point, and 2 represents L2 loss; hcA representation structural feature extractor; hkA representation key point feature extractor; i issRepresenting a source domain image; t istA migration image representing a target domain; the circle in the formula represents the Hadamard product; the lower corner mark 2 of the double vertical lines represents a two-norm;
(2.6) since the inter-domain differences are partly caused by light, decoupling the light when defining the color loss, representing the image from the RGB color model to the LAB color model by ρ, applying the L1 loss to the other two channels after removing the light channel:
l represents loss, color represents color loss, and 1 represents L1 loss; ρ represents the image from the RGB color model to the LAB color model; i issRepresenting a source domain image; msRepresenting a source domain mask image; t istA migration image representing a target domain; the double vertical lines represent the norm, and the subscript 1 of the double vertical lines represents a norm.
5. The image content and style decoupling based 6D pose estimation data set migration method according to claim 1 or 2, wherein said utilizing is relatedKey point attention force feature extractor HkThe heatmap for extracting key points of the image is specifically as follows:
2.5.1, extracting a feature map of the input image by using a feature pyramid network and ResNet 101;
2.5.2 inputting the extracted features into a Key Point extractor HkThe network comprises 4 continuous 3 x 3 convolution layers, each layer is followed by a Relu serving as an activation function, the last layer is up-sampled to obtain a feature map with the same size as the picture, and the extracted features are used for generating a pixel-level probability distribution map h by utilizing softmax to represent the probability that the pixel point is the key point.
6. The image content and style decoupling based 6D pose estimation data set migration method of claim 1 or 2, wherein said step one specifically comprises:
(1.1) source domain image IsFeed-to-source-domain style encoderAnd source domain content encoderDeriving source-domain style codingAnd source content encodingTarget domain image ItFeed target domain style encoderAnd a target domain content encoderObtaining target domain style codingAnd target domain content encoding
(1.2) encoding the Source Domain contentAnd target domain content encodingFeed content discriminator DcContent encoding distinguishing two domains;
(1.3) Source Domain image Style codingAnd content encoding of target domain imagesFeed source domain image generator GsGenerating a pseudo-paired image F of a target Domains(ii) a Encoding target domain image stylesContent coding of source domain imagesInput target domain image generator GtPseudo-paired image F of medium-generation source domaint;
(1.4) pseudo-paired image F of Source DomaintDestination domain identifier DtDistinguishing a real image and a generated image of a target domain;
(1.5) pseudo-paired image F of target DomainsDestination domain identifier DsDistinguishing a real image of a source domain from a generated image;
(1.6) pseudo-paired image F of target DomainsOf a genre code of Z's sAnd sourcePseudo-paired image F of domaintContent encoding ofFeed source domain image generator GsTo generate a reconstructed source domain imagePseudo-paired image F of source domaintStyle coding ofPseudo-paired image F with target domainsContent encoding ofInput target domain image generator GtTo generate a reconstructed target field image
7. The image content and style decoupling based 6D pose estimation data set migration method of claim 6, wherein said source domain content encoderAnd target domain image generator GtThe last layer of (2) shares the weight;
8. The image content and style decoupling based 6D pose estimation data set migration method of claim 1 or 2, further comprising the step three of testing a network:
step (ii) of3.1 merging source Domain image IsInput source domain content encoderDeriving source-domain content codingTarget domain image ItInput target domain style encoderObtaining target domain style coding
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210261360.1A CN114742890A (en) | 2022-03-16 | 2022-03-16 | 6D attitude estimation data set migration method based on image content and style decoupling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210261360.1A CN114742890A (en) | 2022-03-16 | 2022-03-16 | 6D attitude estimation data set migration method based on image content and style decoupling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114742890A true CN114742890A (en) | 2022-07-12 |
Family
ID=82276292
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210261360.1A Pending CN114742890A (en) | 2022-03-16 | 2022-03-16 | 6D attitude estimation data set migration method based on image content and style decoupling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114742890A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115546295A (en) * | 2022-08-26 | 2022-12-30 | 西北大学 | Target 6D attitude estimation model training method and target 6D attitude estimation method |
-
2022
- 2022-03-16 CN CN202210261360.1A patent/CN114742890A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115546295A (en) * | 2022-08-26 | 2022-12-30 | 西北大学 | Target 6D attitude estimation model training method and target 6D attitude estimation method |
CN115546295B (en) * | 2022-08-26 | 2023-11-07 | 西北大学 | Target 6D gesture estimation model training method and target 6D gesture estimation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Melekhov et al. | Dgc-net: Dense geometric correspondence network | |
Nataraj et al. | Detecting GAN generated fake images using co-occurrence matrices | |
Qian et al. | Learning and transferring representations for image steganalysis using convolutional neural network | |
CN108182441B (en) | Parallel multichannel convolutional neural network, construction method and image feature extraction method | |
CN105224942B (en) | RGB-D image classification method and system | |
CN104268593B (en) | The face identification method of many rarefaction representations under a kind of Small Sample Size | |
CN111783521B (en) | Pedestrian re-identification method based on low-rank prior guidance and based on domain invariant information separation | |
CN107301643B (en) | Well-marked target detection method based on robust rarefaction representation Yu Laplce's regular terms | |
CN106408037A (en) | Image recognition method and apparatus | |
CN110705591A (en) | Heterogeneous transfer learning method based on optimal subspace learning | |
CN112883826B (en) | Face cartoon generation method based on learning geometry and texture style migration | |
CN109325513B (en) | Image classification network training method based on massive single-class images | |
CN110765882A (en) | Video tag determination method, device, server and storage medium | |
CN110751271B (en) | Image traceability feature characterization method based on deep neural network | |
CN114742890A (en) | 6D attitude estimation data set migration method based on image content and style decoupling | |
CN113706407B (en) | Infrared and visible light image fusion method based on separation characterization | |
CN112990340B (en) | Self-learning migration method based on feature sharing | |
CN114329031A (en) | Fine-grained bird image retrieval method based on graph neural network and deep hash | |
CN107184224B (en) | Lung nodule diagnosis method based on bimodal extreme learning machine | |
CN103927533B (en) | The intelligent processing method of graph text information in a kind of scanned document for earlier patents | |
CN112561782A (en) | Method for improving reality degree of simulation picture of offshore scene | |
CN112330639A (en) | Significance detection method for color-thermal infrared image | |
Li et al. | Facial age estimation by deep residual decision making | |
CN110956599A (en) | Picture processing method and device, storage medium and electronic device | |
CN116486495A (en) | Attention and generation countermeasure network-based face image privacy protection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |