CN115546295B - Target 6D gesture estimation model training method and target 6D gesture estimation method - Google Patents

Target 6D gesture estimation model training method and target 6D gesture estimation method Download PDF

Info

Publication number
CN115546295B
CN115546295B CN202211030694.4A CN202211030694A CN115546295B CN 115546295 B CN115546295 B CN 115546295B CN 202211030694 A CN202211030694 A CN 202211030694A CN 115546295 B CN115546295 B CN 115546295B
Authority
CN
China
Prior art keywords
coordinate
key point
training
target
regressor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211030694.4A
Other languages
Chinese (zh)
Other versions
CN115546295A (en
Inventor
彭进业
寇希栋
赵万青
张少博
彭先霖
汪霖
张晓丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NORTHWEST UNIVERSITY
Original Assignee
NORTHWEST UNIVERSITY
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NORTHWEST UNIVERSITY filed Critical NORTHWEST UNIVERSITY
Priority to CN202211030694.4A priority Critical patent/CN115546295B/en
Publication of CN115546295A publication Critical patent/CN115546295A/en
Application granted granted Critical
Publication of CN115546295B publication Critical patent/CN115546295B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a training method of a target 6D attitude estimation model, which comprises the following steps: training the target 6D gesture estimation model based on the source domain training image to update parameters in the target 6D gesture estimation model to obtain a model after one training; adding an anti-regression device into the model after primary training to form a migration training model; training the migration training model based on the source domain training image and the target domain training image to update parameters in the migration training model, so as to obtain a trained migration training model; and determining a final trained target 6D attitude estimation model based on the trained migration training model. The training method of the target 6D attitude estimation model has at least one of the following beneficial technical effects: the method combines basic training and migration training, and adopts an anti-regression mode in the migration training, so that the estimation performance of the target 6D attitude estimation model obtained by training is more accurate and reliable.

Description

Target 6D gesture estimation model training method and target 6D gesture estimation method
Technical Field
The application relates to the field of image processing, in particular to a target 6D gesture estimation model training method and a target 6D gesture estimation method.
Background
The purpose of target 6D pose estimation is to detect a target in a given image and estimate the position of the target, target 6D pose estimation is widely used in computer vision applications such as augmented reality, virtual reality, and unmanned, among others. The existing target 6D gesture estimation method is often used for training on a data set with real labels when a model is estimated, but in the actual application process, a large amount of training on the data set with labels can cause the model to be fitted on the training set, so that the performance of the model is reduced in the actual application scene.
Disclosure of Invention
In order to overcome at least one defect in the prior art, the embodiment of the application provides a target 6D gesture estimation model training method and a target 6D gesture estimation method.
In a first aspect, an embodiment of the present application provides a training method for a target 6D pose estimation model: comprising the following steps:
training the target 6D gesture estimation model based on the source domain training image to update parameters in the target 6D gesture estimation model to obtain a model after one training; the target 6D attitude estimation model comprises a feature extractor, a feature regressor and a proportional regressor;
adding an anti-regression device into the model after primary training to form a migration training model;
training the migration training model based on the source domain training image and the target domain training image to update parameters in the migration training model, so as to obtain a trained migration training model;
and determining a final trained target 6D attitude estimation model based on the trained migration training model.
In one embodiment, training the target 6D pose estimation model based on the source domain training image to update parameters in the target 6D pose estimation model, resulting in a once trained model, includes:
inputting the source domain training image into a feature extractor to obtain a feature map;
respectively inputting the feature images into a feature regressor and a proportional regressor to respectively obtain coordinate thermodynamic diagrams and proportional thermodynamic diagrams corresponding to a plurality of key points of the source domain training image;
obtaining a first loss function according to the coordinate thermodynamic diagram and the proportional thermodynamic diagram corresponding to the key points;
and updating parameters of the feature extractor, the feature regressor and the proportional regressor based on the first loss function to obtain a model after primary training.
In one embodiment, obtaining the first loss function according to the coordinate thermodynamic diagram and the proportional thermodynamic diagram corresponding to the plurality of key points comprises:
determining a coordinate loss function of each key point according to the coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;
accumulating the key point coordinate loss functions corresponding to all the key points to obtain accumulated values of the key point coordinate loss functions;
determining a scale factor corresponding to each key point according to the scale thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;
determining a key point scale factor loss function according to the scale factor corresponding to each key point and the real scale factor corresponding to each key point;
and determining a first loss function according to the accumulated value of the key point coordinate loss functions and the key point scale factor loss function.
In one embodiment, training the migration training model based on the source domain training image and the target domain training image to update parameters in the migration training model to obtain a trained migration training model, including:
inputting the source domain training image into a migration training model for training, and updating parameters in a feature extractor, a proportional regressor, a feature regressor and an contrast regressor to obtain a primary migration training model;
inputting the target domain training image into a primary migration training model for training, and updating parameters in an anti-regression device to obtain a secondary migration training model;
and inputting the target domain training image into the secondary migration training model for training, and updating parameters in the feature extractor to obtain a trained migration training model.
In one embodiment, inputting the source domain training image into the migration training model for training, updating parameters in the feature extractor, the proportion regressor, the feature regressor and the contrast regressor, and obtaining a primary migration training model, which comprises the following steps:
inputting the source domain training image into a feature extractor to obtain a feature map;
the feature map is respectively input into a feature regressor, a proportional regressor and an contrast regressor to respectively obtain a coordinate thermodynamic diagram, a proportional thermodynamic diagram and a contrast coordinate thermodynamic diagram corresponding to a plurality of key points of the source domain training image;
obtaining a second loss function according to the coordinate thermodynamic diagram, the proportional thermodynamic diagram and the counter coordinate thermodynamic diagram corresponding to the key points;
and updating parameters of the feature extractor, the feature regressor, the proportional regressor and the contrast regressor based on the second loss function to obtain a primary migration training model.
In one embodiment, deriving the second loss function from the coordinate thermodynamic diagram, the proportional thermodynamic diagram, and the counter coordinate thermodynamic diagram corresponding to the plurality of keypoints includes:
determining a coordinate loss function of each key point according to the coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;
accumulating the key point coordinate loss functions corresponding to all the key points to obtain accumulated values of the key point coordinate loss functions;
determining a scale factor corresponding to each key point according to the scale thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;
determining a key point scale factor loss function according to the scale factor corresponding to each key point and the real scale factor corresponding to each key point;
determining an antagonism key point coordinate loss function corresponding to each key point according to the antagonism coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;
accumulating the coordinate loss functions of the countermeasure key points corresponding to all the key points to obtain accumulated values of the coordinate loss functions of the countermeasure key points;
and determining a second loss function according to the accumulated value of the key point coordinate loss function, the key point scale factor loss function and the accumulated value of the anti-key point coordinate loss function.
In one embodiment, inputting the target domain training image into the primary migration training model for training, updating parameters in the countermeasure regressor, and obtaining a secondary migration training model, including:
inputting the target domain training image into a feature extractor to obtain a feature map;
respectively inputting the feature images into a feature regressor and an countermeasure regressor to respectively obtain coordinate thermodynamic diagrams and countermeasure coordinate thermodynamic diagrams corresponding to a plurality of key points of the target domain training image;
obtaining a third loss function according to the coordinate thermodynamic diagrams and the countermeasure coordinate thermodynamic diagrams corresponding to the key points;
and updating the parameters of the anti-regression device based on the third loss function to obtain a secondary migration training model.
In one embodiment, deriving the third loss function from the coordinate thermodynamic diagram and the opposing coordinate thermodynamic diagram for the plurality of keypoints includes:
determining a map of the coordinate thermodynamic diagram corresponding to the key points except the current key point aiming at each key point;
determining a key point coordinate loss function corresponding to the current key point according to the map corresponding to the current key point and the countermeasure coordinate thermodynamic diagram corresponding to the current key point;
and accumulating the key point coordinate loss functions corresponding to all the key points to obtain a third loss function.
In one embodiment, inputting the target domain training image into the secondary migration training model for training, updating parameters in the feature extractor, and obtaining a trained migration training model, including:
inputting the target domain training image into a feature extractor to obtain a feature map;
respectively inputting the feature images into a feature regressor and an countermeasure regressor to respectively obtain coordinate thermodynamic diagrams and countermeasure coordinate thermodynamic diagrams corresponding to a plurality of key points of the target domain training image;
obtaining a fourth loss function according to the coordinate thermodynamic diagrams and the countermeasure coordinate thermodynamic diagrams corresponding to the key points;
and updating parameters of the feature extractor based on the fourth loss function to obtain a trained migration training model.
In a second aspect, an embodiment of the present application provides a target 6D pose estimation method, including:
inputting the target image into a feature extractor of a target 6D gesture estimation model to obtain a feature map, wherein the target 6D gesture estimation model comprises the feature extractor, a feature regressor and a proportional regressor;
respectively inputting the feature images into a feature regressor and a proportional regressor to respectively obtain coordinate thermodynamic diagrams and proportional thermodynamic diagrams corresponding to a plurality of key points of the target image;
determining the coordinates of the key points corresponding to each key point according to the coordinate thermodynamic diagrams corresponding to the key points;
calculating a scale factor corresponding to each key point according to the coordinate thermodynamic diagram and the scale thermodynamic diagram corresponding to the key points;
determining the three-dimensional coordinates of each key point of the target according to the key point coordinates corresponding to each key point and the scale factors corresponding to each key point;
obtaining a 6D gesture of the target according to the three-dimensional coordinates of each key point of the target and the three-dimensional coordinates of the key points of the three-dimensional model of the target;
the target 6D gesture estimation model is obtained by applying the target 6D gesture estimation model training method.
Compared with the prior art, the application has the following beneficial effects: the method is characterized in that a mode of combining basic training and migration training is adopted, an anti-regression mode is adopted in the migration training, the accuracy of three regressors in a source domain is guaranteed by carrying out migration training on a model based on a source domain training image, the inaccurate prediction effect of the anti-regression in a target domain is guaranteed by carrying out migration training on the model based on a target domain training image, the prediction of a feature regressor is further far away from the prediction of the anti-regression as far as possible, namely, the feature regressor is correctly predicted, and therefore the estimation performance of a target 6D gesture estimation model obtained through training is more accurate and reliable.
Drawings
The application may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are incorporated in and form a part of this specification, together with the following detailed description. In the drawings:
FIG. 1 shows a block flow diagram of a target 6D pose estimation model training method according to an embodiment of the application;
FIG. 2 shows a block flow diagram of a target 6D pose estimation method according to an embodiment of the application.
Detailed Description
Exemplary embodiments of the present application will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual embodiment are described in the specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions may be made to achieve the developers' specific goals, and that these decisions may vary from one implementation to another.
It should be noted here that, in order to avoid obscuring the present application due to unnecessary details, only the device structures closely related to the solution according to the present application are shown in the drawings, and other details not greatly related to the present application are omitted.
It is to be understood that the application is not limited to the described embodiments, as a result of the following description with reference to the drawings. In this context, embodiments may be combined with each other, features replaced or borrowed between different embodiments, one or more features omitted in one embodiment, where possible.
The 6D pose estimation of the object refers to detecting the object in a given image and estimating the position of the object, and the 6D pose refers to 6 degrees of freedom including displacement of 3 degrees of freedom and spatial rotation of 3 degrees of freedom. The application provides a target 6D gesture estimation method aiming at RGB images, which is used for estimating the target 6D gesture based on a target 6D gesture estimation model, wherein the target 6D gesture estimation model is firstly trained, the training process comprises a basic training stage and a migration training stage, and the target 6D gesture estimation model obtained by adopting the training process can greatly improve the accuracy of target 6D gesture estimation.
FIG. 1 shows a block flow diagram of a target 6D pose estimation model training method according to an embodiment of the application. The method starts with step S110, training a target 6D gesture estimation model based on a source domain training image to update parameters in the target 6D gesture estimation model, and obtaining a model after one training; the target 6D pose estimation model includes a feature extractor, a feature regressor, and a scale regressor. Here, the training image is acquired based on a Linemod data set, where the Linemod data set includes a real image and a synthetic image, where the real image may be, for example, an image obtained by capturing a target with an image capturing device such as a camera, and the synthetic image may be, for example, an image synthesized with computer software based on a three-dimensional model of the target. The source domain training image may be a composite image in the Linemod dataset. Here, the target 6D pose estimation model is trained based on the source domain training image to update parameters in the entire target 6D pose estimation model, i.e., to update parameters in the feature extractor, the feature regressor, and the scale regressor. This step is the basic training phase.
Then, in step S120, an countermeasure regression is added to the model after the primary training to form a migration training model; here, the migration training model includes a feature extractor, a feature regressor, and a scale regressor in the model after one training, and an incremental countermeasure regressor. The feature regressor and the countermeasure regressor are identical in structure, and only parameters are different.
Then, in step S130, the migration training model is trained based on the source domain training image and the target domain training image to update parameters in the migration training model, so as to obtain a trained migration training model; here, the target domain training image may be a real image in the Linemod dataset.
Then, in step S140, a final trained target 6D pose estimation model is determined based on the trained migration training model. Here, the trained migration training model includes a feature extractor, a feature regressor, a proportional regressor, and an countermeasure regressor, and the final trained target 6D pose estimation model is obtained according to the trained migration training model, and includes only the feature extractor, the feature regressor, and the proportional regressor in the trained migration training model.
In the embodiment, in the process of training the target 6D attitude estimation model, a mode of combining basic training and migration training is adopted, an anti-regression mode is adopted in the migration training, the model is trained based on a source domain training image to ensure the accuracy of three regressors in a source domain, the model is trained based on a target domain training image to ensure the inaccurate prediction effect of the anti-regression in the target domain, and further the prediction of the feature regressor and the prediction of the anti-regression are kept away as far as possible, namely, the feature regressor is correctly predicted, so that the estimation performance of the target 6D attitude estimation model obtained through training is more accurate and reliable.
In one embodiment, training the target 6D pose estimation model based on the source domain training image to update parameters in the target 6D pose estimation model, resulting in a once trained model, includes:
inputting the source domain training image into a feature extractor to obtain a feature map;
respectively inputting the feature images into a feature regressor and a proportional regressor to respectively obtain coordinate thermodynamic diagrams and proportional thermodynamic diagrams corresponding to a plurality of key points of the source domain training image; here, the source domain training image has a plurality of key points, which can be obtained according to the Linemod data set, and after the feature map is input into the feature regressor and the proportional regressor, a coordinate thermodynamic diagram and a proportional thermodynamic diagram are correspondingly obtained for each key point. Here, the element value of the element on the coordinate thermodynamic diagram reflects the possibility that the current key point may have the position of the element, and the position of the element with the largest element value is the position of the current key point; the magnitude of the element value of an element in the proportional thermodynamic diagram represents the magnitude of the scaling factor when the current keypoint is at the location of the element.
Obtaining a first loss function according to the coordinate thermodynamic diagram and the proportional thermodynamic diagram corresponding to the key points; here, the first loss function is used to train the model and update the model parameters, and the model obtained by training the first loss function in this embodiment can ensure the accuracy of predicting the positions of the key points.
And updating parameters of the feature extractor, the feature regressor and the proportional regressor based on the first loss function to obtain a model after primary training.
In one embodiment, obtaining the first loss function according to the coordinate thermodynamic diagram and the proportional thermodynamic diagram corresponding to the plurality of key points comprises:
step S210, determining a key point coordinate loss function loss corresponding to each key point according to the coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point uv ′;
Here, the real thermodynamic diagram corresponding to each key point may be obtained from the Linemod dataset;
the key point coordinate loss function loss corresponding to each key point uv ' the following formula can be used:
wherein x is i The element value of the ith element in the coordinate thermodynamic diagram corresponding to the key point, y i And N is the number of elements in the coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to the key point.
Step S220, accumulating the key point coordinate loss functions corresponding to all the key points to obtain the accumulated value loss of the key point coordinate loss functions uv
Step S230, determining a scale factor corresponding to each key point according to the scale thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;
in one implementation, the scaling factor for each keypoint may be determined using the following method:
determining a probability distribution map according to the real thermodynamic diagram corresponding to the key points; here, the true thermodynamic diagram corresponding to the key point is processed by using softmax to obtain a pixel-level probability distribution diagram;
and multiplying the element value of each element in the proportional thermodynamic diagram corresponding to the key point by the element value of the corresponding element in the probability distribution diagram, and accumulating to obtain the proportional factor corresponding to the key point.
Determining a key point scale factor loss function loss_s according to the scale factor corresponding to each key point and the real scale factor corresponding to each key point; here, the true scale factor corresponding to each key point may be obtained from the Linemod dataset;
the key point scale factor loss function loss_s can be determined by the following formula:
wherein s is j Is the scale factor corresponding to the jth keypoint,and M is the number of the key points, wherein the real scale factor corresponds to the jth key point.
Step S240, according to the accumulated value loss of the key point coordinate loss function uv And a keypoint scale factor loss function loss_s, determining a first loss function loss1.
Here, the first loss function loss1 may be calculated using the following formula:
loss 1=αloss uv +loss_s
where α is a constant and can be set to 10.
In one embodiment, training the migration training model based on the source domain training image and the target domain training image to update parameters in the migration training model to obtain a trained migration training model, including:
step S310, inputting the source domain training image into a migration training model for training, and updating parameters in a feature extractor, a proportion regressor, a feature regressor and an countermeasure regressor to obtain a primary migration training model; in this step, the parameters of the whole network are updated during the training process so that the predictions of the scale regressor Z, the feature regressor H and the contrast regressor H' are as small as possible in the source domain. In the step, the source domain training image is used for training the model, so that the accuracy of the three regressors in the source domain is ensured.
In one implementation, step 310 may specifically include:
inputting the source domain training image into a feature extractor to obtain a feature map;
the feature map is respectively input into a feature regressor, a proportional regressor and an contrast regressor to respectively obtain a coordinate thermodynamic diagram, a proportional thermodynamic diagram and a contrast coordinate thermodynamic diagram corresponding to a plurality of key points of the source domain training image;
obtaining a second loss function according to the coordinate thermodynamic diagram, the proportional thermodynamic diagram and the counter coordinate thermodynamic diagram corresponding to the key points;
and updating parameters of the feature extractor, the feature regressor, the proportional regressor and the contrast regressor based on the second loss function to obtain a primary migration training model.
Step S320, inputting the target domain training image into the primary migration training model for training, and updating the parameters in the counterregression to obtain a secondary migration training model; in the step, the model is trained by adopting the target domain training image, only the parameters of the contrast regressor are updated, the inaccurate effect of the contrast regressor in the target domain measurement is ensured, and the prediction of the characteristic regressor is further far away from the prediction of the contrast regressor as far as possible, namely the characteristic regressor is correctly predicted.
In one implementation, step S320 may specifically include:
inputting the target domain training image into a feature extractor to obtain a feature map;
respectively inputting the feature images into a feature regressor and an countermeasure regressor to respectively obtain coordinate thermodynamic diagrams and countermeasure coordinate thermodynamic diagrams corresponding to a plurality of key points of the target domain training image;
obtaining a third loss function according to the coordinate thermodynamic diagrams and the countermeasure coordinate thermodynamic diagrams corresponding to the key points;
and updating the parameters of the anti-regression device based on the third loss function to obtain a secondary migration training model.
And step S330, inputting the target domain training image into the secondary migration training model for training, and updating parameters in the feature extractor to obtain a trained migration training model. In this step, only the feature extractor is updated, and the target domain training image is used in order to limit the error effect generated in step S320 to the anti-regression device without affecting the feature extractor.
In one implementation, step 330 may specifically include:
inputting the target domain training image into a feature extractor to obtain a feature map;
respectively inputting the feature images into a feature regressor and an countermeasure regressor to respectively obtain coordinate thermodynamic diagrams and countermeasure coordinate thermodynamic diagrams corresponding to a plurality of key points of the target domain training image;
obtaining a fourth loss function according to the coordinate thermodynamic diagrams and the countermeasure coordinate thermodynamic diagrams corresponding to the key points;
and updating parameters of the feature extractor based on the fourth loss function to obtain a trained migration training model.
In one embodiment, deriving the second loss function from the coordinate thermodynamic diagram, the proportional thermodynamic diagram, and the counter coordinate thermodynamic diagram corresponding to the plurality of keypoints may include:
step S410, determining a key point coordinate loss function loss corresponding to each key point according to the coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point uv1 ′;
Here, the key point coordinate loss function loss corresponding to each key point uv1 ' the following formula can be used:
wherein x is 1i The element value of the ith element in the coordinate thermodynamic diagram corresponding to the key point, y 1i The element value of the ith element in the real thermodynamic diagram corresponding to the key point is represented by NThe corresponding coordinate thermodynamic diagram and the number of elements in the true thermodynamic diagram.
Step S420, accumulating the key point coordinate loss functions corresponding to all the key points to obtain the accumulated value loss of the key point coordinate loss functions uv1
Step S430, determining a scale factor corresponding to each key point according to the scale thermodynamic diagram and the real thermodynamic diagram corresponding to each key point; the manner in which the scale factors are specifically determined is consistent with the foregoing description and will not be described in detail herein;
step S440, determining a key point scale factor loss function according to the scale factor corresponding to each key point and the real scale factor corresponding to each key point; the key point scale factor loss function loss_s1 can be determined by the following formula:
wherein s is 1j Is the scale factor corresponding to the jth keypoint,and M is the number of the key points, wherein the real scale factor corresponds to the jth key point.
Step S450, determining the antagonism key point coordinate loss function loss corresponding to each key point according to the antagonism coordinate thermodynamic diagram and the true thermodynamic diagram corresponding to each key point uv1_adv 'A'; here, the antagonism key point coordinate loss function loss corresponding to each key point uv1_adv ' the following formula can be used:
wherein x is adv1i For the element value, y of the ith element in the antagonism coordinate thermodynamic diagram corresponding to the key point 1i For the element value of the ith element in the real thermodynamic diagram corresponding to the key point, N is the element in the countermeasure coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to the key pointNumber of elements.
Step S460, accumulating the antagonism key point coordinate loss functions corresponding to all the key points to obtain the accumulated value loss of the antagonism key point coordinate loss functions uv1_adv
Step S470, according to the accumulated value loss of the key point coordinate loss function uv1 Key point scale factor loss function loss_s1 and anti-key point coordinate loss function accumulated value loss uv1_adv Determining a second loss function loss2;
here, the second loss function loss2 may be determined using the following formula:
loss2=loss uv1 +βloss uv1_adv +γloss_s1
where β may be set to 1 and γ may be set to 20.
In one embodiment, deriving the third loss function from the coordinate thermodynamic diagram and the opposing coordinate thermodynamic diagram for the plurality of keypoints includes:
determining a map of the coordinate thermodynamic diagram corresponding to the key points except the current key point aiming at each key point;
determining a key point coordinate loss function corresponding to the current key point according to the map corresponding to the current key point and the countermeasure coordinate thermodynamic diagram corresponding to the current key point;
here, the key point coordinate loss function loss corresponding to each key point uv2 ' the following formula can be used:
wherein x is adv2i For the element value, y of the ith element in the antagonism coordinate thermodynamic diagram corresponding to the key point 2i And N is the coordinate thermodynamic diagram corresponding to the key point and the number of elements in the map.
And accumulating the key point coordinate loss functions corresponding to all the key points to obtain a third loss function loss3.
In one embodiment, deriving the fourth loss function from the coordinate thermodynamic diagram and the opposing coordinate thermodynamic diagram for the plurality of keypoints comprises:
determining a coordinate loss function of each key point according to the coordinate thermodynamic diagram and the countermeasure coordinate thermodynamic diagram corresponding to each key point;
here, the key point coordinate loss function loss corresponding to each key point uv3 ' the following formula can be used:
wherein x is 2i The element value of the ith element in the coordinate thermodynamic diagram corresponding to the key point, y adv2i And N is the coordinate thermodynamic diagram corresponding to the key point and the number of elements in the antagonism coordinate thermodynamic diagram.
And accumulating the key point coordinate loss functions corresponding to all the key points to obtain a fourth loss function loss4.
Fig. 2 shows a flow diagram of a target 6D pose estimation method according to an embodiment of the application, the target 6D pose estimation method comprising:
step S510, inputting the target image into a feature extractor of a target 6D gesture estimation model to obtain a feature map; here, the target image may be an image obtained by photographing the target with an image pickup device such as a camera; the target 6D gesture estimation model is obtained by applying the target 6D gesture estimation model training method in the embodiment, and the target 6D gesture estimation model comprises a feature extractor, a feature regressor and a proportional regressor;
step S520, respectively inputting the feature map into a feature regressor and a proportion regressor of the target 6D gesture estimation model to respectively obtain a coordinate thermodynamic diagram and a proportion thermodynamic diagram corresponding to a plurality of key points of the target image; here, the keypoints of the target image are generally selected to be more conspicuous or landmark on the target surface, where the keypoints of the target image may be determined according to the Linemod dataset.
Step S530, determining the coordinates (u, v) of the key points corresponding to each key point according to the coordinate thermodynamic diagrams corresponding to the key points; here, the coordinates of the element with the largest element value in the coordinate thermodynamic diagram corresponding to the key point can be determined as the key point coordinates;
step S540, calculating a scale factor S corresponding to each key point according to the coordinate thermodynamic diagrams and the scale thermodynamic diagrams corresponding to the key points;
in one implementation, the scaling factor for each keypoint may be determined using the following method:
determining a probability distribution map according to the coordinate thermodynamic diagram corresponding to the key points; here, the coordinate thermodynamic diagram corresponding to the key point is processed by using softmax to obtain a pixel-level probability distribution diagram;
and multiplying the element value of each element in the proportional thermodynamic diagram corresponding to the key point by the element value of the corresponding element in the probability distribution diagram, and accumulating to obtain the proportional factor corresponding to the key point.
Step S550, determining the three-dimensional coordinates of each key point of the target according to the key point coordinates (u, v) corresponding to each key point and the scale factor S corresponding to each key point;
here, the calculated u, v and s are calculated by the formulaAnd obtaining a three-dimensional coordinate xyz of the target object in the image under a camera coordinate system, wherein K is an internal reference of the camera.
Step S560, obtaining the 6D gesture of the target according to the three-dimensional coordinates of each key point of the target and the three-dimensional coordinates of the key points of the three-dimensional model of the target; here, the three-dimensional coordinates of the key points of the target three-dimensional model may be obtained from the Linemod dataset;
here, the three-dimensional coordinates of each key point of the target and the three-dimensional coordinates of the key points of the three-dimensional model of the target are processed based on a least square method (LeastSquare), so that the 6D pose of the target can be obtained.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above description is merely illustrative of various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present application, and the application is intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A method for training a 6D pose estimation model of a target, comprising:
training a target 6D gesture estimation model based on a source domain training image to update parameters in the target 6D gesture estimation model to obtain a model after one training; the target 6D attitude estimation model comprises a feature extractor, a feature regressor and a proportional regressor;
adding an anti-regression device in the model after primary training to form a migration training model;
training the migration training model based on the source domain training image and the target domain training image to update parameters in the migration training model, so as to obtain a trained migration training model;
determining a final trained target 6D attitude estimation model based on the trained migration training model;
training the target 6D gesture estimation model based on the source domain training image to update parameters in the target 6D gesture estimation model to obtain a once trained model, wherein the training comprises the following steps:
inputting the source domain training image to the feature extractor to obtain a feature map;
inputting the feature map into the feature regressor and the proportional regressor respectively to obtain coordinate thermodynamic diagrams and proportional thermodynamic diagrams corresponding to a plurality of key points of the source domain training image respectively;
obtaining a first loss function according to the coordinate thermodynamic diagram and the proportional thermodynamic diagram corresponding to the key points;
and updating parameters of the feature extractor, the feature regressor and the proportional regressor based on the first loss function to obtain the model after one training.
2. The method of claim 1, wherein the obtaining a first loss function from the coordinate thermodynamic diagrams and the proportional thermodynamic diagrams corresponding to the plurality of keypoints comprises:
determining a key point coordinate loss function corresponding to each key point according to the coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;
accumulating the key point coordinate loss functions corresponding to all the key points to obtain accumulated values of the key point coordinate loss functions;
determining a scale factor corresponding to each key point according to the scale thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;
determining a key point scale factor loss function according to the scale factors corresponding to each key point and the real scale factors corresponding to each key point;
and determining the first loss function according to the accumulated value of the key point coordinate loss function and the key point scale factor loss function.
3. The method of claim 1, wherein training the migration training model based on the source domain training image and the target domain training image to update parameters in the migration training model results in a trained migration training model, comprising:
inputting the source domain training image into the migration training model for training, and updating parameters in the feature extractor, the proportional regressor, the feature regressor and the contrast regressor to obtain a primary migration training model;
inputting the target domain training image into the primary migration training model for training, and updating parameters in the countermeasure regression to obtain a secondary migration training model;
and inputting the target domain training image into the secondary migration training model for training, and updating parameters in the feature extractor to obtain a trained migration training model.
4. The method of claim 3, wherein said inputting the source domain training image into the migration training model for training, updating parameters in the feature extractor, the proportional regressor, the feature regressor, and the contrast regressor, results in a primary migration training model, comprising:
inputting the source domain training image to the feature extractor to obtain a feature map;
inputting the feature map to the feature regressor, the proportional regressor and the contrast regressor respectively to obtain coordinate thermodynamic diagrams, proportional thermodynamic diagrams and contrast coordinate thermodynamic diagrams corresponding to a plurality of key points of the source domain training image respectively;
obtaining a second loss function according to the coordinate thermodynamic diagram, the proportional thermodynamic diagram and the countermeasure coordinate thermodynamic diagram corresponding to the key points;
and updating parameters of the feature extractor, the feature regressor, the proportional regressor and the contrast regressor based on the second loss function to obtain the one-time migration training model.
5. The method of claim 4, wherein deriving the second loss function from the coordinate thermodynamic diagram, the proportional thermodynamic diagram, and the counter coordinate thermodynamic diagram for the plurality of keypoints comprises:
determining a key point coordinate loss function corresponding to each key point according to the coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;
accumulating the key point coordinate loss functions corresponding to all the key points to obtain accumulated values of the key point coordinate loss functions;
determining a scale factor corresponding to each key point according to the scale thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;
determining a key point scale factor loss function according to the scale factors corresponding to each key point and the real scale factors corresponding to each key point;
determining an antagonism key point coordinate loss function corresponding to each key point according to the antagonism coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;
accumulating the coordinate loss functions of the countermeasure key points corresponding to all the key points to obtain accumulated values of the coordinate loss functions of the countermeasure key points;
and determining the second loss function according to the accumulated value of the key point coordinate loss function, the key point scale factor loss function and the accumulated value of the antagonism key point coordinate loss function.
6. The method of claim 3, wherein said inputting the target domain training image into the primary migration training model for training, updating parameters in the anti-regressive, to obtain a secondary migration training model, comprises:
inputting the target domain training image to the feature extractor to obtain a feature map;
inputting the feature map into the feature regressor and the countermeasure regressor respectively to obtain coordinate thermodynamic diagrams and countermeasure coordinate thermodynamic diagrams corresponding to a plurality of key points of the target domain training image respectively;
obtaining a third loss function according to the coordinate thermodynamic diagrams and the countermeasure coordinate thermodynamic diagrams corresponding to the key points;
and updating the parameters of the anti-regression based on the third loss function to obtain the secondary migration training model.
7. The method of claim 6, wherein deriving the third loss function from the coordinate thermodynamic diagrams and the opposing coordinate thermodynamic diagrams for the plurality of keypoints comprises:
determining a map of a coordinate thermodynamic diagram corresponding to the key points except the current key point aiming at each key point;
determining a key point coordinate loss function corresponding to the current key point according to the map corresponding to the current key point and the countermeasure coordinate thermodynamic diagram corresponding to the current key point;
and accumulating the key point coordinate loss functions corresponding to all the key points to obtain the third loss function.
8. The method of claim 3, wherein inputting the target domain training image into the secondary migration training model for training, updating parameters in the feature extractor, and obtaining a trained migration training model comprises:
inputting the target domain training image to the feature extractor to obtain a feature map;
inputting the feature map into the feature regressor and the countermeasure regressor respectively to obtain coordinate thermodynamic diagrams and countermeasure coordinate thermodynamic diagrams corresponding to a plurality of key points of the target domain training image respectively;
obtaining a fourth loss function according to the coordinate thermodynamic diagrams and the countermeasure coordinate thermodynamic diagrams corresponding to the key points;
and updating parameters of the feature extractor based on the fourth loss function to obtain the trained migration training model.
9. A method for estimating 6D pose of a target, comprising:
inputting a target image into a feature extractor of a target 6D gesture estimation model to obtain a feature map, wherein the target 6D gesture estimation model comprises the feature extractor, a feature regressor and a proportional regressor;
inputting the feature map into the feature regressor and the proportional regressor respectively to obtain coordinate thermodynamic diagrams and proportional thermodynamic diagrams corresponding to a plurality of key points of the target image respectively;
determining the coordinates of the key points corresponding to each key point according to the coordinate thermodynamic diagrams corresponding to the key points;
calculating a scale factor corresponding to each key point according to the coordinate thermodynamic diagram and the scale thermodynamic diagram corresponding to the key points;
determining the three-dimensional coordinates of each key point of the target according to the key point coordinates corresponding to each key point and the scale factors corresponding to each key point;
obtaining a 6D gesture of the target according to the three-dimensional coordinates of each key point of the target and the three-dimensional coordinates of the key points of the three-dimensional model of the target;
the object 6D pose estimation model being obtained by applying the method of any of claims 1-8.
CN202211030694.4A 2022-08-26 2022-08-26 Target 6D gesture estimation model training method and target 6D gesture estimation method Active CN115546295B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211030694.4A CN115546295B (en) 2022-08-26 2022-08-26 Target 6D gesture estimation model training method and target 6D gesture estimation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211030694.4A CN115546295B (en) 2022-08-26 2022-08-26 Target 6D gesture estimation model training method and target 6D gesture estimation method

Publications (2)

Publication Number Publication Date
CN115546295A CN115546295A (en) 2022-12-30
CN115546295B true CN115546295B (en) 2023-11-07

Family

ID=84726482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211030694.4A Active CN115546295B (en) 2022-08-26 2022-08-26 Target 6D gesture estimation model training method and target 6D gesture estimation method

Country Status (1)

Country Link
CN (1) CN115546295B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063301A (en) * 2018-07-24 2018-12-21 杭州师范大学 Gestures of object estimation method in a kind of single image room based on thermodynamic chart
CN113095129A (en) * 2021-03-01 2021-07-09 北京迈格威科技有限公司 Attitude estimation model training method, attitude estimation device and electronic equipment
CN113283598A (en) * 2021-06-11 2021-08-20 清华大学 Model training method and device, storage medium and electronic equipment
CN114742890A (en) * 2022-03-16 2022-07-12 西北大学 6D attitude estimation data set migration method based on image content and style decoupling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063301A (en) * 2018-07-24 2018-12-21 杭州师范大学 Gestures of object estimation method in a kind of single image room based on thermodynamic chart
CN113095129A (en) * 2021-03-01 2021-07-09 北京迈格威科技有限公司 Attitude estimation model training method, attitude estimation device and electronic equipment
CN113283598A (en) * 2021-06-11 2021-08-20 清华大学 Model training method and device, storage medium and electronic equipment
CN114742890A (en) * 2022-03-16 2022-07-12 西北大学 6D attitude estimation data set migration method based on image content and style decoupling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Reegressive Domain Adaptation for Unsupervised Keypoint Detection";Junguang Jiang etc.;《arXiv:2103.06175v2[cs.CV]》;论文第3节 *

Also Published As

Publication number Publication date
CN115546295A (en) 2022-12-30

Similar Documents

Publication Publication Date Title
CN108955718B (en) Visual odometer and positioning method thereof, robot and storage medium
EP3373248A1 (en) Method, control device, and system for tracking and photographing target
US8693785B2 (en) Image matching devices and image matching methods thereof
JP5450619B2 (en) Alignment of street level image with 3D building model
CN110853033B (en) Video detection method and device based on inter-frame similarity
US10366504B2 (en) Image processing apparatus and image processing method for performing three-dimensional reconstruction of plurality of images
US20150138322A1 (en) Image processing device and its control method, imaging apparatus, and storage medium
US20120114175A1 (en) Object pose recognition apparatus and object pose recognition method using the same
CN111667001B (en) Target re-identification method, device, computer equipment and storage medium
CN110246160B (en) Video target detection method, device, equipment and medium
CN111512317A (en) Multi-target real-time tracking method and device and electronic equipment
JP5774226B2 (en) Resolving ambiguity of homography decomposition based on orientation sensor
CN113077516B (en) Pose determining method and related equipment
CN111666922A (en) Video matching method and device, computer equipment and storage medium
CN110648363A (en) Camera posture determining method and device, storage medium and electronic equipment
CN109934873B (en) Method, device and equipment for acquiring marked image
CN110049309A (en) The Detection of Stability method and apparatus of picture frame in video flowing
CN113223064B (en) Visual inertial odometer scale estimation method and device
CN110111341B (en) Image foreground obtaining method, device and equipment
WO2012133371A1 (en) Image capture position and image capture direction estimation device, image capture device, image capture position and image capture direction estimation method and program
CN110766077A (en) Method, device and equipment for screening sketch in evidence chain image
CN113763466B (en) Loop detection method and device, electronic equipment and storage medium
JP6713422B2 (en) Learning device, event detection device, learning method, event detection method, program
JP2010020602A (en) Image matching device and camera
CN111612827B (en) Target position determining method and device based on multiple cameras and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant