CN115546295B

CN115546295B - Target 6D gesture estimation model training method and target 6D gesture estimation method

Info

Publication number: CN115546295B
Application number: CN202211030694.4A
Authority: CN
Inventors: 彭进业; 寇希栋; 赵万青; 张少博; 彭先霖; 汪霖; 张晓丹
Original assignee: NORTHWEST UNIVERSITY
Current assignee: NORTHWEST UNIVERSITY
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2023-11-07
Anticipated expiration: 2042-08-26
Also published as: CN115546295A

Abstract

The application relates to a training method of a target 6D attitude estimation model, which comprises the following steps: training the target 6D gesture estimation model based on the source domain training image to update parameters in the target 6D gesture estimation model to obtain a model after one training; adding an anti-regression device into the model after primary training to form a migration training model; training the migration training model based on the source domain training image and the target domain training image to update parameters in the migration training model, so as to obtain a trained migration training model; and determining a final trained target 6D attitude estimation model based on the trained migration training model. The training method of the target 6D attitude estimation model has at least one of the following beneficial technical effects: the method combines basic training and migration training, and adopts an anti-regression mode in the migration training, so that the estimation performance of the target 6D attitude estimation model obtained by training is more accurate and reliable.

Description

Target 6D gesture estimation model training method and target 6D gesture estimation method

Technical Field

The application relates to the field of image processing, in particular to a target 6D gesture estimation model training method and a target 6D gesture estimation method.

Background

The purpose of target 6D pose estimation is to detect a target in a given image and estimate the position of the target, target 6D pose estimation is widely used in computer vision applications such as augmented reality, virtual reality, and unmanned, among others. The existing target 6D gesture estimation method is often used for training on a data set with real labels when a model is estimated, but in the actual application process, a large amount of training on the data set with labels can cause the model to be fitted on the training set, so that the performance of the model is reduced in the actual application scene.

Disclosure of Invention

In order to overcome at least one defect in the prior art, the embodiment of the application provides a target 6D gesture estimation model training method and a target 6D gesture estimation method.

In a first aspect, an embodiment of the present application provides a training method for a target 6D pose estimation model: comprising the following steps:

training the target 6D gesture estimation model based on the source domain training image to update parameters in the target 6D gesture estimation model to obtain a model after one training; the target 6D attitude estimation model comprises a feature extractor, a feature regressor and a proportional regressor;

adding an anti-regression device into the model after primary training to form a migration training model;

training the migration training model based on the source domain training image and the target domain training image to update parameters in the migration training model, so as to obtain a trained migration training model;

and determining a final trained target 6D attitude estimation model based on the trained migration training model.

In one embodiment, training the target 6D pose estimation model based on the source domain training image to update parameters in the target 6D pose estimation model, resulting in a once trained model, includes:

inputting the source domain training image into a feature extractor to obtain a feature map;

respectively inputting the feature images into a feature regressor and a proportional regressor to respectively obtain coordinate thermodynamic diagrams and proportional thermodynamic diagrams corresponding to a plurality of key points of the source domain training image;

obtaining a first loss function according to the coordinate thermodynamic diagram and the proportional thermodynamic diagram corresponding to the key points;

and updating parameters of the feature extractor, the feature regressor and the proportional regressor based on the first loss function to obtain a model after primary training.

In one embodiment, obtaining the first loss function according to the coordinate thermodynamic diagram and the proportional thermodynamic diagram corresponding to the plurality of key points comprises:

determining a coordinate loss function of each key point according to the coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;

accumulating the key point coordinate loss functions corresponding to all the key points to obtain accumulated values of the key point coordinate loss functions;

determining a scale factor corresponding to each key point according to the scale thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;

determining a key point scale factor loss function according to the scale factor corresponding to each key point and the real scale factor corresponding to each key point;

and determining a first loss function according to the accumulated value of the key point coordinate loss functions and the key point scale factor loss function.

In one embodiment, training the migration training model based on the source domain training image and the target domain training image to update parameters in the migration training model to obtain a trained migration training model, including:

inputting the source domain training image into a migration training model for training, and updating parameters in a feature extractor, a proportional regressor, a feature regressor and an contrast regressor to obtain a primary migration training model;

inputting the target domain training image into a primary migration training model for training, and updating parameters in an anti-regression device to obtain a secondary migration training model;

and inputting the target domain training image into the secondary migration training model for training, and updating parameters in the feature extractor to obtain a trained migration training model.

In one embodiment, inputting the source domain training image into the migration training model for training, updating parameters in the feature extractor, the proportion regressor, the feature regressor and the contrast regressor, and obtaining a primary migration training model, which comprises the following steps:

the feature map is respectively input into a feature regressor, a proportional regressor and an contrast regressor to respectively obtain a coordinate thermodynamic diagram, a proportional thermodynamic diagram and a contrast coordinate thermodynamic diagram corresponding to a plurality of key points of the source domain training image;

obtaining a second loss function according to the coordinate thermodynamic diagram, the proportional thermodynamic diagram and the counter coordinate thermodynamic diagram corresponding to the key points;

and updating parameters of the feature extractor, the feature regressor, the proportional regressor and the contrast regressor based on the second loss function to obtain a primary migration training model.

In one embodiment, deriving the second loss function from the coordinate thermodynamic diagram, the proportional thermodynamic diagram, and the counter coordinate thermodynamic diagram corresponding to the plurality of keypoints includes:

determining an antagonism key point coordinate loss function corresponding to each key point according to the antagonism coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;

accumulating the coordinate loss functions of the countermeasure key points corresponding to all the key points to obtain accumulated values of the coordinate loss functions of the countermeasure key points;

and determining a second loss function according to the accumulated value of the key point coordinate loss function, the key point scale factor loss function and the accumulated value of the anti-key point coordinate loss function.

In one embodiment, inputting the target domain training image into the primary migration training model for training, updating parameters in the countermeasure regressor, and obtaining a secondary migration training model, including:

inputting the target domain training image into a feature extractor to obtain a feature map;

respectively inputting the feature images into a feature regressor and an countermeasure regressor to respectively obtain coordinate thermodynamic diagrams and countermeasure coordinate thermodynamic diagrams corresponding to a plurality of key points of the target domain training image;

obtaining a third loss function according to the coordinate thermodynamic diagrams and the countermeasure coordinate thermodynamic diagrams corresponding to the key points;

and updating the parameters of the anti-regression device based on the third loss function to obtain a secondary migration training model.

In one embodiment, deriving the third loss function from the coordinate thermodynamic diagram and the opposing coordinate thermodynamic diagram for the plurality of keypoints includes:

determining a map of the coordinate thermodynamic diagram corresponding to the key points except the current key point aiming at each key point;

determining a key point coordinate loss function corresponding to the current key point according to the map corresponding to the current key point and the countermeasure coordinate thermodynamic diagram corresponding to the current key point;

and accumulating the key point coordinate loss functions corresponding to all the key points to obtain a third loss function.

In one embodiment, inputting the target domain training image into the secondary migration training model for training, updating parameters in the feature extractor, and obtaining a trained migration training model, including:

obtaining a fourth loss function according to the coordinate thermodynamic diagrams and the countermeasure coordinate thermodynamic diagrams corresponding to the key points;

and updating parameters of the feature extractor based on the fourth loss function to obtain a trained migration training model.

In a second aspect, an embodiment of the present application provides a target 6D pose estimation method, including:

inputting the target image into a feature extractor of a target 6D gesture estimation model to obtain a feature map, wherein the target 6D gesture estimation model comprises the feature extractor, a feature regressor and a proportional regressor;

respectively inputting the feature images into a feature regressor and a proportional regressor to respectively obtain coordinate thermodynamic diagrams and proportional thermodynamic diagrams corresponding to a plurality of key points of the target image;

determining the coordinates of the key points corresponding to each key point according to the coordinate thermodynamic diagrams corresponding to the key points;

calculating a scale factor corresponding to each key point according to the coordinate thermodynamic diagram and the scale thermodynamic diagram corresponding to the key points;

determining the three-dimensional coordinates of each key point of the target according to the key point coordinates corresponding to each key point and the scale factors corresponding to each key point;

obtaining a 6D gesture of the target according to the three-dimensional coordinates of each key point of the target and the three-dimensional coordinates of the key points of the three-dimensional model of the target;

the target 6D gesture estimation model is obtained by applying the target 6D gesture estimation model training method.

Compared with the prior art, the application has the following beneficial effects: the method is characterized in that a mode of combining basic training and migration training is adopted, an anti-regression mode is adopted in the migration training, the accuracy of three regressors in a source domain is guaranteed by carrying out migration training on a model based on a source domain training image, the inaccurate prediction effect of the anti-regression in a target domain is guaranteed by carrying out migration training on the model based on a target domain training image, the prediction of a feature regressor is further far away from the prediction of the anti-regression as far as possible, namely, the feature regressor is correctly predicted, and therefore the estimation performance of a target 6D gesture estimation model obtained through training is more accurate and reliable.

Drawings

The application may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are incorporated in and form a part of this specification, together with the following detailed description. In the drawings:

FIG. 1 shows a block flow diagram of a target 6D pose estimation model training method according to an embodiment of the application;

FIG. 2 shows a block flow diagram of a target 6D pose estimation method according to an embodiment of the application.

Detailed Description

Exemplary embodiments of the present application will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual embodiment are described in the specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions may be made to achieve the developers' specific goals, and that these decisions may vary from one implementation to another.

It should be noted here that, in order to avoid obscuring the present application due to unnecessary details, only the device structures closely related to the solution according to the present application are shown in the drawings, and other details not greatly related to the present application are omitted.

It is to be understood that the application is not limited to the described embodiments, as a result of the following description with reference to the drawings. In this context, embodiments may be combined with each other, features replaced or borrowed between different embodiments, one or more features omitted in one embodiment, where possible.

The 6D pose estimation of the object refers to detecting the object in a given image and estimating the position of the object, and the 6D pose refers to 6 degrees of freedom including displacement of 3 degrees of freedom and spatial rotation of 3 degrees of freedom. The application provides a target 6D gesture estimation method aiming at RGB images, which is used for estimating the target 6D gesture based on a target 6D gesture estimation model, wherein the target 6D gesture estimation model is firstly trained, the training process comprises a basic training stage and a migration training stage, and the target 6D gesture estimation model obtained by adopting the training process can greatly improve the accuracy of target 6D gesture estimation.

FIG. 1 shows a block flow diagram of a target 6D pose estimation model training method according to an embodiment of the application. The method starts with step S110, training a target 6D gesture estimation model based on a source domain training image to update parameters in the target 6D gesture estimation model, and obtaining a model after one training; the target 6D pose estimation model includes a feature extractor, a feature regressor, and a scale regressor. Here, the training image is acquired based on a Linemod data set, where the Linemod data set includes a real image and a synthetic image, where the real image may be, for example, an image obtained by capturing a target with an image capturing device such as a camera, and the synthetic image may be, for example, an image synthesized with computer software based on a three-dimensional model of the target. The source domain training image may be a composite image in the Linemod dataset. Here, the target 6D pose estimation model is trained based on the source domain training image to update parameters in the entire target 6D pose estimation model, i.e., to update parameters in the feature extractor, the feature regressor, and the scale regressor. This step is the basic training phase.

Then, in step S120, an countermeasure regression is added to the model after the primary training to form a migration training model; here, the migration training model includes a feature extractor, a feature regressor, and a scale regressor in the model after one training, and an incremental countermeasure regressor. The feature regressor and the countermeasure regressor are identical in structure, and only parameters are different.

Then, in step S130, the migration training model is trained based on the source domain training image and the target domain training image to update parameters in the migration training model, so as to obtain a trained migration training model; here, the target domain training image may be a real image in the Linemod dataset.

Then, in step S140, a final trained target 6D pose estimation model is determined based on the trained migration training model. Here, the trained migration training model includes a feature extractor, a feature regressor, a proportional regressor, and an countermeasure regressor, and the final trained target 6D pose estimation model is obtained according to the trained migration training model, and includes only the feature extractor, the feature regressor, and the proportional regressor in the trained migration training model.

In the embodiment, in the process of training the target 6D attitude estimation model, a mode of combining basic training and migration training is adopted, an anti-regression mode is adopted in the migration training, the model is trained based on a source domain training image to ensure the accuracy of three regressors in a source domain, the model is trained based on a target domain training image to ensure the inaccurate prediction effect of the anti-regression in the target domain, and further the prediction of the feature regressor and the prediction of the anti-regression are kept away as far as possible, namely, the feature regressor is correctly predicted, so that the estimation performance of the target 6D attitude estimation model obtained through training is more accurate and reliable.

respectively inputting the feature images into a feature regressor and a proportional regressor to respectively obtain coordinate thermodynamic diagrams and proportional thermodynamic diagrams corresponding to a plurality of key points of the source domain training image; here, the source domain training image has a plurality of key points, which can be obtained according to the Linemod data set, and after the feature map is input into the feature regressor and the proportional regressor, a coordinate thermodynamic diagram and a proportional thermodynamic diagram are correspondingly obtained for each key point. Here, the element value of the element on the coordinate thermodynamic diagram reflects the possibility that the current key point may have the position of the element, and the position of the element with the largest element value is the position of the current key point; the magnitude of the element value of an element in the proportional thermodynamic diagram represents the magnitude of the scaling factor when the current keypoint is at the location of the element.

Obtaining a first loss function according to the coordinate thermodynamic diagram and the proportional thermodynamic diagram corresponding to the key points; here, the first loss function is used to train the model and update the model parameters, and the model obtained by training the first loss function in this embodiment can ensure the accuracy of predicting the positions of the key points.

step S210, determining a key point coordinate loss function loss corresponding to each key point according to the coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point _uv ′；

Here, the real thermodynamic diagram corresponding to each key point may be obtained from the Linemod dataset;

the key point coordinate loss function loss corresponding to each key point _uv ' the following formula can be used:

wherein x is _i The element value of the ith element in the coordinate thermodynamic diagram corresponding to the key point, y _i And N is the number of elements in the coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to the key point.

Step S220, accumulating the key point coordinate loss functions corresponding to all the key points to obtain the accumulated value loss of the key point coordinate loss functions _uv ；

Step S230, determining a scale factor corresponding to each key point according to the scale thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;

in one implementation, the scaling factor for each keypoint may be determined using the following method:

determining a probability distribution map according to the real thermodynamic diagram corresponding to the key points; here, the true thermodynamic diagram corresponding to the key point is processed by using softmax to obtain a pixel-level probability distribution diagram;

and multiplying the element value of each element in the proportional thermodynamic diagram corresponding to the key point by the element value of the corresponding element in the probability distribution diagram, and accumulating to obtain the proportional factor corresponding to the key point.

Determining a key point scale factor loss function loss_s according to the scale factor corresponding to each key point and the real scale factor corresponding to each key point; here, the true scale factor corresponding to each key point may be obtained from the Linemod dataset;

the key point scale factor loss function loss_s can be determined by the following formula:

wherein s is _j Is the scale factor corresponding to the jth keypoint,and M is the number of the key points, wherein the real scale factor corresponds to the jth key point.

Step S240, according to the accumulated value loss of the key point coordinate loss function _uv And a keypoint scale factor loss function loss_s, determining a first loss function loss1.

Here, the first loss function loss1 may be calculated using the following formula:

loss 1＝αloss _uv +loss_s

where α is a constant and can be set to 10.

step S310, inputting the source domain training image into a migration training model for training, and updating parameters in a feature extractor, a proportion regressor, a feature regressor and an countermeasure regressor to obtain a primary migration training model; in this step, the parameters of the whole network are updated during the training process so that the predictions of the scale regressor Z, the feature regressor H and the contrast regressor H' are as small as possible in the source domain. In the step, the source domain training image is used for training the model, so that the accuracy of the three regressors in the source domain is ensured.

In one implementation, step 310 may specifically include:

Step S320, inputting the target domain training image into the primary migration training model for training, and updating the parameters in the counterregression to obtain a secondary migration training model; in the step, the model is trained by adopting the target domain training image, only the parameters of the contrast regressor are updated, the inaccurate effect of the contrast regressor in the target domain measurement is ensured, and the prediction of the characteristic regressor is further far away from the prediction of the contrast regressor as far as possible, namely the characteristic regressor is correctly predicted.

In one implementation, step S320 may specifically include:

And step S330, inputting the target domain training image into the secondary migration training model for training, and updating parameters in the feature extractor to obtain a trained migration training model. In this step, only the feature extractor is updated, and the target domain training image is used in order to limit the error effect generated in step S320 to the anti-regression device without affecting the feature extractor.

In one implementation, step 330 may specifically include:

In one embodiment, deriving the second loss function from the coordinate thermodynamic diagram, the proportional thermodynamic diagram, and the counter coordinate thermodynamic diagram corresponding to the plurality of keypoints may include:

step S410, determining a key point coordinate loss function loss corresponding to each key point according to the coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point _uv1 ′；

Here, the key point coordinate loss function loss corresponding to each key point _uv1 ' the following formula can be used:

wherein x is _1i The element value of the ith element in the coordinate thermodynamic diagram corresponding to the key point, y _1i The element value of the ith element in the real thermodynamic diagram corresponding to the key point is represented by NThe corresponding coordinate thermodynamic diagram and the number of elements in the true thermodynamic diagram.

Step S420, accumulating the key point coordinate loss functions corresponding to all the key points to obtain the accumulated value loss of the key point coordinate loss functions _uv1 ；

Step S430, determining a scale factor corresponding to each key point according to the scale thermodynamic diagram and the real thermodynamic diagram corresponding to each key point; the manner in which the scale factors are specifically determined is consistent with the foregoing description and will not be described in detail herein;

step S440, determining a key point scale factor loss function according to the scale factor corresponding to each key point and the real scale factor corresponding to each key point; the key point scale factor loss function loss_s1 can be determined by the following formula:

wherein s is _1j Is the scale factor corresponding to the jth keypoint,and M is the number of the key points, wherein the real scale factor corresponds to the jth key point.

Step S450, determining the antagonism key point coordinate loss function loss corresponding to each key point according to the antagonism coordinate thermodynamic diagram and the true thermodynamic diagram corresponding to each key point _{uv1_adv} 'A'; here, the antagonism key point coordinate loss function loss corresponding to each key point _{uv1_adv} ' the following formula can be used:

wherein x is _adv1i For the element value, y of the ith element in the antagonism coordinate thermodynamic diagram corresponding to the key point _1i For the element value of the ith element in the real thermodynamic diagram corresponding to the key point, N is the element in the countermeasure coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to the key pointNumber of elements.

Step S460, accumulating the antagonism key point coordinate loss functions corresponding to all the key points to obtain the accumulated value loss of the antagonism key point coordinate loss functions _{uv1_adv} ；

Step S470, according to the accumulated value loss of the key point coordinate loss function _uv1 Key point scale factor loss function loss_s1 and anti-key point coordinate loss function accumulated value loss _{uv1_adv} Determining a second loss function loss2;

here, the second loss function loss2 may be determined using the following formula:

loss2＝loss _uv1 +βloss _{uv1_adv} +γloss_s1

where β may be set to 1 and γ may be set to 20.

here, the key point coordinate loss function loss corresponding to each key point _uv2 ' the following formula can be used:

wherein x is _adv2i For the element value, y of the ith element in the antagonism coordinate thermodynamic diagram corresponding to the key point _2i And N is the coordinate thermodynamic diagram corresponding to the key point and the number of elements in the map.

And accumulating the key point coordinate loss functions corresponding to all the key points to obtain a third loss function loss3.

In one embodiment, deriving the fourth loss function from the coordinate thermodynamic diagram and the opposing coordinate thermodynamic diagram for the plurality of keypoints comprises:

determining a coordinate loss function of each key point according to the coordinate thermodynamic diagram and the countermeasure coordinate thermodynamic diagram corresponding to each key point;

here, the key point coordinate loss function loss corresponding to each key point _uv3 ' the following formula can be used:

wherein x is _2i The element value of the ith element in the coordinate thermodynamic diagram corresponding to the key point, y _adv2i And N is the coordinate thermodynamic diagram corresponding to the key point and the number of elements in the antagonism coordinate thermodynamic diagram.

And accumulating the key point coordinate loss functions corresponding to all the key points to obtain a fourth loss function loss4.

Fig. 2 shows a flow diagram of a target 6D pose estimation method according to an embodiment of the application, the target 6D pose estimation method comprising:

step S510, inputting the target image into a feature extractor of a target 6D gesture estimation model to obtain a feature map; here, the target image may be an image obtained by photographing the target with an image pickup device such as a camera; the target 6D gesture estimation model is obtained by applying the target 6D gesture estimation model training method in the embodiment, and the target 6D gesture estimation model comprises a feature extractor, a feature regressor and a proportional regressor;

step S520, respectively inputting the feature map into a feature regressor and a proportion regressor of the target 6D gesture estimation model to respectively obtain a coordinate thermodynamic diagram and a proportion thermodynamic diagram corresponding to a plurality of key points of the target image; here, the keypoints of the target image are generally selected to be more conspicuous or landmark on the target surface, where the keypoints of the target image may be determined according to the Linemod dataset.

Step S530, determining the coordinates (u, v) of the key points corresponding to each key point according to the coordinate thermodynamic diagrams corresponding to the key points; here, the coordinates of the element with the largest element value in the coordinate thermodynamic diagram corresponding to the key point can be determined as the key point coordinates;

step S540, calculating a scale factor S corresponding to each key point according to the coordinate thermodynamic diagrams and the scale thermodynamic diagrams corresponding to the key points;

determining a probability distribution map according to the coordinate thermodynamic diagram corresponding to the key points; here, the coordinate thermodynamic diagram corresponding to the key point is processed by using softmax to obtain a pixel-level probability distribution diagram;

Step S550, determining the three-dimensional coordinates of each key point of the target according to the key point coordinates (u, v) corresponding to each key point and the scale factor S corresponding to each key point;

here, the calculated u, v and s are calculated by the formulaAnd obtaining a three-dimensional coordinate xyz of the target object in the image under a camera coordinate system, wherein K is an internal reference of the camera.

Step S560, obtaining the 6D gesture of the target according to the three-dimensional coordinates of each key point of the target and the three-dimensional coordinates of the key points of the three-dimensional model of the target; here, the three-dimensional coordinates of the key points of the target three-dimensional model may be obtained from the Linemod dataset;

here, the three-dimensional coordinates of each key point of the target and the three-dimensional coordinates of the key points of the three-dimensional model of the target are processed based on a least square method (LeastSquare), so that the 6D pose of the target can be obtained.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above description is merely illustrative of various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present application, and the application is intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for training a 6D pose estimation model of a target, comprising:

training a target 6D gesture estimation model based on a source domain training image to update parameters in the target 6D gesture estimation model to obtain a model after one training; the target 6D attitude estimation model comprises a feature extractor, a feature regressor and a proportional regressor;

adding an anti-regression device in the model after primary training to form a migration training model;

determining a final trained target 6D attitude estimation model based on the trained migration training model;

training the target 6D gesture estimation model based on the source domain training image to update parameters in the target 6D gesture estimation model to obtain a once trained model, wherein the training comprises the following steps:

inputting the source domain training image to the feature extractor to obtain a feature map;

inputting the feature map into the feature regressor and the proportional regressor respectively to obtain coordinate thermodynamic diagrams and proportional thermodynamic diagrams corresponding to a plurality of key points of the source domain training image respectively;

and updating parameters of the feature extractor, the feature regressor and the proportional regressor based on the first loss function to obtain the model after one training.

2. The method of claim 1, wherein the obtaining a first loss function from the coordinate thermodynamic diagrams and the proportional thermodynamic diagrams corresponding to the plurality of keypoints comprises:

determining a key point coordinate loss function corresponding to each key point according to the coordinate thermodynamic diagram and the real thermodynamic diagram corresponding to each key point;

determining a key point scale factor loss function according to the scale factors corresponding to each key point and the real scale factors corresponding to each key point;

and determining the first loss function according to the accumulated value of the key point coordinate loss function and the key point scale factor loss function.

3. The method of claim 1, wherein training the migration training model based on the source domain training image and the target domain training image to update parameters in the migration training model results in a trained migration training model, comprising:

inputting the source domain training image into the migration training model for training, and updating parameters in the feature extractor, the proportional regressor, the feature regressor and the contrast regressor to obtain a primary migration training model;

inputting the target domain training image into the primary migration training model for training, and updating parameters in the countermeasure regression to obtain a secondary migration training model;

4. The method of claim 3, wherein said inputting the source domain training image into the migration training model for training, updating parameters in the feature extractor, the proportional regressor, the feature regressor, and the contrast regressor, results in a primary migration training model, comprising:

inputting the feature map to the feature regressor, the proportional regressor and the contrast regressor respectively to obtain coordinate thermodynamic diagrams, proportional thermodynamic diagrams and contrast coordinate thermodynamic diagrams corresponding to a plurality of key points of the source domain training image respectively;

obtaining a second loss function according to the coordinate thermodynamic diagram, the proportional thermodynamic diagram and the countermeasure coordinate thermodynamic diagram corresponding to the key points;

and updating parameters of the feature extractor, the feature regressor, the proportional regressor and the contrast regressor based on the second loss function to obtain the one-time migration training model.

5. The method of claim 4, wherein deriving the second loss function from the coordinate thermodynamic diagram, the proportional thermodynamic diagram, and the counter coordinate thermodynamic diagram for the plurality of keypoints comprises:

and determining the second loss function according to the accumulated value of the key point coordinate loss function, the key point scale factor loss function and the accumulated value of the antagonism key point coordinate loss function.

6. The method of claim 3, wherein said inputting the target domain training image into the primary migration training model for training, updating parameters in the anti-regressive, to obtain a secondary migration training model, comprises:

inputting the target domain training image to the feature extractor to obtain a feature map;

inputting the feature map into the feature regressor and the countermeasure regressor respectively to obtain coordinate thermodynamic diagrams and countermeasure coordinate thermodynamic diagrams corresponding to a plurality of key points of the target domain training image respectively;

and updating the parameters of the anti-regression based on the third loss function to obtain the secondary migration training model.

7. The method of claim 6, wherein deriving the third loss function from the coordinate thermodynamic diagrams and the opposing coordinate thermodynamic diagrams for the plurality of keypoints comprises:

determining a map of a coordinate thermodynamic diagram corresponding to the key points except the current key point aiming at each key point;

and accumulating the key point coordinate loss functions corresponding to all the key points to obtain the third loss function.

8. The method of claim 3, wherein inputting the target domain training image into the secondary migration training model for training, updating parameters in the feature extractor, and obtaining a trained migration training model comprises:

and updating parameters of the feature extractor based on the fourth loss function to obtain the trained migration training model.

9. A method for estimating 6D pose of a target, comprising:

inputting a target image into a feature extractor of a target 6D gesture estimation model to obtain a feature map, wherein the target 6D gesture estimation model comprises the feature extractor, a feature regressor and a proportional regressor;

inputting the feature map into the feature regressor and the proportional regressor respectively to obtain coordinate thermodynamic diagrams and proportional thermodynamic diagrams corresponding to a plurality of key points of the target image respectively;

the object 6D pose estimation model being obtained by applying the method of any of claims 1-8.