CN113034581A - Spatial target relative pose estimation method based on deep learning - Google Patents
Spatial target relative pose estimation method based on deep learning Download PDFInfo
- Publication number
- CN113034581A CN113034581A CN202110275862.5A CN202110275862A CN113034581A CN 113034581 A CN113034581 A CN 113034581A CN 202110275862 A CN202110275862 A CN 202110275862A CN 113034581 A CN113034581 A CN 113034581A
- Authority
- CN
- China
- Prior art keywords
- camera
- posture
- loss
- layer
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000013135 deep learning Methods 0.000 title claims abstract description 10
- 230000036544 posture Effects 0.000 claims abstract description 52
- 238000012549 training Methods 0.000 claims abstract description 43
- 238000012360 testing method Methods 0.000 claims abstract description 30
- 238000013528 artificial neural network Methods 0.000 claims abstract description 23
- 238000012795 verification Methods 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 25
- 238000003384 imaging method Methods 0.000 claims description 22
- 238000010586 diagram Methods 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 238000010200 validation analysis Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 2
- 238000005286 illumination Methods 0.000 abstract description 3
- 238000002372 labelling Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000001427 coherent effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 101100001675 Emericella variicolor andJ gene Proteins 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention relates to a method for estimating relative poses of space targets based on deep learning, which comprises the following steps: a. constructing a labeled sample set by using two-dimensional projections of the three-dimensional model of the space target at different positions and postures; b. dividing a training set, a verification set and a test set of the labeled sample set, and constructing a pose estimation neural network; c. inputting the training set and the verification set into a constructed pose estimation neural network for training to obtain a pose estimation model; d. and testing the test set by using the pose estimation model obtained by training to obtain the pose information of the space target of each sample in the test set. The method can simultaneously estimate the position and the attitude information of the space target through the regression model of the single image, and is suitable for target pose estimation under the condition of space complex illumination.
Description
Technical Field
The invention relates to a method for estimating relative poses of space targets based on deep learning.
Background
Accurate and robust position and attitude estimation is required in spatial on-orbit servicing, rendezvous docking, and debris-clearing on-orbit approach operations. However, the spatial environment tends to be very complex. For example, the light intensity in the space is high, or low light, reflection, etc. occurs. In addition, at different viewing angles, the aircraft sometimes also appears in a complex textured earth background, thus interfering with pose estimation.
The traditional 6D target pose can be based on a local feature matching PnP (passive-n-Point) method of a three-dimensional model and an image, but is not suitable for an object with insufficient texture. Although the template matching or dense feature learning method can solve the problem of insufficient texture, the method is sensitive to illumination and shading. Moreover, the dense feature learning method takes a long time for feature extraction and attitude measurement.
On-orbit spacecraft attitude estimation can only be achieved by means of sensors such as monocular vision or infrared cameras, stereo cameras and LiDAR (LiDAR). Monocular camera based attitude estimation has certain advantages in an aircraft due to the simplicity of the sensors. Monocular solutions for spacecraft tracking and attitude estimation include:
【B.Naasz,J.V.Eepoel,S.Queen,C.M.Southward,andJ.Hannah,“Flightresultsfromthehstsm4relativenavigationsensorsystem,”2010】;
【J.M.Kelsey,J.Byrne,M.Cosgrove,S.Seereeram,andR.K.Mehra,“Vision-basedrelativeposeestimationforautonomousrendezvousanddocking,”inIEEEAerospaceConference,2006】;
【C.LiuandW.Hu,“Relativeposeestimationforcylinder-shapedspacecraftsusingsingleimage,”IEEETransactionsonAerospaceandElectronicSystems,vol.50,no.4,2014】;
relying on a model-based approach that aligns a wire-frame model of a spacecraft (component) with an edge image of a real spacecraft (component) based on heuristics.
The attitude estimation technology based on the deep learning algorithm makes new progress in ground application. This type of algorithm overcomes traditional image-based processing methods and instead attempts to learn the non-linear transformation between the two-dimensional input image and the 6D output pose in an end-to-end manner. For example:
BB8RadM, LepetitV, BB8: ascable, acurate, robusttopatialcyclionmethododepressing the3 Dposseofchallengobjectswiuthouputdeth, ICCV,2017 ] predict the projection coordinates of 8 corner points of a 3D bounding box formed by a 3D object on 2D, and after obtaining the 2D coordinates, directly obtain a 3D rotation vector and a translation vector by using a PnP algorithm.
Sharms, c.beierle, and s.d' Amico, "position estimation for non-coherent space acquisition after based on using a coherent estimation of a spacecraft attitude," entrance aeronautical conference, pp.1-12,2018, "first proposed using a convolutional neural network for spacecraft attitude estimation, which is based on a position estimation of bounding box detection and an attitude estimation based on soft classification. However, this positioning method fails when a portion of the target is not within the field of view.
Disclosure of Invention
The invention aims to provide a method for estimating the relative pose of a spatial target based on deep learning.
In order to achieve the above object, the present invention provides a method for estimating a relative pose of a spatial target based on deep learning, comprising the following steps:
a. constructing a labeled sample set by using two-dimensional projections of the three-dimensional model of the space target at different positions and postures;
b. dividing a training set, a verification set and a test set of the labeled sample set, and constructing a pose estimation neural network;
c. inputting the training set and the verification set into a constructed pose estimation neural network for training to obtain a pose estimation model;
d. and testing the test set by using the pose estimation model obtained by training to obtain the pose information of the space target of each sample in the test set.
According to an aspect of the present invention, in the step (a), each sample in the labeled sample set includes a position (x, y, z), a posture, and an image corresponding to the position and the posture, and the posture is expressed by a quaternion.
According to one aspect of the invention, in the step (a), the three-dimensional model of the space target is input into 3dsMax software, a camera is added into the 3dsMax software, the position, the posture, the view angle and the width and the height of the image of the camera are set, the position and the posture of the camera are used as a reference, the position and the posture of the three-dimensional model relative to the camera are adjusted, the camera performs two-dimensional imaging on the three-dimensional model, the position and the posture of the three-dimensional model relative to the camera and the corresponding image under the position and the posture are stored, and the labeled sample set is generated;
or inputting the space target three-dimensional model into imaging simulation software, adding a camera into the imaging simulation software, setting the position, the posture, the field angle and the width and the height of an image of the camera, taking the position and the posture of the camera as a reference, adjusting the position and the posture of the three-dimensional model relative to the camera, performing two-dimensional imaging on the three-dimensional model by the camera, storing the position and the posture of the three-dimensional model relative to the camera and the corresponding image under the position and the posture, and generating the labeled sample set;
the imaging simulation software is developed based on OSG, camera parameters and space target three-dimensional model parameters are given in the imaging simulation software, and a two-dimensional image is generated by imaging the space target three-dimensional model through a camera; wherein the content of the first and second substances,
the camera parameters include a cameraPosition (x)cam,ycam,zcam) Camera three-axis attitude angle (pitch)cam,yawcam,rollcam) Camera field angle and width and height of the image;
the parameters of the three-dimensional model of the space target comprise the position (x) of the three-dimensional model of the space targetobj,yobj,zobj) And three-axis attitude angle (pitch) of three-dimensional model of space targetobj,yawobj,rollobj)。
According to one aspect of the present invention, in the step (b), the labeled sample set is randomly divided into a training set, a validation set and a testing set according to a ratio of 7:2: 1.
According to one aspect of the invention, in the step (b), a pose estimation neural network is constructed based on a deep convolution residual error network ResNet;
when a network is constructed, a sample image is input into a network formed by taking a residual convolution neural network as a backbone network, then an output two-dimensional feature map is input into a bottleneck layer with variable dimensionality, dimensionality reduction convolution processing is carried out, finally the feature map is flattened into a one-dimensional array, and position and posture information is output through two branches respectively.
According to one aspect of the invention, the backbone network is divided into five parts, namely stage1, stage2, stage3, stage4 and stage 5;
stage1 is composed of 1 convolutional layer and 1 maximal pooling layer;
the convolution kernel size of the convolution layer in stage1 is (7,7), the convolution step is (2,2), the number of channels is 64, the maximum pooling layer down-sampling factor is (3,3), and the step is (2, 2);
stage2 consists of 1 conv _ block and 2 identity _ blocks; the number of the conv _ block and the number of the identification _ block in the stage2 are 64, 64 and 256;
stage3 consists of 1 conv _ block and 4 identity _ blocks; the number of the conv _ block and the identity _ block in the stage3 is 128, 128 and 512;
stage4 consists of 1 conv _ block and 8 identity _ blocks; the number of the conv _ block and the identity _ block in the stage4 is 256, 256 and 1024;
stage5 consists of 1 conv _ block and 2 identity _ blocks; the number of the conv _ block and the identity _ block in the stage5 is 512, 512 and 2048.
According to one aspect of the invention, the output characteristics of stage5 are input to the bottleeck layer, which is convolved two-dimensionally by a 3 × 3 convolution kernel with an adjustable number of channels, with a step size of (2, 2).
According to one aspect of the invention, the output features of the bottleeck layer are input into the position estimation structure and the pose estimation structure, respectively.
According to one aspect of the invention, the output characteristics of the bottleeck layer are input into the position estimation structure, and three-dimensional information is output in a regression mode by adopting a two-layer full-connection layer structure;
the first full connection layer is used for performing dimensionality reduction operation on the flattened feature map information, compressing the feature map information to 1024 dimensions, inputting output data of the first full connection layer to the next full connection layer after a relu activation function is performed, and finally outputting the output data to three-dimensional coordinate information (x, y, z);
inputting the output characteristics of the bottleeck layer into an attitude estimation structure, and outputting an estimated quaternion in a regression mode by adopting a two-layer full-connection layer structure;
and the first full connection layer is used for performing dimensionality reduction operation on the flattened feature diagram information, compressing the flattened feature diagram information to 1024 dimensionalities, inputting output data of the layer to the next full connection layer after a relu activation function is performed, and finally outputting the attitude information represented by quaternion.
According to one aspect of the present invention, in the training network process of step (c), training the training samples in the labeled sample set and the validation samples are input to a pose estimation neural network for training, and a model with the minimum loss function is selected as a training model;
wherein the loss function is obtained by adding a position loss term and an attitude loss term:
Loss=Lossposition of+LossPosture;
Wherein, Loss is total Loss function, and takes relative error form, LossPosition ofAs a position Loss term, LossPostureIs an attitude loss term;
loss of location function LossPosition ofComprises the following steps:
attitude Loss function LossPostureComprises the following steps:
where m is the number of training samples, i is 1,2, …, m, the position estimate(i)True position value(i)A quaternion representing the position estimation value, the labeled position value and the attitude estimation value of the ith sample respectively(i)Quaternion of true attitude(i)The attitude estimation value and the labeled attitude value of the ith sample are expressed by quaternions respectively.
According to the concept of the invention, a sample is labeled on a space target of the three-dimensional model, two-dimensional projections of the three-dimensional model of the space target at different positions and postures are obtained during labeling, and information of the corresponding position and posture of the space target is recorded, so that a labeled sample set containing position and posture information is obtained. And then constructing a pose estimation neural network, inputting the labeled sample set into the pose estimation neural network, and training the labeled sample set to obtain a training model with the minimum loss function. And finally, inputting the space target image into a training model to obtain the position and posture information of the space target. Therefore, the invention can simultaneously estimate the position and the attitude information of the space target through a single image and a simple regression model, and is suitable for the attitude estimation of the space target.
Drawings
FIG. 1 is a flow chart of a method for estimating relative poses of spatial objects based on deep learning according to an embodiment of the present invention;
FIG. 2 is a block diagram schematically illustrating a pose estimation neural network according to one embodiment of the present invention;
FIG. 3 is a diagram schematically illustrating a statistical result of attitude estimation accuracy of a test sample according to an embodiment of the present invention;
FIG. 4 is a diagram schematically illustrating statistics of position estimation accuracy for a test sample according to one embodiment of the present invention;
fig. 5 and 6 schematically show pose truth and estimate maps for two examples of the invention, respectively.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
The present invention is described in detail below with reference to the drawings and the specific embodiments, which are not repeated herein, but the embodiments of the present invention are not limited to the following embodiments.
Referring to fig. 1, in the method for estimating the relative pose of a spatial target based on deep learning of the present invention, first, a sample is labeled on the spatial target of a three-dimensional model, specifically, a labeled sample set containing position and pose information is constructed by using two-dimensional projections of the three-dimensional model of the spatial target at different positions and poses. And then, dividing the labeled sample set into a training set, a verification set and a test set, and constructing a pose estimation neural network. And inputting the training set and the verification set into the constructed pose estimation neural network for training to obtain a pose estimation model. And finally, testing the test set by using the pose estimation model obtained by training to obtain the pose information of the space target of each sample in the test set.
In the invention, each sample in the labeled sample set comprises a position (x, y, z) represented by three translation amounts, a posture and an image corresponding to the current position and the posture, wherein the posture is represented by a quaternion. The invention does not specially limit the selected visual angles of the three positions and postures of the space target as long as the invention is used for selecting the three positions and postures of the space targetThe pose of the space target can be determined by using the pose, for example, x, y and z can form a three-dimensional rectangular coordinate system. When a sample is extracted, a three-dimensional model of a space target can be input into imaging simulation software, a camera is added into the imaging simulation software, the position, the posture, the angle of view and the width and the height of an image of the camera are set, the position and the posture of the camera are used as references, the position and the posture of the three-dimensional model relative to the camera are adjusted, the camera performs two-dimensional imaging on the three-dimensional model, the position and the posture of the three-dimensional model relative to the camera and the corresponding image under the position and the posture are stored, and an annotated sample set is generated. In the invention, the imaging simulation software is the imaging simulation software developed based on OSG, and the given camera parameter and the space target three-dimensional model parameter are needed in the imaging simulation software, wherein the camera parameter comprises the camera position (x)cam,ycam,zcam) Camera three-axis attitude angle (pitch)cam,yawcam,rollcam) Camera field angle, width and height of image, and the three-dimensional model parameters of the space target comprise the position (x) of the three-dimensional model of the space targetobj,yobj,zobj) Three-axis attitude angle (pitch) of three-dimensional model of space targetobj,yawobj,rollobj) (ii) a And imaging the three-dimensional model of the space target by a camera to generate a two-dimensional image. Or inputting the space target three-dimensional model into 3dsMax software, adding a camera into the 3dsMax software, setting the position, the posture, the field angle and the width and the height of an image of the camera, adjusting the position and the posture of the three-dimensional model relative to the camera by taking the position and the posture of the camera as a reference, performing two-dimensional imaging on the three-dimensional model by the camera, storing the position and the posture of the three-dimensional model relative to the camera and the corresponding image at the position and the posture, and generating an annotation sample set.
According to any method, the two-dimensional projection image of the space target can be extracted, and the construction of the labeling sample set is completed by using the images. Subsequently, the constructed labeled sample set can be divided into a training set, a verification set and a test set. In the embodiment, the labeled sample set is randomly divided into a training set, a verification set and a test set according to a ratio of 7:2:1, that is, the labeled sample set is randomly divided into training samples, verification samples and test samples according to a ratio of 70%, 20% and 10%.
After the division of the sample set is completed, the position and pose estimation neural network can be constructed by utilizing the labeled sample set. In the invention, the pose estimation neural network is constructed based on a network model modified by a deep convolution residual error network ResNet. Because the basic unit residual block in the residual network has good characteristics, the phenomenon that the depth is increased and the loss function is increased is solved, and therefore the network is adopted as a basic model for designing. Generally speaking, when a network is constructed, a sample image is firstly input into a network formed by taking a residual convolutional neural network as a backbone network, then an output two-dimensional feature map is input into a bottleneck layer with variable dimensionality, dimensionality reduction convolution processing is carried out, finally, the feature map is flattened into a one-dimensional array, and then position and attitude information is respectively output through two branches.
As shown in fig. 2, a backbone network (backbone) constructed based on a residual convolutional neural network can be divided into five parts, namely, stage1, stage2, stage3, stage4 and stage5, and the composition and corresponding parameters of each part are as follows:
stage1 is composed of 1 convolutional layer and 1 maximal pooling layer;
the convolution kernel size of the convolution layer in stage1 is (7,7), the convolution step is (2,2), the number of channels is 64, the maximum pooling layer down-sampling factor is (3,3), and the step is (2, 2);
stage2 consists of 1 conv _ block and 2 identity _ blocks; the number of the conv _ block and the number of the identification _ block in the stage2 are 64, 64 and 256;
stage3 consists of 1 conv _ block and 4 identity _ blocks; the number of the conv _ block and the identity _ block in the stage3 is 128, 128 and 512;
stage4 consists of 1 conv _ block and 8 identity _ blocks; the number of the conv _ block and the identity _ block in the stage4 is 256, 256 and 1024;
stage5 consists of 1 conv _ block and 2 identity _ blocks; the number of the conv _ block and the identity _ block in the stage5 is 512, 512 and 2048.
According to the steps for constructing the network, the output characteristics of the stage5 are input into the bottleeck layer, the bottleeck layer is subjected to two-dimensional convolution by a convolution kernel of 3 multiplied by 3 with adjustable channel number, and the step length is (2, 2). Therefore, the output dimensionality is reduced, the parameter quantity is greatly reduced for the operation of the following full-connection layer, and the training time is reduced. The output characteristics of the bottleeck layer can be input into two branches of the position estimation structure and the attitude estimation structure respectively.
In the step of inputting the output characteristics of the bottleeck layer into the position estimation structure, the invention adopts a two-layer full-connection layer structure to directly output three-dimensional information in a regression mode. And the first full connection layer is used for performing dimensionality reduction operation on the flattened feature map information, compressing the feature map information to 1024 dimensions, inputting output data of the layer to the next full connection layer after a relu activation function is performed, and finally outputting the three-dimensional coordinate information (x, y, z). In the step of inputting the output characteristics of the bottleeck layer into the attitude estimation structure, the invention adopts a two-layer full-connection layer structure to directly output the estimated quaternion in a regression mode. The first full-connection layer is used for performing dimensionality reduction operation on the flattened feature diagram information, compressing the feature diagram information to 1024 dimensionalities, inputting output data of the layer to the next full-connection layer after a relu activation function is performed, and finally outputting the attitude information (q) represented by quaternion1,q2,q3,q4)。
Through the steps, a basic pose estimation neural network can be constructed, and then the network needs to be trained to be capable of becoming a pose estimation model. In the process of training the network, the training samples in the labeled sample set and the verification samples are input into a pose estimation neural network, the neural network is trained, and the model with the minimum loss function is selected as a final training model. And then inputting the test samples (namely the test set) into a pose estimation neural network, and testing the test samples by using the trained model so as to obtain a pose test result of the space target, namely pose information of each sample. In the invention, the loss function is obtained by adding a position loss term and an attitude loss term. In addition, because the obtained position and posture information are independent of each other and belong to different categories, two branch structures need to be designed, that is, two branches are divided by two arrows in the lower part of fig. 2, corresponding loss functions are designed for the two branches, and then the two branches are summed, so that the obtained loss function is:
Loss=Lossposition of+LossPosture;
Wherein, Loss is total Loss function, and takes relative error form, LossPosition ofAs a position Loss term, LossPostureIs a pose loss term.
Loss of location function LossPosition ofComprises the following steps:
attitude Loss function LossPostureComprises the following steps:
in the expression of position and posture, m is the number of training samples, i is 1,2, …, m is a cyclic variable, and the position estimation value(i)True position value(i)A quaternion representing the position estimation value, the labeled position value and the attitude estimation value of the ith sample respectively(i)Quaternion of true attitude(i)The attitude estimation value and the labeled attitude value of the ith sample are expressed by quaternions respectively.
After the final training model is obtained through training, the position and the posture of the image containing the space target can be estimated by utilizing the final training model, so that the position and the posture information of the space target are obtained.
Fig. 3 shows a statistical result chart of the attitude estimation accuracy of 100 test samples, the average value is 3.71 °, and fig. 4 shows a statistical result chart of the position estimation accuracy of 100 test samples, the average value is 0.389 m. Fig. 5 and 6 show a comparison of the target pose annotation value with the estimate value. The left image of FIG. 5 is the true value of the pose labeling, the true value of the position is (-0.9979,0.3473,65.5559), and the true value of the attitude quaternion is (-0.6179,0.1311,0.0299, 0.7747); the right image of fig. 5 is a pose estimate, with position estimates (-0.9779,0.288,65.1297), and pose quaternion estimates (-0.6248,0.146,0.022, 0.7666); the resulting position error was 0.44m and attitude error was 2.28 °. The left image of FIG. 6 is the true value of the pose labeling, the true value of the position is (-3.9179, -16.5673,65.974), and the true value of the attitude quaternion is (-0.8,0.15, -0.059, 0.5778); the right image of fig. 6 is a pose estimate, with position estimates (-3.8477, -16.3197,65.7765), and attitude quaternion estimates (-0.7933,0.1419, -0.0296, 0.591); the resulting position error was 0.32m and attitude error was 3.93 °.
Generally, the estimation method for the relative pose of the space target has small difference between the estimation values of the position and the attitude of the space target and the true value, has high precision and is suitable for the technical field of estimation of the pose of the space target. The position and posture information of the space target can be estimated simultaneously through the regression model of the single image, and the method is suitable for target pose estimation under the condition of space complex illumination. Therefore, the technical problem that the pose of the space target is difficult to estimate is solved.
The above description is only one embodiment of the present invention, and is not intended to limit the present invention, and it is apparent to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A method for estimating relative poses of spatial targets based on deep learning comprises the following steps:
a. constructing a labeled sample set by using two-dimensional projections of the three-dimensional model of the space target at different positions and postures;
b. dividing a training set, a verification set and a test set of the labeled sample set, and constructing a pose estimation neural network;
c. inputting the training set and the verification set into a constructed pose estimation neural network for training to obtain a pose estimation model;
d. and testing the test set by using the pose estimation model obtained by training to obtain the pose information of the space target of each sample in the test set.
2. The method of claim 1, wherein in step (a), each sample in the labeled sample set comprises a position (x, y, z), a pose, and an image corresponding to the position and the pose, the pose being represented by a quaternion.
3. The method according to claim 1, wherein in the step (a), the three-dimensional model of the space target is input into 3dsMax software, a camera is added into the 3dsMax software, the position, the posture, the angle of view and the width and the height of the image of the camera are set, the position and the posture of the camera are taken as a reference, the position and the posture of the three-dimensional model relative to the camera are adjusted, the camera performs two-dimensional imaging on the three-dimensional model, the position and the posture of the three-dimensional model relative to the camera and the corresponding image under the position and the posture are saved, and the labeled sample set is generated;
or inputting the space target three-dimensional model into imaging simulation software, adding a camera into the imaging simulation software, setting the position, the posture, the field angle and the width and the height of an image of the camera, taking the position and the posture of the camera as a reference, adjusting the position and the posture of the three-dimensional model relative to the camera, performing two-dimensional imaging on the three-dimensional model by the camera, storing the position and the posture of the three-dimensional model relative to the camera and the corresponding image under the position and the posture, and generating the labeled sample set;
the imaging simulation software is developed based on OSG, camera parameters and space target three-dimensional model parameters are given in the imaging simulation software, and a two-dimensional image is generated by imaging the space target three-dimensional model through a camera; wherein the content of the first and second substances,
the camera parameters include camera position (x)cam,ycam,zcam) Camera three-axis attitude angle (pitch)cam,yawcam,rollcam) Camera field angle and width and height of the image;
the three-dimensional model parameters of the space target comprise space orderTargeting three-dimensional model position (x)obj,yobj,zobj) And three-axis attitude angle (pitch) of three-dimensional model of space targetobj,yawobj,rollobj)。
4. The method of claim 1, wherein in step (b), the labeled sample set is randomly divided into a training set, a validation set and a testing set according to a ratio of 7:2: 1.
5. The method according to claim 1, wherein in step (b), a pose estimation neural network is constructed based on a deep convolution residual network (ResNet);
when a network is constructed, a sample image is input into a network formed by taking a residual convolution neural network as a backbone network, then an output two-dimensional feature map is input into a bottleneck layer with variable dimensionality, dimensionality reduction convolution processing is carried out, finally the feature map is flattened into a one-dimensional array, and position and posture information is output through two branches respectively.
6. The method of claim 5, wherein the backbone network is divided into five parts, namely stage1, stage2, stage3, stage4 and stage 5;
stage1 is composed of 1 convolutional layer and 1 maximal pooling layer;
the convolution kernel size of the convolution layer in stage1 is (7,7), the convolution step is (2,2), the number of channels is 64, the maximum pooling layer down-sampling factor is (3,3), and the step is (2, 2);
stage2 consists of 1 conv _ block and 2 identity _ blocks; the number of the conv _ block and the number of the identification _ block in the stage2 are 64, 64 and 256;
stage3 consists of 1 conv _ block and 4 identity _ blocks; the number of the conv _ block and the identity _ block in the stage3 is 128, 128 and 512;
stage4 consists of 1 conv _ block and 8 identity _ blocks; the number of the conv _ block and the identity _ block in the stage4 is 256, 256 and 1024;
stage5 consists of 1 conv _ block and 2 identity _ blocks; the number of the conv _ block and the identity _ block in the stage5 is 512, 512 and 2048.
7. The method of claim 6, wherein the output characteristics of stage5 are input to a bottleeck layer, which is convolved two-dimensionally by a 3 x 3 convolution kernel with an adjustable number of channels, with a step size of (2, 2).
8. The method according to claim 7, wherein the output features of the bottleeck layer are input to two branches of the position estimation structure and the pose estimation structure, respectively.
9. The method according to claim 8, characterized in that the output characteristics of the bottleeck layer are input into the position estimation structure, and three-dimensional information is output in a regression manner by adopting a two-layer fully-connected layer structure;
the first full connection layer is used for performing dimensionality reduction operation on the flattened feature map information, compressing the feature map information to 1024 dimensions, inputting output data of the first full connection layer to the next full connection layer after a relu activation function is performed, and finally outputting the output data to three-dimensional coordinate information (x, y, z);
inputting the output characteristics of the bottleeck layer into an attitude estimation structure, and outputting an estimated quaternion in a regression mode by adopting a two-layer full-connection layer structure;
and the first full connection layer is used for performing dimensionality reduction operation on the flattened feature diagram information, compressing the flattened feature diagram information to 1024 dimensionalities, inputting output data of the layer to the next full connection layer after a relu activation function is performed, and finally outputting the attitude information represented by quaternion.
10. The method according to claim 1, wherein in the training network of step (c), the training samples in the labeled sample set and the validation samples are input into a pose estimation neural network for training, and a model with the minimum loss function is selected as a training model;
wherein the loss function is obtained by adding a position loss term and an attitude loss term:
Loss=Lossposition of+LossPosture;
Wherein, Loss is total Loss function, and takes relative error form, LossPosition ofAs a position Loss term, LossPostureIs an attitude loss term;
loss of location function LossPosition ofComprises the following steps:
attitude Loss function LossPostureComprises the following steps:
where m is the number of training samples, i is 1,2, …, m, the position estimate(i)True position value(i)A quaternion representing the position estimation value, the labeled position value and the attitude estimation value of the ith sample respectively(i)Quaternion of true attitude(i)The attitude estimation value and the labeled attitude value of the ith sample are expressed by quaternions respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110275862.5A CN113034581A (en) | 2021-03-15 | 2021-03-15 | Spatial target relative pose estimation method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110275862.5A CN113034581A (en) | 2021-03-15 | 2021-03-15 | Spatial target relative pose estimation method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113034581A true CN113034581A (en) | 2021-06-25 |
Family
ID=76468750
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110275862.5A Pending CN113034581A (en) | 2021-03-15 | 2021-03-15 | Spatial target relative pose estimation method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113034581A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113763572A (en) * | 2021-09-17 | 2021-12-07 | 北京京航计算通讯研究所 | 3D entity labeling method based on AI intelligent recognition and storage medium |
CN114187360A (en) * | 2021-12-14 | 2022-03-15 | 西安交通大学 | Head pose estimation method based on deep learning and quaternion |
CN114266824A (en) * | 2021-12-10 | 2022-04-01 | 北京理工大学 | Non-cooperative target relative pose measurement method and system based on deep learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109816725A (en) * | 2019-01-17 | 2019-05-28 | 哈工大机器人(合肥)国际创新研究院 | A kind of monocular camera object pose estimation method and device based on deep learning |
CN110349215A (en) * | 2019-07-10 | 2019-10-18 | 北京悉见科技有限公司 | A kind of camera position and orientation estimation method and device |
WO2020161118A1 (en) * | 2019-02-05 | 2020-08-13 | Siemens Aktiengesellschaft | Adversarial joint image and pose distribution learning for camera pose regression and refinement |
-
2021
- 2021-03-15 CN CN202110275862.5A patent/CN113034581A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109816725A (en) * | 2019-01-17 | 2019-05-28 | 哈工大机器人(合肥)国际创新研究院 | A kind of monocular camera object pose estimation method and device based on deep learning |
WO2020161118A1 (en) * | 2019-02-05 | 2020-08-13 | Siemens Aktiengesellschaft | Adversarial joint image and pose distribution learning for camera pose regression and refinement |
CN110349215A (en) * | 2019-07-10 | 2019-10-18 | 北京悉见科技有限公司 | A kind of camera position and orientation estimation method and device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113763572A (en) * | 2021-09-17 | 2021-12-07 | 北京京航计算通讯研究所 | 3D entity labeling method based on AI intelligent recognition and storage medium |
CN114266824A (en) * | 2021-12-10 | 2022-04-01 | 北京理工大学 | Non-cooperative target relative pose measurement method and system based on deep learning |
CN114187360A (en) * | 2021-12-14 | 2022-03-15 | 西安交通大学 | Head pose estimation method based on deep learning and quaternion |
CN114187360B (en) * | 2021-12-14 | 2024-02-06 | 西安交通大学 | Head pose estimation method based on deep learning and quaternion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110458939B (en) | Indoor scene modeling method based on visual angle generation | |
CN109859296B (en) | Training method of SMPL parameter prediction model, server and storage medium | |
JP6681729B2 (en) | Method for determining 3D pose of object and 3D location of landmark point of object, and system for determining 3D pose of object and 3D location of landmark of object | |
CN110084304B (en) | Target detection method based on synthetic data set | |
CN113034581A (en) | Spatial target relative pose estimation method based on deep learning | |
AU2011362799B2 (en) | 3D streets | |
CN112270249A (en) | Target pose estimation method fusing RGB-D visual features | |
EP3182371B1 (en) | Threshold determination in for example a type ransac algorithm | |
CN109242855B (en) | Multi-resolution three-dimensional statistical information-based roof segmentation method, system and equipment | |
CN106503671A (en) | The method and apparatus for determining human face posture | |
CN109816769A (en) | Scene based on depth camera ground drawing generating method, device and equipment | |
CN109035327B (en) | Panoramic camera attitude estimation method based on deep learning | |
CN112639846A (en) | Method and device for training deep learning model | |
EP3420532B1 (en) | Systems and methods for estimating pose of textureless objects | |
EP3185212B1 (en) | Dynamic particle filter parameterization | |
CN110243390A (en) | The determination method, apparatus and odometer of pose | |
CN114581571A (en) | Monocular human body reconstruction method and device based on IMU and forward deformation field | |
CN114897136A (en) | Multi-scale attention mechanism method and module and image processing method and device | |
CN111127556B (en) | Target object identification and pose estimation method and device based on 3D vision | |
CN114972646A (en) | Method and system for extracting and modifying independent ground objects of live-action three-dimensional model | |
CN111198563A (en) | Terrain recognition method and system for dynamic motion of foot type robot | |
CN116612513A (en) | Head posture estimation method and system | |
CN113379899B (en) | Automatic extraction method for building engineering working face area image | |
Dong et al. | Learning stratified 3D reconstruction | |
Zhang et al. | A multiple camera system with real-time volume reconstruction for articulated skeleton pose tracking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |