CN113034581A - Spatial target relative pose estimation method based on deep learning - Google Patents

Spatial target relative pose estimation method based on deep learning Download PDF

Info

Publication number
CN113034581A
CN113034581A CN202110275862.5A CN202110275862A CN113034581A CN 113034581 A CN113034581 A CN 113034581A CN 202110275862 A CN202110275862 A CN 202110275862A CN 113034581 A CN113034581 A CN 113034581A
Authority
CN
China
Prior art keywords
camera
posture
loss
layer
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110275862.5A
Other languages
Chinese (zh)
Inventor
李志�
李海超
蒙波
黄剑斌
张志民
杨兴昊
黄良伟
黄龙飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Space Technology CAST
Original Assignee
China Academy of Space Technology CAST
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Space Technology CAST filed Critical China Academy of Space Technology CAST
Priority to CN202110275862.5A priority Critical patent/CN113034581A/en
Publication of CN113034581A publication Critical patent/CN113034581A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a method for estimating relative poses of space targets based on deep learning, which comprises the following steps: a. constructing a labeled sample set by using two-dimensional projections of the three-dimensional model of the space target at different positions and postures; b. dividing a training set, a verification set and a test set of the labeled sample set, and constructing a pose estimation neural network; c. inputting the training set and the verification set into a constructed pose estimation neural network for training to obtain a pose estimation model; d. and testing the test set by using the pose estimation model obtained by training to obtain the pose information of the space target of each sample in the test set. The method can simultaneously estimate the position and the attitude information of the space target through the regression model of the single image, and is suitable for target pose estimation under the condition of space complex illumination.

Description

Spatial target relative pose estimation method based on deep learning
Technical Field
The invention relates to a method for estimating relative poses of space targets based on deep learning.
Background
Accurate and robust position and attitude estimation is required in spatial on-orbit servicing, rendezvous docking, and debris-clearing on-orbit approach operations. However, the spatial environment tends to be very complex. For example, the light intensity in the space is high, or low light, reflection, etc. occurs. In addition, at different viewing angles, the aircraft sometimes also appears in a complex textured earth background, thus interfering with pose estimation.
The traditional 6D target pose can be based on a local feature matching PnP (passive-n-Point) method of a three-dimensional model and an image, but is not suitable for an object with insufficient texture. Although the template matching or dense feature learning method can solve the problem of insufficient texture, the method is sensitive to illumination and shading. Moreover, the dense feature learning method takes a long time for feature extraction and attitude measurement.
On-orbit spacecraft attitude estimation can only be achieved by means of sensors such as monocular vision or infrared cameras, stereo cameras and LiDAR (LiDAR). Monocular camera based attitude estimation has certain advantages in an aircraft due to the simplicity of the sensors. Monocular solutions for spacecraft tracking and attitude estimation include:
【B.Naasz,J.V.Eepoel,S.Queen,C.M.Southward,andJ.Hannah,“Flightresultsfromthehstsm4relativenavigationsensorsystem,”2010】;
【J.M.Kelsey,J.Byrne,M.Cosgrove,S.Seereeram,andR.K.Mehra,“Vision-basedrelativeposeestimationforautonomousrendezvousanddocking,”inIEEEAerospaceConference,2006】;
【C.LiuandW.Hu,“Relativeposeestimationforcylinder-shapedspacecraftsusingsingleimage,”IEEETransactionsonAerospaceandElectronicSystems,vol.50,no.4,2014】;
relying on a model-based approach that aligns a wire-frame model of a spacecraft (component) with an edge image of a real spacecraft (component) based on heuristics.
The attitude estimation technology based on the deep learning algorithm makes new progress in ground application. This type of algorithm overcomes traditional image-based processing methods and instead attempts to learn the non-linear transformation between the two-dimensional input image and the 6D output pose in an end-to-end manner. For example:
BB8RadM, LepetitV, BB8: ascable, acurate, robusttopatialcyclionmethododepressing the3 Dposseofchallengobjectswiuthouputdeth, ICCV,2017 ] predict the projection coordinates of 8 corner points of a 3D bounding box formed by a 3D object on 2D, and after obtaining the 2D coordinates, directly obtain a 3D rotation vector and a translation vector by using a PnP algorithm.
Sharms, c.beierle, and s.d' Amico, "position estimation for non-coherent space acquisition after based on using a coherent estimation of a spacecraft attitude," entrance aeronautical conference, pp.1-12,2018, "first proposed using a convolutional neural network for spacecraft attitude estimation, which is based on a position estimation of bounding box detection and an attitude estimation based on soft classification. However, this positioning method fails when a portion of the target is not within the field of view.
Disclosure of Invention
The invention aims to provide a method for estimating the relative pose of a spatial target based on deep learning.
In order to achieve the above object, the present invention provides a method for estimating a relative pose of a spatial target based on deep learning, comprising the following steps:
a. constructing a labeled sample set by using two-dimensional projections of the three-dimensional model of the space target at different positions and postures;
b. dividing a training set, a verification set and a test set of the labeled sample set, and constructing a pose estimation neural network;
c. inputting the training set and the verification set into a constructed pose estimation neural network for training to obtain a pose estimation model;
d. and testing the test set by using the pose estimation model obtained by training to obtain the pose information of the space target of each sample in the test set.
According to an aspect of the present invention, in the step (a), each sample in the labeled sample set includes a position (x, y, z), a posture, and an image corresponding to the position and the posture, and the posture is expressed by a quaternion.
According to one aspect of the invention, in the step (a), the three-dimensional model of the space target is input into 3dsMax software, a camera is added into the 3dsMax software, the position, the posture, the view angle and the width and the height of the image of the camera are set, the position and the posture of the camera are used as a reference, the position and the posture of the three-dimensional model relative to the camera are adjusted, the camera performs two-dimensional imaging on the three-dimensional model, the position and the posture of the three-dimensional model relative to the camera and the corresponding image under the position and the posture are stored, and the labeled sample set is generated;
or inputting the space target three-dimensional model into imaging simulation software, adding a camera into the imaging simulation software, setting the position, the posture, the field angle and the width and the height of an image of the camera, taking the position and the posture of the camera as a reference, adjusting the position and the posture of the three-dimensional model relative to the camera, performing two-dimensional imaging on the three-dimensional model by the camera, storing the position and the posture of the three-dimensional model relative to the camera and the corresponding image under the position and the posture, and generating the labeled sample set;
the imaging simulation software is developed based on OSG, camera parameters and space target three-dimensional model parameters are given in the imaging simulation software, and a two-dimensional image is generated by imaging the space target three-dimensional model through a camera; wherein the content of the first and second substances,
the camera parameters include a cameraPosition (x)cam,ycam,zcam) Camera three-axis attitude angle (pitch)cam,yawcam,rollcam) Camera field angle and width and height of the image;
the parameters of the three-dimensional model of the space target comprise the position (x) of the three-dimensional model of the space targetobj,yobj,zobj) And three-axis attitude angle (pitch) of three-dimensional model of space targetobj,yawobj,rollobj)。
According to one aspect of the present invention, in the step (b), the labeled sample set is randomly divided into a training set, a validation set and a testing set according to a ratio of 7:2: 1.
According to one aspect of the invention, in the step (b), a pose estimation neural network is constructed based on a deep convolution residual error network ResNet;
when a network is constructed, a sample image is input into a network formed by taking a residual convolution neural network as a backbone network, then an output two-dimensional feature map is input into a bottleneck layer with variable dimensionality, dimensionality reduction convolution processing is carried out, finally the feature map is flattened into a one-dimensional array, and position and posture information is output through two branches respectively.
According to one aspect of the invention, the backbone network is divided into five parts, namely stage1, stage2, stage3, stage4 and stage 5;
stage1 is composed of 1 convolutional layer and 1 maximal pooling layer;
the convolution kernel size of the convolution layer in stage1 is (7,7), the convolution step is (2,2), the number of channels is 64, the maximum pooling layer down-sampling factor is (3,3), and the step is (2, 2);
stage2 consists of 1 conv _ block and 2 identity _ blocks; the number of the conv _ block and the number of the identification _ block in the stage2 are 64, 64 and 256;
stage3 consists of 1 conv _ block and 4 identity _ blocks; the number of the conv _ block and the identity _ block in the stage3 is 128, 128 and 512;
stage4 consists of 1 conv _ block and 8 identity _ blocks; the number of the conv _ block and the identity _ block in the stage4 is 256, 256 and 1024;
stage5 consists of 1 conv _ block and 2 identity _ blocks; the number of the conv _ block and the identity _ block in the stage5 is 512, 512 and 2048.
According to one aspect of the invention, the output characteristics of stage5 are input to the bottleeck layer, which is convolved two-dimensionally by a 3 × 3 convolution kernel with an adjustable number of channels, with a step size of (2, 2).
According to one aspect of the invention, the output features of the bottleeck layer are input into the position estimation structure and the pose estimation structure, respectively.
According to one aspect of the invention, the output characteristics of the bottleeck layer are input into the position estimation structure, and three-dimensional information is output in a regression mode by adopting a two-layer full-connection layer structure;
the first full connection layer is used for performing dimensionality reduction operation on the flattened feature map information, compressing the feature map information to 1024 dimensions, inputting output data of the first full connection layer to the next full connection layer after a relu activation function is performed, and finally outputting the output data to three-dimensional coordinate information (x, y, z);
inputting the output characteristics of the bottleeck layer into an attitude estimation structure, and outputting an estimated quaternion in a regression mode by adopting a two-layer full-connection layer structure;
and the first full connection layer is used for performing dimensionality reduction operation on the flattened feature diagram information, compressing the flattened feature diagram information to 1024 dimensionalities, inputting output data of the layer to the next full connection layer after a relu activation function is performed, and finally outputting the attitude information represented by quaternion.
According to one aspect of the present invention, in the training network process of step (c), training the training samples in the labeled sample set and the validation samples are input to a pose estimation neural network for training, and a model with the minimum loss function is selected as a training model;
wherein the loss function is obtained by adding a position loss term and an attitude loss term:
Loss=Lossposition of+LossPosture
Wherein, Loss is total Loss function, and takes relative error form, LossPosition ofAs a position Loss term, LossPostureIs an attitude loss term;
loss of location function LossPosition ofComprises the following steps:
Figure BDA0002976608920000061
attitude Loss function LossPostureComprises the following steps:
Figure BDA0002976608920000062
where m is the number of training samples, i is 1,2, …, m, the position estimate(i)True position value(i)A quaternion representing the position estimation value, the labeled position value and the attitude estimation value of the ith sample respectively(i)Quaternion of true attitude(i)The attitude estimation value and the labeled attitude value of the ith sample are expressed by quaternions respectively.
According to the concept of the invention, a sample is labeled on a space target of the three-dimensional model, two-dimensional projections of the three-dimensional model of the space target at different positions and postures are obtained during labeling, and information of the corresponding position and posture of the space target is recorded, so that a labeled sample set containing position and posture information is obtained. And then constructing a pose estimation neural network, inputting the labeled sample set into the pose estimation neural network, and training the labeled sample set to obtain a training model with the minimum loss function. And finally, inputting the space target image into a training model to obtain the position and posture information of the space target. Therefore, the invention can simultaneously estimate the position and the attitude information of the space target through a single image and a simple regression model, and is suitable for the attitude estimation of the space target.
Drawings
FIG. 1 is a flow chart of a method for estimating relative poses of spatial objects based on deep learning according to an embodiment of the present invention;
FIG. 2 is a block diagram schematically illustrating a pose estimation neural network according to one embodiment of the present invention;
FIG. 3 is a diagram schematically illustrating a statistical result of attitude estimation accuracy of a test sample according to an embodiment of the present invention;
FIG. 4 is a diagram schematically illustrating statistics of position estimation accuracy for a test sample according to one embodiment of the present invention;
fig. 5 and 6 schematically show pose truth and estimate maps for two examples of the invention, respectively.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
The present invention is described in detail below with reference to the drawings and the specific embodiments, which are not repeated herein, but the embodiments of the present invention are not limited to the following embodiments.
Referring to fig. 1, in the method for estimating the relative pose of a spatial target based on deep learning of the present invention, first, a sample is labeled on the spatial target of a three-dimensional model, specifically, a labeled sample set containing position and pose information is constructed by using two-dimensional projections of the three-dimensional model of the spatial target at different positions and poses. And then, dividing the labeled sample set into a training set, a verification set and a test set, and constructing a pose estimation neural network. And inputting the training set and the verification set into the constructed pose estimation neural network for training to obtain a pose estimation model. And finally, testing the test set by using the pose estimation model obtained by training to obtain the pose information of the space target of each sample in the test set.
In the invention, each sample in the labeled sample set comprises a position (x, y, z) represented by three translation amounts, a posture and an image corresponding to the current position and the posture, wherein the posture is represented by a quaternion. The invention does not specially limit the selected visual angles of the three positions and postures of the space target as long as the invention is used for selecting the three positions and postures of the space targetThe pose of the space target can be determined by using the pose, for example, x, y and z can form a three-dimensional rectangular coordinate system. When a sample is extracted, a three-dimensional model of a space target can be input into imaging simulation software, a camera is added into the imaging simulation software, the position, the posture, the angle of view and the width and the height of an image of the camera are set, the position and the posture of the camera are used as references, the position and the posture of the three-dimensional model relative to the camera are adjusted, the camera performs two-dimensional imaging on the three-dimensional model, the position and the posture of the three-dimensional model relative to the camera and the corresponding image under the position and the posture are stored, and an annotated sample set is generated. In the invention, the imaging simulation software is the imaging simulation software developed based on OSG, and the given camera parameter and the space target three-dimensional model parameter are needed in the imaging simulation software, wherein the camera parameter comprises the camera position (x)cam,ycam,zcam) Camera three-axis attitude angle (pitch)cam,yawcam,rollcam) Camera field angle, width and height of image, and the three-dimensional model parameters of the space target comprise the position (x) of the three-dimensional model of the space targetobj,yobj,zobj) Three-axis attitude angle (pitch) of three-dimensional model of space targetobj,yawobj,rollobj) (ii) a And imaging the three-dimensional model of the space target by a camera to generate a two-dimensional image. Or inputting the space target three-dimensional model into 3dsMax software, adding a camera into the 3dsMax software, setting the position, the posture, the field angle and the width and the height of an image of the camera, adjusting the position and the posture of the three-dimensional model relative to the camera by taking the position and the posture of the camera as a reference, performing two-dimensional imaging on the three-dimensional model by the camera, storing the position and the posture of the three-dimensional model relative to the camera and the corresponding image at the position and the posture, and generating an annotation sample set.
According to any method, the two-dimensional projection image of the space target can be extracted, and the construction of the labeling sample set is completed by using the images. Subsequently, the constructed labeled sample set can be divided into a training set, a verification set and a test set. In the embodiment, the labeled sample set is randomly divided into a training set, a verification set and a test set according to a ratio of 7:2:1, that is, the labeled sample set is randomly divided into training samples, verification samples and test samples according to a ratio of 70%, 20% and 10%.
After the division of the sample set is completed, the position and pose estimation neural network can be constructed by utilizing the labeled sample set. In the invention, the pose estimation neural network is constructed based on a network model modified by a deep convolution residual error network ResNet. Because the basic unit residual block in the residual network has good characteristics, the phenomenon that the depth is increased and the loss function is increased is solved, and therefore the network is adopted as a basic model for designing. Generally speaking, when a network is constructed, a sample image is firstly input into a network formed by taking a residual convolutional neural network as a backbone network, then an output two-dimensional feature map is input into a bottleneck layer with variable dimensionality, dimensionality reduction convolution processing is carried out, finally, the feature map is flattened into a one-dimensional array, and then position and attitude information is respectively output through two branches.
As shown in fig. 2, a backbone network (backbone) constructed based on a residual convolutional neural network can be divided into five parts, namely, stage1, stage2, stage3, stage4 and stage5, and the composition and corresponding parameters of each part are as follows:
stage1 is composed of 1 convolutional layer and 1 maximal pooling layer;
the convolution kernel size of the convolution layer in stage1 is (7,7), the convolution step is (2,2), the number of channels is 64, the maximum pooling layer down-sampling factor is (3,3), and the step is (2, 2);
stage2 consists of 1 conv _ block and 2 identity _ blocks; the number of the conv _ block and the number of the identification _ block in the stage2 are 64, 64 and 256;
stage3 consists of 1 conv _ block and 4 identity _ blocks; the number of the conv _ block and the identity _ block in the stage3 is 128, 128 and 512;
stage4 consists of 1 conv _ block and 8 identity _ blocks; the number of the conv _ block and the identity _ block in the stage4 is 256, 256 and 1024;
stage5 consists of 1 conv _ block and 2 identity _ blocks; the number of the conv _ block and the identity _ block in the stage5 is 512, 512 and 2048.
According to the steps for constructing the network, the output characteristics of the stage5 are input into the bottleeck layer, the bottleeck layer is subjected to two-dimensional convolution by a convolution kernel of 3 multiplied by 3 with adjustable channel number, and the step length is (2, 2). Therefore, the output dimensionality is reduced, the parameter quantity is greatly reduced for the operation of the following full-connection layer, and the training time is reduced. The output characteristics of the bottleeck layer can be input into two branches of the position estimation structure and the attitude estimation structure respectively.
In the step of inputting the output characteristics of the bottleeck layer into the position estimation structure, the invention adopts a two-layer full-connection layer structure to directly output three-dimensional information in a regression mode. And the first full connection layer is used for performing dimensionality reduction operation on the flattened feature map information, compressing the feature map information to 1024 dimensions, inputting output data of the layer to the next full connection layer after a relu activation function is performed, and finally outputting the three-dimensional coordinate information (x, y, z). In the step of inputting the output characteristics of the bottleeck layer into the attitude estimation structure, the invention adopts a two-layer full-connection layer structure to directly output the estimated quaternion in a regression mode. The first full-connection layer is used for performing dimensionality reduction operation on the flattened feature diagram information, compressing the feature diagram information to 1024 dimensionalities, inputting output data of the layer to the next full-connection layer after a relu activation function is performed, and finally outputting the attitude information (q) represented by quaternion1,q2,q3,q4)。
Through the steps, a basic pose estimation neural network can be constructed, and then the network needs to be trained to be capable of becoming a pose estimation model. In the process of training the network, the training samples in the labeled sample set and the verification samples are input into a pose estimation neural network, the neural network is trained, and the model with the minimum loss function is selected as a final training model. And then inputting the test samples (namely the test set) into a pose estimation neural network, and testing the test samples by using the trained model so as to obtain a pose test result of the space target, namely pose information of each sample. In the invention, the loss function is obtained by adding a position loss term and an attitude loss term. In addition, because the obtained position and posture information are independent of each other and belong to different categories, two branch structures need to be designed, that is, two branches are divided by two arrows in the lower part of fig. 2, corresponding loss functions are designed for the two branches, and then the two branches are summed, so that the obtained loss function is:
Loss=Lossposition of+LossPosture
Wherein, Loss is total Loss function, and takes relative error form, LossPosition ofAs a position Loss term, LossPostureIs a pose loss term.
Loss of location function LossPosition ofComprises the following steps:
Figure BDA0002976608920000111
attitude Loss function LossPostureComprises the following steps:
Figure BDA0002976608920000112
in the expression of position and posture, m is the number of training samples, i is 1,2, …, m is a cyclic variable, and the position estimation value(i)True position value(i)A quaternion representing the position estimation value, the labeled position value and the attitude estimation value of the ith sample respectively(i)Quaternion of true attitude(i)The attitude estimation value and the labeled attitude value of the ith sample are expressed by quaternions respectively.
After the final training model is obtained through training, the position and the posture of the image containing the space target can be estimated by utilizing the final training model, so that the position and the posture information of the space target are obtained.
Fig. 3 shows a statistical result chart of the attitude estimation accuracy of 100 test samples, the average value is 3.71 °, and fig. 4 shows a statistical result chart of the position estimation accuracy of 100 test samples, the average value is 0.389 m. Fig. 5 and 6 show a comparison of the target pose annotation value with the estimate value. The left image of FIG. 5 is the true value of the pose labeling, the true value of the position is (-0.9979,0.3473,65.5559), and the true value of the attitude quaternion is (-0.6179,0.1311,0.0299, 0.7747); the right image of fig. 5 is a pose estimate, with position estimates (-0.9779,0.288,65.1297), and pose quaternion estimates (-0.6248,0.146,0.022, 0.7666); the resulting position error was 0.44m and attitude error was 2.28 °. The left image of FIG. 6 is the true value of the pose labeling, the true value of the position is (-3.9179, -16.5673,65.974), and the true value of the attitude quaternion is (-0.8,0.15, -0.059, 0.5778); the right image of fig. 6 is a pose estimate, with position estimates (-3.8477, -16.3197,65.7765), and attitude quaternion estimates (-0.7933,0.1419, -0.0296, 0.591); the resulting position error was 0.32m and attitude error was 3.93 °.
Generally, the estimation method for the relative pose of the space target has small difference between the estimation values of the position and the attitude of the space target and the true value, has high precision and is suitable for the technical field of estimation of the pose of the space target. The position and posture information of the space target can be estimated simultaneously through the regression model of the single image, and the method is suitable for target pose estimation under the condition of space complex illumination. Therefore, the technical problem that the pose of the space target is difficult to estimate is solved.
The above description is only one embodiment of the present invention, and is not intended to limit the present invention, and it is apparent to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for estimating relative poses of spatial targets based on deep learning comprises the following steps:
a. constructing a labeled sample set by using two-dimensional projections of the three-dimensional model of the space target at different positions and postures;
b. dividing a training set, a verification set and a test set of the labeled sample set, and constructing a pose estimation neural network;
c. inputting the training set and the verification set into a constructed pose estimation neural network for training to obtain a pose estimation model;
d. and testing the test set by using the pose estimation model obtained by training to obtain the pose information of the space target of each sample in the test set.
2. The method of claim 1, wherein in step (a), each sample in the labeled sample set comprises a position (x, y, z), a pose, and an image corresponding to the position and the pose, the pose being represented by a quaternion.
3. The method according to claim 1, wherein in the step (a), the three-dimensional model of the space target is input into 3dsMax software, a camera is added into the 3dsMax software, the position, the posture, the angle of view and the width and the height of the image of the camera are set, the position and the posture of the camera are taken as a reference, the position and the posture of the three-dimensional model relative to the camera are adjusted, the camera performs two-dimensional imaging on the three-dimensional model, the position and the posture of the three-dimensional model relative to the camera and the corresponding image under the position and the posture are saved, and the labeled sample set is generated;
or inputting the space target three-dimensional model into imaging simulation software, adding a camera into the imaging simulation software, setting the position, the posture, the field angle and the width and the height of an image of the camera, taking the position and the posture of the camera as a reference, adjusting the position and the posture of the three-dimensional model relative to the camera, performing two-dimensional imaging on the three-dimensional model by the camera, storing the position and the posture of the three-dimensional model relative to the camera and the corresponding image under the position and the posture, and generating the labeled sample set;
the imaging simulation software is developed based on OSG, camera parameters and space target three-dimensional model parameters are given in the imaging simulation software, and a two-dimensional image is generated by imaging the space target three-dimensional model through a camera; wherein the content of the first and second substances,
the camera parameters include camera position (x)cam,ycam,zcam) Camera three-axis attitude angle (pitch)cam,yawcam,rollcam) Camera field angle and width and height of the image;
the three-dimensional model parameters of the space target comprise space orderTargeting three-dimensional model position (x)obj,yobj,zobj) And three-axis attitude angle (pitch) of three-dimensional model of space targetobj,yawobj,rollobj)。
4. The method of claim 1, wherein in step (b), the labeled sample set is randomly divided into a training set, a validation set and a testing set according to a ratio of 7:2: 1.
5. The method according to claim 1, wherein in step (b), a pose estimation neural network is constructed based on a deep convolution residual network (ResNet);
when a network is constructed, a sample image is input into a network formed by taking a residual convolution neural network as a backbone network, then an output two-dimensional feature map is input into a bottleneck layer with variable dimensionality, dimensionality reduction convolution processing is carried out, finally the feature map is flattened into a one-dimensional array, and position and posture information is output through two branches respectively.
6. The method of claim 5, wherein the backbone network is divided into five parts, namely stage1, stage2, stage3, stage4 and stage 5;
stage1 is composed of 1 convolutional layer and 1 maximal pooling layer;
the convolution kernel size of the convolution layer in stage1 is (7,7), the convolution step is (2,2), the number of channels is 64, the maximum pooling layer down-sampling factor is (3,3), and the step is (2, 2);
stage2 consists of 1 conv _ block and 2 identity _ blocks; the number of the conv _ block and the number of the identification _ block in the stage2 are 64, 64 and 256;
stage3 consists of 1 conv _ block and 4 identity _ blocks; the number of the conv _ block and the identity _ block in the stage3 is 128, 128 and 512;
stage4 consists of 1 conv _ block and 8 identity _ blocks; the number of the conv _ block and the identity _ block in the stage4 is 256, 256 and 1024;
stage5 consists of 1 conv _ block and 2 identity _ blocks; the number of the conv _ block and the identity _ block in the stage5 is 512, 512 and 2048.
7. The method of claim 6, wherein the output characteristics of stage5 are input to a bottleeck layer, which is convolved two-dimensionally by a 3 x 3 convolution kernel with an adjustable number of channels, with a step size of (2, 2).
8. The method according to claim 7, wherein the output features of the bottleeck layer are input to two branches of the position estimation structure and the pose estimation structure, respectively.
9. The method according to claim 8, characterized in that the output characteristics of the bottleeck layer are input into the position estimation structure, and three-dimensional information is output in a regression manner by adopting a two-layer fully-connected layer structure;
the first full connection layer is used for performing dimensionality reduction operation on the flattened feature map information, compressing the feature map information to 1024 dimensions, inputting output data of the first full connection layer to the next full connection layer after a relu activation function is performed, and finally outputting the output data to three-dimensional coordinate information (x, y, z);
inputting the output characteristics of the bottleeck layer into an attitude estimation structure, and outputting an estimated quaternion in a regression mode by adopting a two-layer full-connection layer structure;
and the first full connection layer is used for performing dimensionality reduction operation on the flattened feature diagram information, compressing the flattened feature diagram information to 1024 dimensionalities, inputting output data of the layer to the next full connection layer after a relu activation function is performed, and finally outputting the attitude information represented by quaternion.
10. The method according to claim 1, wherein in the training network of step (c), the training samples in the labeled sample set and the validation samples are input into a pose estimation neural network for training, and a model with the minimum loss function is selected as a training model;
wherein the loss function is obtained by adding a position loss term and an attitude loss term:
Loss=Lossposition of+LossPosture
Wherein, Loss is total Loss function, and takes relative error form, LossPosition ofAs a position Loss term, LossPostureIs an attitude loss term;
loss of location function LossPosition ofComprises the following steps:
Figure FDA0002976608910000041
attitude Loss function LossPostureComprises the following steps:
Figure FDA0002976608910000042
where m is the number of training samples, i is 1,2, …, m, the position estimate(i)True position value(i)A quaternion representing the position estimation value, the labeled position value and the attitude estimation value of the ith sample respectively(i)Quaternion of true attitude(i)The attitude estimation value and the labeled attitude value of the ith sample are expressed by quaternions respectively.
CN202110275862.5A 2021-03-15 2021-03-15 Spatial target relative pose estimation method based on deep learning Pending CN113034581A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110275862.5A CN113034581A (en) 2021-03-15 2021-03-15 Spatial target relative pose estimation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110275862.5A CN113034581A (en) 2021-03-15 2021-03-15 Spatial target relative pose estimation method based on deep learning

Publications (1)

Publication Number Publication Date
CN113034581A true CN113034581A (en) 2021-06-25

Family

ID=76468750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110275862.5A Pending CN113034581A (en) 2021-03-15 2021-03-15 Spatial target relative pose estimation method based on deep learning

Country Status (1)

Country Link
CN (1) CN113034581A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763572A (en) * 2021-09-17 2021-12-07 北京京航计算通讯研究所 3D entity labeling method based on AI intelligent recognition and storage medium
CN114187360A (en) * 2021-12-14 2022-03-15 西安交通大学 Head pose estimation method based on deep learning and quaternion
CN114266824A (en) * 2021-12-10 2022-04-01 北京理工大学 Non-cooperative target relative pose measurement method and system based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816725A (en) * 2019-01-17 2019-05-28 哈工大机器人(合肥)国际创新研究院 A kind of monocular camera object pose estimation method and device based on deep learning
CN110349215A (en) * 2019-07-10 2019-10-18 北京悉见科技有限公司 A kind of camera position and orientation estimation method and device
WO2020161118A1 (en) * 2019-02-05 2020-08-13 Siemens Aktiengesellschaft Adversarial joint image and pose distribution learning for camera pose regression and refinement

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816725A (en) * 2019-01-17 2019-05-28 哈工大机器人(合肥)国际创新研究院 A kind of monocular camera object pose estimation method and device based on deep learning
WO2020161118A1 (en) * 2019-02-05 2020-08-13 Siemens Aktiengesellschaft Adversarial joint image and pose distribution learning for camera pose regression and refinement
CN110349215A (en) * 2019-07-10 2019-10-18 北京悉见科技有限公司 A kind of camera position and orientation estimation method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763572A (en) * 2021-09-17 2021-12-07 北京京航计算通讯研究所 3D entity labeling method based on AI intelligent recognition and storage medium
CN114266824A (en) * 2021-12-10 2022-04-01 北京理工大学 Non-cooperative target relative pose measurement method and system based on deep learning
CN114187360A (en) * 2021-12-14 2022-03-15 西安交通大学 Head pose estimation method based on deep learning and quaternion
CN114187360B (en) * 2021-12-14 2024-02-06 西安交通大学 Head pose estimation method based on deep learning and quaternion

Similar Documents

Publication Publication Date Title
CN110458939B (en) Indoor scene modeling method based on visual angle generation
CN109859296B (en) Training method of SMPL parameter prediction model, server and storage medium
JP6681729B2 (en) Method for determining 3D pose of object and 3D location of landmark point of object, and system for determining 3D pose of object and 3D location of landmark of object
CN110084304B (en) Target detection method based on synthetic data set
CN113034581A (en) Spatial target relative pose estimation method based on deep learning
AU2011362799B2 (en) 3D streets
CN112270249A (en) Target pose estimation method fusing RGB-D visual features
EP3182371B1 (en) Threshold determination in for example a type ransac algorithm
CN109242855B (en) Multi-resolution three-dimensional statistical information-based roof segmentation method, system and equipment
CN106503671A (en) The method and apparatus for determining human face posture
CN109816769A (en) Scene based on depth camera ground drawing generating method, device and equipment
CN109035327B (en) Panoramic camera attitude estimation method based on deep learning
CN112639846A (en) Method and device for training deep learning model
EP3420532B1 (en) Systems and methods for estimating pose of textureless objects
EP3185212B1 (en) Dynamic particle filter parameterization
CN110243390A (en) The determination method, apparatus and odometer of pose
CN114581571A (en) Monocular human body reconstruction method and device based on IMU and forward deformation field
CN114897136A (en) Multi-scale attention mechanism method and module and image processing method and device
CN111127556B (en) Target object identification and pose estimation method and device based on 3D vision
CN114972646A (en) Method and system for extracting and modifying independent ground objects of live-action three-dimensional model
CN111198563A (en) Terrain recognition method and system for dynamic motion of foot type robot
CN116612513A (en) Head posture estimation method and system
CN113379899B (en) Automatic extraction method for building engineering working face area image
Dong et al. Learning stratified 3D reconstruction
Zhang et al. A multiple camera system with real-time volume reconstruction for articulated skeleton pose tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination