CN112561995A - Real-time efficient 6D attitude estimation network, construction method and estimation method - Google Patents

Real-time efficient 6D attitude estimation network, construction method and estimation method Download PDF

Info

Publication number
CN112561995A
CN112561995A CN202011430902.0A CN202011430902A CN112561995A CN 112561995 A CN112561995 A CN 112561995A CN 202011430902 A CN202011430902 A CN 202011430902A CN 112561995 A CN112561995 A CN 112561995A
Authority
CN
China
Prior art keywords
network
real
pose estimation
linemod
time efficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011430902.0A
Other languages
Chinese (zh)
Other versions
CN112561995B (en
Inventor
刘鹏磊
张锲石
程俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202011430902.0A priority Critical patent/CN112561995B/en
Publication of CN112561995A publication Critical patent/CN112561995A/en
Application granted granted Critical
Publication of CN112561995B publication Critical patent/CN112561995B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The invention discloses a real-time and efficient 6D attitude estimation network, a construction method and an estimation method, belongs to the technical field of computer vision, relates to the field of 6D attitude estimation, utilizes a multidirectional feature fusion pyramid network MFPN to fuse and express features, can effectively express and process multi-scale features, moreover, the method can effectively process the conditions of occlusion and complex background, takes the cross-phase local network CSPNet as a basic module, integrates a YOLO frame, constructs a backbone network capable of effectively extracting features, then combined with a multi-directional feature fusion pyramid network MFPN, finally designing a new network MFPN-6D for 6D attitude estimation, the problems of insufficient texture and shielding of the object can be effectively solved, the prediction precision and the calculation speed of the model are improved, and the robustness is also enhanced.

Description

Real-time efficient 6D attitude estimation network, construction method and estimation method
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a real-time and efficient 6D attitude estimation network, a construction method and an estimation method.
Background
The 6D pose estimation refers to estimating the 6D pose, i.e. the 3D position and the 3D pose, of an object in the camera coordinate system, and at this time, the coordinate system of the original object itself can be regarded as a world coordinate system, i.e. R T transformation from the world system where the original object is located to the camera system is obtained. Rigid means that the object does not deform. The significance of the 6D pose estimation of the rigid body is that the accurate pose of an object can be obtained, the fine operation of the object is supported, and the method is mainly applied to the field of robot grabbing and the field of augmented reality. The latest research trend of 6D posture estimation is to train a deep neural network to directly predict the 2D projection position of a 3D key point from an image, establish a corresponding relation and finally carry out posture estimation by using a Pespective-n-Point (PnP) algorithm. The current challenges of pose estimation are that when there are few object textures, occlusion and scene clutter, the detection accuracy will be reduced, and most of the existing computational models are large and cannot meet the real-time requirements.
The 6D attitude estimation methods of the related art are mainly divided into two types: based on depth information (RGB-D) or based on image information (RGB). Although current methods of pose estimation using RGB-D cameras are reliable, depth cameras are only suitable for indoor scenes and in situations where power is insufficient. In contrast, RGB cameras are suitable for a wider range of scenes and save power. In the image-based field, the 6D pose estimation algorithm for objects has methods of key point matching and edge matching, and although objects with rich textures can be effectively processed, objects with no textures or few textures cannot be processed. To solve this problem, a deep learning based approach has recently been used in pose estimation. For example: BB8 and PVNet, which predict 2D-3D correspondences by training a deep neural network and further solve the pose by the PnP algorithm. Although they have achieved good performance, these methods either require a post-processing stage and are difficult to achieve with real-time requirements. Some algorithms have achieved good results in terms of speed, such as: YOLO-6D, but this method works poorly for objects where occlusion is present and small objects.
Therefore, the related art has the following two disadvantages with respect to the 6D pose estimation study: when the texture of the target object is few and the situation of shielding and complex scenes exists, the detection precision is reduced, and even the detection cannot be carried out; most of the existing calculation methods require a large amount of parameters, so that the model is large and the real-time requirement cannot be met mostly.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a real-time and efficient 6D attitude estimation network, a construction method and an estimation method, which can effectively solve the problem that the texture of the surface of an object is insufficient or other objects shield the target object, improve the detection precision and speed, and have higher robustness.
In order to achieve the above purpose, the invention provides a real-time and efficient 6D posture estimation network, which includes a multidirectional feature fusion pyramid network and a backbone network, wherein the multidirectional feature fusion pyramid network and the backbone network are combined to form the 6D posture estimation network, the multidirectional feature fusion pyramid network is used for fusing and expressing features, and the backbone network is used for feature extraction.
Further, the multidirectional feature fusion pyramid network comprises a residual structure, and the residual structure is fused into forward propagation and vertical propagation of the multidirectional feature fusion pyramid network.
Further, the backbone network takes a CSPNet network as a basic module and fuses a YOLO framework.
Further, the total data set of the 6D posture estimation network comprises a LINEMOD standard data set and an occupied-LINEMOD standard data set, and the 6D posture estimation network is trained and verified on the LINEMOD standard data set and the occupied-LINEMOD standard data set.
Further, the LINEMOD standard data set includes 13 sequences, each sequence containing the true pose of a single object in a cluttered environment and providing CAD models of all objects; the Occluded-LINEMOD standard data set is a data set which comprises a plurality of target objects and has occlusion.
Further, the total data set of the 6D pose estimation network includes a training set and a test set, the training set accounts for 20% of the total data set, and the test set accounts for 80% of the total data set.
Further, the 6D pose estimation network operates at a speed of 56 FPS.
The invention also provides a method for constructing the real-time and efficient 6D attitude estimation network, which comprises the following steps: firstly, fusing a residual structure into forward propagation and vertical propagation to establish a multidirectional characteristic fusion pyramid network; then, a CSPNet network is used as a basic module, and a YOLO framework is fused to establish a backbone network; and finally, combining the multidirectional feature fusion pyramid network and the backbone network to form a 6D attitude estimation network.
Further, in the construction method, the 6D attitude estimation network is trained and verified on a LINEMOD standard data set and an Occluded-LINEMOD standard data set.
The invention also provides a 6D attitude estimation method, which adopts the real-time and efficient 6D attitude estimation network.
Compared with the prior art, the method can solve the problem of 6D pose estimation of a rigid body, the multidirectional feature fusion pyramid network MFPN is used for fusing and expressing features, the multidirectional feature fusion pyramid network MFPN can effectively express and process multi-scale features and can effectively process the conditions of shielding and complex background, the cross-stage local network CSPNet is used as a basic module and is fused with a YOLO frame, a backbone network capable of effectively extracting the features is constructed, then the backbone network is combined with the multidirectional feature fusion pyramid network MFPN, and finally a new network MFPN-6D for 6D pose estimation is designed, so that the problems of insufficient texture and shielding of an object can be effectively solved, the prediction accuracy and the calculation speed of the model are improved, and the robustness is also enhanced.
Drawings
FIG. 1 is a schematic diagram of a 6D pose estimation neural network MFPN-6D of the present invention;
FIG. 2a is a schematic diagram of a feature pyramid network FPN; FIG. 2b is a schematic diagram of a PANet network; FIG. 2c is a schematic diagram of a BiFPN network; fig. 2d is a schematic diagram of the multi-directional feature fusion pyramid network MFPN of the present invention.
Detailed Description
The present invention will be further explained with reference to the drawings and specific examples in the specification, and it should be understood that the examples described are only a part of the examples of the present application, and not all examples. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, an embodiment of the present invention provides a real-time and efficient 6D posture estimation network MFPN-6D, which includes a multidirectional feature fusion pyramid network MFPN and a backbone network, the multidirectional feature fusion pyramid network MFPN and the backbone network are combined to form the 6D posture estimation network MFPN-6D, the multidirectional feature fusion pyramid network MFPN is used for fusing and expressing features, and the backbone network is used for feature extraction. The multi-directional feature fusion pyramid network MFPN comprises a residual structure, and the residual structure is fused into the forward propagation and the vertical propagation of the multi-directional feature fusion pyramid network MFPN. The backbone network takes a cross-stage local network CSPNet network as a basic module and fuses a YOLO framework.
The total data set of the 6D posture estimation network MFPN-6D comprises a LINEMOD standard data set and an Occluded-LINEMOD standard data set, and the 6D posture estimation network MFPN-6D is trained and verified on the LINEMOD standard data set and the Occluded-LINEMOD standard data set. The LINEMOD standard dataset consists of 13 sequences, each sequence containing the true pose of a single target in a cluttered environment and providing a CAD model of the target; the Occluded-LINEMOD standard data set is a data set which comprises a plurality of target objects and has occlusion. The total data set of the 6D posture estimation network MFPN-6D comprises a training set and a testing set, wherein the training set accounts for 20% of the total data set, and the testing set accounts for 80% of the total data set. The 6D attitude estimation network MFPN-6D runs at the speed of 56FPS, and is the fastest method in the field of 6D attitude estimation at present.
One of the main difficulties in 6D pose estimation is the efficient representation and processing of multi-scale features in order to be able to efficiently represent and process multi-scale features. As shown in fig. 2a, the feature pyramid network FPN proposes a top-down path to combine multi-scale features, but FPN is inherently limited by unidirectional information flow. To solve this problem, PANet adds an additional bottom-up path aggregation network on the basis of FPN, as shown in fig. 2 b. The accuracy of PANet is high, but more parameters and calculation are needed, and in order to improve the model efficiency, Google researchers have proposed a BiFPN network, as shown in fig. 2c, which is an effective two-way cross-scale connection and weighted feature fusion network. BiFPN is more accurate and less costly than PANET. BiFPN is one of the most advanced feature networks, but only considers the problem of forward feature propagation and does not consider the problem of vertical propagation of features, which results in that the features are lost when the features propagate in the vertical direction, so that all feature information cannot be effectively used.
In order to more efficiently process and represent multi-scale features, the idea of a residual network is applied in the 6D pose estimation network MFPN-6D of the present invention. The residual structure is fused into the forward propagation and the vertical propagation of the 6D posture estimation network MFPN-6D, and finally the multidirectional feature fusion pyramid network MFPN is provided, as shown in FIG. 2D, the forward residual structure and the residual structure in the vertical direction are added on the basis of BiFPN, and the provided multidirectional feature fusion pyramid network MFPN can improve the feature utilization rate in the forward and vertical propagation and more effectively represent and process the multi-scale features.
In the aspect of designing a skeleton network, the invention adopts the most advanced cross-stage local network CSPNet network as a basic module, designs a final feature extraction backbone network by fusing the idea of a YOLO network frame, and constructs a neural network MFPN-6D for 6D attitude estimation by combining with a multidirectional feature fusion pyramid network MFPN, as shown in FIG. 1, the backbone network for feature extraction is designed on the basis of a CSPNet structure, so that the feature extraction can be efficiently carried out on pictures, the MFPN network is combined with the backbone network as a Neck network, the final detection network adopts a YOLO network, the finally designed network can efficiently and accurately carry out 6D attitude estimation on objects, and can run at the speed of 56FPS, thereby being the fastest method in the field of 6D attitude estimation at present.
The invention aims to provide 6D attitude estimation which is efficient, rapid and capable of effectively processing the occlusion problem, firstly, a multidirectional feature fusion pyramid network MFPN is designed, features can be effectively fused and expressed, then, a backbone network is designed for feature extraction by taking CSPNet as a basic module and fusing a YOLO framework, and finally, the backbone network and the multidirectional feature fusion pyramid network MFPN are combined to form the 6D attitude estimation network MFPN-6D.
The multi-directional feature fusion pyramid network MFPN can effectively represent and process multi-scale features and can effectively process the conditions of occlusion and complex background. The 6D attitude estimation network MFPN-6D constructed based on the multidirectional feature fusion pyramid network MFPN can quickly and accurately estimate the attitude of the target object. Compared with other methods, the method is far superior to other methods in the aspects of efficiency, speed and robustness, and can verify that the method is superior to other methods, effectively solve the problem that the surface texture of an object is insufficient or other objects shield the target object, improve the detection precision and simultaneously give consideration to the speed, and has higher robustness.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. The real-time efficient 6D attitude estimation network is characterized by comprising a multidirectional feature fusion pyramid network and a backbone network, wherein the multidirectional feature fusion pyramid network and the backbone network are combined to form the 6D attitude estimation network, the multidirectional feature fusion pyramid network is used for fusing and expressing features, and the backbone network is used for feature extraction.
2. The real-time efficient 6D pose estimation network of claim 1, wherein the multidirectional feature fusion pyramid network comprises a residual structure that is fused into a forward propagation and a vertical propagation of the multidirectional feature fusion pyramid network.
3. The real-time efficient 6D pose estimation network of claim 2, wherein the backbone network is based on CSPNet network and integrates YOLO framework.
4. The real-time efficient 6D pose estimation network of claim 1, wherein the total dataset of the 6D pose estimation network comprises a LINEMOD standard dataset and an Occluded-LINEMOD standard dataset, and wherein the 6D pose estimation network is trained and validated on the LINEMOD standard dataset and the Occluded-LINEMOD standard dataset.
5. The real-time efficient 6D pose estimation network of claim 4, wherein the LINEMOD standard dataset comprises 13 sequences, each sequence containing the true pose of a single object in a cluttered environment and providing CAD models of all objects; the Occluded-LINEMOD standard data set is a data set which comprises a plurality of target objects and has occlusion.
6. The real-time efficient 6D pose estimation network according to claim 5, wherein a total data set of the 6D pose estimation network comprises a training set and a test set, wherein the training set accounts for 20% of the total data set, and wherein the test set accounts for 80% of the total data set.
7. The real-time efficient 6D pose estimation network of claim 1, wherein the 6D pose estimation network operates at 56 FPS.
8. A method for constructing a real-time efficient 6D pose estimation network according to any of claims 1 to 7, comprising: firstly, fusing a residual structure into forward propagation and vertical propagation to establish a multidirectional characteristic fusion pyramid network; then, a CSPNet network is used as a basic module, and a YOLO framework is fused to establish a backbone network; and finally, combining the multidirectional feature fusion pyramid network and the backbone network to form a 6D attitude estimation network.
9. The method of claim 8, wherein the 6D pose estimation network is trained and validated on a LINEMOD standard dataset and an Occluded-LINEMOD standard dataset.
10. A method of 6D pose estimation employing a real-time efficient 6D pose estimation network according to any of claims 1 to 7.
CN202011430902.0A 2020-12-09 2020-12-09 Real-time and efficient 6D attitude estimation network, construction method and estimation method Active CN112561995B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011430902.0A CN112561995B (en) 2020-12-09 2020-12-09 Real-time and efficient 6D attitude estimation network, construction method and estimation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011430902.0A CN112561995B (en) 2020-12-09 2020-12-09 Real-time and efficient 6D attitude estimation network, construction method and estimation method

Publications (2)

Publication Number Publication Date
CN112561995A true CN112561995A (en) 2021-03-26
CN112561995B CN112561995B (en) 2024-04-23

Family

ID=75060013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011430902.0A Active CN112561995B (en) 2020-12-09 2020-12-09 Real-time and efficient 6D attitude estimation network, construction method and estimation method

Country Status (1)

Country Link
CN (1) CN112561995B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436251A (en) * 2021-06-24 2021-09-24 东北大学 Pose estimation system and method based on improved YOLO6D algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101473439A (en) * 2006-04-17 2009-07-01 全视Cdm光学有限公司 Arrayed imaging systems and associated methods
US20150146029A1 (en) * 2013-11-26 2015-05-28 Pelican Imaging Corporation Array Camera Configurations Incorporating Multiple Constituent Array Cameras
CN110533721A (en) * 2019-08-27 2019-12-03 杭州师范大学 A kind of indoor objects object 6D Attitude estimation method based on enhancing self-encoding encoder
CN111145253A (en) * 2019-12-12 2020-05-12 深圳先进技术研究院 Efficient object 6D attitude estimation algorithm
CN111968235A (en) * 2020-07-08 2020-11-20 杭州易现先进科技有限公司 Object attitude estimation method, device and system and computer equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101473439A (en) * 2006-04-17 2009-07-01 全视Cdm光学有限公司 Arrayed imaging systems and associated methods
US20150146029A1 (en) * 2013-11-26 2015-05-28 Pelican Imaging Corporation Array Camera Configurations Incorporating Multiple Constituent Array Cameras
CN110533721A (en) * 2019-08-27 2019-12-03 杭州师范大学 A kind of indoor objects object 6D Attitude estimation method based on enhancing self-encoding encoder
CN111145253A (en) * 2019-12-12 2020-05-12 深圳先进技术研究院 Efficient object 6D attitude estimation algorithm
CN111968235A (en) * 2020-07-08 2020-11-20 杭州易现先进科技有限公司 Object attitude estimation method, device and system and computer equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
""基于3D多视图的物体识别及姿态估计方法"", 《中国知网 硕士电子期刊》, no. 08, pages 5 *
MINGXING TAN 等: ""EfficientDet: Scalable and Efficient Object Detection"", 《PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, pages 2 - 5 *
PENGLEI LIU 等: "MFPN-6D : Real-time One-stage Pose Estimation of Objects on RGB Images", 《2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021)》, pages 12939 - 12945 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436251A (en) * 2021-06-24 2021-09-24 东北大学 Pose estimation system and method based on improved YOLO6D algorithm
CN113436251B (en) * 2021-06-24 2024-01-09 东北大学 Pose estimation system and method based on improved YOLO6D algorithm

Also Published As

Publication number Publication date
CN112561995B (en) 2024-04-23

Similar Documents

Publication Publication Date Title
CN109791697B (en) Predicting depth from image data using statistical models
WO2016110239A1 (en) Image processing method and device
Luo et al. Real-time dense monocular SLAM with online adapted depth prediction network
CN107330439A (en) A kind of determination method, client and the server of objects in images posture
WO2021218123A1 (en) Method and device for detecting vehicle pose
US11367195B2 (en) Image segmentation method, image segmentation apparatus, image segmentation device
WO2023016271A1 (en) Attitude determining method, electronic device, and readable storage medium
CN112560684B (en) Lane line detection method, lane line detection device, electronic equipment, storage medium and vehicle
CN110706269B (en) Binocular vision SLAM-based dynamic scene dense modeling method
CN110232418B (en) Semantic recognition method, terminal and computer readable storage medium
WO2019157922A1 (en) Image processing method and device and ar apparatus
EP3665651B1 (en) Hierarchical disparity hypothesis generation with slanted support windows
CN115330940B (en) Three-dimensional reconstruction method, device, equipment and medium
CN111753739A (en) Object detection method, device, equipment and storage medium
CN114037087B (en) Model training method and device, depth prediction method and device, equipment and medium
CN112561995B (en) Real-time and efficient 6D attitude estimation network, construction method and estimation method
CN113409340A (en) Semantic segmentation model training method, semantic segmentation device and electronic equipment
CN108921852B (en) Double-branch outdoor unstructured terrain segmentation network based on parallax and plane fitting
CN111192312A (en) Depth image acquisition method, device, equipment and medium based on deep learning
CN115953468A (en) Method, device and equipment for estimating depth and self-movement track and storage medium
CN112085842A (en) Depth value determination method and device, electronic equipment and storage medium
CN113763474B (en) Indoor monocular depth estimation method based on scene geometric constraint
WO2022120988A1 (en) Stereo matching method based on hybrid 2d convolution and pseudo 3d convolution
CN114005098A (en) Method and device for detecting lane line information of high-precision map and electronic equipment
CN114723809A (en) Method and device for estimating object posture and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant