CN113362467A

CN113362467A - Point cloud preprocessing and ShuffleNet-based mobile terminal three-dimensional pose estimation method

Info

Publication number: CN113362467A
Application number: CN202110634620.0A
Authority: CN
Inventors: 袁景凌; 俞洋; 白立华; 王梦蝶; 李宵
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-09-07
Anticipated expiration: 2041-06-08
Also published as: CN113362467B

Abstract

The invention discloses a mobile terminal three-dimensional pose estimation method based on point cloud preprocessing and ShuffleNet, which comprises the steps of preprocessing at a PC (personal computer) terminal, performing three-dimensional reconstruction on target point cloud data and importing the data into a three-dimensional rendering engine; obtaining two-dimensional photos of a target under different visual angles by adopting a rotation photographing algorithm in a three-dimensional engine, labeling the photos by using a key voxel block extraction algorithm provided by the invention, and establishing a training data set; the detection model of the key voxel blocks of the target is trained by using ShuffleNetv2-YOLOv3 which has light weight and high performance advantages and is suitable for mobile terminal calculation; reading a video stream from a mobile terminal camera, detecting a target key voxel block through a ShuffleNetv2-YOLOv3 model, and calculating a 2D-3D point pair corresponding to a key voxel block central point through RANSAC and EPNP algorithm to obtain a relative pose of a target. And finally, calculating the pose of the target in the actual three-dimensional world by using the advantages of the mobile terminal through data provided by the built-in IMU and the GPS.

Description

Point cloud preprocessing and ShuffleNet-based mobile terminal three-dimensional pose estimation method

Technical Field

The invention belongs to the technical field of computers, and relates to a mobile terminal three-dimensional pose estimation method based on point cloud preprocessing and ShuffleNet, which can be widely applied to the fields of robot grabbing, vehicle intelligent navigation, augmented reality, medical diagnosis and the like.

Background

The three-dimensional pose estimation plays a very key role in the fields of robot grabbing, vehicle intelligent navigation, augmented reality, medical diagnosis and the like. At present, the mainstream pose estimation methods are divided into two categories, one category is an identification method based on a two-dimensional image, the method predicts 1 central point and 8 angular points of an object for an input RGB or RGB-D image, and then obtains the 6D pose of the object through a PNP or EPNP algorithm. The algorithm has good real-time performance but low accuracy. And the other type of the method is used for positioning based on point cloud data, the method firstly uses a deep network to establish a corresponding relation between 3D point cloud data and a 2D image, and then obtains the 6D posture of the object through a PNP or EPNP algorithm. Due to the use of point cloud data, the accuracy is higher than the first class, but comparatively, the speed is lower.

The mobile terminal of the mobile phone has the advantages of high popularization rate and convenience in carrying, but the hardware configuration is far lower than that of a PC (personal computer), and the recognition speed by adopting a conventional algorithm is difficult to meet the requirement. The portable advantages of the laser radar and the depth camera which need to be externally connected are weakened in configuration, and the mobile end only can adopt an RGB video stream identification scheme, so that the pose analysis accuracy is not high.

The invention mainly aims at the requirement of a mobile terminal on target pose estimation in the field of auxiliary industrial application, and provides a mobile terminal three-dimensional pose estimation method based on point cloud preprocessing and ShuffleNet.

The design scheme adopted by the invention is as follows: a mobile terminal three-dimensional pose estimation method based on point cloud preprocessing and ShuffleNet comprises the following steps:

step 1: performing three-dimensional reconstruction on target point cloud data obtained by laser scanning; importing a three-dimensional model obtained by three-dimensional reconstruction into a rendering engine for photographing;

step 2: respectively acquiring two-dimensional photos and camera poses of the target under different visual angles by adopting a positioning rotation photographing algorithm; extracting two-dimensional photo feature points through SIFT, calculating corresponding three-dimensional feature points, dividing a target model into equal-size voxel blocks, and screening target key voxel blocks according to the number of the three-dimensional feature points; generating two-dimensional projection of key voxel blocks on the photo set and establishing a training data set; training a ShuffLeNet feature detection model aiming at a target through a ShufflLeNet v2-YOLOv3 lightweight network;

and step 3: inputting the video stream into a trained detection model of ShuffleNetv2-YOLOv3 target key voxel blocks, identifying the key voxel blocks to obtain 2D-3D matching point pairs, and calculating the relative pose of the target by combining RANSAC and EPNP algorithms;

and 4, step 4: and calculating the absolute pose of the target in the three-dimensional world by combining the GPS and IMU information of the mobile terminal.

The invention combines the advantages of two types of three-dimensional pose estimation algorithms, and firstly carries out preprocessing at the PC end. And reconstructing a three-dimensional model for the target point cloud data through a Delauay algorithm. The method adopts a positioning rotation photographing and key voxel block extraction algorithm to automatically generate a target voxel characteristic detection data set, and adopts a ShuffleNetv2-YOLOv3 training characteristic detection model which has the advantages of light weight and high performance and is suitable for mobile-end calculation.

The invention fully utilizes the advantages of the mobile terminal hardware equipment in the identification stage and introduces GPS and IMU data to position the pose of the mobile terminal. And detecting key voxel blocks of the target by using a trained ShuffleNetv2-YOLOv3 model, and calculating by using RANSAC and EPNP algorithms to obtain the relative pose between the target and the mobile terminal camera. And finally, calculating the absolute pose of the target in the three-dimensional world. At present, the popularity of the mobile terminal of the mobile phone exceeds 90%, the invention can provide position and pose estimation which does not depend on a depth camera and laser equipment and meets the requirements of industrial auxiliary application in real time and accuracy at the mobile terminal, and has the advantages of portability, practicability and easy popularization.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Fig. 2 is a schematic block diagram of an embodiment of the present invention.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

Referring to fig. 1 and fig. 2, the method for estimating the three-dimensional pose of the mobile terminal based on point cloud preprocessing and ShuffleNet provided by the invention comprises the following steps:

the specific implementation of the embodiment includes the following sub-steps:

step 1.1: performing three-dimensional reconstruction on target point cloud data obtained by laser scanning through a Delauay algorithm to obtain a target three-dimensional model;

step 1.2: and importing the target three-dimensional model into a rendering engine, calculating a target three-dimensional model bounding box and a central point, and moving the target three-dimensional model to enable the central point to reach an origin point.

the specific implementation of the embodiment includes the following sub-steps:

step 2.1: shooting a target in a three-dimensional engine to obtain a target photo;

step 2.2: detecting two-dimensional feature points in the target picture through an SIFT algorithm to obtain a feature point set K ═ K₁,...,k_n}；

Step 2.3: calculating each characteristic point k by a screen ray projection algorithm_iCorresponding three-dimensional coordinate point p_iAnd recording the three-dimensional feature point set corresponding to the two-dimensional feature point as P ═ P₁,...,p_n}；

Step 2.4: the camera rotates around the target and continues to take pictures of the target, and the step 2.2 and the step 2.3 are repeated until a multi-view picture of the target is obtained and a three-dimensional feature point set PS is obtained through calculation₁,...,P_NN is the number of photos, and P is a three-dimensional feature point set of each photo;

step 2.5: dividing a target three-dimensional voxel (Volume Pixel) into M blocks B ═ B of the same voxel size₁,...,b_M}; setting the frequency of the three-dimensional feature points appearing in each voxel block as a voxel block weight q, and screening m blocks KB with the maximum weight as { b }₁,...,b_mAs key voxel block, where m<M；

Step 2.6: taking the key voxel blocks as categories, calculating the areas of the key voxel blocks on the two-dimensional photo set according to a projection transformation formula, and generating marking information to obtain a training data set;

step 2.7: and training ShuffleNetv2-YOLOv3 through the generated data set to obtain a detection model aiming at the target key voxel block.

And step 3: inputting the video stream into a trained detection model of ShuffleNetv2-YOLOv3 target key voxel blocks, identifying the key voxel blocks to obtain 2D-3D matching point pairs, and calculating the relative pose of the target by combining an EPNP algorithm;

the specific implementation of the embodiment includes the following sub-steps:

step 3.1: reading a video stream, inputting a trained detection model of ShuffleNetv2-YOLOv3 key voxel blocks, and outputting the detection model into two-dimensional areas corresponding to a plurality of key voxel blocks;

step 3.2: calculating the detected central point of the two-dimensional region, and forming a 2D-3D matching point pair with the central point of the corresponding key voxel block;

step 3.3: and calculating the relative pose between the target and the mobile terminal camera through RANSAC and EPNP algorithms.

And 4, step 4: calculating the pose of the mobile end camera in the three-dimensional world through the GPS and IMU information of the mobile end, and calculating the absolute pose of the target in the three-dimensional world by combining the relative position between the target and the mobile end camera;

the specific implementation of the embodiment includes the following sub-steps:

step 4.1: reading GPS and IMU data of a mobile terminal;

step 4.2: calculating the positioning of the mobile terminal in the three-dimensional world through the data acquired in the step 4.1;

step 4.3: and (3) calculating the absolute pose of the target in the three-dimensional world according to the moving end pose calculated in the step 4.2 and the relative pose of the target calculated in the step 3.

The method comprises the steps of estimating the real-time pose at a mobile terminal, reading a video stream from a camera at the mobile terminal, obtaining feature points of a target in an RGB image by using a ShuffleNet model, and obtaining the relative pose through RANSAC and EPNP algorithms. The invention fully utilizes the advantages of the mobile terminal, and calculates the pose of the target in the actual three-dimensional world by combining the relative poses through the positioning information provided by the GPS and IMU of the mobile terminal.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A mobile terminal three-dimensional pose estimation method based on point cloud preprocessing and ShuffleNet comprises the following steps:

and step 3: inputting the video stream into a trained detection model of ShuffleNetv2-YOLOv3 target key voxel blocks, identifying the key voxel blocks to obtain 2D-3D matching point pairs, and calculating the relative pose of the target through RANSAC and EPNP algorithms;

2. The point cloud preprocessing and ShuffleNet-based mobile terminal three-dimensional pose estimation method according to claim 1, characterized in that: in the step 1, performing three-dimensional reconstruction on target point cloud data obtained by laser scanning through a Delauay algorithm to obtain a target three-dimensional model; and importing the target three-dimensional model into a rendering engine, calculating a target three-dimensional model bounding box and a central point, and moving the target three-dimensional model to enable the central point to reach an origin point.

3. The method for estimating the three-dimensional pose of the moving end based on the point cloud preprocessing and the ShuffleNet according to claim 1, wherein the concrete implementation of the step 2 comprises the following substeps:

step 2.5: dividing the target three-dimensional voxel into M blocks B ═ B with the same voxel size₁,...,b_M}; setting the frequency of the three-dimensional feature points appearing in each voxel block as a voxel block weight q, and screening m blocks KB with the maximum weight as { b }₁,...,b_mAs key voxel block, where m<M；

4. The method for estimating the three-dimensional pose of the moving end based on the point cloud preprocessing and the ShuffleNet according to claim 1, wherein the concrete implementation of the step 3 comprises the following substeps:

5. The method for estimating the three-dimensional pose of the moving end based on the point cloud preprocessing and the ShuffleNet according to any one of claims 1 to 4, wherein the specific implementation of the step 4 comprises the following substeps:

step 4.1: reading GPS and IMU data of a mobile terminal;