CN109903332A

CN109903332A - A kind of object's pose estimation method based on deep learning

Info

Publication number: CN109903332A
Application number: CN201910016293.5A
Authority: CN
Inventors: 高明煜; 杜宇杰; 杨宇翔; 何志伟; 曾毓
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2019-01-08
Filing date: 2019-01-08
Publication date: 2019-06-18

Abstract

The object's pose estimation method based on deep learning that the present invention relates to a kind of.It needs to grab target object using mechanical arm in actual industrial environment, needs to obtain the spatial positional information and posture information of target object first.Camera chain is cheap, so most widely used come the method for carrying out the Attitude estimation of target object using visual information.It carries out Attitude estimation using traditional vision algorithm to be difficult to extract effective feature, ratio of precision is relatively limited.The advantage of neural network is utilized in the method for the present invention, and the method that models after being extracted target object important area using neural network algorithm to body surface estimates gestures of object.This method is adaptable, for different types of object, need to only collect data set and carry out re -training to neural network, without redesigning feature extractor.And this method determines that posture is accurate, using the powerful ability in feature extraction of neural network, can estimate to analyze the object in most of scene.

Description

A kind of object's pose estimation method based on deep learning

Technical field

The invention belongs to computer vision fields, and in particular to a kind of object's pose estimation method based on deep learning.

Background technique

The posture information that target object is obtained using visual information is the vital task of computer vision.In actual work In industry environment, needs to grab target object using mechanical arm, need to obtain the spatial three-dimensional position of target object Information and posture information.So the accuracy of Attitude estimation is extremely important for mechanical arm crawl object.Currently, posture is estimated The method of meter mainly has: 1) based on the Attitude estimation of study；2) based on the Attitude estimation of model.

Estimated based on the Attitude estimation of model mainly in conjunction with the characteristic point information of object or the geometrical relationship of object Count the posture of object.It is the structure and shape that object is indicated by the geometrical characteristic or geometrical model of object, is then passed through Image and the direct matching relationship of model are established to determine the posture information of object in three dimensions.This method is usually to compare The relationship between feature that the model relatively set up and input picture extract, goes to determine model by Feature Correspondence Algorithm Posture.

Attitude estimation method based on study refers to using machine learning or deep learning, from training dataset middle school The disaggregated model succeeded in school either regression model is used test data by the relationship for practising three-dimensional pose posture and two dimension observation figure Detection on, for obtaining the posture of target object.What such method was generally extracted is the information of whole picture, does not only use object A part of feature of body, so comparatively, the Attitude estimation based on study has better anti-interference ability, and robustness is more By force.

Summary of the invention

The present invention models fitting ability powerful using neural network, proposes a kind of Attitude estimation based on deep learning Method.This method extracts the effective coverage of target object surface using neural network, and establishes model to its surface and go to estimate Obtain gestures of object information.It comprises the concrete steps that:

Step (1) obtains color image and depth image using RGB-D camera

The scene comprising target object is shot using RGB-D camera, obtain a color image and with colour The one-to-one depth image of image pixel.

Step (2) extracts interested pixel region in image using semantic segmentation network

Using semantic segmentation neural network, input picture is extracted into feature by convolution pondization and then is up-sampled It is merged with shallow-layer profile information, finally obtains one and the consistent segmentation result characteristic pattern of input image resolution, it is right The classification of a pixel scale is done in input picture, is determined in the interested pixel region of target object surface.It is defeated extracting Enter in image after interested target object region, so that it may which the body surface for being directed to the part determines the appearance of target object State.

After semantic segmentation network, the image information combination depth information that extraction obtains is acquired to the appearance of target object State.

Step (3) models body surface and determines gestures of object

The depth on body surface plane domain and corresponding fusion depth image detected using semantic segmentation network Information establishes areal model to body surface plane:

Ax+By+Cz+D=0

The parameter of plane equation is acquired using Principal Component Analysis Algorithm, calculation is as follows:

1. utilizing formula ∑=E (aa^T)-E(a)E(a^T), the covariance matrix of all the points on object plane is acquired, is it Middle a is sample data, and E is with regard to average operation.

2. seeking the characteristic value and feature vector of covariance matrix, the corresponding feature vector of minimal eigenvalue is chosen, with this As the normal vector of body surface plane, plane equation parameter A, B, C are obtained.

3. seeking the average coordinates of all surface coordinate points, bring into plane equation, solution obtains equation parameter D.

By Principal Component Analysis Algorithm, the plane equation of body surface is obtained, estimation obtains the posture of target object.But Be, during seeking plane equation using depth integration figure because depth camera collect there are much noise, serious shadows The equation for finally acquiring plane is rung, needs first to filter out the noise spot in initial data.

Plane is acquired after filtering out the noise spot in data using random sample unification algorism combination Principal Component Analysis Algorithm Model.The step of process, is as follows:

1. randomly selecting four sample datas that can determine that equation parameter, the parameter of plane equation is acquired using data point, Obtain areal model.

2. calculating the error of all data points using the areal model acquired.When error is less than given threshold value, it is believed that be It is interior, it is otherwise exterior point.

3. the number put in statistics, if interior number is greater than setting quantity, using Principal Component Analysis Algorithm in all interior points On acquire "current" model.

4. the mean error of all interior points is calculated using the model that interior point set acquires, if error current is less than the optimal of storage When the error of model, optimal models are updated, and update the error of optimal models.

5. the step more than constantly repeating obtains final areal model until meeting maximum the number of iterations.

After acquiring the normal vector of body surface, the mean value of all spatial points of body surface is calculated as object table The central point in face.Determine that planar central point and plane normal vector have determined that the posture information of target object later.

Beneficial effects of the present invention: the advantage of neural network is utilized in the method for the present invention, using neural network algorithm by mesh Method that mark object important area models body surface after extracting estimates gestures of object.This method adaptability By force, for different types of object, it need to only collect data set and re -training is carried out to neural network, be mentioned without redesigning feature Take device.And this method determines that posture is accurate, using the powerful ability in feature extraction of neural network, can estimate that analysis is most of Object in scene.

Detailed description of the invention

Fig. 1 is semantic segmentation network；

Specific implementation step

Step (1) obtains color image and depth image using RGB-D camera

As shown in Figure 1, using semantic segmentation neural network, by input picture by convolution pondization extraction feature and then It carries out up-sampling to be merged with shallow-layer profile information, finally obtains one and the consistent segmentation result of input image resolution Characteristic pattern makees input picture the classification of one pixel scale, determines in the interested pixel region of target object surface.? It extracts in input picture after interested target object region, so that it may which the body surface for being directed to the part determines target The posture of object.

Step (3) models body surface and determines gestures of object

Ax+By+Cz+D=0

Claims

1. a kind of object's pose estimation method based on deep learning, it is characterised in that this method comprises the concrete steps that:

Step (1) obtains color image and depth image using RGB-D camera

The scene comprising target object is shot using RGB-D camera, obtains a color image and and color image The one-to-one depth image of pixel；

Using semantic segmentation neural network, input picture by convolution pondization is extracted into feature and then carry out up-sampling with shallowly Layer profile information is merged, and one and the consistent segmentation result characteristic pattern of input image resolution is finally obtained, for defeated Enter the classification that picture does a pixel scale, determines in the interested pixel region of target object surface；Extracting input figure As in after interested target object region, so that it may which the body surface for being directed to the part determines the posture of target object；

After semantic segmentation network, the image information combination depth information that extraction obtains is acquired to the posture of target object；

Step (3) models body surface and determines gestures of object

The depth information on body surface plane domain and corresponding fusion depth image detected using semantic segmentation network, Areal model is established to body surface plane:

Ax+By+Cz+D=0

1. utilizing formula ∑=E (aa^T)-E(a)E(a^T), the covariance matrix of all the points on object plane is acquired, for wherein a For sample data, E is with regard to average operation；

2. seeking the characteristic value and feature vector of covariance matrix, the corresponding feature vector of minimal eigenvalue is chosen, in this, as The normal vector of body surface plane obtains plane equation parameter A, B, C；

3. seeking the average coordinates of all surface coordinate points, bring into plane equation, solution obtains equation parameter D；

By Principal Component Analysis Algorithm, the plane equation of body surface is obtained, estimation obtains the posture of target object；

Areal model is acquired after filtering out the noise spot in data using random sample unification algorism combination Principal Component Analysis Algorithm； The step of process, is as follows:

1. randomly selecting four sample datas that can determine that equation parameter, the parameter of plane equation is acquired using data point, is obtained Areal model；

2. calculating the error of all data points using the areal model acquired；When error is less than given threshold value, it is believed that be interior Otherwise point is exterior point；

3. the number put in statistics is asked on all interior points if interior number is greater than setting quantity using Principal Component Analysis Algorithm Obtain "current" model；

4. the mean error of all interior points is calculated using the model that interior point set acquires, if error current is less than the optimal models of storage Error when, update optimal models, and update the error of optimal models；

5. the step more than constantly repeating obtains final areal model until meeting maximum the number of iterations；

After acquiring the normal vector of body surface, the mean value of all spatial points of body surface is calculated as body surface Central point；The posture information of target object has been determined that using planar central point and plane normal vector.