CN104732587B

CN104732587B - A kind of indoor 3D semanteme map constructing method based on depth transducer

Info

Publication number: CN104732587B
Application number: CN201510175129.0A
Authority: CN
Inventors: 赵哲; 陈小平
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2015-04-14
Filing date: 2015-04-14
Publication date: 2019-02-01
Anticipated expiration: 2035-04-14
Also published as: CN104732587A

Abstract

The invention discloses a kind of indoor 3D semanteme map constructing method based on depth transducer, this method comprises: using the color depth RGB-D image of depth transducer acquisition indoor environment, and indoor 3D map is constructed with this；Collected RGB-D image is split, and calculates the color and shape feature of RGB-D image after segmentation, obtains corresponding semantic information；The semantic information of acquisition is merged with interior 3D map, obtains interior 3D semanteme map.By using method disclosed by the invention, can construct comprising the semantic informations such as structure semantics information and furniture semantic information, so that robot executes high-rise intelligent operation.

Description

A kind of indoor 3D semanteme map constructing method based on depth transducer

Technical field

The present invention relates to robot vision scene understanding technical field more particularly to a kind of interiors based on depth transducer 3D semanteme map constructing method.

Background technique

The Semantic Aware of robot is the core and vital technology of indoor service robot.Traditional robot Indoor map is established with limitation by laser.The map that one side laser is established is 2D map, since it lacks effectively 3D information can only hide ground object, the object with certain altitude can not be hidden when carrying out mobile avoidance.It is another Aspect, the map that robot can only be established using laser carry out some bottom operations, and such as avoidance is mobile, path planning.And machine People does not get a real idea of ambient enviroment.For household service robot, environment is got a real idea of, understands that the demand of user is heavy to closing Target want and artificial intelligence is that intelligent robot is different from one of key factor of industrial robot.

In recent years with the development of depth transducer (for example, kinect etc.), the indoor 3D sensing capability of robot is gradually Enhancing, in academia, a series of 3D indoor environment map constructing methods come into being, such as RGB-D Mapping, Kinect Fusion etc.." ATAP " department under industry, Google discloses newest development project Project Tango, should Project is researching and developing a kind of smart phone of band 3D environment induction technology, is expected to that real-time 3D environment can be established at any time in future Structure chart.However, these technologies all establish interior 3D environmental structure map just with depth sensing capability, lack in map Where understanding semantically is such as metope, where is desk etc..

Summary of the invention

The object of the present invention is to provide a kind of indoor 3D semanteme map constructing method based on depth transducer, can construct Comprising the semantic informations such as structure semantics information and furniture semantic information, so that robot executes high-rise intelligent operation.

The purpose of the present invention is what is be achieved through the following technical solutions:

A kind of indoor 3D semanteme map constructing method based on depth transducer, this method comprises:

Using the color depth RGB-D image of depth transducer acquisition indoor environment, and indoor 3D map is constructed with this；

Collected RGB-D image is split, and calculates the color and shape feature of RGB-D image after segmentation, is obtained Obtain corresponding semantic information；

The semantic information of acquisition is merged with interior 3D map, obtains interior 3D semanteme map.

Further, the color depth RGB-D image using depth transducer acquisition indoor environment, and constructed with this Indoor 3D map includes:

By the hand-held equipment equipped with depth transducer of user or the mobile robot by being equipped with depth transducer to indoor ring Border is scanned, and obtains continuous RGB-D image；

Rotational translation matrix between the continuous RGB-D image of pretreatment acquisition is carried out to the continuous RGB-D image, from And continuous RGB-D image mosaic is constructed to indoor 3D map.

Further, the rotation continuous RGB-D image carried out between the continuous RGB-D image of pretreatment acquisition Turn translation matrix to include the following steps:

The angle point in each frame RGB-D image is calculated, and the angle point calculated is tracked by optical flow algorithm；

All consistent angle points in adjacent two frames RGB-D image are found out using the consistent RANSAC algorithm of random sampling, as Characteristic point pair, remaining angle point filter out；

According to the Distance Judgment interframe distance of characteristic point pair in adjacent two frames RGB-D image, if it is determined that interframe distance is greater than Preset value, then it represents that angle point lazy weight, then calculate in RGB-D image ORB characteristic point and description, and according to description come Calculate the ORB characteristic point pair of adjacent two frames RGB-D image；

It is filtered processing using ORB characteristic point of the RANSAC algorithm to acquisition, following formula is recycled to calculate adjacent two The rotational translation matrix of frame RGB-D image:

Wherein, n is the number of adjacent two frames RGB-D image characteristic point pair, p_iWith q_iRespectively adjacent two frames RGB-D image Middle ith feature point, R, T are respectively rotation, translation matrix；To make E is R, T of minimum value as final result.

Further, it is described by continuous RGB-D image mosaic come before constructing indoor 3D map further include: to each frame RGB-D image carries out LS-SVM sparseness.

Further, described that collected RGB-D image is split, and calculate the color of RGB-D image after segmentation With shape feature, obtaining corresponding semantic information includes:

Super-pixel segmentation is carried out respectively to collected each frame RGB-D image using 2D image segmentation algorithm；

The color and shape feature of each super-pixel are calculated using computer vision algorithms make, recycle support vector machines Classifier classifies to the color calculated with shape feature, obtains the classifier of several semantic classes；

Classified using the semantic classes classifier of acquisition to the super-pixel in each frame RGB-D image, to obtain The semantic information of RGB-D image.

Further, the semantic information by acquisition is merged with indoor 3D map, obtains interior 3D semanteme map Include:

Indoor 3D map is divided into several voxels；

Using the Dense CRF model in machine learning and the semantic information of the RGB-D image obtained is combined to carry out voxel Semantic information reasoning determines the semantic information of each voxel, forms final semantic 3D scene map.

As seen from the above technical solution provided by the invention, the field 3D of environment can be established by scanning ambient enviroment Scape map, while automatic understanding ambient enviroment add semantic tagger (wall, ground, ceiling, sofa, chair etc.) to environment, most Indoor 3D semanteme map is obtained eventually；So that intelligent robot, the 3D environmental map being not only able to around establishing carries out bottom Layer operation (mobile, avoidance etc.), while being also understood that ambient enviroment, reach the real purpose of intelligent semantic perception.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

Fig. 1 is a kind of stream of the indoor 3D semanteme map constructing method based on depth transducer provided in an embodiment of the present invention Cheng Tu；

Fig. 2 is the flow chart of RGB-D image preprocessing process provided in an embodiment of the present invention.

Specific embodiment

With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based on this The embodiment of invention, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, belongs to protection scope of the present invention.

Fig. 1 is a kind of stream of the indoor 3D semanteme map constructing method based on depth transducer provided in an embodiment of the present invention Cheng Tu.As shown in Figure 1, this method mainly includes the following steps:

Step 11, the RGB-D image that indoor environment is acquired using depth transducer, and indoor 3D map is constructed with this.

It, can be by the hand-held equipment (for example, kinect) for being equipped with depth transducer of user or by setting in the embodiment of the present invention There is the mobile robot of depth transducer to be scanned indoor environment, obtains continuous RGB-D (color depth) image；It is right again The continuous RGB-D image carries out the rotational translation matrix between the continuous RGB-D image of pretreatment acquisition, thus by continuous RGB-D image mosaic constructs indoor 3D map.

As described in Figure 2, RGB-D image preprocessing process mainly includes the following steps:

Angle point in step 21, each frame RGB-D image of calculating, and the angle point calculated is tracked by optical flow algorithm.

Step 22 finds out all consistent angle points in adjacent two frames RGB-D image using RANSAC algorithm, as characteristic point Right, remaining angle point filters out.

It alternates betwwen good and bad by the angle point that step 21 tracks, in the embodiment of the present invention, utilizes RANSAC (random sampling is consistent) Algorithm finds out all consistent angle points in adjacent two frames RGB-D image, as characteristic point pair.

Illustratively, three characteristic points pair can be randomly choosed, calculate rotational transformation matrix T, while statistics meets matrix T's Characteristic point is to number.It repeats the above process repeatedly, the characteristic point T most to number is exactly best rotational transformation matrix best T is unsatisfactory for the characteristic point of best T to being exactly bad tracking, needs to filter out.

Step 23, according to the Distance Judgment interframe distance of characteristic point pair in adjacent two frames RGB-D image, if it is determined that frame pitch From being greater than preset value, then it represents that angle point lazy weight, then calculate in RGB-D image ORB characteristic point and description, and according to retouching Son is stated to calculate the ORB characteristic point pair of adjacent two frames RGB-D image.

In the embodiment of the present invention, if interframe apart from too small, illustrates that camera is not substantially dynamic, it is not added into global scene In；If interframe distance is greater than preset value (can according to the actual situation or experience setting), then it represents that angle point quantity may be not enough；This When, it needs using other characteristic points, for example, SURF (Speeded Up Robust Features accelerates robust feature), ORB (ORiented Brief)。

In the embodiment of the present invention, it is contemplated that the reason of efficiency, using ORB characteristic point；OpenCV (vision open source can be passed through Library) ORB cuda (the GPU algorithm of ORB characteristic point) calculate ORB characteristic point and description son, further according to description son, to calculate The ORB characteristic point pair of adjacent two frames RGB-D image.

Step 24 is filtered processing using ORB characteristic point of the RANSAC algorithm to acquisition, recycles following formula (Point to Point ICP, with regard to close-point search method) calculates the rotational translation matrix of adjacent two frames RGB-D image:

Wherein, n is the number (characteristic point pair and mistake obtained including step 22 of adjacent two frames RGB-D image characteristic point pair Filter post-processing after ORB characteristic point to), p_iWith q_iIth feature point is (that is, p in respectively adjacent two frames RGB-D image_iWith q_i For ith feature point to), R, T be respectively rotate, translation matrix；To make E is R, T of minimum value as final result.

Above-mentioned formula has analytic solutions.Common method has: singular value decomposition, quaternary number, orthogonal matrix method, biquaternion side Method, the method that singular value decomposition can be used in the present embodiment, and by its parallelization.

Further, consider limited memory, the larger reason of image data occupied space, by continuous RGB-D image mosaic It also needs to carry out LS-SVM sparseness to each frame RGB-D image before to construct indoor 3D map.

Illustratively, the image point cloud data of a 640*480 will occupy the memory of 3.5M, in order to save space, processing More images require only one point (that is, resolution ratio is 1cm) in the cube of every 1cm in this example.

In this example, rarefaction is carried out to the space of 3m*3m*3m, since resolution ratio is 1cm, so needing 300*300* The state array of 300 sizes, at the end GPU, per thread handles a point in image, calculates it relative to state according to coordinate The index of array judges whether space is occupied according to state array, does not occupy labeled as occupying, occupies, which is not added Enter in global scene.

On the other hand, the above-mentioned treatment process of the embodiment of the present invention can be realized based on GPU (graphics processor), relative to It can get faster treatment effeciency for CPU.

Illustratively, when executing abovementioned steps 22, GPU algorithm is as follows: for the end GPU, utilizing each block (module) mould Intend a RANSAC iteration, 250 iteration are exactly 250 block, and a thread randomly chooses three characteristic points pair in block, T is calculated, whether per thread statistics meets T in block later, counts number.The last most T of reselection number are as finally T.The random function API of cuda (universal parallel computing architecture), atomic operation, shared drive, parallel specification etc. have been used herein Mechanism.

When executing the newly-increased ORB characteristic point processing in abovementioned steps 23, at the end GPU, looked for using the method for parallel specification Two closest correspondences of each characteristic point pair；The sum operation for becoming parallel specification is needed in this method only to take minimum operation ?；Meanwhile if closest characteristic point distance is less than 0.8 times of secondary neighbouring characteristic point distance, it is believed that this is Good matching, this process are only needed with per thread to each characteristic point to parallel processing.

When executing calculating rotational translation matrix process in abovementioned steps 24,3* the end GPU: is calculated using the method for parallel specification H is carried out singular value decomposition, H=U*V later at the end CPU by 3 matrix Hs^t, R=V*U^t, T=p '-R*p, such spin matrix R, Translation matrix T, which is just solved, to be come；The correlation formula at the end GPU are as follows:

p_iWith q_iIth feature point is (that is, p in respectively adjacent two frames RGB-D image_iWith q_iFor ith feature point to), R, T is respectively rotation, translation matrix, which is one of method for calculating rotational translation matrix, i.e. singular value decomposition side Method.

Intuitively to embody based on GPU and the efficiency based on CPU processing, the present embodiment has also carried out experiment and has compared, and compares The results are shown in Table 1.

Method	CPU	GPU
			Corner	36	14.15
Corner pair	21	12.8
			RANSAC	40	3.6
ORB	118	24.5
			Find ORB pair	44	1.43
Point to Point ICP	12	0.08
			Filter	162	15

1 comparison result of table

From table 1 it follows that the program at the end GPU is faster relative to the end CPU program.CPU program is per second to handle 4.12 Frame data, GPU program is per second to handle 12.5 frame data；Available 3 times of acceleration is handled based on GPU.

Step 12 is split collected RGB-D image, and calculates the color and shape of RGB-D image after segmentation Feature obtains corresponding semantic information.

For every frame RGB-D image, its semantic segmentation and mark (semantic are generated in the embodiment of the present invention segmentation).Specifically:

Firstly, carrying out super-pixel respectively to collected each frame RGB-D image using 2D image segmentation algorithm (superpixel) divide；

Then, it needs to carry out semantic classification to super-pixel.Semantic classes includes structured sort (wall, ground etc.) and household Classification (desk, chair, bed etc.).Illustratively, it can use the semanteme point that SVM (support vector machines) classifier realizes super-pixel Class.The color and shape feature of each super-pixel are calculated first with computer vision algorithms make, and SVM classifier is recycled to arrive calculating Color classify with shape feature, obtain the classifier of several semantic classes；

Finally, the semantic classes classifier using acquisition classifies to the super-pixel in each frame RGB-D image, thus Obtain the semantic information of RGB-D image.

Step 13 merges the semantic information of acquisition with interior 3D map, obtains interior 3D semanteme map.

Firstly, indoor 3D map is divided into several voxels (voxel).

Then, semantic information is added to each voxel, in the present embodiment, utilizes the Dense CRF (depth in machine learning Condition random field) model and combine obtain RGB-D image semantic information carry out the reasoning of voxel semantic information；Wherein, Dense Node potential energy (unary potential) in CRF can be obtained by the semantic information of RGB-D picture, be associated with potential energy (pairwise potential) can be defined by Gaussian function；Pass through the reasoning of Dense CRF model, it may be determined that each voxel Semantic information, form final semantic 3D scene map.

The 3D scene map of environment can be established by scanning ambient enviroment in the scheme of the embodiment of the present invention, while automatic Understand ambient enviroment, semantic tagger (wall, ground, ceiling, sofa, chair etc.) is added to environment, finally obtains indoor 3D Semantic map；So that intelligent robot, the 3D environmental map being not only able to around establishing carries out bottom operation, and (movement, is kept away Barrier etc.), while being also understood that ambient enviroment, reach the real purpose of intelligent semantic perception.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment can The mode of necessary general hardware platform can also be added to realize by software by software realization.Based on this understanding, The technical solution of above-described embodiment can be embodied in the form of software products, which can store non-easy at one In the property lost storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions are with so that a computer is set Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.

The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Within the technical scope of the present disclosure, any changes or substitutions that can be easily thought of by anyone skilled in the art, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claims Subject to enclosing.

Claims

1. a kind of indoor 3D semanteme map constructing method based on depth transducer, which is characterized in that this method comprises:

Collected RGB-D image is split, and calculates the color and shape feature of RGB-D image after segmentation, acquisition pair The semantic information answered；

The semantic information of acquisition is merged with interior 3D map, obtains interior 3D semanteme map；

Wherein, the color depth RGB-D image using depth transducer acquisition indoor environment, and with constructing indoor 3D with this Figure includes: by the hand-held equipment equipped with depth transducer of user or the mobile robot by being equipped with depth transducer to indoor environment It is scanned, obtains continuous RGB-D image；Pretreatment is carried out to the continuous RGB-D image and obtains continuous RGB-D image Between rotational translation matrix, so that continuous RGB-D image mosaic to be constructed to indoor 3D map；

Also, before continuous RGB-D image mosaic to construct to indoor 3D map further include: carried out to each frame RGB-D image LS-SVM sparseness, process are as follows: assuming that only one point in the cube of every 1cm, that is, resolution ratio 1cm；To 3m*3m* The space of 3m carries out rarefaction, and resolution ratio 1cm then needs the state array of 300*300*300 size, at the end GPU, each A point in thread process image calculates its index relative to state array according to coordinate, is judged according to state array empty Between it is whether occupied, do not occupy labeled as occupy, occupy then will the point be added global scene in；

The rotational translation matrix packet continuous RGB-D image carried out between the continuous RGB-D image of pretreatment acquisition Include following steps:

All consistent angle points in adjacent two frames RGB-D image are found out using the consistent RANSAC algorithm of random sampling, as feature Point pair, remaining angle point filters out；

According to the Distance Judgment interframe distance of characteristic point pair in adjacent two frames RGB-D image, preset if it is determined that interframe distance is greater than Value, then it represents that angle point lazy weight, then ORB characteristic point and description in RGB-D image are calculated, and calculate according to description The ORB characteristic point pair of adjacent two frames RGB-D image；

It is filtered processing using ORB characteristic point of the RANSAC algorithm to acquisition, following formula is recycled to calculate adjacent two frame The rotational translation matrix of RGB-D image:

Wherein, n is the number of adjacent two frames RGB-D image characteristic point pair, p_iWith q_iI-th in respectively adjacent two frames RGB-D image A characteristic point, R, T are respectively rotation, translation matrix；To make E is R, T of minimum value as final result.

2. the method according to claim 1, wherein described be split collected RGB-D image, and counting The color and shape feature of rear RGB-D image are cut in point counting, are obtained corresponding semantic information and are included:

The color and shape feature of each super-pixel are calculated using computer vision algorithms make, recycle support vector machines classification Device classifies to the color calculated with shape feature, obtains the classifier of several semantic classes；Several languages finally obtained The classifier of adopted classification is classifier corresponding with color and shape feature classification, semantic classes quantity also with color and shape Feature classification is corresponding；

Classified using the semantic classes classifier of acquisition to the super-pixel in each frame RGB-D image, to obtain RGB-D The semantic information of image.

3. method according to claim 1 or 2, which is characterized in that the semantic information by acquisition and interior 3D map It is merged, obtaining interior 3D semanteme map includes:

Indoor 3D map is divided into several voxels；

Using the Dense CRF model in machine learning and the semantic information of the RGB-D image obtained is combined to carry out voxel semanteme Information inference determines the semantic information of each voxel, forms final semantic 3D scene map.