CN107563388A

CN107563388A - A kind of convolutional neural networks object identification method based on depth information pre-segmentation

Info

Publication number: CN107563388A
Application number: CN201710838112.8A
Authority: CN
Inventors: 王晟; 左东昊; 谢丽萍; 钱唯; 刘正阳; 方郅昊; 高英淇; 成奕霖
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2017-09-18
Filing date: 2017-09-18
Publication date: 2018-01-09

Abstract

The present invention relates to a kind of convolutional neural networks object identification method based on depth information pre-segmentation, comprise the following steps：Step 1：Gather the depth image and coloured image of scene；Step 2：The depth image of object is partitioned into from the depth image of scene；Step 3：According to the segmentation scope of the depth image of object, the coloured image of object is partitioned into from the coloured image of scene；Step 4：Processing is filled to the coloured image split；Step 5：Populated coloured image is input into convolutional neural networks to be identified, exports recognition result.The convolutional neural networks object identification method based on depth information pre-segmentation of the present invention, can identify the multiple objects in complicated picture, training speed is fast, recognition speed is fast, low to hardware requirement, can reduce convolutional neural networks over-fitting.

Description

A kind of convolutional neural networks object identification method based on depth information pre-segmentation

Technical field

The present invention relates to image procossing and object recognition technique field, and in particular to a kind of based on depth information pre-segmentation Convolutional neural networks object identification method.

Background technology

The computer of object explains that the application for robotics, artificial intelligence etc. has critical effect.Will It is that prescience is studied that the subject image of sensor collection, which is converted into human understandable information (such as word, sound, image etc.), Focus.

Mainly simple RGB image is identified for existing object identification method.This identification method may incite somebody to action The global characteristics of image reduce the discrimination of object as local feature so as to over-fitting occur.Another new knowledge Other method is the RGB of object and depth image to be together put into convolution and neutral net is trained and identified.This method Resolution is high compared with first method, but amount of calculation is excessive in the training and identification for more objects.

For problem present in current image recognition, it is proposed that new algorithm, is reduced while discrimination is improved Amount of calculation.

The content of the invention

The present invention provides a kind of thing based on depth information pre-segmentation and convolutional neural networks in view of the shortcomings of the prior art Body recognition methods, the multiple objects in complicated picture can be identified, recognition speed is fast, can reduce convolutional neural networks over-fitting Occur.

The present invention provides a kind of convolutional neural networks object identification method based on depth information pre-segmentation, including following step Suddenly：

Step 1：Gather the depth image and coloured image of scene；

Step 2：The depth image of object is partitioned into from the depth image of scene；

Step 3：According to the segmentation scope of the depth image of object, the colour of object is partitioned into from the coloured image of scene Image；

Step 4：The coloured image split is filled and scaling is handled；

Step 5：Populated coloured image is input into convolutional neural networks to be identified, exports recognition result.

In the convolutional neural networks object identification method based on depth information pre-segmentation of the present invention, step 1 is specially：

Synchronization gathers the depth image and coloured image of Same Scene, and image capture device can be Kinect device Or more mesh cameras.

In the convolutional neural networks object identification method based on depth information pre-segmentation of the present invention, step 2 is specially：

Step 2.1：The depth image of scene is divided into by foreground and background using Otsu algorithm, prospect represents target object, Its depth is within the specific limits；

Step 2.2：Using seed region growth algorithm, the depth image of target object is divided from the depth image of scene Cut out.

In the convolutional neural networks object identification method based on depth information pre-segmentation of the present invention, step 2.2 is specific For：

Step 2.2.1：Five pixels are selected at random in the depth bounds of prospect as seed point；

Step 2.2.2：8 pixels around each seed point are just traveled through, the picture when grey scale change is less than 4 Vegetarian refreshments assimilates into seed point；

Step 2.2.3：Repeat step 2.2.2 is divided into seed point and non-seed point until pixel all on picture；

Step 2.2.4：The image that seed point is formed is split to the depth image for obtaining object.

In the convolutional neural networks object identification method based on depth information pre-segmentation of the present invention, step 3 is specially：

The coloured image of scene and the pixel of depth image correspond, can be according to the framing bits of the depth image of object Put on the coloured image for corresponding to scene, and then the color images of object are come out.

In the convolutional neural networks object identification method based on depth information pre-segmentation of the present invention, step 4 includes：

Step 4.1：The RGB color value of filling region is set, the ratio of width to height for the coloured image split is filled to 1:1；

Step 4.2：Coloured image after filling is adjusted to by defined size using bilinearity difference arithmetic.

In the convolutional neural networks object identification method based on depth information pre-segmentation of the present invention, set in step 4.1 Following several method can be used by determining the RGB color value of filling region：

A. the RGB color value of the edge pixel point of article is taken into averaging operation, the color is referred to as edge average, filled The RGB color value in region is the inverse of edge average；

B. the RGB color value for setting filling region is (0,0,0), that is, fills black；

C. the RGB color value for setting filling region is (255,255,255), i.e. filling white.

In the convolutional neural networks object identification method based on depth information pre-segmentation of the present invention, step 5 includes：

Step 5.1：Build convolutional neural networks model；

Step 5.2：The image construction training set for gathering a variety of objects is trained to convolutional neural networks model；

Step 5.3：Populated coloured image is input to the convolutional neural networks model trained to be identified, and it is defeated Go out result.

In the convolutional neural networks object identification method based on depth information pre-segmentation of the present invention, structure in step 5.1 It is the neutral net for including 20 hidden layers to build convolutional neural networks, is specially：

First layer is that the wave filter for having 64 3*3 in convolutional layer conv1, conv1 carries out the convolution operation that step-length is 1 pixel Simultaneously by a nonlinear activation layer after terminating before carrying out convolution there is edge filling Padding operations in convolution ReLU functions are as activation primitive；

The second layer is that the wave filter for having 64 3*3 in convolutional layer conv2, conv2 carries out the convolution operation that step-length is 1 pixel Simultaneously by a nonlinear activation layer after terminating before carrying out convolution there is edge filling Padding operations in convolution ReLU functions are as activation primitive；

Third layer is pond layer subsampling1 layers, is operated during pond using maximum pondization；

4th layer is that the wave filter for having 128 3*3 in convolutional layer conv3, conv3 carries out the convolution behaviour that step-length is 1 pixel Make simultaneously after terminating before carrying out convolution there is edge filling Padding operations in convolution by a nonlinear activation layer ReLU functions are as activation primitive；

Layer 5 is that the wave filter for having 128 3*3 in convolutional layer conv4, conv4 carries out the convolution behaviour that step-length is 1 pixel Make simultaneously after terminating before carrying out convolution there is edge filling Padding operations in convolution by a nonlinear activation layer ReLU functions are as activation primitive；

Layer 6 is pond layer subsampling2 layers, is operated during pond using maximum pondization；

Layer 7 is that the wave filter for having 256 3*3 in convolutional layer conv5, conv5 carries out the convolution behaviour that step-length is 1 pixel Make simultaneously after terminating before carrying out convolution there is edge filling Padding operations in convolution by a nonlinear activation layer ReLU functions are as activation primitive；

8th layer is that the wave filter for having 256 3*3 in convolutional layer conv6, conv6 carries out the convolution behaviour that step-length is 1 pixel Make simultaneously after terminating before carrying out convolution there is edge filling Padding operations in convolution by a nonlinear activation layer ReLU functions are as activation primitive；

9th layer is that the wave filter for having 256 3*3 in convolutional layer conv7, conv7 carries out the convolution behaviour that step-length is 1 pixel Make simultaneously after terminating before carrying out convolution there is edge filling Padding operations in convolution by a nonlinear activation layer ReLU functions are as activation primitive；

Tenth layer is pond layer subsampling3 layers, is operated during pond using maximum pondization；

Eleventh floor is convolutional layer conv8, and the wave filter for having 512 3*3 in conv8 carries out the convolution that step-length is 1 pixel Operation is simultaneously by a nonlinear activation after terminating before carrying out convolution there is edge filling Padding operations in convolution Layer ReLU functions are as activation primitive；

Floor 12 is convolutional layer conv9, and the wave filter for having 512 3*3 in conv9 carries out the convolution that step-length is 1 pixel Operation is simultaneously by a nonlinear activation after terminating before carrying out convolution there is edge filling Padding operations in convolution Layer ReLU functions are as activation primitive；

13rd layer is convolutional layer conv10, and the wave filter for having 512 3*3 in conv10 carries out the volume that step-length is 1 pixel Product operation is simultaneously non-linear sharp by one after terminating before carrying out convolution there is edge filling Padding operations in convolution Layer ReLU functions living are as activation primitive；

14th layer is pond layer subsampling4 layers, is operated during pond using maximum pondization；

15th layer is convolutional layer conv11, and the wave filter for having 512 3*3 in conv11 carries out the volume that step-length is 1 pixel Product operation is simultaneously non-linear sharp by one after terminating before carrying out convolution there is edge filling Padding operations in convolution Layer ReLU functions living are as activation primitive；

16th layer is convolutional layer conv12, and the wave filter for having 512 3*3 in conv12 carries out the volume that step-length is 1 pixel Product operation is simultaneously non-linear sharp by one after terminating before carrying out convolution there is edge filling Padding operations in convolution Layer ReLU functions living are as activation primitive；

17th layer is convolutional layer conv13, and the wave filter for having 512 3*3 in conv13 carries out the volume that step-length is 1 pixel Product operation is simultaneously non-linear sharp by one after terminating before carrying out convolution there is edge filling Padding operations in convolution Layer ReLU functions living are as activation primitive；

18th is pond layer subsampling5 layers, is operated during pond using maximum pondization；

19th layer is that full articulamentum Fc uses average pooling, and the training and prediction of neutral net are improved with this Speed；

20th layer is classification layer Softmax, and the characteristic vector input classification layer of full articulamentum Fc outputs is identified The tag along sort of object, the probability of every kind of tag along sort is calculated, and the label of maximum probability is exported.

The convolutional neural networks object identification method based on depth information pre-segmentation of the present invention, can identify complicated picture In multiple objects, it is fast to reduce required training burden, recognition speed, low to hardware requirement, can reduce convolutional neural networks mistake Fitting.

Brief description of the drawings

Fig. 1 is a kind of flow of convolutional neural networks object identification method based on depth information pre-segmentation of the present invention Figure；

Fig. 2 is the structural representation of the convolutional neural networks of the present invention.

Embodiment

The invention provides a kind of convolutional neural networks object identification method based on depth information pre-segmentation, such as Fig. 1 institutes Show, recognition methods comprises the following steps：

Step 1：Gather the depth image and coloured image of scene；

Step 4：The coloured image split is filled and scaling is handled；

Step 1 is specially：

Coloured image can be obtained by camera, and the acquisition method of depth image is following several including being not limited to：1. pass through pair Then camera collection assistant images carry out pattern match, can obtain two different coordinates of the object in binocular camera, Depth image is calculated using geometrical relationship so as to combine the distance between dual camera.This cheap precision of method compared with It is low, and be difficult to differentiate the object more than five meters.2. being scanned while rotation at a high speed by laser radar, obtain around some Object is to the distance of sensor, and this technology is widely used in a variety of applications on pilotless automobile, for example, December 10 in 2015 Day, Baidu's pilotless automobile actually road is successfully tested, and the Velodyne HDL-64E used on its vehicle body are exactly such Laser radar technique.Similar technical price is very high, and the price of an equipment can reach hundreds thousand of RMB.3. use The business machines such as the Kinect of Microsoft.Depth information is obtained by the way of dual camera combination infrared camera.It is this Mode has reached preferable balance between price and precision, but the private problem of algorithm be present.

The depth information of image is gathered and then is calculated by two cameras, and a color information camera wherein Station acquisition, and both pixels are not corresponded, it is necessary to by certain correction algorithm by depth image and colour Image alignment, by taking Microsoft Kinect as an example, there is the corresponding image of function pair two to carry out alignment pair in its supporting software development kit It should handle.

Step 2 is specially：

Now the depth of prospect is in certain scope, and for example target object is split as prospect by Otsu algorithm, For the distance of the target object range sensor between 35cm-42cm, the distance is referred to as prospect distance range.But because picture In have other noise range sensors also within prospect distance range, we using this value of gray value corresponding to 37cm as The seed of seed growth.

Step 2.2：Using seed region growth algorithm, the depth image of target object is divided from the depth image of scene Cut out, be specially：

We can be used for multiple times Otsu algorithm and be split picture during Range Image Segmentation, and to each The result of segmentation carries out algorithm of region growing to remove incoherent information, and such as two cups are put on the table, region life Long algorithm can be split two cups.

Step 3 is specially：

It is not of uniform size in the image scaled after Da-Jin algorithm and the segmentation of seed region growth method, but the present invention uses Convolutional neural networks need the dimension scale of picture to be unified for：Wide 224 pixel, high 224 pixel.So we are carried out down to image State two operations, filling and scaling.

Step 4 is filled to the coloured image split and scaling processing specifically includes：

The RGB color value of filling region is set in step 4.1 can use following several method：

It is filled, it is assumed that the length of the picture is a width of (w, h), if w>H, picture is filled into (w, w)；If w<H, by picture It is filled into (h, h).

In the present embodiment, the image scaling of length a width of (x, x) of filling will have been completed to (224,244).

After the completion of image procossing, by each pixel of image according to from left to right, order from top to bottom is pressed successively Row, which is input in convolutional neural networks, carries out image recognition.Step 5 includes：

Step 5.1：Build convolutional neural networks model；

When it is implemented, structure convolutional neural networks are the neutral net for including 20 hidden layers, it is specially：

The present invention improves Generalization Capability using the complete convolutional layer weight of Image-Net pre-training with this.

Presently preferred embodiments of the present invention is the foregoing is only, the thought being not intended to limit the invention is all the present invention's Within spirit and principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.

Claims

1. a kind of convolutional neural networks object identification method based on depth information pre-segmentation, it is characterised in that including following step Suddenly：

Step 1：Gather the depth image and coloured image of scene；

Step 3：According to the segmentation scope of the depth image of object, the cromogram of object is partitioned into from the coloured image of scene Picture；

Step 4：The coloured image split is filled and scaling is handled；

2. the convolutional neural networks object identification method based on depth information pre-segmentation, its feature exist as claimed in claim 1 In step 1 is specially：

Synchronization gathers the depth image and coloured image of Same Scene, and image capture device can be Kinect device or more Mesh camera.

3. the convolutional neural networks object identification method based on depth information pre-segmentation, its feature exist as claimed in claim 1 In step 2 is specially：

Step 2.1：The depth image of scene is divided into by foreground and background using Otsu algorithm, prospect represents target object, its depth Degree is within the specific limits；

Step 2.2：Using seed region growth algorithm, the depth image of target object is partitioned into from the depth image of scene Come.

4. the convolutional neural networks object identification method based on depth information pre-segmentation, its feature exist as claimed in claim 3 In step 2.2 is specially：

Step 2.2.2：8 pixels around each seed point are just traveled through, the pixel when grey scale change is less than 4 Assimilate into seed point；

5. the convolutional neural networks object identification method based on depth information pre-segmentation, its feature exist as claimed in claim 1 In step 3 is specially：

The coloured image of scene and the pixel of depth image correspond, can be according to the split position pair of the depth image of object Should be on the coloured image of scene, and then the color images of object are come out.

6. the convolutional neural networks object identification method based on depth information pre-segmentation, its feature exist as claimed in claim 1 In step 4 includes：

7. the convolutional neural networks object identification method based on depth information pre-segmentation, its feature exist as claimed in claim 6 In the RGB color value of filling region is set in step 4.1 can use following several method：

A. the RGB color value of the edge pixel point of article is taken into averaging operation, the color is referred to as edge average, filling region RGB color value be edge average inverse；

8. the convolutional neural networks object identification method based on depth information pre-segmentation, its feature exist as claimed in claim 1 In step 5 includes：

Step 5.1：Build convolutional neural networks model；

Step 5.3：Populated coloured image is input to the convolutional neural networks model trained to be identified, and exports knot Fruit.

9. the convolutional neural networks object identification method based on depth information pre-segmentation, its feature exist as claimed in claim 8 In it is the neutral net for including 20 hidden layers that convolutional neural networks are built in step 5.1, is specially：

First layer be have in convolutional layer conv1, conv1 64 3*3 wave filter carry out step-length be the convolution operation of 1 pixel simultaneously By a nonlinear activation layer ReLU letter after terminating before carrying out convolution there is edge filling Padding operations in convolution Number is used as activation primitive；

The second layer be have in convolutional layer conv2, conv2 64 3*3 wave filter carry out step-length be the convolution operation of 1 pixel simultaneously By a nonlinear activation layer ReLU letter after terminating before carrying out convolution there is edge filling Padding operations in convolution Number is used as activation primitive；

4th layer is to have 128 3*3 wave filter in convolutional layer conv3, conv3 to carry out the convolution operation that step-length is 1 pixel same When before carrying out convolution there is edge filling Padding operation terminate in convolution after by a nonlinear activation layer ReLU Function is as activation primitive；

Layer 5 is to have 128 3*3 wave filter in convolutional layer conv4, conv4 to carry out the convolution operation that step-length is 1 pixel same When before carrying out convolution there is edge filling Padding operation terminate in convolution after by a nonlinear activation layer ReLU Function is as activation primitive；

Layer 7 is to have 256 3*3 wave filter in convolutional layer conv5, conv5 to carry out the convolution operation that step-length is 1 pixel same When before carrying out convolution there is edge filling Padding operation terminate in convolution after by a nonlinear activation layer ReLU Function is as activation primitive；

8th layer is to have 256 3*3 wave filter in convolutional layer conv6, conv6 to carry out the convolution operation that step-length is 1 pixel same When before carrying out convolution there is edge filling Padding operation terminate in convolution after by a nonlinear activation layer ReLU Function is as activation primitive；

9th layer is to have 256 3*3 wave filter in convolutional layer conv7, conv7 to carry out the convolution operation that step-length is 1 pixel same When before carrying out convolution there is edge filling Padding operation terminate in convolution after by a nonlinear activation layer ReLU Function is as activation primitive；

Eleventh floor is convolutional layer conv8, and the wave filter for having 512 3*3 in conv8 carries out the convolution operation that step-length is 1 pixel Simultaneously by a nonlinear activation layer after terminating before carrying out convolution there is edge filling Padding operations in convolution ReLU functions are as activation primitive；

Floor 12 is convolutional layer conv9, and the wave filter for having 512 3*3 in conv9 carries out the convolution operation that step-length is 1 pixel Simultaneously by a nonlinear activation layer after terminating before carrying out convolution there is edge filling Padding operations in convolution ReLU functions are as activation primitive；

13rd layer is convolutional layer conv10, has 512 3*3 wave filter to carry out step-length in conv10 and is grasped for the convolution of 1 pixel Make simultaneously after terminating before carrying out convolution there is edge filling Padding operations in convolution by a nonlinear activation layer ReLU functions are as activation primitive；

15th layer is convolutional layer conv11, has 512 3*3 wave filter to carry out step-length in conv11 and is grasped for the convolution of 1 pixel Make simultaneously after terminating before carrying out convolution there is edge filling Padding operations in convolution by a nonlinear activation layer ReLU functions are as activation primitive；

16th layer is convolutional layer conv12, has 512 3*3 wave filter to carry out step-length in conv12 and is grasped for the convolution of 1 pixel Make simultaneously after terminating before carrying out convolution there is edge filling Padding operations in convolution by a nonlinear activation layer ReLU functions are as activation primitive；

17th layer is convolutional layer conv13, has 512 3*3 wave filter to carry out step-length in conv13 and is grasped for the convolution of 1 pixel Make simultaneously after terminating before carrying out convolution there is edge filling Padding operations in convolution by a nonlinear activation layer ReLU functions are as activation primitive；

19th layer is that full articulamentum Fc uses average pooling, and training and predetermined speed of neutral net are improved with this；

20th layer is classification layer Softmax, by the characteristic vector input classification layer of full articulamentum Fc outputs, is identified object Tag along sort, calculate the probability of every kind of tag along sort, and the label of maximum probability is exported.