CN111461125A

CN111461125A - Continuous segmentation method of panoramic image

Info

Publication number: CN111461125A
Application number: CN202010198068.0A
Authority: CN
Inventors: 杨恺伦; 胡鑫欣; 孙东明; 李华兵
Original assignee: Hangzhou Lingxiang Technology Co ltd
Current assignee: Hangzhou Lingxiang Technology Co ltd
Priority date: 2020-03-19
Filing date: 2020-03-19
Publication date: 2020-07-28
Anticipated expiration: 2040-03-19
Also published as: CN111461125B

Abstract

The invention discloses a method for continuously segmenting a panoramic image. The method uses the segmentation data set of the conventional pinhole image for training, the segmentation model generated by training is deployed by the method disclosed by the invention, and the output segmentation image has the advantages of continuity, seamless, smoothness and reliability at 360 degrees.

Description

Continuous segmentation method of panoramic image

Technical Field

The invention belongs to the technical fields of image segmentation technology, scene perception technology, mode recognition technology, image processing technology and computer vision, and relates to a method for continuously segmenting a panoramic image.

Background

Panoramic vision refers to the fact that all visual information of a three-dimensional space larger than a hemispherical visual field (360 degrees × 180 degrees) is obtained at one time, and due to the fact that the visual field is wide, the panoramic vision has very important significance for various industries depending on visual information to make decisions in the fields of civil use, military use and aerospace.

The image segmentation can provide pixel-level classification of scenes, and can complete detection tasks of various scene elements at the same time. Various image segmentation methods including semantic segmentation have been widely applied in the fields of intelligent vehicles, robots, visual aids, augmented reality systems, and the like.

However, current segmentation techniques are typically designed to work with conventional pinhole cameras and therefore can only acquire information at limited angles of view. Segmentation based on a convolutional neural network often requires a large amount of labeled data for training a network model, and most of large-scale data sets in the industry are images acquired by a common pinhole camera. The semantic segmentation network model trained by using the image data sets is not suitable for panoramic images, cannot be directly applied to panoramic cameras, and realizes 360-degree segmentation.

Disclosure of Invention

The invention aims to provide a panoramic image continuity segmentation method, which adopts a segmentation model F comprising N encoders and decoders to carry out continuity processing on a characteristic image boundary, and specifically comprises the following steps: and (3) taking element values from the characteristic images obtained from the convolution layers in the encoders Fi +1 and Fi-1 corresponding to the adjacent panoramic segmented images Pi +1 and Pi-1, and filling without adopting a default zero filling mode. By modifying the filling mode of the convolution layer, the information of the adjacent images can be considered in the prediction of each panoramic subsection image, so that continuous and seamless semantic type prediction can be realized, gaps caused by the subsections can be avoided, and blind areas can be eliminated.

Another object of the present invention is to provide a method for segmenting a panoramic image continuously, which segments an image using a segmentation model F including N encoders and decoders trained from segmented data sets, thereby eliminating the need to label the panoramic image data sets and reducing the time and cost for data preparation.

In order to achieve the above object, the present invention comprises the steps of:

(1) unfolding the panoramic image to obtain an image Pu; the deployment process may employ existing OCamCalib tools (Scaramuzza, d., Martinelli, a.and Siegwart, r.,2006, october.a. toolbox for easy simulation of systematic cameras.in2006IEEE/RSJ International conference on intellectual Robots and Systems (pp.5695-5701).

(2) Averagely dividing the image Pu into N sections along the unfolding direction to obtain a panoramic sectional image P₁、P₂,…P_i,…P_N，i＝1,2…N；

(3) Using image segmentation data set, trainingThe segmentation model F is obtained by training a segmentation network of the encoder-decoder type. Inputting the N panoramic segment images into a segmentation model F coder respectively to obtain images P corresponding to the panoramic segment images respectively₁、P₂,…P_i,…P_NCharacteristic image T of₁、T₂,…T,…T_N，i＝1,2…N；

Wherein for a panorama segmented image P_iEncoder F_iAnd filling the boundary of the characteristic image output by the kth convolutional layer by adopting adjacent element values, wherein the adjacent element values adopted by the left boundary of the characteristic image are as follows: encoder F_LThe right boundary element value in the feature image output by the (k-1) th convolutional layer in the image, and the adjacent element values adopted by the right boundary of the feature image are as follows: encoder F_RThe left boundary element value in the feature image output by the (k-1) th convolutional layer; k is more than or equal to 1; wherein, the characteristic image output by the 0 th convolution layer is the original image, namely the panoramic segmented image P_i。

That is, for the panorama segment image P_iEncoder F_iThe boundary of the feature image output by the middle 1 st convolutional layer is filled by adopting adjacent element values, wherein the adjacent element values adopted by the left boundary of the feature image are as follows: the right boundary element values of the original image, and the adjacent element values adopted by the right boundary of the feature image are as follows: left boundary element values of the original image; and an encoder F_iThe boundary of the characteristic image output by the kth (k is more than or equal to 2) convolutional layer is also filled by adopting adjacent element values, wherein the adjacent element values adopted by the left boundary of the characteristic image are as follows: encoder F_LThe right boundary element value in the feature image output by the (k-1) th convolutional layer in the image, and the adjacent element values adopted by the right boundary of the feature image are as follows: encoder F_RThe left boundary element value in the feature image output by the (k-1) th convolutional layer;

encoder F_i、F_L、F_RRespectively representing a method for processing a panorama segmented image P_i、P_L、P_RThe encoder of (1).

Subscripts L, R satisfy:

(4) feature image T₁～T_NAnd splicing along the unfolding direction of the panoramic image to obtain a spliced characteristic image T.

(5) Pooling the spliced characteristic image T along the unfolding direction of the panoramic image with the pooling proportion of N, and acquiring the pooled characteristic image T_p。

(6) Feature image T_pAnd inputting the panoramic image into a decoder, and up-sampling the resolution of the expanded panoramic image Pu to obtain a panoramic image segmentation image Ps. The upsampling method may use bilinear interpolation.

Further, the segmentation is based on semantics. Correspondingly, the segmentation model F is obtained by training a semantic segmentation data set of the pinhole camera image. The training set can adopt Cityscapes or Mapilary Vistas; encoder-decoder type semantic segmentation networks can be used such as ERFNet, SegNet, ERF-PSPNet. The cityscaps or MapillaryVistas and the semantic segmentation networks ERFNet, SegNet and ERF-PSPNet are common knowledge in the field, and specifically include:

data set cityscaps: cordts, M.M., Omran, M.S., Ramos, S.S., Rehfeld, T.S., Enzweiler, M.S., Benenson, R.S., Franke, U.S., Roth, S.and Schiele, B.S., 2016.The CIS maps data set for manufacturing urea rubber understandings. in Proceedings of The IEEE conference on computer vision and pattern recognition (pp.3213-3223).

Data set Mapillary Vistas: neuhold, G., Ollmann, T.S., Rota Bulo, S.and Kontscheder, P.2017. The mapillary vistas dataset for magnetic understandings of street scenes. in Proceedings of The IEEE International Conference on computer Vision (pp.4990-4999).

Semantic segmentation network ERFNet, Romera, E., Alvarez, J.M., Bergasa, L. M.and Arroyo, R.,2017.Erfnet, effective residual factored consistent net for real-time segmentation. IEEE Transactions on Intelligent transformation Systems,19(1), pp.263-272.

Semantic segmentation network SegNet: badrinarayanan, V., Kendall, A.and Cipola, R.,2017 Segnet A deep connected encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine analysis, 39(12), pp.2481-2495.

Semantic segmentation network ERF-PSPNet, Yang, K., Wang, K., Bergasa, L, M., Romera, E., Hu, W., Sun, D., Sun, J., Cheng, R., Chen, T.and L Lopez, E.2018, Unifying teran architecture for the visual affected third real-time semantic segmentation, 18(5), p.1506.

Compared with the conventional panoramic image semantic segmentation, the panoramic image semantic segmentation method has the following advantages that:

1. low cost, compact system and low delay. When the invention finishes 360-degree semantic segmentation, only one panoramic camera and one small-sized processor are needed. Compared with the prior art that 360-degree semantic information is acquired, a plurality of pinhole cameras or a plurality of fisheye cameras are needed, so that the device and cost are saved, the compactness of the system is ensured, and the system is suitable for systems such as intelligent vehicles, robots and vision assistance. In addition, in the past, a plurality of cameras need to be synchronized, and images acquired by the cameras and segmentation results need to be subjected to data fusion, so that delay is increased. The invention performs 360-degree semantic segmentation and only needs to process the image collected by one panoramic camera, thereby reducing redundancy and delay.

2. And a new image does not need to be marked, so that the time and the cost for preparing data are saved. The invention only needs the semantic segmentation data set of the conventional pinhole camera image for training, and does not need to mark the panoramic image data set by self, thereby reducing the time and cost for data preparation.

3. The generated semantic segmentation model has high reliability. The invention can train by using abundant and diversified data existing in the industry because of training by adopting the semantic segmentation data set of the conventional pinhole camera image, and can train to generate a reliable model.

4. The 360-degree continuous and seamless semantic segmentation can be realized. According to the invention, because the filling mode of the convolution layer is modified during deployment, and the information of the adjacent images is considered in the prediction of each section of panoramic segmented image, continuous and seamless semantic type prediction can be realized, gaps caused by segmentation can be avoided, and blind areas are eliminated.

5. Smooth semantic segmentation can be achieved. According to the invention, when the system is deployed, the characteristic images of different segments are spliced and pooled, so that noise is filtered out conveniently, and smooth semantic information prediction is realized.

Drawings

FIG. 1 is a schematic diagram of module connections;

FIG. 2 is a panoramic image;

FIG. 3 is a panoramic expansion image;

FIG. 4 shows the result of semantic segmentation of a panoramic image.

Detailed Description

The fact that the present invention is carried out and the technical effects thereof will be described in detail with reference to examples.

In the following embodiments, a panoramic camera is used to acquire a panoramic image as shown in fig. 2, and the image is segmented according to the following steps:

(1) the encoder-decoder type semantic segmentation network ERF-PSPNet is trained with the cityscaps data set to obtain a semantic segmentation model F, as shown in the following table, layers 1 to 16 being the encoder part and layers 17 to 20 being the decoder part.

(2) The panoramic image shown in fig. 2 is expanded, specifically:

setting the center coordinates as the origin O (0,0), X axis, Y axis of the plane coordinate system, the inner diameter of the panoramic image as R, the outer diameter of the panoramic image as R, setting the radius of the middle circle by R1 (R + R)/2, the azimuth angle as β tan-1 (Y/X), developing the panoramic column-shaped developed image by using the origin O (0,0), X axis and Y axis as the plane coordinate system, setting the intersection (R,0) of the inner diameter of the panoramic image and the X axis as the origin O (0,0) and developing in the azimuth angle β direction, establishing the corresponding relation between the pixel coordinate P (X, Y) of any point in the panoramic column-shaped developed image and the pixel coordinate Q (X, Y) in the panoramic image, and calculating the corresponding formula as the following steps:

x*＝y*/(tan(360x**/π(R+r)))

y*＝(y**+r)cosβ

in the formula, x, y are pixel coordinate values of the panoramic cylindrical expansion image, x, y are pixel coordinate values of the panoramic image, R is an outer diameter of the circular panoramic image, R is an inner diameter of the circular panoramic image, and β is an azimuth angle of the circular panoramic image coordinates.

The image obtained after unfolding is shown in fig. 3 and named Pu.

Those skilled in the art can also see Scaramuzza, d., Martinelli, a.and Siegwart, r.,2006, october.a. toolbox for easy simulation of arbitrary cameras.in2006IEEE/RSJ International Conference on Intelligent Robots and Systems (pp.5695-5701) IEEE, using the OCamCalib tool to expand the panoramic image shown in fig. 2.

(3) Dividing the image Pu into 4 segments along the spreading direction, i.e. the horizontal direction in the figure, and obtaining a panoramic segment image P₁、P₂,P₃,P₄。

(4) Segmenting 4 panoramas into images P₁、P₂,P₃,P₄Respectively input into the encoders of the segmentation models F to obtain respective corresponding panoramic segment images P₁、P₂,P₃,P₄Characteristic image T of₁、T₁,T₃,T₄One encoder corresponds to one panoramic segment image;

for panoramic segment image P_iEncoder F_iAnd filling the boundary of the characteristic image output by the kth convolutional layer by adopting adjacent element values, wherein the adjacent element values adopted by the left boundary of the characteristic image are as follows: encoder F_LThe right boundary element value in the feature image output by the (k-1) th convolutional layer in the image, and the adjacent element values adopted by the right boundary of the feature image are as follows: encoder F_RThe left boundary element value in the feature image output by the (k-1) th convolutional layer; k is more than or equal to 2;

encoder F_i、F_L、F_RIndividual watchFor processing panorama segment image P_i、P_L、P_RThe encoder of (1).

Subscripts L, R satisfy:

specifically, in the present embodiment, in the layers 1 to 8 in the corresponding table, when padding is performed in the convolution calculation at the boundary, the number of elements taken from the boundary of the adjacent feature image each time is 1, and in the layers 9 to 16, the number of elements taken is consistent with the extended convolution rate.

(5) Feature image T₁～T₄And splicing along the unfolding direction of the panoramic image to obtain a spliced characteristic image T.

(6) Pooling the spliced characteristic image T along the unfolding direction of the panoramic image with the pooling proportion of 4, and acquiring the pooled characteristic image T_p。

(7) Feature image T_pAnd inputting the images into a decoder, and adopting bilinear interpolation to up-sample the resolution of the expanded panoramic image Pu to obtain a panoramic image segmentation image Ps, as shown in fig. 4, wherein different gray values represent different segmentation categories.

Comparing fig. 4 with fig. 2 manually, it can be seen that the images are in smooth transition without discontinuity of the images due to segmentation of the panoramic image; the elements such as the passage, the vehicle, the house, the finer street lamp, the pedestrian and the like are clearly divided, the division result is accurate, and the precision is high; and the generated semantic segmentation model is adopted, so that the result reliability is high.

In addition, the speed of the image input from step 2 to the image output of step 7 on the Nvidia Titan RTX processor is as high as 40 frames per second, and the precision (average cross-over ratio) can be improved by more than 25% compared with the existing segmentation method that directly inputs the whole panoramic image without adaptation.

Claims

1. A method for continuous segmentation of a panoramic image, the method comprising at least:

(1) unfolding the panoramic image to obtain an image Pu;

(2) averagely dividing the image Pu into N sections along the unfolding direction to obtain a panoramic sectional image, and sequentially marking the panoramic sectional image as P from left to right₁、P₂，...P_i，…P_N，i＝1，2…N；

(3) An encoder-decoder type segmentation network is trained using the image segmentation dataset to obtain a segmentation model F. Inputting the N panoramic segment images into a segmentation model F coder respectively to obtain images P corresponding to the panoramic segment images respectively₁、P₂，...P_i，…P_NCharacteristic image T of₁、T₂，...T，…T_N，i＝1，2…N；

Wherein for a panorama segmented image P_iEncoder F_iAnd filling the boundary of the characteristic image output by the kth convolutional layer by adopting adjacent element values, wherein the adjacent element values adopted by the left boundary of the characteristic image are as follows: encoder F_LThe right boundary element value in the feature image output by the (k-1) th convolutional layer in the image, and the adjacent element values adopted by the right boundary of the feature image are as follows: encoder F_RThe left boundary element value in the feature image output by the (k-1) th convolutional layer; k is more than or equal to 1; wherein, the characteristic image output by the 0 th convolution layer is an original image: panorama segmented image P_i。

Encoder F_i、F_L、F_RRespectively representing a method for processing a panorama segmented image P_i、P_L、P_RSubscripts L, R satisfy:

(6) Feature image T_pAnd inputting the panoramic image into a decoder, and up-sampling the resolution of the expanded panoramic image Pu to obtain a panoramic image segmentation image Ps.

2. The method of claim 1, wherein the segmenting is based on semantics.

3. The method of claim 2, wherein the segmentation model F is trained using a semantic segmentation dataset of pinhole camera images.

4. The method of claim 1, wherein in step 6, the upsampling method may use bilinear interpolation.