CN109788270B

CN109788270B - 3D-360-degree panoramic image generation method and device

Info

Publication number: CN109788270B
Application number: CN201811619904.7A
Authority: CN
Inventors: 周强; 高宏彬
Original assignee: Nanjing Magewell Electronic Technology Co ltd
Current assignee: Nanjing Magewell Electronic Technology Co ltd
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2021-04-09
Anticipated expiration: 2038-12-28
Also published as: CN109788270A

Abstract

The invention relates to a 3D-360 degree panoramic image generation method and a device, which adopts a deep learning neural network method, utilizes surrounding multi-camera image frames to directly generate left and right eye views, further splices and synthesizes the 3D-360 degree panoramic image frames, obtains the whole process of the left and right eye views by automatic training through a network model and realizes the automatic interpolation of proper characteristics without additionally using an optical flow algorithm for calculation, greatly reduces the calculation amount, increases the robustness of panoramic generation, avoids unacceptable artifacts, improves the panoramic image generation speed, can deal with scenes with complex light, near-scene objects and insufficient textures in the image generation process, has high image quality, high efficiency, high quality and stability, and can meet the requirement of real-time panoramic image acquisition.

Description

3D-360-degree panoramic image generation method and device

Technical Field

The invention relates to the technical field of camera shooting and image processing, in particular to a 3D-360-degree panoramic image generation method and device.

Background

With the increasing development of Virtual Reality (VR) technology, the production of VR content is becoming a short board for the development of the whole industry. 3D-360 degree panoramic video in VR content is an important direction of video industry, and the performance of the acquisition equipment plays a crucial role in video image quality.

The 3D-360 degree panoramic image is a shot image field of view reaching 360 degrees in the horizontal direction, simultaneously contains left and right eye panoramic views with horizontal parallax, and can be respectively displayed on left and right eye screens of VR glasses, so that a three-dimensional panoramic image display effect with reality is realized.

The current 3D-360 degree panoramic image splicing implementation method comprises the following steps: a plurality of images are shot by surrounding multi-camera hardware, then optical flows are calculated for two adjacent images in the plurality of images, the optical flows are that the positions of any pixel in overlapped pictures of the two images correspond to the position of the other image, the left eye view and the right eye view are interpolated by using the pixel level corresponding relation, and if a pair of virtual cameras simulating human eyes are positioned behind the two cameras, the imaging of the visual fields of the two virtual cameras between the two cameras can be completely obtained by interpolation of the two real cameras. To achieve 3D effect, two images of the left and right eyes of the same scene must be synthesized, and a 360-degree panorama is obtained by sequentially connecting a plurality of left and right eye images synthesized by two cameras.

In the above-described conventional 3D-360 panoramic image stitching process, the link directly determining the stitching quality is optical flow calculation of adjacent images, and the pixel level correspondence of the accurate adjacent views can generate a flawless and artifact-free interpolation result. The optical flow calculation is always a difficult problem in the computer vision industry, and is easy to fail in scenes with complex light, close objects, and poor texture, and the like, and the spliced images are directly mistaken at the time. Meanwhile, optical flow calculation is to obtain the pixel level position correspondence of any pixel in an image in another view, and the calculation amount is often large, so that the panoramic real-time stitching application which aims to achieve a high frame rate is difficult, the hardware calculation cost of the panoramic camera is increased, and the large-scale application of panoramic stitching is limited.

Disclosure of Invention

The invention aims to improve the existing 3D-360-degree panoramic image generation method and provides a 3D-360-degree panoramic image generation method and a device.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

A3D-360 degree panoramic image generation method comprises the following steps:

acquiring images of cameras surrounding a plurality of cameras, one image for each camera;

preprocessing a plurality of acquired images into images meeting the input requirements of a network model; the network model is obtained by performing iterative training on a plurality of groups of image sample sets generated by the virtual camera and 3D left and right eye views corresponding to the image sample sets;

inputting the preprocessed multiple images into a network model, and calculating to obtain multiple left and right eye views;

carrying out post-processing on the obtained multiple left and right eye views, and recovering the size and the pixel value range of the original image;

splicing the plurality of left-eye views after the post-processing according to a sequence to obtain a left-eye panoramic view, splicing the plurality of right-eye views according to a sequence to obtain a right-eye panoramic view, and splicing the left-eye panoramic view and the right-eye panoramic view up and down to obtain the required panoramic image.

Further, the network model is obtained through convolutional neural network training, and the convolutional neural network comprises a plurality of first convolutional layers, a first activation function layer, a pooling layer, a plurality of anti-convolutional layers, a second convolutional layer and a second activation function layer which are sequentially connected.

Further, the training method of the network model comprises the following steps:

acquiring an image sample set generated by a virtual camera and a plurality of 3D left and right eye views corresponding to the image sample set;

preprocessing an image sample set;

inputting each group of preprocessed image sample set into a convolutional neural network, outputting a plurality of generated left and right eye views, calculating prediction errors according to the left and right eye views obtained by calculation of each group of image sample set and the left and right eye views obtained by a virtual camera, and performing iterative training on the convolutional neural network by adopting a supervised back propagation method to obtain a deep learning network model.

Further, the method for generating a plurality of left and right eye views by using the convolutional neural network comprises the following steps:

s1: performing convolution operation on the acquired image through the first convolution layer, performing nonlinear transformation on a convolution operation result through the first activation function layer, and performing pooling operation on a nonlinear transformation result through the pooling layer;

s2: repeating S1 to obtain a plurality of feature maps with descending scales;

s3: sampling the characteristic diagram result of the processing result of the front half part through the deconvolution layer to obtain a characteristic diagram with a plurality of scales rising continuously; then, connecting the first half part characteristic diagram and the second half part characteristic diagram of the network with the same scale in parallel, performing convolution operation on the processing result through a second convolution layer, and performing nonlinear transformation on the convolution operation result of the second convolution layer through a second activation function layer;

s4: the S3 is repeated to obtain the prediction results of the plurality of left and right eye views.

Further, the activation functions adopted in the first activation function layer and the second activation function layer are linear rectification functions; the pooling layer adopts a maximum pooling mode.

Further, the method for preprocessing the image or the image sample set comprises the following steps:

scaling the image to a standard size;

normalizing the scaled image pixel values so that all pixel values are between 0 and 1;

and averaging 0 of each pixel value in the normalized image.

Further, the image post-processing method comprises the following steps:

multiplying the pixel value of the image by a coefficient to restore the pixel value to an original pixel value range;

and enlarging the image restored by the numerical value range to the standard size.

Further, the method for acquiring the image sample set and the 3D left and right eye views corresponding to the image sample set by the virtual camera includes:

simulating a plurality of virtual cameras to be placed in a surrounding mode by using a VR graphic engine to form an annular virtual camera set;

setting a first group of virtual cameras to be completely horizontally arranged in equal proportion, wherein a virtual imaging scene comprises objects and textures of depth of field, and each virtual camera performs imaging independently to obtain a group of image sets;

placing a second group of two virtual cameras in the circular shape of the annular virtual camera set, wherein the second group of virtual cameras only record pixels in the vertical direction right in front of the optical center of the cameras, simulating the second group of two virtual cameras to rotate by taking the circle center of the annular virtual camera as the center, recording scanning imaging right in front of the optical center of the second group of two virtual cameras, recording a left eye view and a right eye view respectively when each scanning passes through the two peripheral virtual cameras, and recording left eye views and right eye views corresponding to any two adjacent peripheral virtual cameras when the second group of two virtual cameras are rotated by 360 degrees;

and repeating the steps to obtain a plurality of groups of image samples and left and right eye view data corresponding to the image samples.

The device of the invention is realized by the following technical scheme:

a 3D-360 degree panoramic image generation apparatus comprising:

the surrounding multi-camera image acquisition module is used for acquiring images shot by surrounding multi-cameras, and each camera acquires an image;

the network model training module is used for training a network model; the network model is obtained by performing iterative training on a plurality of groups of image sample sets generated by the virtual camera and 3D left and right eye views corresponding to the image sample sets;

the image preprocessing module is connected with the surrounding multi-camera image acquisition module and used for preprocessing the acquired image into an image meeting the input requirement of a network model;

the view prediction module is connected with the network model training module and the image preprocessing module and is used for inputting a plurality of preprocessed images into the network model generated by the network model training module to obtain a plurality of left and right eye views;

the view post-processing module is connected with the view prediction module and used for restoring the acquired multiple left and right eye views to the original image size and the pixel value range;

and the panoramic image splicing module is connected with the view post-processing module and used for splicing all the left eye views subjected to post-processing to obtain a left eye panoramic view, splicing all the right eye views subjected to post-processing to obtain a right eye panoramic view, and splicing the left eye panoramic view and the right eye panoramic view up and down to obtain the required panoramic image.

Further, the network model training module performs training based on a convolutional neural network, and comprises a plurality of first convolutional layers, a first activation function layer, a pooling layer, a plurality of anti-convolutional layers, a second convolutional layer and a second activation function layer which are connected in sequence. (ii) a

The first convolution layer performs convolution operation on the acquired image;

the first activation function layer carries out nonlinear transformation on the convolution operation result;

the pooling layer performs pooling operation on the nonlinear transformation result to obtain a characteristic diagram with a plurality of scales descending continuously;

the deconvolution layer samples the result of the characteristic diagram, and then the characteristic diagram with the same scale is connected in parallel with a new characteristic diagram obtained by sampling the characteristic diagram;

the second convolution layer carries out convolution operation on the processing result of the deconvolution layer;

and the second activation function layer performs nonlinear transformation on the convolution operation result of the second convolution layer to obtain a plurality of prediction results of left and right eye views.

The invention adopts a deep learning neural network method, utilizes surrounding multi-camera image frames to directly generate left and right eye views, and further splices and synthesizes a 3D-360-degree panoramic image, the whole process of acquiring the left and right eye views is realized by automatic interpolation of proper characteristics obtained by automatic training of a network model, optical flow algorithm calculation is not needed, the calculated amount is greatly reduced, the robustness of panoramic generation is increased, meanwhile, unacceptable artifacts are avoided, the panoramic image generation speed is improved, meanwhile, the image generation process can deal with scenes with complex light, close-range objects and insufficient textures, the image quality is high, high-efficiency, high-quality and high-stability output is realized, and the requirement of real-time panoramic image acquisition can be met.

Drawings

FIG. 1 is a view showing the structure of the apparatus of the present invention.

Detailed Description

The technical solution of the present invention will be further described with reference to the accompanying drawings and detailed description.

Example 1

This example specifically illustrates an implementation of the method of the present invention.

The 3D-360 degree panoramic image generation method comprises the following steps:

s100, acquiring images of cameras surrounding a plurality of cameras, wherein each camera has one image;

s200, preprocessing a plurality of acquired images into images meeting the input requirements of a network model; the method comprises the following steps:

scaling the image to a standard size;

and averaging 0 of each pixel value in the normalized image.

S300, inputting the preprocessed multiple images into a network model, and calculating to obtain multiple left and right eye views;

the network model is obtained through convolutional neural network training, the convolutional neural network comprises a plurality of first convolutional layers, a first activation function layer, a pooling layer, a plurality of anti-convolutional layers, a second convolutional layer and a second activation function layer which are sequentially connected, and the training method comprises the following steps:

s310, acquiring an image sample set generated by the virtual camera and a plurality of 3D left and right eye views corresponding to the image sample set;

the method for acquiring the image sample set and the 3D left and right eye views corresponding to the image sample set through the virtual camera comprises the following steps:

simulating a plurality of virtual cameras to be placed in a surrounding mode by using a VR graphic Engine (such as a non-regional Engine, Unity 3D, CryENGINE and the like) to form an annular virtual camera group;

the method comprises the steps that a first group of virtual cameras are arranged in a system and are arranged in a horizontal equal proportion completely, a virtual imaging scene comprises objects and textures of depth of field, the simulated scene comprises an indoor space and an outdoor space, the simulated objects comprise people, buildings, office supplies, trees, flowers and plants, large stadiums, parks, the sky, the sea bottom, tunnels and the like, real world textures can be attached to the virtual scene, and each virtual camera is used for imaging independently to obtain a group of image sets.

The second group of two virtual cameras are placed in the circular ring of the annular virtual camera set, the distance between the two virtual cameras is set to be 6.4cm, the two virtual cameras of the second group only record pixels in the vertical direction in front of the optical center of the cameras, the virtual cameras of the second group are simulated to rotate by taking the circle center of the annular virtual camera set as the center, scanning imaging in front of the optical center of the two virtual cameras is recorded, each scanning passes through the two peripheral virtual cameras, namely, a left eye view and a right eye view are respectively recorded, and the two virtual cameras of the second group rotate for 360 degrees, so that left eye views and right eye views corresponding to any two adjacent peripheral virtual cameras are recorded.

And repeating the process to obtain a large amount of imaging data of the surrounding multi-camera for training and left and right eye diagram data corresponding to the imaging data.

S320, preprocessing an image sample set;

the method comprises the following steps:

scaling the image to a standard size;

and averaging 0 of each pixel value in the normalized image.

S330, inputting each group of preprocessed image sample sets into a convolutional neural network, and outputting a plurality of generated left and right eye views, wherein the steps are as follows:

s331, performing convolution operation on the acquired image through the first convolution layer, performing nonlinear transformation on a convolution operation result through the first activation function layer, and performing pooling operation on a nonlinear transformation result through the pooling layer; the activation function adopted in the first activation function layer is a linear rectification function; the pooling layer adopts a maximum pooling mode;

s332, repeating S1 to obtain a plurality of feature maps with descending scales;

s333, sampling the feature map result of the processing result of the front half part through the deconvolution layer to obtain a feature map with a plurality of scales rising continuously; then, connecting the first half part characteristic diagram and the second half part characteristic diagram of the network with the same scale in parallel, performing convolution operation on the processing result through a second convolution layer, and performing nonlinear transformation on the convolution operation result of the second convolution layer through a second activation function layer; the activation function adopted in the second activation function layer is a linear rectification function;

s334 repeats S3 to obtain the prediction results of the left and right eye views.

And calculating prediction errors according to the left and right eye views obtained by calculating each group of image sample sets and the left and right eye views obtained by the virtual camera, and performing iterative training on the convolutional neural network by adopting a supervised back propagation method to obtain a deep learning network model.

S340, post-processing the left and right eye views to restore the size and the pixel value range of the original image;

the method comprises the following steps:

multiplying the pixel value of the image by a coefficient to restore the pixel value to an original pixel value range; for example, for an image with an original pixel value range of 0-255, multiplying the pixel value of the image by a coefficient of 255 to restore the image;

S500, splicing the plurality of left-eye views after the post-processing in sequence to obtain a left-eye panoramic view, splicing the plurality of right-eye views in sequence to obtain a right-eye panoramic view, and splicing the left-eye panoramic view and the right-eye panoramic view up and down to obtain the required panoramic image.

Example 2

This example specifically illustrates an implementation of the apparatus of the present invention.

The 3D-360 degree panorama image generating apparatus as shown in fig. 1 includes:

The network model training module is used for training based on a convolutional neural network and comprises a plurality of first convolution layers, a first activation function layer, a pooling layer, a plurality of deconvolution layers, a second convolution layer and a second activation function layer which are sequentially connected. (ii) a

And calculating prediction errors according to the predicted left and right eye views and the left and right eye views obtained by the virtual camera corresponding to each group of images, and performing iterative training on the convolutional neural network and the deconvolution network by adopting a supervised back propagation method to obtain a deep learning network model.

The image preprocessing module comprises:

a scaling unit scaling the image to a standard size;

the normalization unit normalizes the zoomed image pixel values to ensure that all the pixel values are between 0 and 1;

and the normalization unit is used for carrying out 0 equalization processing on each pixel value in the normalized image.

The apparatus further comprises a virtual camera module; the system is used for acquiring an image sample set required by a network model training module and left and right eye view records corresponding to the images, and simulating the disturbance of the position and orientation of the camera during imaging through the parameter setting of the virtual camera.

The view post-processing module includes:

the value range recovery unit is used for recovering all the pixel value ranges of the images in the predicted left and right eye views to an original image value range;

and a scaling unit scaling the image to a standard size.

Claims

1. A3D-360 degree panoramic image generation method is characterized by comprising the following steps:

preprocessing a plurality of acquired images into images meeting the input requirements of a network model; the network model is obtained by performing iterative training on a plurality of groups of image sample sets generated by the virtual camera and 3D left and right eye views corresponding to the image sample sets; the network model is obtained through convolutional neural network training, and the convolutional neural network comprises a plurality of first convolutional layers, a first activation function layer, a pooling layer, a plurality of deconvolution layers, a second convolutional layer and a second activation function layer which are sequentially connected;

the training method of the network model comprises the following steps:

preprocessing an image sample set;

inputting each group of preprocessed image sample set into a convolutional neural network, outputting a plurality of generated left and right eye views, calculating prediction errors according to the left and right eye views obtained by calculation of each group of image sample set and the left and right eye views obtained by a virtual camera, and performing iterative training on the convolutional neural network by adopting a supervised back propagation method to obtain a deep learning network model;

the method for generating a plurality of left and right eye views by using a convolutional neural network comprises the following steps:

s2: repeating S1 to obtain a plurality of feature maps with descending scales;

s4: repeating S3 to obtain the prediction results of a plurality of left and right eye views;

2. The method for generating the 3D-360 degree panoramic image according to claim 1, wherein the activation functions adopted in the first activation function layer and the second activation function layer are linear rectification functions; the pooling layer adopts a maximum pooling mode.

3. A method for generating a 3D-360 degree panoramic image according to claim 1, wherein the method for preprocessing the image or image sample set comprises:

scaling the image to a standard size;

and averaging 0 of each pixel value in the normalized image.

4. A 3D-360 degree panorama image generating method according to claim 1, wherein said image post-processing method comprises:

5. The method of claim 1, wherein the method of obtaining the image sample set and the 3D left and right eye views corresponding thereto by the virtual camera comprises:

6. A 3D-360 degree panoramic image generation apparatus, comprising:

the network model training module is used for training based on a convolutional neural network and comprises a plurality of first convolution layers, a first activation function layer, a pooling layer, a plurality of anti-convolution layers, a second convolution layer and a second activation function layer which are sequentially connected;

the second activation function layer carries out nonlinear transformation on the convolution operation result of the second convolution layer to obtain a plurality of prediction results of left and right eye views;