CN112634127B

CN112634127B - Unsupervised stereo image redirection method

Info

Publication number: CN112634127B
Application number: CN202011528334.8A
Authority: CN
Inventors: 雷建军; 范晓婷; 张哲�; 彭勃
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2022-07-29
Anticipated expiration: 2040-12-22
Also published as: CN112634127A

Abstract

The invention discloses an unsupervised stereo image redirection method, which comprises the following steps: acquiring an attention diagram of the three-dimensional image by using a multi-stage attention generating module; constructing viewpoint synthesis loss by using the correlation between viewpoints of the left image and the right image; constructing a three-dimensional cycle consistency loss by utilizing the consistency of the three-dimensional images before and after redirection; and constructing a total loss function based on viewpoint synthesis loss and stereoscopic circulation consistency loss, and training unsupervised stereoscopic image redirection through the total loss function to obtain a redirected stereoscopic image. The invention utilizes an unsupervised deep learning mode, adopts a multi-level attention generation module to extract high-level characteristics, extracts information of a salient region, and utilizes unsupervised viewpoint synthesis loss and three-dimensional cycle consistency loss to ensure the geometric structure and depth information of a three-dimensional image and realize the redirection of the three-dimensional image.

Description

Unsupervised stereo image redirection method

Technical Field

The invention relates to the technical field of image processing and stereoscopic vision, in particular to an unsupervised stereoscopic image redirection method.

Background

Stereoscopic images can provide an immersive visual experience and are of great interest to the industry and academia. As different stereoscopic display devices grow, there is a need to display stereoscopic images and video on devices of different resolutions and target aspect ratios. The stereo image redirection technology aims at intelligently processing multimedia contents of display equipment, enables the multimedia contents to adapt to screens of different sizes, and can be widely applied to the fields of virtual reality, human-computer interaction and the like.

Currently, 2D image redirection methods are classified into discrete methods and continuous methods. The discrete method is to change the original size by removing or inserting pixels that contribute the least amount of energy to the image. However, this approach tends to cause discontinuity artifacts in the salient content, resulting in visual distortion. In contrast, the continuous method, which uses a quadrilateral mesh or a triangular mesh to maintain a salient region, achieves image scaling by calculating an optimal non-uniform mesh deformation. However, the continuous method may cause deformation of the salient region. In recent years, 2D image retargeting algorithms based on deep learning have been developed. For example, Cho et al propose an image redirection method based on a deep convolutional neural network, which utilizes a codec model learning attention map and designs a content-aware based offset layer warped image. Lin et al propose a framework for image redirection from coarse to fine and redirect feature maps to target size with uniform resampling on each convolution layer. The existing 2D image reorientation model based on deep learning shows that the deep learning has good performance in the aspects of understanding the salient content of the image and extracting the interested region.

Compared with conventional 2D image redirection, stereoscopic image redirection not only needs to avoid image content and shape distortion, but also needs to ensure parallax consistency of the stereoscopic image. Stereoscopic image redirection is also classified into two categories, namely discrete methods and continuous methods. The discrete method addresses the stereoscopic image redirection problem by consistently removing seams of uniform regions in the left and right images. For example, Utsugi et al and Basha et al extend the Seam-carving algorithm for 2D images into the stereoscopic image retargeting task by introducing depth constraints. The continuous method realizes image scaling by optimizing deformed grids in the stereo image. Chang et al define stereo image retargeting as an energy minimization problem and handle left and right image deformation by sparse stereo correspondences in the mesh deformation field.

In the process of implementing the invention, the inventor finds that at least the following disadvantages and shortcomings exist in the prior art:

in the method in the prior art, a 2D image redirection technology is utilized to process a stereo image, parallax information of the stereo image is ignored, deformation of a salient region is inconsistent, and depth perception of a 3D scene is further weakened; existing discrete-based stereo image redirection methods may cause distortion in the shape and content of the stereo image, while continuous-based stereo image redirection methods typically cause parallax distortion of the stereo image.

Disclosure of Invention

The invention provides an unsupervised stereo image redirection method, which utilizes an unsupervised deep learning mode, adopts a multi-level attention generation module to extract high-level features, extracts information of a salient region, and utilizes unsupervised viewpoint synthesis loss and stereo cycle consistency loss to ensure the geometric structure and depth information of a stereo image to realize stereo image redirection, and the detailed description is as follows:

an unsupervised stereo image redirection method, the method comprising the steps of:

acquiring an attention diagram of a three-dimensional image by using a multi-stage attention generation module;

constructing viewpoint synthesis loss by using the correlation between viewpoints of the left image and the right image;

constructing a three-dimensional cycle consistency loss by utilizing the consistency of the three-dimensional images before and after redirection;

and constructing a total loss function based on viewpoint synthesis loss and stereoscopic circulation consistency loss, and training unsupervised stereoscopic image redirection through the total loss function to obtain a redirected stereoscopic image.

Wherein the viewpoint synthesis loss is used to facilitate generation of a high quality target stereoscopic image having an accurate inter-viewpoint relationship; and (3) stereo cycle consistency loss, wherein the significant information and parallax relation used for encouraging the reconstructed image are similar to the corresponding original image.

Further, the viewpoint synthesis loss is:

wherein H 'and W' represent the width and height of the reoriented stereoscopic image, D _t (u, v) is a disparity map of the target stereo image, (u, v) is pixel coordinates in the image,

a pixel representing a left image of the object,

the pixels representing the right image of the object,

represented as a composite target left image obtained by image deformation

The stereo cyclic consistency loss consists of a content consistency item and a parallax consistency item, and the stereo cyclic consistency loss L _SL The definition is as follows:

L _SL ＝L _CL +κL _DL

wherein L is _CL Representing content consistency items, L _DL A disparity consistency item is represented, and kappa represents weight parameters of shape consistency and depth consistency;

the parallax consistency item is used for constructing parallax constraint between the deformed reconstructed stereo image and the original stereo image, the parallax between the reconstructed left and right images is close to the parallax between the original left and right images, and the parallax consistency item L _DL Is defined as:

wherein H is the width of the original stereoscopic image, W is the height of the original stereoscopic image,

in the form of the original left image,

in the form of the original right image,

and

to re-enter the target image into the reconstructed left and right images obtained in the proposed unsupervised stereo image retargeting model, (u, v) are the pixel coordinates in the image.

The technical scheme provided by the invention has the beneficial effects that:

1. the method can uniformly zoom the background area while keeping the structure of the significant object in the stereo image, effectively keep the parallax consistency of the stereo image and obtain the high-quality redirected stereo image;

2. the invention solves the problem of three-dimensional image redirection in an unsupervised deep learning mode, positions the salient objects in an unsupervised mode and learns the three-dimensional relationship to obtain the target three-dimensional images with consistent structures, and retains the original depth values of the three-dimensional scenes.

Drawings

FIG. 1 is a flow chart of an unsupervised stereo image redirection method;

fig. 2 is a diagram illustrating depth distortion score versus result of a redirected stereo image.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

The embodiment of the invention designs an unsupervised stereo image redirection method. The method consists of three parts: the multi-level attention generation module extracts high-level semantic features to obtain information extraction of the salient region; viewpoint synthesis loss is used to generate high quality redirected stereo images; the stereo cyclic consistency loss is used to guarantee the geometry and depth information of the stereo image. The stereo image redirection method realizes the self-adaptive adjustment of the resolution of the image, and maintains the structure and parallax consistency relation of the salient objects in the image, which is described in detail in the following:

An unsupervised stereo image redirection method, see fig. 1, comprising the steps of:

step 1: acquiring an attention diagram of a three-dimensional image by using a multi-stage attention generation module;

in order to obtain the information of the salient objects in the image, the embodiment of the invention designs the attention diagram of the image acquired by the multi-stage attention generating module. The module consists of a basic feature extraction network and a plurality of attention modules, wherein the basic feature extraction network adopts an encoding and decoding structure, obtains multilayer features through a plurality of convolution layers, and recovers feature space information by utilizing a plurality of deconvolution layers. The encoder architecture is based primarily on VGG-16 capturing high-level feature maps, which the decoder network upsamples to preserve the original resolution of the input image. Furthermore, three convolution block attention models are inserted into the encoder of the basic feature extraction network to adequately learn salient objects from coarse to fine.

Specifically, the attention maps are generated by fusing the three feature maps generated by Conv 3-3, Conv 4-3 and Conv 5-3 with the convolution block attention model. Then, a plurality of deconvolution upsampling is carried out on the three attention diagrams, the attention diagrams are restored to the original space dimension and further fused, and a final attention diagram is generated. After the attention map is acquired, the attention map is input to an offset layer, and the reorientation of the stereo image in the depth feature space is realized.

And 2, step: constructing viewpoint synthesis loss by using the correlation between viewpoints of the left image and the right image;

to describe inter-viewpoint correlation between left and right images and further supervise inter-viewpoint relation of a target stereoscopic image. The embodiment of the invention deforms the target right image and the target disparity map to obtain the synthesized target left image, and promotes the synthesized target left image to be consistent with the content and the depth information of the original target left image in the stereo image as far as possible. Suppose that

A pixel representing a left image of the object,

the pixels representing the right image of the object,

represented as a composite target left image obtained by image deformation

Wherein the view point synthesis loss function L _VL The definition is as follows:

wherein H 'and W' represent the width and height of the reoriented stereoscopic image, D _t (u, v) is a disparity map of the target stereoscopic image, and (u, v) is a pixel coordinate in the image.

And step 3: and constructing a stereo cycle consistency loss by utilizing the consistency of the stereo images before and after redirection.

The goal of stereo image redirection is to maintain the original shape of visually salient objects while minimizing visual distortion. Furthermore, it is also necessary to maintain depth perception between the input and output stereo images.

Therefore, the embodiment of the invention designs the consistency loss of the stereo circulation by utilizing the consistency of the stereo images before and after the redirection, keeps the global structure of the stereo images and enhances the 3D visual experience. The stereo cyclic consistency loss consists of a content consistency term and a disparity consistency term.

Stereo cyclic consistency loss L _SL The definition is as follows: l is a radical of an alcohol _SL ＝L _CL +κL _DL (2)

Wherein L is _CL Representing content consistency items, L _DL The term "disparity consistency" is expressed, and κ represents the weight parameters of shape consistency and depth consistency.

1) Content consistency item

In order to maintain the shape structure of the salient object, the embodiment of the invention designs a content consistency item to evaluate the similarity between the reconstructed image and the original image. It is an object of embodiments of the present invention that when modifying the aspect ratio of an original stereo image, the reconstructed stereo image should be similar to the original stereo image. In this term, Structural Similarity (SSIM) and the L1 norm combine as a content consistency term,for comparing the original image with its reconstructed image. Content consistency item L _CL Is defined as:

wherein the content of the first and second substances,

and

representing original left and right images, H and W representing the width and height of the original stereoscopic image,

and

representing reconstructed left and right images obtained by re-entering the target image into the proposed unsupervised stereo image retargeting model, S (-) representing structural similarity, η representing a weight factor,

For the content consistency item of the left image,

is the content consistency item for the right image.

2) Disparity consistency term

In order to maintain the parallax relationship of the stereoscopic image and obtain a 3D visual experience similar to that of the original stereoscopic image, the embodiment of the present invention designs a parallax consistency item based on parallax clues. The disparity consistency term is used to construct disparity constraints between the deformed reconstructed stereoscopic image and the original stereoscopic image. Specifically, between the reconstructed left and right imagesThe disparity should be close to the disparity between the original left and right images. Disparity consistency term L _DL Is defined as:

in summary, the total loss function L is used _total And training an unsupervised stereo image redirection model (namely a deep learning model consisting of a multi-stage attention generation module, a shift layer, viewpoint synthesis loss and stereo cycle consistency loss) to obtain a redirected stereo image. Total loss function L _total The method is composed of the two loss functions:

L _total ＝L _SL +αL _VL (6)

wherein L is _SL For stereo cyclic consistency loss that encourages the similarity of the saliency information and disparity relationships of the reconstructed images to the corresponding original images, L _VL To view synthesis penalty, which facilitates the generation of high quality target stereo images with more precise inter-view relationships, α is a weighting factor.

Fig. 2 lists depth distortion score comparison results for a post-reorientation stereo image, the comparison algorithm comprising: a WSSDCNN method and a DPS method, wherein the WSSDCNN method is a 2D image redirection algorithm and the DPS method is a stereoscopic image redirection algorithm. The smaller the depth distortion, the higher the stereoscopic experience. As shown, the WSSDCNN method introduces large depth distortion due to the lack of information supplementation between the left and right images. In addition, the depth distortion results for the DPS method are higher than those of the present invention because the method cannot maintain geometric consistency in some test data. As can be seen from fig. 2, this method can achieve better redirection results with accurate inter-viewpoint relationships and disparity through viewpoint synthesis and stereo cycle consistency loss.

In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An unsupervised stereo image redirection method, characterized in that the method comprises the steps of:

acquiring an attention diagram of the stereo image by using a multi-stage attention generation module, inputting the attention diagram into an offset layer, and realizing the redirection of the stereo image in a depth feature space;

constructing viewpoint synthesis loss by using the correlation between viewpoints of the left image and the right image, and deforming the target right image and the target disparity map to obtain a synthesized target left image and promote the content and depth information of the synthesized target left image to be consistent with those of the original target left image in the stereo image;

constructing a total loss function based on viewpoint synthesis loss and stereoscopic circulation consistency loss, and training an unsupervised stereoscopic image redirection model by using the total loss function to obtain a redirected stereoscopic image, wherein the unsupervised stereoscopic image redirection model is a deep learning model consisting of a multi-stage attention generation module, a shift layer, viewpoint synthesis loss and stereoscopic circulation consistency loss;

wherein the content of the first and second substances,

the viewpoint synthesis loss is:

wherein H 'and W' represent the width and height of the reoriented stereoscopic image, D _t (u, v) is a disparity map of the target stereo image,(u, v) are pixel coordinates in the image,

a pixel representing a left image of the object,

the pixels representing the right image of the object,

represented as a composite target left image obtained by image deformation

Wherein the stereo cycle consistency loss is composed of a content consistency item and a parallax consistency item, and the stereo cycle consistency loss L _SL The definition is as follows:

L _SL ＝L _CL +κL _DL

h and W represent the width and height of the original stereoscopic image,

for the content consistency item of the left image,

is the content consistency item for the right image.

2. An unsupervised stereo image redirection method according to claim 1,

a viewpoint synthesis penalty for facilitating generation of a high quality target stereoscopic image having an accurate inter-viewpoint relationship;

and (3) stereo cycle consistency loss, wherein the significant information and parallax relation used for encouraging the reconstructed image are similar to the corresponding original image.

3. An unsupervised stereo image redirection method according to claim 1 or 2, wherein the disparity consistency term is used to construct a disparity constraint between the deformed reconstructed stereo image and the original stereo image, the disparity between the reconstructed left and right images is close to the disparity between the original left and right images, and the disparity consistency term L is used to determine the disparity between the original left and right images _DL Is defined as:

in the form of the original left image,

in the form of the original right image,

and