CN112634127B - Unsupervised stereo image redirection method - Google Patents

Unsupervised stereo image redirection method Download PDF

Info

Publication number
CN112634127B
CN112634127B CN202011528334.8A CN202011528334A CN112634127B CN 112634127 B CN112634127 B CN 112634127B CN 202011528334 A CN202011528334 A CN 202011528334A CN 112634127 B CN112634127 B CN 112634127B
Authority
CN
China
Prior art keywords
image
consistency
loss
stereo
redirection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011528334.8A
Other languages
Chinese (zh)
Other versions
CN112634127A (en
Inventor
雷建军
范晓婷
张哲�
彭勃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202011528334.8A priority Critical patent/CN112634127B/en
Publication of CN112634127A publication Critical patent/CN112634127A/en
Application granted granted Critical
Publication of CN112634127B publication Critical patent/CN112634127B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06T3/04

Abstract

The invention discloses an unsupervised stereo image redirection method, which comprises the following steps: acquiring an attention diagram of the three-dimensional image by using a multi-stage attention generating module; constructing viewpoint synthesis loss by using the correlation between viewpoints of the left image and the right image; constructing a three-dimensional cycle consistency loss by utilizing the consistency of the three-dimensional images before and after redirection; and constructing a total loss function based on viewpoint synthesis loss and stereoscopic circulation consistency loss, and training unsupervised stereoscopic image redirection through the total loss function to obtain a redirected stereoscopic image. The invention utilizes an unsupervised deep learning mode, adopts a multi-level attention generation module to extract high-level characteristics, extracts information of a salient region, and utilizes unsupervised viewpoint synthesis loss and three-dimensional cycle consistency loss to ensure the geometric structure and depth information of a three-dimensional image and realize the redirection of the three-dimensional image.

Description

Unsupervised stereo image redirection method
Technical Field
The invention relates to the technical field of image processing and stereoscopic vision, in particular to an unsupervised stereoscopic image redirection method.
Background
Stereoscopic images can provide an immersive visual experience and are of great interest to the industry and academia. As different stereoscopic display devices grow, there is a need to display stereoscopic images and video on devices of different resolutions and target aspect ratios. The stereo image redirection technology aims at intelligently processing multimedia contents of display equipment, enables the multimedia contents to adapt to screens of different sizes, and can be widely applied to the fields of virtual reality, human-computer interaction and the like.
Currently, 2D image redirection methods are classified into discrete methods and continuous methods. The discrete method is to change the original size by removing or inserting pixels that contribute the least amount of energy to the image. However, this approach tends to cause discontinuity artifacts in the salient content, resulting in visual distortion. In contrast, the continuous method, which uses a quadrilateral mesh or a triangular mesh to maintain a salient region, achieves image scaling by calculating an optimal non-uniform mesh deformation. However, the continuous method may cause deformation of the salient region. In recent years, 2D image retargeting algorithms based on deep learning have been developed. For example, Cho et al propose an image redirection method based on a deep convolutional neural network, which utilizes a codec model learning attention map and designs a content-aware based offset layer warped image. Lin et al propose a framework for image redirection from coarse to fine and redirect feature maps to target size with uniform resampling on each convolution layer. The existing 2D image reorientation model based on deep learning shows that the deep learning has good performance in the aspects of understanding the salient content of the image and extracting the interested region.
Compared with conventional 2D image redirection, stereoscopic image redirection not only needs to avoid image content and shape distortion, but also needs to ensure parallax consistency of the stereoscopic image. Stereoscopic image redirection is also classified into two categories, namely discrete methods and continuous methods. The discrete method addresses the stereoscopic image redirection problem by consistently removing seams of uniform regions in the left and right images. For example, Utsugi et al and Basha et al extend the Seam-carving algorithm for 2D images into the stereoscopic image retargeting task by introducing depth constraints. The continuous method realizes image scaling by optimizing deformed grids in the stereo image. Chang et al define stereo image retargeting as an energy minimization problem and handle left and right image deformation by sparse stereo correspondences in the mesh deformation field.
In the process of implementing the invention, the inventor finds that at least the following disadvantages and shortcomings exist in the prior art:
in the method in the prior art, a 2D image redirection technology is utilized to process a stereo image, parallax information of the stereo image is ignored, deformation of a salient region is inconsistent, and depth perception of a 3D scene is further weakened; existing discrete-based stereo image redirection methods may cause distortion in the shape and content of the stereo image, while continuous-based stereo image redirection methods typically cause parallax distortion of the stereo image.
Disclosure of Invention
The invention provides an unsupervised stereo image redirection method, which utilizes an unsupervised deep learning mode, adopts a multi-level attention generation module to extract high-level features, extracts information of a salient region, and utilizes unsupervised viewpoint synthesis loss and stereo cycle consistency loss to ensure the geometric structure and depth information of a stereo image to realize stereo image redirection, and the detailed description is as follows:
an unsupervised stereo image redirection method, the method comprising the steps of:
acquiring an attention diagram of a three-dimensional image by using a multi-stage attention generation module;
constructing viewpoint synthesis loss by using the correlation between viewpoints of the left image and the right image;
constructing a three-dimensional cycle consistency loss by utilizing the consistency of the three-dimensional images before and after redirection;
and constructing a total loss function based on viewpoint synthesis loss and stereoscopic circulation consistency loss, and training unsupervised stereoscopic image redirection through the total loss function to obtain a redirected stereoscopic image.
Wherein the viewpoint synthesis loss is used to facilitate generation of a high quality target stereoscopic image having an accurate inter-viewpoint relationship; and (3) stereo cycle consistency loss, wherein the significant information and parallax relation used for encouraging the reconstructed image are similar to the corresponding original image.
Further, the viewpoint synthesis loss is:
Figure BDA0002851496250000021
wherein H 'and W' represent the width and height of the reoriented stereoscopic image, D t (u, v) is a disparity map of the target stereo image, (u, v) is pixel coordinates in the image,
Figure BDA0002851496250000022
a pixel representing a left image of the object,
Figure BDA0002851496250000023
the pixels representing the right image of the object,
Figure BDA0002851496250000024
represented as a composite target left image obtained by image deformation
Figure BDA0002851496250000025
The stereo cyclic consistency loss consists of a content consistency item and a parallax consistency item, and the stereo cyclic consistency loss L SL The definition is as follows:
L SL =L CL +κL DL
wherein L is CL Representing content consistency items, L DL A disparity consistency item is represented, and kappa represents weight parameters of shape consistency and depth consistency;
the parallax consistency item is used for constructing parallax constraint between the deformed reconstructed stereo image and the original stereo image, the parallax between the reconstructed left and right images is close to the parallax between the original left and right images, and the parallax consistency item L DL Is defined as:
Figure BDA0002851496250000031
wherein H is the width of the original stereoscopic image, W is the height of the original stereoscopic image,
Figure BDA0002851496250000032
in the form of the original left image,
Figure BDA0002851496250000033
in the form of the original right image,
Figure BDA0002851496250000034
and
Figure BDA0002851496250000035
to re-enter the target image into the reconstructed left and right images obtained in the proposed unsupervised stereo image retargeting model, (u, v) are the pixel coordinates in the image.
The technical scheme provided by the invention has the beneficial effects that:
1. the method can uniformly zoom the background area while keeping the structure of the significant object in the stereo image, effectively keep the parallax consistency of the stereo image and obtain the high-quality redirected stereo image;
2. the invention solves the problem of three-dimensional image redirection in an unsupervised deep learning mode, positions the salient objects in an unsupervised mode and learns the three-dimensional relationship to obtain the target three-dimensional images with consistent structures, and retains the original depth values of the three-dimensional scenes.
Drawings
FIG. 1 is a flow chart of an unsupervised stereo image redirection method;
fig. 2 is a diagram illustrating depth distortion score versus result of a redirected stereo image.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
The embodiment of the invention designs an unsupervised stereo image redirection method. The method consists of three parts: the multi-level attention generation module extracts high-level semantic features to obtain information extraction of the salient region; viewpoint synthesis loss is used to generate high quality redirected stereo images; the stereo cyclic consistency loss is used to guarantee the geometry and depth information of the stereo image. The stereo image redirection method realizes the self-adaptive adjustment of the resolution of the image, and maintains the structure and parallax consistency relation of the salient objects in the image, which is described in detail in the following:
An unsupervised stereo image redirection method, see fig. 1, comprising the steps of:
step 1: acquiring an attention diagram of a three-dimensional image by using a multi-stage attention generation module;
in order to obtain the information of the salient objects in the image, the embodiment of the invention designs the attention diagram of the image acquired by the multi-stage attention generating module. The module consists of a basic feature extraction network and a plurality of attention modules, wherein the basic feature extraction network adopts an encoding and decoding structure, obtains multilayer features through a plurality of convolution layers, and recovers feature space information by utilizing a plurality of deconvolution layers. The encoder architecture is based primarily on VGG-16 capturing high-level feature maps, which the decoder network upsamples to preserve the original resolution of the input image. Furthermore, three convolution block attention models are inserted into the encoder of the basic feature extraction network to adequately learn salient objects from coarse to fine.
Specifically, the attention maps are generated by fusing the three feature maps generated by Conv 3-3, Conv 4-3 and Conv 5-3 with the convolution block attention model. Then, a plurality of deconvolution upsampling is carried out on the three attention diagrams, the attention diagrams are restored to the original space dimension and further fused, and a final attention diagram is generated. After the attention map is acquired, the attention map is input to an offset layer, and the reorientation of the stereo image in the depth feature space is realized.
And 2, step: constructing viewpoint synthesis loss by using the correlation between viewpoints of the left image and the right image;
to describe inter-viewpoint correlation between left and right images and further supervise inter-viewpoint relation of a target stereoscopic image. The embodiment of the invention deforms the target right image and the target disparity map to obtain the synthesized target left image, and promotes the synthesized target left image to be consistent with the content and the depth information of the original target left image in the stereo image as far as possible. Suppose that
Figure BDA0002851496250000041
A pixel representing a left image of the object,
Figure BDA0002851496250000042
the pixels representing the right image of the object,
Figure BDA0002851496250000043
represented as a composite target left image obtained by image deformation
Figure BDA0002851496250000044
Wherein the view point synthesis loss function L VL The definition is as follows:
Figure BDA0002851496250000045
wherein H 'and W' represent the width and height of the reoriented stereoscopic image, D t (u, v) is a disparity map of the target stereoscopic image, and (u, v) is a pixel coordinate in the image.
And step 3: and constructing a stereo cycle consistency loss by utilizing the consistency of the stereo images before and after redirection.
The goal of stereo image redirection is to maintain the original shape of visually salient objects while minimizing visual distortion. Furthermore, it is also necessary to maintain depth perception between the input and output stereo images.
Therefore, the embodiment of the invention designs the consistency loss of the stereo circulation by utilizing the consistency of the stereo images before and after the redirection, keeps the global structure of the stereo images and enhances the 3D visual experience. The stereo cyclic consistency loss consists of a content consistency term and a disparity consistency term.
Stereo cyclic consistency loss L SL The definition is as follows: l is a radical of an alcohol SL =L CL +κL DL (2)
Wherein L is CL Representing content consistency items, L DL The term "disparity consistency" is expressed, and κ represents the weight parameters of shape consistency and depth consistency.
1) Content consistency item
In order to maintain the shape structure of the salient object, the embodiment of the invention designs a content consistency item to evaluate the similarity between the reconstructed image and the original image. It is an object of embodiments of the present invention that when modifying the aspect ratio of an original stereo image, the reconstructed stereo image should be similar to the original stereo image. In this term, Structural Similarity (SSIM) and the L1 norm combine as a content consistency term,for comparing the original image with its reconstructed image. Content consistency item L CL Is defined as:
Figure BDA0002851496250000051
Figure BDA0002851496250000052
wherein the content of the first and second substances,
Figure BDA0002851496250000053
and
Figure BDA0002851496250000054
representing original left and right images, H and W representing the width and height of the original stereoscopic image,
Figure BDA0002851496250000055
and
Figure BDA0002851496250000056
representing reconstructed left and right images obtained by re-entering the target image into the proposed unsupervised stereo image retargeting model, S (-) representing structural similarity, η representing a weight factor,
Figure BDA0002851496250000057
For the content consistency item of the left image,
Figure BDA0002851496250000058
is the content consistency item for the right image.
2) Disparity consistency term
In order to maintain the parallax relationship of the stereoscopic image and obtain a 3D visual experience similar to that of the original stereoscopic image, the embodiment of the present invention designs a parallax consistency item based on parallax clues. The disparity consistency term is used to construct disparity constraints between the deformed reconstructed stereoscopic image and the original stereoscopic image. Specifically, between the reconstructed left and right imagesThe disparity should be close to the disparity between the original left and right images. Disparity consistency term L DL Is defined as:
Figure BDA0002851496250000061
in summary, the total loss function L is used total And training an unsupervised stereo image redirection model (namely a deep learning model consisting of a multi-stage attention generation module, a shift layer, viewpoint synthesis loss and stereo cycle consistency loss) to obtain a redirected stereo image. Total loss function L total The method is composed of the two loss functions:
L total =L SL +αL VL (6)
wherein L is SL For stereo cyclic consistency loss that encourages the similarity of the saliency information and disparity relationships of the reconstructed images to the corresponding original images, L VL To view synthesis penalty, which facilitates the generation of high quality target stereo images with more precise inter-view relationships, α is a weighting factor.
Fig. 2 lists depth distortion score comparison results for a post-reorientation stereo image, the comparison algorithm comprising: a WSSDCNN method and a DPS method, wherein the WSSDCNN method is a 2D image redirection algorithm and the DPS method is a stereoscopic image redirection algorithm. The smaller the depth distortion, the higher the stereoscopic experience. As shown, the WSSDCNN method introduces large depth distortion due to the lack of information supplementation between the left and right images. In addition, the depth distortion results for the DPS method are higher than those of the present invention because the method cannot maintain geometric consistency in some test data. As can be seen from fig. 2, this method can achieve better redirection results with accurate inter-viewpoint relationships and disparity through viewpoint synthesis and stereo cycle consistency loss.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (3)

1. An unsupervised stereo image redirection method, characterized in that the method comprises the steps of:
acquiring an attention diagram of the stereo image by using a multi-stage attention generation module, inputting the attention diagram into an offset layer, and realizing the redirection of the stereo image in a depth feature space;
constructing viewpoint synthesis loss by using the correlation between viewpoints of the left image and the right image, and deforming the target right image and the target disparity map to obtain a synthesized target left image and promote the content and depth information of the synthesized target left image to be consistent with those of the original target left image in the stereo image;
constructing a three-dimensional cycle consistency loss by utilizing the consistency of the three-dimensional images before and after redirection;
constructing a total loss function based on viewpoint synthesis loss and stereoscopic circulation consistency loss, and training an unsupervised stereoscopic image redirection model by using the total loss function to obtain a redirected stereoscopic image, wherein the unsupervised stereoscopic image redirection model is a deep learning model consisting of a multi-stage attention generation module, a shift layer, viewpoint synthesis loss and stereoscopic circulation consistency loss;
wherein the content of the first and second substances,
the viewpoint synthesis loss is:
Figure FDA0003699427010000011
wherein H 'and W' represent the width and height of the reoriented stereoscopic image, D t (u, v) is a disparity map of the target stereo image,(u, v) are pixel coordinates in the image,
Figure FDA0003699427010000012
a pixel representing a left image of the object,
Figure FDA0003699427010000013
the pixels representing the right image of the object,
Figure FDA0003699427010000014
represented as a composite target left image obtained by image deformation
Figure FDA0003699427010000015
Wherein the stereo cycle consistency loss is composed of a content consistency item and a parallax consistency item, and the stereo cycle consistency loss L SL The definition is as follows:
L SL =L CL +κL DL
wherein L is CL Representing content consistency items, L DL A disparity consistency item is represented, and kappa represents weight parameters of shape consistency and depth consistency;
Figure FDA0003699427010000016
h and W represent the width and height of the original stereoscopic image,
Figure FDA0003699427010000017
for the content consistency item of the left image,
Figure FDA0003699427010000018
is the content consistency item for the right image.
2. An unsupervised stereo image redirection method according to claim 1,
a viewpoint synthesis penalty for facilitating generation of a high quality target stereoscopic image having an accurate inter-viewpoint relationship;
and (3) stereo cycle consistency loss, wherein the significant information and parallax relation used for encouraging the reconstructed image are similar to the corresponding original image.
3. An unsupervised stereo image redirection method according to claim 1 or 2, wherein the disparity consistency term is used to construct a disparity constraint between the deformed reconstructed stereo image and the original stereo image, the disparity between the reconstructed left and right images is close to the disparity between the original left and right images, and the disparity consistency term L is used to determine the disparity between the original left and right images DL Is defined as:
Figure FDA0003699427010000021
wherein H is the width of the original stereoscopic image, W is the height of the original stereoscopic image,
Figure FDA0003699427010000022
in the form of the original left image,
Figure FDA0003699427010000023
in the form of the original right image,
Figure FDA0003699427010000024
and
Figure FDA0003699427010000025
to re-enter the target image into the reconstructed left and right images obtained in the proposed unsupervised stereo image retargeting model, (u, v) are the pixel coordinates in the image.
CN202011528334.8A 2020-12-22 2020-12-22 Unsupervised stereo image redirection method Active CN112634127B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011528334.8A CN112634127B (en) 2020-12-22 2020-12-22 Unsupervised stereo image redirection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011528334.8A CN112634127B (en) 2020-12-22 2020-12-22 Unsupervised stereo image redirection method

Publications (2)

Publication Number Publication Date
CN112634127A CN112634127A (en) 2021-04-09
CN112634127B true CN112634127B (en) 2022-07-29

Family

ID=75321445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011528334.8A Active CN112634127B (en) 2020-12-22 2020-12-22 Unsupervised stereo image redirection method

Country Status (1)

Country Link
CN (1) CN112634127B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113506217B (en) * 2021-07-09 2022-08-16 天津大学 Three-dimensional image super-resolution reconstruction method based on cyclic interaction
CN113516698B (en) * 2021-07-23 2023-11-17 香港中文大学(深圳) Indoor space depth estimation method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011026850A1 (en) * 2009-09-01 2011-03-10 Markus Gross Method for art-directable retargeting for streaming video
CN106504186A (en) * 2016-09-30 2017-03-15 天津大学 A kind of stereo-picture reorientation method
CN108537806A (en) * 2018-04-17 2018-09-14 福州大学 A kind of stereo-picture line clipping reorientation method based on cumlative energy
CN111724459A (en) * 2020-06-22 2020-09-29 合肥工业大学 Method and system for reorienting movement facing heterogeneous human skeleton

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9445072B2 (en) * 2009-11-11 2016-09-13 Disney Enterprises, Inc. Synthesizing views based on image domain warping
US20150022631A1 (en) * 2013-07-17 2015-01-22 Htc Corporation Content-aware display adaptation methods and editing interfaces and methods for stereoscopic images
CN106447702B (en) * 2016-08-31 2019-05-31 天津大学 A kind of Stereo image matching figure calculation method
CN107610110B (en) * 2017-09-08 2020-09-25 北京工业大学 Global and local feature combined cross-scale image quality evaluation method
EP3740847B1 (en) * 2018-01-17 2024-03-13 Magic Leap, Inc. Display systems and methods for determining registration between a display and a user's eyes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011026850A1 (en) * 2009-09-01 2011-03-10 Markus Gross Method for art-directable retargeting for streaming video
CN106504186A (en) * 2016-09-30 2017-03-15 天津大学 A kind of stereo-picture reorientation method
CN108537806A (en) * 2018-04-17 2018-09-14 福州大学 A kind of stereo-picture line clipping reorientation method based on cumlative energy
CN111724459A (en) * 2020-06-22 2020-09-29 合肥工业大学 Method and system for reorienting movement facing heterogeneous human skeleton

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Cycle-IR: Deep Cyclic Image Retargeting;Weimin Tan,et al;《arXiv:1905.03556v1》;20190509;正文第1-12页 *
Recycle-GAN: Unsupervised Video Retargeting;Aayush Bansal,et al;《ECCV2018》;20181231;正文第1-17页 *
三维可视媒体重定向方法研究;林文崇;《中国优秀硕士学位论文全文数据库》;20180215;I138-2284 *
应用残差稠密网络的无监督单幅图像深度估计;马利 等;《小型微型计算机系统》;20191130;正文第2439-2444页 *

Also Published As

Publication number Publication date
CN112634127A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
CN100576934C (en) Virtual visual point synthesizing method based on the degree of depth and block information
Cao et al. Semi-automatic 2D-to-3D conversion using disparity propagation
CN111028150B (en) Rapid space-time residual attention video super-resolution reconstruction method
CN101588445B (en) Video area-of-interest exacting method based on depth
US9485497B2 (en) Systems and methods for converting two-dimensional images into three-dimensional images
EP2595116A1 (en) Method for generating depth maps for converting moving 2d images to 3d
US9165401B1 (en) Multi-perspective stereoscopy from light fields
Lee et al. Discontinuity-adaptive depth map filtering for 3D view generation
CN112634127B (en) Unsupervised stereo image redirection method
CN110113593B (en) Wide baseline multi-view video synthesis method based on convolutional neural network
CN112019828B (en) Method for converting 2D (two-dimensional) video into 3D video
CN111899295B (en) Monocular scene depth prediction method based on deep learning
CN111950477A (en) Single-image three-dimensional face reconstruction method based on video surveillance
CN115512073A (en) Three-dimensional texture grid reconstruction method based on multi-stage training under differentiable rendering
CN115298708A (en) Multi-view neural human body rendering
Lu et al. A survey on multiview video synthesis and editing
CN104661014A (en) Space-time combined cavity filling method
Liu et al. Learning-based stereoscopic view synthesis with cascaded deep neural networks
Jammal et al. Multiview video quality enhancement without depth information
Evain et al. A lightweight neural network for monocular view generation with occlusion handling
Zhang et al. SivsFormer: Parallax-aware transformers for single-image-based view synthesis
Li et al. Point-based neural scene rendering for street views
Liu et al. Stereoscopic view synthesis based on region-wise rendering and sparse representation
CN112907641B (en) Multi-view depth estimation method based on detail information retention
Fan et al. A comprehensive review of image retargeting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant