CN112634127B - Unsupervised stereo image redirection method - Google Patents
Unsupervised stereo image redirection method Download PDFInfo
- Publication number
- CN112634127B CN112634127B CN202011528334.8A CN202011528334A CN112634127B CN 112634127 B CN112634127 B CN 112634127B CN 202011528334 A CN202011528334 A CN 202011528334A CN 112634127 B CN112634127 B CN 112634127B
- Authority
- CN
- China
- Prior art keywords
- image
- consistency
- loss
- stereo
- redirection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 19
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 19
- 230000006870 function Effects 0.000 claims abstract description 11
- 238000010586 diagram Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims abstract description 4
- 239000002131 composite material Substances 0.000 claims description 3
- 238000013136 deep learning model Methods 0.000 claims description 2
- 239000000126 substance Substances 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 abstract description 6
- 239000000284 extract Substances 0.000 abstract description 3
- 125000004122 cyclic group Chemical group 0.000 description 6
- 238000011437 continuous method Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 5
- 238000011438 discrete method Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000008447 perception Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009469 supplementation Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G06T3/04—
Abstract
The invention discloses an unsupervised stereo image redirection method, which comprises the following steps: acquiring an attention diagram of the three-dimensional image by using a multi-stage attention generating module; constructing viewpoint synthesis loss by using the correlation between viewpoints of the left image and the right image; constructing a three-dimensional cycle consistency loss by utilizing the consistency of the three-dimensional images before and after redirection; and constructing a total loss function based on viewpoint synthesis loss and stereoscopic circulation consistency loss, and training unsupervised stereoscopic image redirection through the total loss function to obtain a redirected stereoscopic image. The invention utilizes an unsupervised deep learning mode, adopts a multi-level attention generation module to extract high-level characteristics, extracts information of a salient region, and utilizes unsupervised viewpoint synthesis loss and three-dimensional cycle consistency loss to ensure the geometric structure and depth information of a three-dimensional image and realize the redirection of the three-dimensional image.
Description
Technical Field
The invention relates to the technical field of image processing and stereoscopic vision, in particular to an unsupervised stereoscopic image redirection method.
Background
Stereoscopic images can provide an immersive visual experience and are of great interest to the industry and academia. As different stereoscopic display devices grow, there is a need to display stereoscopic images and video on devices of different resolutions and target aspect ratios. The stereo image redirection technology aims at intelligently processing multimedia contents of display equipment, enables the multimedia contents to adapt to screens of different sizes, and can be widely applied to the fields of virtual reality, human-computer interaction and the like.
Currently, 2D image redirection methods are classified into discrete methods and continuous methods. The discrete method is to change the original size by removing or inserting pixels that contribute the least amount of energy to the image. However, this approach tends to cause discontinuity artifacts in the salient content, resulting in visual distortion. In contrast, the continuous method, which uses a quadrilateral mesh or a triangular mesh to maintain a salient region, achieves image scaling by calculating an optimal non-uniform mesh deformation. However, the continuous method may cause deformation of the salient region. In recent years, 2D image retargeting algorithms based on deep learning have been developed. For example, Cho et al propose an image redirection method based on a deep convolutional neural network, which utilizes a codec model learning attention map and designs a content-aware based offset layer warped image. Lin et al propose a framework for image redirection from coarse to fine and redirect feature maps to target size with uniform resampling on each convolution layer. The existing 2D image reorientation model based on deep learning shows that the deep learning has good performance in the aspects of understanding the salient content of the image and extracting the interested region.
Compared with conventional 2D image redirection, stereoscopic image redirection not only needs to avoid image content and shape distortion, but also needs to ensure parallax consistency of the stereoscopic image. Stereoscopic image redirection is also classified into two categories, namely discrete methods and continuous methods. The discrete method addresses the stereoscopic image redirection problem by consistently removing seams of uniform regions in the left and right images. For example, Utsugi et al and Basha et al extend the Seam-carving algorithm for 2D images into the stereoscopic image retargeting task by introducing depth constraints. The continuous method realizes image scaling by optimizing deformed grids in the stereo image. Chang et al define stereo image retargeting as an energy minimization problem and handle left and right image deformation by sparse stereo correspondences in the mesh deformation field.
In the process of implementing the invention, the inventor finds that at least the following disadvantages and shortcomings exist in the prior art:
in the method in the prior art, a 2D image redirection technology is utilized to process a stereo image, parallax information of the stereo image is ignored, deformation of a salient region is inconsistent, and depth perception of a 3D scene is further weakened; existing discrete-based stereo image redirection methods may cause distortion in the shape and content of the stereo image, while continuous-based stereo image redirection methods typically cause parallax distortion of the stereo image.
Disclosure of Invention
The invention provides an unsupervised stereo image redirection method, which utilizes an unsupervised deep learning mode, adopts a multi-level attention generation module to extract high-level features, extracts information of a salient region, and utilizes unsupervised viewpoint synthesis loss and stereo cycle consistency loss to ensure the geometric structure and depth information of a stereo image to realize stereo image redirection, and the detailed description is as follows:
an unsupervised stereo image redirection method, the method comprising the steps of:
acquiring an attention diagram of a three-dimensional image by using a multi-stage attention generation module;
constructing viewpoint synthesis loss by using the correlation between viewpoints of the left image and the right image;
constructing a three-dimensional cycle consistency loss by utilizing the consistency of the three-dimensional images before and after redirection;
and constructing a total loss function based on viewpoint synthesis loss and stereoscopic circulation consistency loss, and training unsupervised stereoscopic image redirection through the total loss function to obtain a redirected stereoscopic image.
Wherein the viewpoint synthesis loss is used to facilitate generation of a high quality target stereoscopic image having an accurate inter-viewpoint relationship; and (3) stereo cycle consistency loss, wherein the significant information and parallax relation used for encouraging the reconstructed image are similar to the corresponding original image.
Further, the viewpoint synthesis loss is:
wherein H 'and W' represent the width and height of the reoriented stereoscopic image, D t (u, v) is a disparity map of the target stereo image, (u, v) is pixel coordinates in the image,a pixel representing a left image of the object,the pixels representing the right image of the object,represented as a composite target left image obtained by image deformation
The stereo cyclic consistency loss consists of a content consistency item and a parallax consistency item, and the stereo cyclic consistency loss L SL The definition is as follows:
L SL =L CL +κL DL
wherein L is CL Representing content consistency items, L DL A disparity consistency item is represented, and kappa represents weight parameters of shape consistency and depth consistency;
the parallax consistency item is used for constructing parallax constraint between the deformed reconstructed stereo image and the original stereo image, the parallax between the reconstructed left and right images is close to the parallax between the original left and right images, and the parallax consistency item L DL Is defined as:
wherein H is the width of the original stereoscopic image, W is the height of the original stereoscopic image,in the form of the original left image,in the form of the original right image,andto re-enter the target image into the reconstructed left and right images obtained in the proposed unsupervised stereo image retargeting model, (u, v) are the pixel coordinates in the image.
The technical scheme provided by the invention has the beneficial effects that:
1. the method can uniformly zoom the background area while keeping the structure of the significant object in the stereo image, effectively keep the parallax consistency of the stereo image and obtain the high-quality redirected stereo image;
2. the invention solves the problem of three-dimensional image redirection in an unsupervised deep learning mode, positions the salient objects in an unsupervised mode and learns the three-dimensional relationship to obtain the target three-dimensional images with consistent structures, and retains the original depth values of the three-dimensional scenes.
Drawings
FIG. 1 is a flow chart of an unsupervised stereo image redirection method;
fig. 2 is a diagram illustrating depth distortion score versus result of a redirected stereo image.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
The embodiment of the invention designs an unsupervised stereo image redirection method. The method consists of three parts: the multi-level attention generation module extracts high-level semantic features to obtain information extraction of the salient region; viewpoint synthesis loss is used to generate high quality redirected stereo images; the stereo cyclic consistency loss is used to guarantee the geometry and depth information of the stereo image. The stereo image redirection method realizes the self-adaptive adjustment of the resolution of the image, and maintains the structure and parallax consistency relation of the salient objects in the image, which is described in detail in the following:
An unsupervised stereo image redirection method, see fig. 1, comprising the steps of:
step 1: acquiring an attention diagram of a three-dimensional image by using a multi-stage attention generation module;
in order to obtain the information of the salient objects in the image, the embodiment of the invention designs the attention diagram of the image acquired by the multi-stage attention generating module. The module consists of a basic feature extraction network and a plurality of attention modules, wherein the basic feature extraction network adopts an encoding and decoding structure, obtains multilayer features through a plurality of convolution layers, and recovers feature space information by utilizing a plurality of deconvolution layers. The encoder architecture is based primarily on VGG-16 capturing high-level feature maps, which the decoder network upsamples to preserve the original resolution of the input image. Furthermore, three convolution block attention models are inserted into the encoder of the basic feature extraction network to adequately learn salient objects from coarse to fine.
Specifically, the attention maps are generated by fusing the three feature maps generated by Conv 3-3, Conv 4-3 and Conv 5-3 with the convolution block attention model. Then, a plurality of deconvolution upsampling is carried out on the three attention diagrams, the attention diagrams are restored to the original space dimension and further fused, and a final attention diagram is generated. After the attention map is acquired, the attention map is input to an offset layer, and the reorientation of the stereo image in the depth feature space is realized.
And 2, step: constructing viewpoint synthesis loss by using the correlation between viewpoints of the left image and the right image;
to describe inter-viewpoint correlation between left and right images and further supervise inter-viewpoint relation of a target stereoscopic image. The embodiment of the invention deforms the target right image and the target disparity map to obtain the synthesized target left image, and promotes the synthesized target left image to be consistent with the content and the depth information of the original target left image in the stereo image as far as possible. Suppose thatA pixel representing a left image of the object,the pixels representing the right image of the object,represented as a composite target left image obtained by image deformation
Wherein the view point synthesis loss function L VL The definition is as follows:
wherein H 'and W' represent the width and height of the reoriented stereoscopic image, D t (u, v) is a disparity map of the target stereoscopic image, and (u, v) is a pixel coordinate in the image.
And step 3: and constructing a stereo cycle consistency loss by utilizing the consistency of the stereo images before and after redirection.
The goal of stereo image redirection is to maintain the original shape of visually salient objects while minimizing visual distortion. Furthermore, it is also necessary to maintain depth perception between the input and output stereo images.
Therefore, the embodiment of the invention designs the consistency loss of the stereo circulation by utilizing the consistency of the stereo images before and after the redirection, keeps the global structure of the stereo images and enhances the 3D visual experience. The stereo cyclic consistency loss consists of a content consistency term and a disparity consistency term.
Stereo cyclic consistency loss L SL The definition is as follows: l is a radical of an alcohol SL =L CL +κL DL (2)
Wherein L is CL Representing content consistency items, L DL The term "disparity consistency" is expressed, and κ represents the weight parameters of shape consistency and depth consistency.
1) Content consistency item
In order to maintain the shape structure of the salient object, the embodiment of the invention designs a content consistency item to evaluate the similarity between the reconstructed image and the original image. It is an object of embodiments of the present invention that when modifying the aspect ratio of an original stereo image, the reconstructed stereo image should be similar to the original stereo image. In this term, Structural Similarity (SSIM) and the L1 norm combine as a content consistency term,for comparing the original image with its reconstructed image. Content consistency item L CL Is defined as:
wherein the content of the first and second substances,andrepresenting original left and right images, H and W representing the width and height of the original stereoscopic image,andrepresenting reconstructed left and right images obtained by re-entering the target image into the proposed unsupervised stereo image retargeting model, S (-) representing structural similarity, η representing a weight factor, For the content consistency item of the left image,is the content consistency item for the right image.
2) Disparity consistency term
In order to maintain the parallax relationship of the stereoscopic image and obtain a 3D visual experience similar to that of the original stereoscopic image, the embodiment of the present invention designs a parallax consistency item based on parallax clues. The disparity consistency term is used to construct disparity constraints between the deformed reconstructed stereoscopic image and the original stereoscopic image. Specifically, between the reconstructed left and right imagesThe disparity should be close to the disparity between the original left and right images. Disparity consistency term L DL Is defined as:
in summary, the total loss function L is used total And training an unsupervised stereo image redirection model (namely a deep learning model consisting of a multi-stage attention generation module, a shift layer, viewpoint synthesis loss and stereo cycle consistency loss) to obtain a redirected stereo image. Total loss function L total The method is composed of the two loss functions:
L total =L SL +αL VL (6)
wherein L is SL For stereo cyclic consistency loss that encourages the similarity of the saliency information and disparity relationships of the reconstructed images to the corresponding original images, L VL To view synthesis penalty, which facilitates the generation of high quality target stereo images with more precise inter-view relationships, α is a weighting factor.
Fig. 2 lists depth distortion score comparison results for a post-reorientation stereo image, the comparison algorithm comprising: a WSSDCNN method and a DPS method, wherein the WSSDCNN method is a 2D image redirection algorithm and the DPS method is a stereoscopic image redirection algorithm. The smaller the depth distortion, the higher the stereoscopic experience. As shown, the WSSDCNN method introduces large depth distortion due to the lack of information supplementation between the left and right images. In addition, the depth distortion results for the DPS method are higher than those of the present invention because the method cannot maintain geometric consistency in some test data. As can be seen from fig. 2, this method can achieve better redirection results with accurate inter-viewpoint relationships and disparity through viewpoint synthesis and stereo cycle consistency loss.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (3)
1. An unsupervised stereo image redirection method, characterized in that the method comprises the steps of:
acquiring an attention diagram of the stereo image by using a multi-stage attention generation module, inputting the attention diagram into an offset layer, and realizing the redirection of the stereo image in a depth feature space;
constructing viewpoint synthesis loss by using the correlation between viewpoints of the left image and the right image, and deforming the target right image and the target disparity map to obtain a synthesized target left image and promote the content and depth information of the synthesized target left image to be consistent with those of the original target left image in the stereo image;
constructing a three-dimensional cycle consistency loss by utilizing the consistency of the three-dimensional images before and after redirection;
constructing a total loss function based on viewpoint synthesis loss and stereoscopic circulation consistency loss, and training an unsupervised stereoscopic image redirection model by using the total loss function to obtain a redirected stereoscopic image, wherein the unsupervised stereoscopic image redirection model is a deep learning model consisting of a multi-stage attention generation module, a shift layer, viewpoint synthesis loss and stereoscopic circulation consistency loss;
wherein the content of the first and second substances,
the viewpoint synthesis loss is:
wherein H 'and W' represent the width and height of the reoriented stereoscopic image, D t (u, v) is a disparity map of the target stereo image,(u, v) are pixel coordinates in the image,a pixel representing a left image of the object,the pixels representing the right image of the object,represented as a composite target left image obtained by image deformation
Wherein the stereo cycle consistency loss is composed of a content consistency item and a parallax consistency item, and the stereo cycle consistency loss L SL The definition is as follows:
L SL =L CL +κL DL
wherein L is CL Representing content consistency items, L DL A disparity consistency item is represented, and kappa represents weight parameters of shape consistency and depth consistency;
2. An unsupervised stereo image redirection method according to claim 1,
a viewpoint synthesis penalty for facilitating generation of a high quality target stereoscopic image having an accurate inter-viewpoint relationship;
and (3) stereo cycle consistency loss, wherein the significant information and parallax relation used for encouraging the reconstructed image are similar to the corresponding original image.
3. An unsupervised stereo image redirection method according to claim 1 or 2, wherein the disparity consistency term is used to construct a disparity constraint between the deformed reconstructed stereo image and the original stereo image, the disparity between the reconstructed left and right images is close to the disparity between the original left and right images, and the disparity consistency term L is used to determine the disparity between the original left and right images DL Is defined as:
wherein H is the width of the original stereoscopic image, W is the height of the original stereoscopic image,in the form of the original left image,in the form of the original right image,andto re-enter the target image into the reconstructed left and right images obtained in the proposed unsupervised stereo image retargeting model, (u, v) are the pixel coordinates in the image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011528334.8A CN112634127B (en) | 2020-12-22 | 2020-12-22 | Unsupervised stereo image redirection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011528334.8A CN112634127B (en) | 2020-12-22 | 2020-12-22 | Unsupervised stereo image redirection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112634127A CN112634127A (en) | 2021-04-09 |
CN112634127B true CN112634127B (en) | 2022-07-29 |
Family
ID=75321445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011528334.8A Active CN112634127B (en) | 2020-12-22 | 2020-12-22 | Unsupervised stereo image redirection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112634127B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113506217B (en) * | 2021-07-09 | 2022-08-16 | 天津大学 | Three-dimensional image super-resolution reconstruction method based on cyclic interaction |
CN113516698B (en) * | 2021-07-23 | 2023-11-17 | 香港中文大学(深圳) | Indoor space depth estimation method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011026850A1 (en) * | 2009-09-01 | 2011-03-10 | Markus Gross | Method for art-directable retargeting for streaming video |
CN106504186A (en) * | 2016-09-30 | 2017-03-15 | 天津大学 | A kind of stereo-picture reorientation method |
CN108537806A (en) * | 2018-04-17 | 2018-09-14 | 福州大学 | A kind of stereo-picture line clipping reorientation method based on cumlative energy |
CN111724459A (en) * | 2020-06-22 | 2020-09-29 | 合肥工业大学 | Method and system for reorienting movement facing heterogeneous human skeleton |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9445072B2 (en) * | 2009-11-11 | 2016-09-13 | Disney Enterprises, Inc. | Synthesizing views based on image domain warping |
US20150022631A1 (en) * | 2013-07-17 | 2015-01-22 | Htc Corporation | Content-aware display adaptation methods and editing interfaces and methods for stereoscopic images |
CN106447702B (en) * | 2016-08-31 | 2019-05-31 | 天津大学 | A kind of Stereo image matching figure calculation method |
CN107610110B (en) * | 2017-09-08 | 2020-09-25 | 北京工业大学 | Global and local feature combined cross-scale image quality evaluation method |
EP3740847B1 (en) * | 2018-01-17 | 2024-03-13 | Magic Leap, Inc. | Display systems and methods for determining registration between a display and a user's eyes |
-
2020
- 2020-12-22 CN CN202011528334.8A patent/CN112634127B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011026850A1 (en) * | 2009-09-01 | 2011-03-10 | Markus Gross | Method for art-directable retargeting for streaming video |
CN106504186A (en) * | 2016-09-30 | 2017-03-15 | 天津大学 | A kind of stereo-picture reorientation method |
CN108537806A (en) * | 2018-04-17 | 2018-09-14 | 福州大学 | A kind of stereo-picture line clipping reorientation method based on cumlative energy |
CN111724459A (en) * | 2020-06-22 | 2020-09-29 | 合肥工业大学 | Method and system for reorienting movement facing heterogeneous human skeleton |
Non-Patent Citations (4)
Title |
---|
Cycle-IR: Deep Cyclic Image Retargeting;Weimin Tan,et al;《arXiv:1905.03556v1》;20190509;正文第1-12页 * |
Recycle-GAN: Unsupervised Video Retargeting;Aayush Bansal,et al;《ECCV2018》;20181231;正文第1-17页 * |
三维可视媒体重定向方法研究;林文崇;《中国优秀硕士学位论文全文数据库》;20180215;I138-2284 * |
应用残差稠密网络的无监督单幅图像深度估计;马利 等;《小型微型计算机系统》;20191130;正文第2439-2444页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112634127A (en) | 2021-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100576934C (en) | Virtual visual point synthesizing method based on the degree of depth and block information | |
Cao et al. | Semi-automatic 2D-to-3D conversion using disparity propagation | |
CN111028150B (en) | Rapid space-time residual attention video super-resolution reconstruction method | |
CN101588445B (en) | Video area-of-interest exacting method based on depth | |
US9485497B2 (en) | Systems and methods for converting two-dimensional images into three-dimensional images | |
EP2595116A1 (en) | Method for generating depth maps for converting moving 2d images to 3d | |
US9165401B1 (en) | Multi-perspective stereoscopy from light fields | |
Lee et al. | Discontinuity-adaptive depth map filtering for 3D view generation | |
CN112634127B (en) | Unsupervised stereo image redirection method | |
CN110113593B (en) | Wide baseline multi-view video synthesis method based on convolutional neural network | |
CN112019828B (en) | Method for converting 2D (two-dimensional) video into 3D video | |
CN111899295B (en) | Monocular scene depth prediction method based on deep learning | |
CN111950477A (en) | Single-image three-dimensional face reconstruction method based on video surveillance | |
CN115512073A (en) | Three-dimensional texture grid reconstruction method based on multi-stage training under differentiable rendering | |
CN115298708A (en) | Multi-view neural human body rendering | |
Lu et al. | A survey on multiview video synthesis and editing | |
CN104661014A (en) | Space-time combined cavity filling method | |
Liu et al. | Learning-based stereoscopic view synthesis with cascaded deep neural networks | |
Jammal et al. | Multiview video quality enhancement without depth information | |
Evain et al. | A lightweight neural network for monocular view generation with occlusion handling | |
Zhang et al. | SivsFormer: Parallax-aware transformers for single-image-based view synthesis | |
Li et al. | Point-based neural scene rendering for street views | |
Liu et al. | Stereoscopic view synthesis based on region-wise rendering and sparse representation | |
CN112907641B (en) | Multi-view depth estimation method based on detail information retention | |
Fan et al. | A comprehensive review of image retargeting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |