CN108886598B - Compression method and device of panoramic stereo video system - Google Patents
Compression method and device of panoramic stereo video system Download PDFInfo
- Publication number
- CN108886598B CN108886598B CN201680078558.9A CN201680078558A CN108886598B CN 108886598 B CN108886598 B CN 108886598B CN 201680078558 A CN201680078558 A CN 201680078558A CN 108886598 B CN108886598 B CN 108886598B
- Authority
- CN
- China
- Prior art keywords
- value
- saliency
- determining
- partition
- saliency value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 86
- 230000006835 compression Effects 0.000 title description 18
- 238000007906 compression Methods 0.000 title description 18
- 230000033001 locomotion Effects 0.000 claims abstract description 87
- 238000005192 partition Methods 0.000 claims abstract description 40
- 238000013139 quantization Methods 0.000 claims abstract description 29
- 230000009466 transformation Effects 0.000 claims 2
- 238000010586 diagram Methods 0.000 description 16
- 238000012545 processing Methods 0.000 description 16
- 239000013598 vector Substances 0.000 description 16
- 230000002123 temporal effect Effects 0.000 description 11
- 230000000007 visual effect Effects 0.000 description 11
- 230000008569 process Effects 0.000 description 9
- 238000001514 detection method Methods 0.000 description 7
- 238000007781 pre-processing Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 4
- 230000000875 corresponding effect Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003702 image correction Methods 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000005549 size reduction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/18—Image warping, e.g. rearranging pixels individually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
- G06T7/337—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/74—Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
- G06T7/85—Stereo camera calibration
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/128—Adjusting depth or disparity
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/239—Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/243—Image signal generators using stereoscopic image cameras using three or more 2D image sensors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/246—Calibration of cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/282—Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
- H04N19/126—Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/137—Motion inside a coding unit, e.g. average field, frame or block difference
- H04N19/139—Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
- H04N19/159—Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/523—Motion estimation or motion compensation with sub-pixel accuracy
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/698—Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
- G06T2207/10021—Stereoscopic video; Stereoscopic image sequence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N2013/0074—Stereoscopic image analysis
- H04N2013/0081—Depth or disparity estimation from stereoscopic image signals
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Studio Devices (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
There is provided a method of compressing stereoscopic video including left-view frames and right-view frames, the method comprising: determining a texture saliency value of a first partition within the left view frame by intra prediction (1101); determining a motion saliency value (1102) for the first partition by motion estimation; determining a disparity saliency value between the first partition and a respective second partition within the right view frame (1103); determining a quantization parameter (1104) from the disparity saliency value, the texture saliency value, and the motion saliency value; and quantizing (1105) the first partition according to the quantization parameter.
Description
RELATED APPLICATIONSCross reference to
The application requires an international patent application with the application number of PCT/CN2016/070712, the name of the international patent application is 'calibration method and device for a panoramic stereo video system', the application date is 'international patent application with the date of 2016, 1, 12, and the application number of PCT/CN2016/070823, the name of the international patent application is' splicing method and device for the panoramic stereo video system ', and the application date is' rights and priority of the international patent application with the date of 2016, 1, 13. The entire disclosures of both applications are incorporated herein by reference.
Technical Field
The present invention relates to a panoramic stereoscopic video system for photographing, processing, compressing, and displaying a 3D panoramic stereoscopic video, and more particularly, to a method and apparatus for 3D panoramic stereoscopic video compression in a panoramic stereoscopic video system.
Background
The panoramic stereoscopic video system proposed in the above-mentioned document achieves an immersive 3D experience by displaying a stereoscopic panoramic video on a Head Mounted Display (HMD). The resolution and persistence of stereoscopic video are two main features that determine the user experience. The system stitches images taken by 16 high-definition (HD) cameras onto each other to generate a stereoscopic video, and the resolution of each field of view is at least 3840 × 2160 (4K). Since the frame rate of the system is 50fps, it can greatly reduce the motion blur and flicker effect. On the other hand, however, its ultra-high resolution and high refresh rate result in generation of a huge amount of video data, thereby causing a problem to 3D video services and broadcasting.
The video coding efficiency of existing hybrid video coding methods such as h.264, VC-1, HEVC, etc. has been significantly improved over the last decade, and temporal and spatial redundancies in video sequences have been greatly reduced by implementing dense spatio-temporal prediction. Recent advances in 3D technologies such as MV-HEVC and 3D-HEVC have further investigated disparity prediction between different fields of view. However, in order to achieve better compression performance of the stereoscopic panoramic video, it is also necessary to improve subjective video quality by considering human visual characteristics and panorama-oriented characteristics.
In general, a 360 degree panoramic image contains an elongated field of view, and most of the field of view is likely to be background only. However, the user may only be interested in a small portion of the field of view where the color, texture, motion, or depth contrast is significant.
The basic principle of the compression method based on human visual features is that only a small number of selected regions of interest with high priority are encoded to obtain high subjective video quality, while regions of less interest are processed with low priority to save bits. To achieve this, a region that the user is likely to pay attention to is generally predicted using an attention prediction method.
Existing 2D image saliency calculations primarily consider contrast of features such as color, shape, orientation, texture, curvature, etc. In image sequences or video, region of interest detection focuses on motion information that can distinguish the foreground from the background. However, since the existing video compression method does not consider stereoscopic contrast in stereoscopic video, it is not suitable for stereoscopic video. In addition, when the salient object has no visual uniqueness on a spatial level and no motion occurs on a temporal level, the existing method has difficulty in detecting the attention area thereof.
Therefore, there is a need to provide a new stereo video compression method that simultaneously uses texture, motion and stereo contrast for saliency analysis.
Disclosure of Invention
To solve the problems in the prior art, embodiments of the present invention provide a new method for compressing stereoscopic video that simultaneously uses texture, motion, and stereoscopic contrast for saliency analysis. In particular, by employing block-based stereo vision detection, depth cues are further provided that play an important role in human vision.
According to an embodiment of the present invention, there is provided a method for compressing stereoscopic video including a left-view frame and a right-view frame, the method including: determining, by intra prediction, a texture saliency value of a first partition within the left view frame; determining a motion significance value of the first block through motion estimation; determining a disparity saliency value between the first partition and a respective second partition within the right view frame; and determining a quantization parameter according to the parallax significance value, the texture significance value and the motion significance value.
Preferably, the method further comprises: and quantizing the first block according to the quantization parameter.
Preferably, the method further comprises: determining a hybrid stereo saliency map of the left view frame; reducing the size of the hybrid stereo saliency map to match the size of a Transform Unit (TU); determining a second quantization parameter for the transform unit; and quantizing the transform unit according to the second quantization parameter.
Preferably, the method further comprises: determining the texture saliency value from a DC mode intra prediction output of High Efficiency Video Coding (HEVC).
Preferably, the method further comprises: determining a motion saliency value for the first partition from a motion estimation output of High Efficiency Video Coding (HEVC).
Preferably, the method further comprises: determining a mixed stereo saliency value for the first partition by superimposing the disparity saliency value, texture saliency value, and motion saliency value with a weighting parameter.
Preferably, the left-view frame and the right-view frame are modified in a first direction, and the method further comprises: searching for the disparity saliency value in a second direction perpendicular to the first direction.
Preferably, the disparity saliency value comprises a non-integer value.
Preferably, the method further comprises: the disparity saliency value is determined from an 1/4 pixel sample generated by sub-pixel motion estimation of High Efficiency Video Coding (HEVC).
According to another embodiment of the present invention, there is provided a non-transitory computer-readable medium having stored thereon computer-executable instructions comprising a method of compressing stereoscopic video comprising left-view and right-view frames, the method comprising: determining, by intra prediction, a texture saliency value of a first partition within the left view frame; determining a motion significance value of the first block through motion estimation; determining a disparity saliency value between the first partition and a respective second partition within the right view frame; and determining a quantization parameter according to the parallax significance value, the texture significance value and the motion significance value.
Preferably, the method further comprises: and quantizing the first block according to the quantization parameter.
Preferably, the method further comprises: determining a hybrid stereo saliency map of the left view frame; reducing the size of the hybrid stereo saliency map to match the size of a Transform Unit (TU); determining a second quantization parameter for the transform unit; and quantizing the transform unit according to the second quantization parameter.
Preferably, the method further comprises: determining the texture saliency value from an output of DC mode intra prediction for High Efficiency Video Coding (HEVC).
Preferably, the method further comprises: determining a motion saliency value for the first partition from an output of a motion estimation for High Efficiency Video Coding (HEVC).
Preferably, the method further comprises: determining a mixed stereo saliency value for the first partition by superimposing the disparity saliency value, texture saliency value, and motion saliency value with a weighting parameter.
Preferably, the left-view frame and the right-view frame are modified in a first direction, and the method further comprises: searching for the disparity saliency value in a second direction perpendicular to the first direction.
Preferably, the disparity saliency value comprises a non-integer value.
Preferably, the method further comprises: the disparity saliency value is determined from an 1/4 pixel sample generated by sub-pixel motion estimation of High Efficiency Video Coding (HEVC).
According to an embodiment of the present invention, a region-of-interest based video coding scheme is employed that employs a visual attention based bit allocation method. Among these, spatial, temporal, and stereo cues are specifically taken into account in video attention prediction. The spatial and temporal contrast features are directly extracted from the existing video coding process without introducing additional computations. In addition, sub-pixel disparity intensity estimation is also employed to improve the visual saliency accuracy of stereoscopic systems. Thus, the perception quality of the end user is not affected while the stereo video is efficiently compressed.
Drawings
In order to better explain technical features of embodiments of the present invention, various embodiments of the present invention will be briefly described below with reference to the accompanying drawings.
Fig. 1 is an exemplary schematic diagram of a panoramic stereoscopic video system according to an embodiment of the present invention.
Fig. 2 is an exemplary schematic diagram of a camera array of a panoramic stereoscopic video system according to an embodiment of the present invention.
Fig. 3 is an exemplary schematic diagram of a data processing unit of a panoramic stereoscopic video system according to an embodiment of the present invention.
Fig. 4 is an exemplary flowchart of a panoramic stereo video stitching method according to an embodiment of the present invention.
Fig. 5 is an exemplary flowchart of a panoramic stereoscopic video display method according to an embodiment of the present invention.
Fig. 6 is an exemplary diagram of HEVC spatial prediction mode according to an embodiment of the present invention.
Fig. 7 is an exemplary diagram of block-based motion estimation using motion vector prediction according to an embodiment of the present invention.
Fig. 8 is an exemplary diagram of a motion intensity map obtained by motion estimation according to an embodiment of the present invention.
Fig. 9 is an exemplary diagram of block-based disparity estimation for stereoscopic video coding according to an embodiment of the present invention.
Fig. 10 is an exemplary schematic diagram of a stereoscopic video compression system based on a mixed region of interest according to an embodiment of the present invention.
Fig. 11 is an exemplary flowchart of a method for compressing stereoscopic video based on a mixed region of interest according to an embodiment of the present invention.
Detailed Description
To better illustrate the objects, technical features and advantages of the embodiments of the present invention, various embodiments of the present invention are further described below with reference to the accompanying drawings. It is to be understood that the drawings are for purposes of illustrating exemplary embodiments of the invention and that other drawings may be devised by those skilled in the art without departing from the principles of the invention.
According to an embodiment of the present invention, there is provided a panoramic stereoscopic video system having multi-camera video photographing, data processing, stereoscopic video encoding, transmission, and 3D display functions. The panoramic stereo video system adopts real-time multi-view video shooting, image correction and preprocessing, and stereo video compression based on a region of interest (ROI). After the transmission and decoding process, left and right fields of view are displayed using Head Mounted Display (HMD) headphones.
1. Overview of the System
Fig. 1 is an exemplary schematic diagram of a panoramic stereoscopic video system according to an embodiment of the present invention. The panoramic stereo video system shoots a 3D panoramic video by adopting a camera array, and displays the shot 3D panoramic video on a 3D television or a head-mounted virtual reality display device. As shown in fig. 1, the panoramic stereoscopic video system includes a data acquisition unit 200, a data processing unit 300, and a data display unit 400. The data acquisition unit 200 includes a plurality of cameras within a camera array 210 and a camera calibration unit 220. The data processing unit 300 includes a data preprocessing unit 310 and an advanced stereoscopic video transcoding unit 320. The data display unit 400 includes a decoding unit 410 and a display headset 420.
2. Data acquisition unit
As shown in fig. 1, the data acquisition unit 200 includes a plurality of cameras in a camera array 210, and a camera calibration unit 220 that calibrates the camera array 210.
2.1 Camera array
Fig. 2 is an exemplary schematic diagram of a camera array in a panoramic stereoscopic video system according to an embodiment of the invention.
As shown in fig. 2, the camera array 210 has 16 high-definition cameras c 1-c 16 mounted on a mounting frame of a regular octagon having a pair of cameras mounted on each side of the octagon. The two cameras on each side, e.g., c1 and c2, have parallel optical axes and are spaced a distance d from each other. Raw video data collected by the camera array 210 is sent over a cable to a computer for further processing. The camera parameters are listed in table 1 below.
TABLE 1
It should be noted that although the camera array is shown as a regular octagon in fig. 2, the camera array may be provided in other shapes in other embodiments of the present invention. In particular, in one embodiment of the invention, each camera is mounted on a rigid frame such that the relative position between the plurality of cameras is substantially constant. In a further embodiment of the invention, the cameras are arranged substantially in the same plane, for example on each side of a polygon.
2.2 Camera calibration
In order to stitch together the images taken by the cameras and generate a 3D effect, it is necessary to obtain both the internal and external parameters of the cameras. The external parameters include rotation and translation between the cameras so that images taken by different cameras can be corrected and aligned in the horizontal direction. In addition, the images captured by the cameras may be distorted, and in order to obtain an undistorted image, it is necessary to know the distortion parameters of the cameras. These parameters may be obtained during calibration of the camera.
2.2.1 internal and distortion parameter calibration
The internal and distortion parameters for each camera can be obtained by various methods, such as the calibration method proposed by Zhengyou Zhang. In addition, tools such as MatLab can be used to obtain such parameters.
2.2.2 external parameter calibration
After obtaining the internal parameters of each camera, rotation and translation between the cameras are obtained by adopting a method based on a motion recovery structure. The method has the following advantages:
high efficiency: the cameras do not need to be calibrated pair by pair. On the contrary, all cameras shoot a scene simultaneously in the calibration process, and the external parameters of all cameras can be obtained simultaneously.
The accuracy is as follows: in the pattern-based calibration method, a pattern needs to be photographed by two adjacent cameras, which often causes the resolution and calibration accuracy of the pattern to be degraded. In the method of the present invention based on a motion recovery structure, the motion of each camera is estimated independently to obtain the above parameters, and adjacent cameras do not need to have overlapping fields of view. Therefore, each camera can be placed at a position closer to the scene to be photographed, thereby achieving higher accuracy.
And (3) expandability: since the adjacent cameras of the method of the invention do not need to overlap fields of view, they can even be adapted to cameras placed in a back-to-back attitude.
2.3 data acquisition method
Data from the 16 cameras are stored by software after being collected and then provided to the data processing unit. Image data of each frame captured by each camera may be collected by software such as FFmpeg and DirectShow (or DShow). The frames captured by each camera are compressed and saved as a video file. Since the number of cameras is large, it is necessary to synchronize frames captured by the cameras using a time stamp or the like. For example, each frame captured by each camera may be placed in a queue after the time stamp is added, so that it is synchronized with other frames having the same time stamp. The synchronized frames are encoded into a video stream and stored locally or transmitted simultaneously over a network.
3. Data processing unit
As shown in fig. 1, the data processing unit 300 includes a data preprocessing unit 310 and an advanced stereoscopic video transcoding unit 320.
Fig. 3 is an exemplary schematic diagram of a data processing unit in a panoramic stereoscopic video system according to an embodiment of the present invention. As shown in fig. 3, the data preprocessing unit 310 includes: a time axis synchronization 311 for synchronizing images captured by the respective cameras; a number of decoders 312 for decoding the original video stream; several modifiers 313 for the original video modification; an encoder 314 for implementing video processing including noise reduction and editing; and the splicing unit is used for splicing the videos to generate a panoramic video. The data preprocessing unit 310 outputs the left-eye video and the right-eye video to the advanced stereoscopic video transcoding unit 320. The advanced stereoscopic video transcoding unit 320 generates a motion map 321 and a texture map 322 of the video. A mixed region of interest (ROI) generation unit 323 identifies a region of interest in the video from the motion map 321 and texture map 322. A bit allocation unit 324 allocates bits according to the identified region of interest, and an HEVC coding unit 325 codes the video. H.265 packer 326 packs the encoded video for transmission.
Fig. 4 is an exemplary flowchart of a panoramic stereo video stitching method according to an embodiment of the present invention.
3.1 distortion correction and preprocessing
And curling the frames shot by each camera according to the distortion parameters obtained in the calibration process so as to obtain distortion-free frames. To improve image alignment and stitching accuracy, each frame needs to be filtered first to reduce noise.
3.2 image alignment
And carrying out image alignment on each pair of cameras arranged on each side of the octagon, and aligning the images shot by each pair of cameras in the horizontal direction. According to one embodiment of the present invention, each frame captured by each pair of cameras is warped to a plane parallel to the optical axes of the pair of cameras.
4. Panoramic video stitching
The camera array has 8 pairs of cameras. After all the frames shot by the left cameras are projected onto the cylinder, the frames are spliced into a panoramic image. By repeating the above steps on all frames shot by each left camera, a panoramic video can be obtained. By processing the frames taken by each right-side camera in the same manner, another panoramic video can be obtained. The two panoramic videos form a panoramic stereo video.
5. Data display unit
As shown in fig. 1, the data display unit 400 includes a decoding unit 410 and a display headset 420. After passing through the codec system, the panoramic stereo video is played over a display headset 420, which may be a wearable Virtual Reality (VR) device, such as one provided by the Oculus VR company. And rendering the panoramic stereo video on a left-eye display and a right-eye display of the Oculus device respectively. The display area of the panoramic stereo video can be adjusted according to the movement of the detection device so as to simulate the change of the visual angle in virtual reality.
Fig. 5 is an exemplary flowchart of a panoramic stereoscopic video display method according to an embodiment of the present invention. As shown in fig. 5, in step 501, the encoded video stream is first decoded into YUV. In step 502, position calculations and field of view selection are performed based on the Oculus sensor data. In step 503, the left-eye and right-eye images are rendered separately. In step 504, the rendered image is displayed on the Oculus display headphones.
6. Stereo video compression
In the stereoscopic panoramic video system, the video processing module splices the left and right super-resolution videos, but huge video data becomes a difficult problem of video compression and transmission. According to an embodiment of the present invention, there is provided a region-of-interest-based video coding scheme that employs a visual attention-based bit allocation method. Among these, spatial, temporal, and stereo cues are specifically taken into account in video attention prediction. Spatial and temporal contrast features are extracted directly from the video encoding process without introducing additional computations. In addition, the visual saliency accuracy of the stereoscopic system is improved by adopting sub-pixel parallax intensity estimation. The repeated use of sub-pixel samples and the block-based matching ensure that the algorithm of the present invention can implement real-time detection with good performance. In general, this scheme does not affect the perceived quality of the end user while greatly improving the video compression rate.
6.1 region of interest detection
6.1.1 extraction of spatial features by Intra prediction
In the HEVC coding standard, intra prediction (or spatial prediction) is used to encode blocks that need to be compressed independently of previously encoded frames, and the spatial correlation at the pixel level is obtained from neighboring samples of previously encoded and reconstructed blocks. After this, the prediction samples are subtracted from the original pixel values to obtain residual blocks. This residual, obtained from intra prediction, contains texture contrast information and is used to generate a spatial saliency map.
In HEVC video coding, spatial prediction includes 33 directional prediction modes (only 8 such modes in h.264) for Prediction Unit (PU) selection, a DC prediction mode (ensemble averaging) and a planar (surface-fitting) prediction mode. Fig. 6 is an exemplary diagram of HEVC spatial prediction mode according to an embodiment of the present invention. All 35 prediction modes are shown in fig. 6. The size of the HEVC prediction unit is selected from 64 × 64 to 8 × 8, and all 35 modes can achieve the best block partition and the best residual. To reduce complexity, in one embodiment, the block-based residual mapping reuses the results of performing DC mode prediction on fixed 8 × 8 blocks. The residual of block k is calculated as follows:
wherein, CijAnd RijThe (i, j) th element of the current original block C and reconstructed block R. Then, a texture saliency value S of each block can be calculated according to the residual error of the blockTAnd normalized to [0, 1 ]]The range is as follows:
wherein N is the number of partitions in a frame. Since the spatial residual detection of 8 × 8 partitions is performed by the HEVC intra prediction method, no additional computation needs to be introduced.
In other embodiments: each frame may be partitioned into different sized non-overlapping blocks of 64 x 64 or 16 x 16 pixels, etc.; and may compute texture saliency maps from results of other video coding methods similar or comparable to intra prediction methods; and may be compressed according to other coding standards such as h.264/AVC or AVS. Preferably, the intra-prediction or other video processing is based on same-size partitions partitioned from the frame.
6.1.2 extracting temporal features by motion estimation
A fast moving object can be of visual interest. However, since a video sequence is captured by a camera in motion, there is global motion therein. Therefore, the local motion saliency needs to be measured by estimating the Motion Vector Difference (MVD) with HEVC inter-prediction motion estimation.
Motion estimation techniques in most video coding standards are generally based on block matching, where the motion vectors are represented by a 2D translational pattern and each block is matched to all candidate locations within a predetermined search area. Since motion vectors of neighboring blocks are typically highly correlated, in the motion vector prediction technique employed in HEVC, the motion vector of the current block is predicted from the motion vectors of nearby coded blocks.
Fig. 7 is an exemplary diagram of block-based motion estimation using Motion Vector (MV) prediction according to an embodiment of the present invention. As shown in fig. 7, for a current block 711 in a current frame 710, a vector mv is predicted from motion vectors of neighboring blocks pred712 and the corresponding block 721 is matched to all candidate locations within the predetermined search area 725. Finally, the best vector mv best723 and prediction vector mv pred721 are coded and transmitted.
In one embodiment, the motion vector difference generated by 8 x 8 block motion estimation is used. Wherein the size of the motion vector difference may be defined as:
MVDk=||mvbest(k)-mvpred(k)|| (3)
then, a motion saliency map S may be calculated by normalizing the motion vector difference within the same frameM:
The motion saliency map may be computed from the results of motion estimation, which is the main process of HEVC video coding. Therefore, the method can extract the motion feature without introducing any additional processing. Fig. 8 is an exemplary diagram of a motion intensity map obtained by motion estimation according to an embodiment of the present invention.
In other embodiments, each frame may be partitioned into different sized non-overlapping blocks of 64 × 64 or 16 × 16 pixels, etc.; and may compute a motion saliency map from the results of other video coding methods similar or comparable to inter-prediction motion estimation; but also according to other coding standards such as h.264/AVC or AVS. Preferably, the motion estimation or other video processing is based on equally sized blocks segmented from the frame.
6.1.3 disparity estimation by disparity prediction
We also use stereo vision in the saliency analysis to further provide depth cues, and this stereo vision plays an important role in stereo panoramic video. Wherein, a block-based disparity estimation method is introduced in the disparity mapping process.
Fig. 9 is a schematic diagram illustrating block-based disparity estimation for stereo video encoding according to an embodiment of the present invention. As shown in fig. 9, in the high resolution video system, both the left field of view 910 and the right field of view 920 of the stereoscopic image are well corrected. Each field of view is divided into non-overlapping tiles of size 8 x 8 pixels, and all pixels within a tile are assumed to have the same disparity. Thus, it is expected that the left field of view block matching the right field of view block can be found on the same scan line and the disparity 922 becomes a one-dimensional vector (the vertical component equals zero). The disparity matching scheme is similar to motion estimation in inter prediction. Specifically, the search area 925 is limited to a range within ± 32 in the horizontal direction. The initial search position is set to the position of the corresponding block 921 of the right field of view 920, and the Sum of Absolute Differences (SAD) is used as a matching criterion.
To achieve better prediction accuracy, we take into account non-integer value disparity as well, and use the HEVC 7/8 tap filter to interpolate sub-pixel intensities. Since sub-pixel sample interpolation is one of the most complex operations, the sub-pixel disparity search of the present invention directly uses the 1/4 pixel samples generated by HEVC sub-pixel motion estimation. By reusing the 7-tap interpolation of HEVC, the computational complexity can be reduced significantly. And, according to the block disparity value dkGenerating a block-by-block disparity map:
in other embodiments, each frame may be partitioned into different sized non-overlapping blocks of 64 × 64 or 16 × 16 pixels, etc.; and may compute a disparity map from the results of other video coding methods similar or comparable to motion estimation; but also according to other coding standards such as h.264/AVC or AVS. Preferably, the motion estimation process is based on equally sized blocks segmented from the frame.
6.1.4 hybrid region of interest determination
In one embodiment, the spatial-temporal features are compared with the disparity features, i.e., the texture contrast S in equation (2)TMotion contrast S in equation (4)MAnd the parallax intensity in the formula (5) are combined to detect the region of interest. Although each feature has its own advantages and disadvantages, the best results are achieved by combining all of the features. First, each feature map is normalized to [0, 1 ]]The range of (1). Secondly, by mixing SM,STAnd SDSuperimposed to form a mixed stereo saliency map S:
S(bi)=λTST+λMSM+λDSD(6)
wherein λ isT,λMAnd λDAre weighting parameters.
Fig. 10 is an exemplary schematic diagram of a stereoscopic video compression system based on a mixed region of interest according to an embodiment of the present invention. As shown in fig. 10, the stereo video compression system has a spatial prediction module 1001, a temporal prediction module 1102 and a disparity prediction module 1103. The results generated by the spatial prediction module 1001, the temporal prediction module 1102 and the disparity prediction module 1103 are input to the mixed region of interest generation module 1004, and after the salient region is identified, corresponding bit allocation is performed. The transform and quantization module 1105 performs quantization according to the bit allocation determined by the hybrid region of interest generation module 1004, and the entropy coding module 1106 operates to encode each frame to generate a compressed frame 1006.
6.2 region of interest based stereoscopic video coding
One of the concepts of region-of-interest based compression is that the bit allocation is done in favor of the salient regions. The hybrid region of interest detection method can generate high quality, high accuracy saliency maps. In addition, in order to improve the video compression performance, the high-level video standard HEVC with high compression efficiency is also selected.
Since the region of interest detection of the present invention is based on 8 x 8 blocking, the size of the estimated saliency map needs to be reduced to match the size of the current transform unit, which may be selected as 32 x 32, 16 x 16 and 8 x 8. The new QP value may be calculated as follows:
Q′=max(Q-ψ·(S-ES),0) (8)
In this way, the QP value for coding units containing a significant region is reduced, while the QP value for coding units not containing a significant region is increased. The parameter ψ can be selected by the user and controls the bit rate distribution between salient and non-salient regions: the higher the value of ψ, the more bits of the saliency area.
Fig. 11 is an exemplary flowchart of a method for compressing stereoscopic video based on a mixed region of interest according to an embodiment of the present invention. As shown in fig. 11, the compression method includes the steps of:
step 1101: a texture saliency value for a first partition within a left view frame is determined by intra prediction. Preferably, the texture saliency value is determined from an output of DC mode intra prediction for High Efficiency Video Coding (HEVC).
Step 1102: determining a motion saliency value of the first partition by motion estimation. Preferably, the motion saliency value is determined from an output of a motion estimation of High Efficiency Video Coding (HEVC). Step 1103: determining a disparity between the first partition and a corresponding second partition within a right view frame. Preferably, the left-view frame and the right-view frame are first corrected in a first direction, and then a parallax search is performed in a second direction perpendicular to the first direction.
Step 1104: and determining a quantization parameter according to the parallax, the texture significance value and the motion significance value. Preferably, a mixed stereo saliency value is determined by superimposing the disparity, texture saliency values and motion saliency values with weighting parameters.
Step 1105: and quantizing the first block according to the quantization parameter. Wherein if the size of the block is different from the size of the current transform unit, the size of the mixed stereoscopic saliency map is reduced to match the size of the current transform unit, and a new quantization parameter is calculated.
According to an embodiment of the present invention, a region-of-interest based video coding scheme is employed that employs a visual attention based bit allocation method. Among these, spatial, temporal, and stereo cues are specifically taken into account in video attention prediction. The spatial and temporal contrast features are extracted directly from the video encoding process without introducing additional computations. In addition, sub-pixel disparity intensity estimation is also used to improve visual saliency accuracy. Thus, the perception quality of the end user is not affected while the stereo video is efficiently compressed.
The various modules, units and components described above may be implemented as: an Application Specific Integrated Circuit (ASIC); an electronic circuit; a combinational logic circuit; a Field Programmable Gate Array (FPGA); a processor (shared, dedicated, or group) that executes code; or other suitable hardware components that provide the functionality described above. The processor may be a microprocessor from Intel corporation or a mainframe computer from IBM corporation.
It should be noted that one or more of the above functions may be implemented by software or firmware stored in a memory and executed by a processor, or stored in a program memory and executed by a processor. Further, the software or firmware can be stored and/or transmitted within any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a "computer-readable medium" can be any medium that can contain or store the program for use by or in connection with the instruction execution system, apparatus, or device. The computer readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, a portable computer diskette (magnetic), a Random Access Memory (RAM) (magnetic), a read-only memory (ROM) (magnetic), an erasable programmable read-only memory (EPROM) (magnetic), a portable optical disk such as a CD, CD-R, CD-RW, DVD-R, or DVD-RW, or flash memory cards, secured digital cards, USB memory devices, memory sticks, and the like.
The various embodiments of the invention described above are merely preferred embodiments and are not intended to limit the scope of the invention, which encompasses any modification, equivalents, and improvements, without departing from the spirit and principles of the invention.
Claims (18)
1. A method of compressing stereoscopic video including left-view frames and right-view frames, the method comprising:
determining, by intra prediction, a texture saliency value of a first partition within the left view frame;
determining a motion significance value of the first partition by motion estimation of inter prediction;
determining a disparity saliency value between the first partition and a respective second partition within the right view frame;
determining a quantization parameter according to the disparity saliency value, the texture saliency value, and the motion saliency value; and
quantizing the first partition according to the quantization parameter;
wherein a mixed stereo saliency value for the first partition is determined by a weighted summation of the disparity saliency value, the texture saliency value, and the motion saliency value, and the quantization parameter Q' is determined by:
Q′=max(Q-ψ·(S-ES),0)
where Q is the original quantization parameter value, S is the mixed stereo saliency value, and ψ is a parameter for controlling the bit rate distribution between a saliency region and a non-saliency region.
2. The method of claim 1, further comprising:
determining the texture saliency value from an output of DC mode intra prediction for high efficiency video coding.
3. The method of claim 1, further comprising:
determining a motion saliency value for the first partition based on an output of a motion estimation for high efficiency video coding.
4. The method of claim 1, wherein the left view frame is partitioned into a plurality of non-overlapping partitions, and the motion estimation is based on a same size as the first partition.
5. The method of claim 4, further comprising:
and quantizing the first block according to the quantization parameter.
6. The method of claim 4, further comprising:
determining a hybrid stereo saliency map of the left view frame;
reducing the size of the hybrid stereo saliency map to match the size of a transform unit;
determining a second quantization parameter for the transform unit; and
and quantizing the transformation unit according to the second quantization parameter.
7. The method of claim 1, wherein the left-view frame and the right-view frame are modified in a first direction, and further comprising:
searching for the disparity saliency value in a second direction perpendicular to the first direction.
8. The method of claim 7, wherein the disparity saliency value comprises a non-integer value.
9. The method of claim 8, further comprising:
the disparity saliency value is determined from 1/4 pixel samples generated by sub-pixel motion estimation for high efficiency video coding.
10. A non-transitory computer readable medium having stored thereon computer executable instructions that when executed by a processor perform the following method of compressing stereoscopic video comprising left view frames and right view frames, the method comprising:
determining, by intra prediction, a texture saliency value of a first partition within the left view frame;
determining a motion significance value of the first block through motion estimation;
determining a disparity saliency value between the first partition and a respective second partition within the right view frame;
determining a quantization parameter according to the parallax significance value, the texture significance value and the motion significance value; and
quantizing the first partition according to the quantization parameter;
wherein a mixed stereo saliency value for the first partition is determined by a weighted summation of the disparity saliency value, the texture saliency value, and the motion saliency value, and the quantization parameter Q' is determined by:
Q′=max(Q-ψ·(S-ES),0)
where Q is the original quantization parameter value, S is the mixed stereo saliency value, and ψ is a parameter for controlling the bit rate distribution between a saliency region and a non-saliency region.
11. The computer-readable medium of claim 10, wherein the method further comprises:
determining the texture saliency value from an output of DC mode intra prediction for high efficiency video coding.
12. The computer-readable medium of claim 10, wherein the method further comprises:
determining a motion saliency value for the first partition based on an output of a motion estimation for high efficiency video coding.
13. The computer-readable medium of claim 10, wherein the left view frame is partitioned into a plurality of non-overlapping partitions, and the motion estimation is based on a same size as the first partition.
14. The computer-readable medium of claim 13, wherein the method further comprises:
and quantizing the first block according to the quantization parameter.
15. The computer-readable medium of claim 13, wherein the method further comprises:
determining a hybrid stereo saliency map of the left view frame;
reducing the size of the hybrid stereo saliency map to match the size of a transform unit;
determining a second quantization parameter for the transform unit; and
and quantizing the transformation unit according to the second quantization parameter.
16. The computer-readable medium of claim 10, wherein the left-view frame and the right-view frame are modified in a first direction, and the method further comprises:
searching for the disparity saliency value in a second direction perpendicular to the first direction.
17. The computer-readable medium of claim 16, wherein the disparity saliency value comprises a non-integer value.
18. The computer-readable medium of claim 17, wherein the method further comprises:
the disparity saliency value is determined from 1/4 pixel samples generated by sub-pixel motion estimation for high efficiency video coding.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNPCT/CN2016/070712 | 2016-01-12 | ||
PCT/CN2016/070712 WO2017120776A1 (en) | 2016-01-12 | 2016-01-12 | Calibration method and apparatus for panoramic stereo video system |
CNPCT/CN2016/070823 | 2016-01-13 | ||
PCT/CN2016/070823 WO2017120802A1 (en) | 2016-01-12 | 2016-01-13 | Stitching method and apparatus for panoramic stereo video system |
PCT/CN2016/071238 WO2017120981A1 (en) | 2016-01-12 | 2016-01-18 | Compression method and apparatus for panoramic stereo video system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108886598A CN108886598A (en) | 2018-11-23 |
CN108886598B true CN108886598B (en) | 2020-08-25 |
Family
ID=59310625
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201680078559.3A Active CN109076200B (en) | 2016-01-12 | 2016-01-12 | Method and device for calibrating panoramic stereo video system |
CN201680078524.XA Active CN108886611B (en) | 2016-01-12 | 2016-01-13 | Splicing method and device of panoramic stereo video system |
CN201680078558.9A Active CN108886598B (en) | 2016-01-12 | 2016-01-18 | Compression method and device of panoramic stereo video system |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201680078559.3A Active CN109076200B (en) | 2016-01-12 | 2016-01-12 | Method and device for calibrating panoramic stereo video system |
CN201680078524.XA Active CN108886611B (en) | 2016-01-12 | 2016-01-13 | Splicing method and device of panoramic stereo video system |
Country Status (4)
Country | Link |
---|---|
US (3) | US10636121B2 (en) |
EP (3) | EP3403403B1 (en) |
CN (3) | CN109076200B (en) |
WO (3) | WO2017120776A1 (en) |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10694102B2 (en) * | 2016-07-22 | 2020-06-23 | Immervision, Inc. | Method to capture, store, distribute, share, stream and display panoramic image or video |
US20200005529A1 (en) * | 2017-01-23 | 2020-01-02 | Virtual Diamond Boutique Inc. | System, method and computer-accessible medium for displaying a three-dimensional digital version of an object |
JP7159057B2 (en) * | 2017-02-10 | 2022-10-24 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | Free-viewpoint video generation method and free-viewpoint video generation system |
CA3054162A1 (en) * | 2017-02-27 | 2018-08-30 | Thomson Licensing | Method, system and apparatus for visual effects |
CN108886616A (en) * | 2017-12-27 | 2018-11-23 | 深圳市大疆创新科技有限公司 | The method, apparatus and computer system of Video coding |
EP3752877A4 (en) * | 2018-02-17 | 2021-11-03 | Dreamvu, Inc. | System and method for capturing omni-stereo videos using multi-sensors |
JP7223978B2 (en) * | 2018-05-23 | 2023-02-17 | パナソニックIpマネジメント株式会社 | Calibration device and calibration method |
US10878276B2 (en) * | 2018-06-05 | 2020-12-29 | Hrl Laboratories, Llc | Method and system for detecting change of context in video streams |
US20200226787A1 (en) * | 2019-01-14 | 2020-07-16 | Sony Corporation | Information processing apparatus, information processing method, and program |
CN109887036A (en) * | 2019-01-21 | 2019-06-14 | 广州市安晓科技有限责任公司 | A kind of automobile looks around the semi-automatic calibration system and method for panorama |
CN110796012B (en) * | 2019-09-29 | 2022-12-27 | 北京达佳互联信息技术有限公司 | Image processing method and device, electronic equipment and readable storage medium |
CN110749994A (en) * | 2019-10-25 | 2020-02-04 | 江苏荣策士科技发展有限公司 | Penetrating HMD glasses |
CN111193920B (en) * | 2019-12-31 | 2020-12-18 | 重庆特斯联智慧科技股份有限公司 | Video picture three-dimensional splicing method and system based on deep learning network |
CN111464805B (en) * | 2020-03-13 | 2021-08-10 | 宁波大学 | Three-dimensional panoramic video rapid coding method based on panoramic saliency |
CN111935474A (en) * | 2020-08-17 | 2020-11-13 | 广东申义实业投资有限公司 | Spread type light-emitting turntable and image shooting method thereof |
US11677979B2 (en) | 2020-08-24 | 2023-06-13 | Tencent America LLC | Freeview video coding |
US11758101B2 (en) * | 2020-08-25 | 2023-09-12 | Samsung Electronics Co., Ltd. | Restoration of the FOV of images for stereoscopic rendering |
US11900561B2 (en) | 2020-11-25 | 2024-02-13 | Electronics And Telecommunications Research Institute | Deep learning-based image stitching method and apparatus |
CN112637601B (en) * | 2020-12-14 | 2023-01-03 | 天津光电通信技术有限公司 | Encoding method and device based on fisheye panoramic video |
CN112884844B (en) * | 2021-01-13 | 2023-02-03 | 深圳市豪恩汽车电子装备股份有限公司 | Method and device for calibrating panoramic image system and computer readable storage medium |
CN115550516A (en) * | 2021-06-30 | 2022-12-30 | 长城信息股份有限公司 | Marker-based camera array calibration method, image splicing method, system, terminal and readable storage medium |
CN113573058B (en) * | 2021-09-23 | 2021-11-30 | 康达洲际医疗器械有限公司 | Interframe image coding method based on space-time significance fusion |
CN113965698B (en) * | 2021-11-12 | 2024-03-08 | 白银银珠电力(集团)有限责任公司 | Monitoring image calibration processing method, device and system for fire-fighting Internet of things |
KR20230115819A (en) * | 2022-01-27 | 2023-08-03 | 삼성전자주식회사 | Electronic device for generating panoramic image and method for operating thereof |
CN115209181B (en) * | 2022-06-09 | 2024-03-22 | 咪咕视讯科技有限公司 | Video synthesis method based on surrounding view angle, controller and storage medium |
FR3147652A1 (en) * | 2023-04-04 | 2024-10-11 | Psa Automobiles Sa | Method and device for calibrating a non-parallel stereoscopic vision system on board a vehicle. |
CN118333915B (en) * | 2024-06-11 | 2024-09-13 | 武汉精立电子技术有限公司 | Wide-angle LMD distortion calibration method, correction method, device and equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8228560B2 (en) * | 2005-04-13 | 2012-07-24 | Acd Systems International Inc. | Image contrast enhancement |
CN103020985A (en) * | 2012-11-12 | 2013-04-03 | 华中科技大学 | Video image saliency detection method based on field quantity analysis |
CN104247432A (en) * | 2011-11-11 | 2014-12-24 | 弗兰霍菲尔运输应用研究公司 | Efficient multi-view coding using depth-map estimate update |
CN104255027A (en) * | 2012-03-22 | 2014-12-31 | 高通股份有限公司 | Inter layer texture prediction for video coding |
CN104424642A (en) * | 2013-09-09 | 2015-03-18 | 华为软件技术有限公司 | Detection method and detection system for video salient regions |
CN104822058A (en) * | 2015-04-14 | 2015-08-05 | 宁波大学 | Method for extracting saliency map of three-dimensional image |
CN105049850A (en) * | 2015-03-24 | 2015-11-11 | 上海大学 | HEVC (High Efficiency Video Coding) code rate control method based on region-of-interest |
Family Cites Families (58)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002065763A2 (en) | 2001-02-12 | 2002-08-22 | Carnegie Mellon University | System and method for manipulating the point of interest in a sequence of images |
US20040001138A1 (en) | 2002-06-27 | 2004-01-01 | Weerashinghe W.A. Chaminda P. | Stereoscopic panoramic video generation system |
US7119816B2 (en) | 2003-03-31 | 2006-10-10 | Microsoft Corp. | System and method for whiteboard scanning to obtain a high resolution image |
EP1679659A1 (en) | 2005-01-06 | 2006-07-12 | Thomson Licensing | Method and device for selecting quantization parameters in a picture using side information |
KR100739730B1 (en) | 2005-09-03 | 2007-07-13 | 삼성전자주식회사 | Apparatus and method for processing 3D dimensional picture |
US7697839B2 (en) | 2006-06-30 | 2010-04-13 | Microsoft Corporation | Parametric calibration for panoramic camera systems |
CN104768031B (en) | 2009-01-26 | 2018-02-09 | 汤姆森特许公司 | Device for video decoding |
US20110255589A1 (en) | 2009-08-03 | 2011-10-20 | Droplet Technology, Inc. | Methods of compressing data and methods of assessing the same |
EP2481209A1 (en) | 2009-09-22 | 2012-08-01 | Tenebraex Corporation | Systems and methods for correcting images in a multi-sensor system |
US8711204B2 (en) * | 2009-11-11 | 2014-04-29 | Disney Enterprises, Inc. | Stereoscopic editing for video production, post-production and display adaptation |
US10095953B2 (en) * | 2009-11-11 | 2018-10-09 | Disney Enterprises, Inc. | Depth modification for display applications |
US9445072B2 (en) * | 2009-11-11 | 2016-09-13 | Disney Enterprises, Inc. | Synthesizing views based on image domain warping |
US20110235706A1 (en) | 2010-03-25 | 2011-09-29 | Texas Instruments Incorporated | Region of interest (roi) video encoding |
US20120154518A1 (en) * | 2010-12-17 | 2012-06-21 | Microsoft Corporation | System for capturing panoramic stereoscopic video |
CN102055982B (en) | 2011-01-13 | 2012-06-27 | 浙江大学 | Coding and decoding methods and devices for three-dimensional video |
CN102592124B (en) * | 2011-01-13 | 2013-11-27 | 汉王科技股份有限公司 | Geometrical correction method, device and binocular stereoscopic vision system of text image |
US20120236934A1 (en) | 2011-03-18 | 2012-09-20 | Qualcomm Incorporated | Signaling of multiview video plus depth content with a block-level 4-component structure |
US8581961B2 (en) | 2011-03-31 | 2013-11-12 | Vangogh Imaging, Inc. | Stereoscopic panoramic video capture system using surface identification and distance registration technique |
WO2012136388A1 (en) | 2011-04-08 | 2012-10-11 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Capturing panoramic or semi-panoramic 3d scenes |
CN102243432A (en) | 2011-06-28 | 2011-11-16 | 浙江工业大学 | Panoramic three-dimensional photographing device |
US9153025B2 (en) | 2011-08-19 | 2015-10-06 | Adobe Systems Incorporated | Plane detection and tracking for structure from motion |
KR101638173B1 (en) * | 2011-09-06 | 2016-07-12 | 한국전자통신연구원 | Method and apparatus for providing automated detection of calibration |
US8928729B2 (en) | 2011-09-09 | 2015-01-06 | Disney Enterprises, Inc. | Systems and methods for converting video |
RU2580439C2 (en) * | 2011-11-01 | 2016-04-10 | Конинклейке Филипс Н.В. | Based on significance of disparity |
US20130176390A1 (en) | 2012-01-06 | 2013-07-11 | Qualcomm Incorporated | Multi-hypothesis disparity vector construction in 3d video coding with depth |
CN104169965B (en) * | 2012-04-02 | 2018-07-03 | 英特尔公司 | For system, the method and computer program product adjusted during the operation of anamorphose parameter in more filming apparatus systems |
US8942422B2 (en) * | 2012-04-06 | 2015-01-27 | Adobe Systems Incorporated | Nonlinear self-calibration for structure from motion (SFM) techniques |
CN103516995A (en) | 2012-06-19 | 2014-01-15 | 中南大学 | A real time panorama video splicing method based on ORB characteristics and an apparatus |
WO2014020529A1 (en) * | 2012-08-02 | 2014-02-06 | Earthmine, Inc. | Three-dimensional plane panorama creation through hough-based line detection |
US9025880B2 (en) * | 2012-08-29 | 2015-05-05 | Disney Enterprises, Inc. | Visual saliency estimation for images and video |
US20140098185A1 (en) | 2012-10-09 | 2014-04-10 | Shahram Davari | Interactive user selected video/audio views by real time stitching and selective delivery of multiple video/audio sources |
JP6107081B2 (en) * | 2012-11-21 | 2017-04-05 | 富士通株式会社 | Image processing apparatus, image processing method, and program |
TWI602152B (en) * | 2013-02-06 | 2017-10-11 | 聚晶半導體股份有限公司 | Image capturing device nd image processing method thereof |
CN103108187B (en) | 2013-02-25 | 2016-09-28 | 清华大学 | The coded method of a kind of 3 D video, coding/decoding method, encoder |
CN103179405B (en) | 2013-03-26 | 2016-02-24 | 天津大学 | A kind of multi-view point video encoding method based on multi-level region-of-interest |
US20140300691A1 (en) * | 2013-04-04 | 2014-10-09 | Panasonic Corporation | Imaging system |
US9398215B2 (en) | 2013-04-16 | 2016-07-19 | Eth Zurich | Stereoscopic panoramas |
US9542773B2 (en) * | 2013-05-23 | 2017-01-10 | Google Inc. | Systems and methods for generating three-dimensional models using sensed position data |
US9667990B2 (en) | 2013-05-31 | 2017-05-30 | Qualcomm Incorporated | Parallel derived disparity vector for 3D video coding with neighbor-based disparity vector derivation |
US9509979B2 (en) * | 2013-11-26 | 2016-11-29 | Mobileye Vision Technologies Ltd. | Stereo auto-calibration from structure-from-motion |
CA2933704A1 (en) | 2013-12-13 | 2015-06-18 | 8702209 Canada Inc. | Systems and methods for producing panoramic and stereoscopic videos |
KR101537174B1 (en) * | 2013-12-17 | 2015-07-15 | 가톨릭대학교 산학협력단 | Method for extracting salient object from stereoscopic video |
US9552061B2 (en) * | 2014-03-26 | 2017-01-24 | Microsoft Technology Licensing, Llc | Eye gaze tracking using binocular fixation constraints |
US10027948B2 (en) * | 2014-05-20 | 2018-07-17 | Nextvr Inc. | Methods and apparatus including or for use with one or more cameras |
US9911454B2 (en) | 2014-05-29 | 2018-03-06 | Jaunt Inc. | Camera array including camera modules |
JP6308449B2 (en) * | 2014-06-26 | 2018-04-11 | ホアウェイ・テクノロジーズ・カンパニー・リミテッド | Method and apparatus for reducing computational load in high efficiency video coding |
US9973694B1 (en) * | 2014-08-21 | 2018-05-15 | Jaunt Inc. | Image stitching to form a three dimensional panoramic image |
US9870617B2 (en) * | 2014-09-19 | 2018-01-16 | Brain Corporation | Apparatus and methods for saliency detection based on color occurrence analysis |
US10547825B2 (en) | 2014-09-22 | 2020-01-28 | Samsung Electronics Company, Ltd. | Transmission of three-dimensional video |
CN104580933A (en) | 2015-02-09 | 2015-04-29 | 上海安威士科技股份有限公司 | Multi-scale real-time monitoring video stitching device based on feature points and multi-scale real-time monitoring video stitching method |
US9877016B2 (en) * | 2015-05-27 | 2018-01-23 | Google Llc | Omnistereo capture and render of panoramic virtual reality content |
WO2017092007A1 (en) * | 2015-12-03 | 2017-06-08 | SZ DJI Technology Co., Ltd. | System and method for video processing |
WO2017098090A1 (en) * | 2015-12-12 | 2017-06-15 | Aalto University Foundation | Calibration technique for capturing panoramic images |
US20190019299A1 (en) * | 2016-01-03 | 2019-01-17 | Humaneyes Technologies Ltd. | Adaptive stitching of frames in the process of creating a panoramic frame |
WO2017188714A1 (en) * | 2016-04-26 | 2017-11-02 | 엘지전자 주식회사 | Method for transmitting 360-degree video, method for receiving 360-degree video, apparatus for transmitting 360-degree video, apparatus for receiving 360-degree video |
US10447993B2 (en) | 2016-09-27 | 2019-10-15 | Laduma, Inc. | Stereoscopic 360 degree digital camera systems |
CN109587203A (en) * | 2017-09-29 | 2019-04-05 | 索尼公司 | Information processing equipment and method, electronic device and computer-readable medium |
US10902556B2 (en) * | 2018-07-16 | 2021-01-26 | Nvidia Corporation | Compensating for disparity variation when viewing captured multi video image streams |
-
2016
- 2016-01-12 US US16/069,181 patent/US10636121B2/en active Active
- 2016-01-12 CN CN201680078559.3A patent/CN109076200B/en active Active
- 2016-01-12 EP EP16884339.9A patent/EP3403403B1/en active Active
- 2016-01-12 WO PCT/CN2016/070712 patent/WO2017120776A1/en active Application Filing
- 2016-01-13 CN CN201680078524.XA patent/CN108886611B/en active Active
- 2016-01-13 WO PCT/CN2016/070823 patent/WO2017120802A1/en active Application Filing
- 2016-01-13 EP EP16884365.4A patent/EP3403400B1/en active Active
- 2016-01-13 US US16/069,183 patent/US10489886B2/en active Active
- 2016-01-18 US US16/069,185 patent/US10643305B2/en active Active
- 2016-01-18 WO PCT/CN2016/071238 patent/WO2017120981A1/en active Application Filing
- 2016-01-18 CN CN201680078558.9A patent/CN108886598B/en active Active
- 2016-01-18 EP EP16884543.6A patent/EP3403401B1/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8228560B2 (en) * | 2005-04-13 | 2012-07-24 | Acd Systems International Inc. | Image contrast enhancement |
CN104247432A (en) * | 2011-11-11 | 2014-12-24 | 弗兰霍菲尔运输应用研究公司 | Efficient multi-view coding using depth-map estimate update |
CN104255027A (en) * | 2012-03-22 | 2014-12-31 | 高通股份有限公司 | Inter layer texture prediction for video coding |
CN103020985A (en) * | 2012-11-12 | 2013-04-03 | 华中科技大学 | Video image saliency detection method based on field quantity analysis |
CN104424642A (en) * | 2013-09-09 | 2015-03-18 | 华为软件技术有限公司 | Detection method and detection system for video salient regions |
CN105049850A (en) * | 2015-03-24 | 2015-11-11 | 上海大学 | HEVC (High Efficiency Video Coding) code rate control method based on region-of-interest |
CN104822058A (en) * | 2015-04-14 | 2015-08-05 | 宁波大学 | Method for extracting saliency map of three-dimensional image |
Non-Patent Citations (1)
Title |
---|
Saliency Prediction on Stereoscopic Videos;KIM,Haksub et al.;《IEEE TRANSCAATIONS ON IMAGE PROCESSING》;20140430;第1476-1490页 * |
Also Published As
Publication number | Publication date |
---|---|
EP3403401A1 (en) | 2018-11-21 |
US10489886B2 (en) | 2019-11-26 |
EP3403403A4 (en) | 2019-08-21 |
US10636121B2 (en) | 2020-04-28 |
US10643305B2 (en) | 2020-05-05 |
CN109076200A (en) | 2018-12-21 |
EP3403400B1 (en) | 2022-04-20 |
CN108886611B (en) | 2021-07-09 |
WO2017120981A1 (en) | 2017-07-20 |
CN108886598A (en) | 2018-11-23 |
WO2017120802A1 (en) | 2017-07-20 |
EP3403403A1 (en) | 2018-11-21 |
WO2017120776A1 (en) | 2017-07-20 |
CN109076200B (en) | 2021-04-23 |
EP3403403B1 (en) | 2023-06-07 |
US20190028693A1 (en) | 2019-01-24 |
EP3403401B1 (en) | 2021-10-13 |
EP3403400A4 (en) | 2019-10-09 |
EP3403400A1 (en) | 2018-11-21 |
US20190035055A1 (en) | 2019-01-31 |
CN108886611A (en) | 2018-11-23 |
US20190028707A1 (en) | 2019-01-24 |
EP3403401A4 (en) | 2019-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108886598B (en) | Compression method and device of panoramic stereo video system | |
EP2491722B1 (en) | Depth map generation techniques for conversion of 2d video data to 3d video data | |
JP7036599B2 (en) | A method of synthesizing a light field with compressed omnidirectional parallax using depth information | |
US8488870B2 (en) | Multi-resolution, multi-window disparity estimation in 3D video processing | |
US9525858B2 (en) | Depth or disparity map upscaling | |
JP5970609B2 (en) | Method and apparatus for unified disparity vector derivation in 3D video coding | |
EP3100454B1 (en) | Method for low-latency illumination compensation process | |
US20140063188A1 (en) | Apparatus, a Method and a Computer Program for Image Processing | |
WO2011014419A1 (en) | Methods, systems, and computer-readable storage media for creating three-dimensional (3d) images of a scene | |
US9451233B2 (en) | Methods and arrangements for 3D scene representation | |
Ma et al. | Low complexity adaptive view synthesis optimization in HEVC based 3D video coding | |
JP6307152B2 (en) | Image encoding apparatus and method, image decoding apparatus and method, and program thereof | |
JP6039178B2 (en) | Image encoding apparatus, image decoding apparatus, method and program thereof | |
Pourazad et al. | Generating the depth map from the motion information of H. 264-encoded 2D video sequence | |
Farid et al. | Panorama view with spatiotemporal occlusion compensation for 3D video coding | |
Daribo et al. | Arbitrarily shaped sub-block motion prediction in texture map compression using depth information | |
WO2016078162A1 (en) | Hevc-based 3d video fast coding method | |
JP2015019326A (en) | Encoding device, encoding method, decoding device, and decoding method | |
Li et al. | Fast frame-rate up-conversion of depth video via video coding | |
Farid et al. | A panoramic 3D video coding with directional depth aided inpainting | |
Pang et al. | Multiview video coding using projective rectification-based view extrapolation and synthesis bias correction | |
JP6232117B2 (en) | Image encoding method, image decoding method, and recording medium | |
Zhang et al. | A super resolution reconstruction scheme for mixed spatio-temporal stereo video | |
Shimizu et al. | Backward view synthesis prediction using virtual depth map for multiview video plus depth map coding | |
Tizon et al. | Multi-view acquisition and advanced depth map processing techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |