CN108886598B - Compression method and device of panoramic stereo video system - Google Patents

Compression method and device of panoramic stereo video system Download PDF

Info

Publication number
CN108886598B
CN108886598B CN201680078558.9A CN201680078558A CN108886598B CN 108886598 B CN108886598 B CN 108886598B CN 201680078558 A CN201680078558 A CN 201680078558A CN 108886598 B CN108886598 B CN 108886598B
Authority
CN
China
Prior art keywords
value
saliency
determining
partition
saliency value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201680078558.9A
Other languages
Chinese (zh)
Other versions
CN108886598A (en
Inventor
虞晶怡
马毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ShanghaiTech University
Original Assignee
ShanghaiTech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ShanghaiTech University filed Critical ShanghaiTech University
Publication of CN108886598A publication Critical patent/CN108886598A/en
Application granted granted Critical
Publication of CN108886598B publication Critical patent/CN108886598B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/18Image warping, e.g. rearranging pixels individually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/337Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/128Adjusting depth or disparity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/239Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/243Image signal generators using stereoscopic image cameras using three or more 2D image sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/246Calibration of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • H04N19/126Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • H04N19/139Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/523Motion estimation or motion compensation with sub-pixel accuracy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0081Depth or disparity estimation from stereoscopic image signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Studio Devices (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

There is provided a method of compressing stereoscopic video including left-view frames and right-view frames, the method comprising: determining a texture saliency value of a first partition within the left view frame by intra prediction (1101); determining a motion saliency value (1102) for the first partition by motion estimation; determining a disparity saliency value between the first partition and a respective second partition within the right view frame (1103); determining a quantization parameter (1104) from the disparity saliency value, the texture saliency value, and the motion saliency value; and quantizing (1105) the first partition according to the quantization parameter.

Description

Compression method and device of panoramic stereo video system
RELATED APPLICATIONSCross reference to
The application requires an international patent application with the application number of PCT/CN2016/070712, the name of the international patent application is 'calibration method and device for a panoramic stereo video system', the application date is 'international patent application with the date of 2016, 1, 12, and the application number of PCT/CN2016/070823, the name of the international patent application is' splicing method and device for the panoramic stereo video system ', and the application date is' rights and priority of the international patent application with the date of 2016, 1, 13. The entire disclosures of both applications are incorporated herein by reference.
Technical Field
The present invention relates to a panoramic stereoscopic video system for photographing, processing, compressing, and displaying a 3D panoramic stereoscopic video, and more particularly, to a method and apparatus for 3D panoramic stereoscopic video compression in a panoramic stereoscopic video system.
Background
The panoramic stereoscopic video system proposed in the above-mentioned document achieves an immersive 3D experience by displaying a stereoscopic panoramic video on a Head Mounted Display (HMD). The resolution and persistence of stereoscopic video are two main features that determine the user experience. The system stitches images taken by 16 high-definition (HD) cameras onto each other to generate a stereoscopic video, and the resolution of each field of view is at least 3840 × 2160 (4K). Since the frame rate of the system is 50fps, it can greatly reduce the motion blur and flicker effect. On the other hand, however, its ultra-high resolution and high refresh rate result in generation of a huge amount of video data, thereby causing a problem to 3D video services and broadcasting.
The video coding efficiency of existing hybrid video coding methods such as h.264, VC-1, HEVC, etc. has been significantly improved over the last decade, and temporal and spatial redundancies in video sequences have been greatly reduced by implementing dense spatio-temporal prediction. Recent advances in 3D technologies such as MV-HEVC and 3D-HEVC have further investigated disparity prediction between different fields of view. However, in order to achieve better compression performance of the stereoscopic panoramic video, it is also necessary to improve subjective video quality by considering human visual characteristics and panorama-oriented characteristics.
In general, a 360 degree panoramic image contains an elongated field of view, and most of the field of view is likely to be background only. However, the user may only be interested in a small portion of the field of view where the color, texture, motion, or depth contrast is significant.
The basic principle of the compression method based on human visual features is that only a small number of selected regions of interest with high priority are encoded to obtain high subjective video quality, while regions of less interest are processed with low priority to save bits. To achieve this, a region that the user is likely to pay attention to is generally predicted using an attention prediction method.
Existing 2D image saliency calculations primarily consider contrast of features such as color, shape, orientation, texture, curvature, etc. In image sequences or video, region of interest detection focuses on motion information that can distinguish the foreground from the background. However, since the existing video compression method does not consider stereoscopic contrast in stereoscopic video, it is not suitable for stereoscopic video. In addition, when the salient object has no visual uniqueness on a spatial level and no motion occurs on a temporal level, the existing method has difficulty in detecting the attention area thereof.
Therefore, there is a need to provide a new stereo video compression method that simultaneously uses texture, motion and stereo contrast for saliency analysis.
Disclosure of Invention
To solve the problems in the prior art, embodiments of the present invention provide a new method for compressing stereoscopic video that simultaneously uses texture, motion, and stereoscopic contrast for saliency analysis. In particular, by employing block-based stereo vision detection, depth cues are further provided that play an important role in human vision.
According to an embodiment of the present invention, there is provided a method for compressing stereoscopic video including a left-view frame and a right-view frame, the method including: determining, by intra prediction, a texture saliency value of a first partition within the left view frame; determining a motion significance value of the first block through motion estimation; determining a disparity saliency value between the first partition and a respective second partition within the right view frame; and determining a quantization parameter according to the parallax significance value, the texture significance value and the motion significance value.
Preferably, the method further comprises: and quantizing the first block according to the quantization parameter.
Preferably, the method further comprises: determining a hybrid stereo saliency map of the left view frame; reducing the size of the hybrid stereo saliency map to match the size of a Transform Unit (TU); determining a second quantization parameter for the transform unit; and quantizing the transform unit according to the second quantization parameter.
Preferably, the method further comprises: determining the texture saliency value from a DC mode intra prediction output of High Efficiency Video Coding (HEVC).
Preferably, the method further comprises: determining a motion saliency value for the first partition from a motion estimation output of High Efficiency Video Coding (HEVC).
Preferably, the method further comprises: determining a mixed stereo saliency value for the first partition by superimposing the disparity saliency value, texture saliency value, and motion saliency value with a weighting parameter.
Preferably, the left-view frame and the right-view frame are modified in a first direction, and the method further comprises: searching for the disparity saliency value in a second direction perpendicular to the first direction.
Preferably, the disparity saliency value comprises a non-integer value.
Preferably, the method further comprises: the disparity saliency value is determined from an 1/4 pixel sample generated by sub-pixel motion estimation of High Efficiency Video Coding (HEVC).
According to another embodiment of the present invention, there is provided a non-transitory computer-readable medium having stored thereon computer-executable instructions comprising a method of compressing stereoscopic video comprising left-view and right-view frames, the method comprising: determining, by intra prediction, a texture saliency value of a first partition within the left view frame; determining a motion significance value of the first block through motion estimation; determining a disparity saliency value between the first partition and a respective second partition within the right view frame; and determining a quantization parameter according to the parallax significance value, the texture significance value and the motion significance value.
Preferably, the method further comprises: and quantizing the first block according to the quantization parameter.
Preferably, the method further comprises: determining a hybrid stereo saliency map of the left view frame; reducing the size of the hybrid stereo saliency map to match the size of a Transform Unit (TU); determining a second quantization parameter for the transform unit; and quantizing the transform unit according to the second quantization parameter.
Preferably, the method further comprises: determining the texture saliency value from an output of DC mode intra prediction for High Efficiency Video Coding (HEVC).
Preferably, the method further comprises: determining a motion saliency value for the first partition from an output of a motion estimation for High Efficiency Video Coding (HEVC).
Preferably, the method further comprises: determining a mixed stereo saliency value for the first partition by superimposing the disparity saliency value, texture saliency value, and motion saliency value with a weighting parameter.
Preferably, the left-view frame and the right-view frame are modified in a first direction, and the method further comprises: searching for the disparity saliency value in a second direction perpendicular to the first direction.
Preferably, the disparity saliency value comprises a non-integer value.
Preferably, the method further comprises: the disparity saliency value is determined from an 1/4 pixel sample generated by sub-pixel motion estimation of High Efficiency Video Coding (HEVC).
According to an embodiment of the present invention, a region-of-interest based video coding scheme is employed that employs a visual attention based bit allocation method. Among these, spatial, temporal, and stereo cues are specifically taken into account in video attention prediction. The spatial and temporal contrast features are directly extracted from the existing video coding process without introducing additional computations. In addition, sub-pixel disparity intensity estimation is also employed to improve the visual saliency accuracy of stereoscopic systems. Thus, the perception quality of the end user is not affected while the stereo video is efficiently compressed.
Drawings
In order to better explain technical features of embodiments of the present invention, various embodiments of the present invention will be briefly described below with reference to the accompanying drawings.
Fig. 1 is an exemplary schematic diagram of a panoramic stereoscopic video system according to an embodiment of the present invention.
Fig. 2 is an exemplary schematic diagram of a camera array of a panoramic stereoscopic video system according to an embodiment of the present invention.
Fig. 3 is an exemplary schematic diagram of a data processing unit of a panoramic stereoscopic video system according to an embodiment of the present invention.
Fig. 4 is an exemplary flowchart of a panoramic stereo video stitching method according to an embodiment of the present invention.
Fig. 5 is an exemplary flowchart of a panoramic stereoscopic video display method according to an embodiment of the present invention.
Fig. 6 is an exemplary diagram of HEVC spatial prediction mode according to an embodiment of the present invention.
Fig. 7 is an exemplary diagram of block-based motion estimation using motion vector prediction according to an embodiment of the present invention.
Fig. 8 is an exemplary diagram of a motion intensity map obtained by motion estimation according to an embodiment of the present invention.
Fig. 9 is an exemplary diagram of block-based disparity estimation for stereoscopic video coding according to an embodiment of the present invention.
Fig. 10 is an exemplary schematic diagram of a stereoscopic video compression system based on a mixed region of interest according to an embodiment of the present invention.
Fig. 11 is an exemplary flowchart of a method for compressing stereoscopic video based on a mixed region of interest according to an embodiment of the present invention.
Detailed Description
To better illustrate the objects, technical features and advantages of the embodiments of the present invention, various embodiments of the present invention are further described below with reference to the accompanying drawings. It is to be understood that the drawings are for purposes of illustrating exemplary embodiments of the invention and that other drawings may be devised by those skilled in the art without departing from the principles of the invention.
According to an embodiment of the present invention, there is provided a panoramic stereoscopic video system having multi-camera video photographing, data processing, stereoscopic video encoding, transmission, and 3D display functions. The panoramic stereo video system adopts real-time multi-view video shooting, image correction and preprocessing, and stereo video compression based on a region of interest (ROI). After the transmission and decoding process, left and right fields of view are displayed using Head Mounted Display (HMD) headphones.
1. Overview of the System
Fig. 1 is an exemplary schematic diagram of a panoramic stereoscopic video system according to an embodiment of the present invention. The panoramic stereo video system shoots a 3D panoramic video by adopting a camera array, and displays the shot 3D panoramic video on a 3D television or a head-mounted virtual reality display device. As shown in fig. 1, the panoramic stereoscopic video system includes a data acquisition unit 200, a data processing unit 300, and a data display unit 400. The data acquisition unit 200 includes a plurality of cameras within a camera array 210 and a camera calibration unit 220. The data processing unit 300 includes a data preprocessing unit 310 and an advanced stereoscopic video transcoding unit 320. The data display unit 400 includes a decoding unit 410 and a display headset 420.
2. Data acquisition unit
As shown in fig. 1, the data acquisition unit 200 includes a plurality of cameras in a camera array 210, and a camera calibration unit 220 that calibrates the camera array 210.
2.1 Camera array
Fig. 2 is an exemplary schematic diagram of a camera array in a panoramic stereoscopic video system according to an embodiment of the invention.
As shown in fig. 2, the camera array 210 has 16 high-definition cameras c 1-c 16 mounted on a mounting frame of a regular octagon having a pair of cameras mounted on each side of the octagon. The two cameras on each side, e.g., c1 and c2, have parallel optical axes and are spaced a distance d from each other. Raw video data collected by the camera array 210 is sent over a cable to a computer for further processing. The camera parameters are listed in table 1 below.
Figure BDA0001726038870000061
TABLE 1
It should be noted that although the camera array is shown as a regular octagon in fig. 2, the camera array may be provided in other shapes in other embodiments of the present invention. In particular, in one embodiment of the invention, each camera is mounted on a rigid frame such that the relative position between the plurality of cameras is substantially constant. In a further embodiment of the invention, the cameras are arranged substantially in the same plane, for example on each side of a polygon.
2.2 Camera calibration
In order to stitch together the images taken by the cameras and generate a 3D effect, it is necessary to obtain both the internal and external parameters of the cameras. The external parameters include rotation and translation between the cameras so that images taken by different cameras can be corrected and aligned in the horizontal direction. In addition, the images captured by the cameras may be distorted, and in order to obtain an undistorted image, it is necessary to know the distortion parameters of the cameras. These parameters may be obtained during calibration of the camera.
2.2.1 internal and distortion parameter calibration
The internal and distortion parameters for each camera can be obtained by various methods, such as the calibration method proposed by Zhengyou Zhang. In addition, tools such as MatLab can be used to obtain such parameters.
2.2.2 external parameter calibration
After obtaining the internal parameters of each camera, rotation and translation between the cameras are obtained by adopting a method based on a motion recovery structure. The method has the following advantages:
high efficiency: the cameras do not need to be calibrated pair by pair. On the contrary, all cameras shoot a scene simultaneously in the calibration process, and the external parameters of all cameras can be obtained simultaneously.
The accuracy is as follows: in the pattern-based calibration method, a pattern needs to be photographed by two adjacent cameras, which often causes the resolution and calibration accuracy of the pattern to be degraded. In the method of the present invention based on a motion recovery structure, the motion of each camera is estimated independently to obtain the above parameters, and adjacent cameras do not need to have overlapping fields of view. Therefore, each camera can be placed at a position closer to the scene to be photographed, thereby achieving higher accuracy.
And (3) expandability: since the adjacent cameras of the method of the invention do not need to overlap fields of view, they can even be adapted to cameras placed in a back-to-back attitude.
2.3 data acquisition method
Data from the 16 cameras are stored by software after being collected and then provided to the data processing unit. Image data of each frame captured by each camera may be collected by software such as FFmpeg and DirectShow (or DShow). The frames captured by each camera are compressed and saved as a video file. Since the number of cameras is large, it is necessary to synchronize frames captured by the cameras using a time stamp or the like. For example, each frame captured by each camera may be placed in a queue after the time stamp is added, so that it is synchronized with other frames having the same time stamp. The synchronized frames are encoded into a video stream and stored locally or transmitted simultaneously over a network.
3. Data processing unit
As shown in fig. 1, the data processing unit 300 includes a data preprocessing unit 310 and an advanced stereoscopic video transcoding unit 320.
Fig. 3 is an exemplary schematic diagram of a data processing unit in a panoramic stereoscopic video system according to an embodiment of the present invention. As shown in fig. 3, the data preprocessing unit 310 includes: a time axis synchronization 311 for synchronizing images captured by the respective cameras; a number of decoders 312 for decoding the original video stream; several modifiers 313 for the original video modification; an encoder 314 for implementing video processing including noise reduction and editing; and the splicing unit is used for splicing the videos to generate a panoramic video. The data preprocessing unit 310 outputs the left-eye video and the right-eye video to the advanced stereoscopic video transcoding unit 320. The advanced stereoscopic video transcoding unit 320 generates a motion map 321 and a texture map 322 of the video. A mixed region of interest (ROI) generation unit 323 identifies a region of interest in the video from the motion map 321 and texture map 322. A bit allocation unit 324 allocates bits according to the identified region of interest, and an HEVC coding unit 325 codes the video. H.265 packer 326 packs the encoded video for transmission.
Fig. 4 is an exemplary flowchart of a panoramic stereo video stitching method according to an embodiment of the present invention.
3.1 distortion correction and preprocessing
And curling the frames shot by each camera according to the distortion parameters obtained in the calibration process so as to obtain distortion-free frames. To improve image alignment and stitching accuracy, each frame needs to be filtered first to reduce noise.
3.2 image alignment
And carrying out image alignment on each pair of cameras arranged on each side of the octagon, and aligning the images shot by each pair of cameras in the horizontal direction. According to one embodiment of the present invention, each frame captured by each pair of cameras is warped to a plane parallel to the optical axes of the pair of cameras.
4. Panoramic video stitching
The camera array has 8 pairs of cameras. After all the frames shot by the left cameras are projected onto the cylinder, the frames are spliced into a panoramic image. By repeating the above steps on all frames shot by each left camera, a panoramic video can be obtained. By processing the frames taken by each right-side camera in the same manner, another panoramic video can be obtained. The two panoramic videos form a panoramic stereo video.
5. Data display unit
As shown in fig. 1, the data display unit 400 includes a decoding unit 410 and a display headset 420. After passing through the codec system, the panoramic stereo video is played over a display headset 420, which may be a wearable Virtual Reality (VR) device, such as one provided by the Oculus VR company. And rendering the panoramic stereo video on a left-eye display and a right-eye display of the Oculus device respectively. The display area of the panoramic stereo video can be adjusted according to the movement of the detection device so as to simulate the change of the visual angle in virtual reality.
Fig. 5 is an exemplary flowchart of a panoramic stereoscopic video display method according to an embodiment of the present invention. As shown in fig. 5, in step 501, the encoded video stream is first decoded into YUV. In step 502, position calculations and field of view selection are performed based on the Oculus sensor data. In step 503, the left-eye and right-eye images are rendered separately. In step 504, the rendered image is displayed on the Oculus display headphones.
6. Stereo video compression
In the stereoscopic panoramic video system, the video processing module splices the left and right super-resolution videos, but huge video data becomes a difficult problem of video compression and transmission. According to an embodiment of the present invention, there is provided a region-of-interest-based video coding scheme that employs a visual attention-based bit allocation method. Among these, spatial, temporal, and stereo cues are specifically taken into account in video attention prediction. Spatial and temporal contrast features are extracted directly from the video encoding process without introducing additional computations. In addition, the visual saliency accuracy of the stereoscopic system is improved by adopting sub-pixel parallax intensity estimation. The repeated use of sub-pixel samples and the block-based matching ensure that the algorithm of the present invention can implement real-time detection with good performance. In general, this scheme does not affect the perceived quality of the end user while greatly improving the video compression rate.
6.1 region of interest detection
6.1.1 extraction of spatial features by Intra prediction
In the HEVC coding standard, intra prediction (or spatial prediction) is used to encode blocks that need to be compressed independently of previously encoded frames, and the spatial correlation at the pixel level is obtained from neighboring samples of previously encoded and reconstructed blocks. After this, the prediction samples are subtracted from the original pixel values to obtain residual blocks. This residual, obtained from intra prediction, contains texture contrast information and is used to generate a spatial saliency map.
In HEVC video coding, spatial prediction includes 33 directional prediction modes (only 8 such modes in h.264) for Prediction Unit (PU) selection, a DC prediction mode (ensemble averaging) and a planar (surface-fitting) prediction mode. Fig. 6 is an exemplary diagram of HEVC spatial prediction mode according to an embodiment of the present invention. All 35 prediction modes are shown in fig. 6. The size of the HEVC prediction unit is selected from 64 × 64 to 8 × 8, and all 35 modes can achieve the best block partition and the best residual. To reduce complexity, in one embodiment, the block-based residual mapping reuses the results of performing DC mode prediction on fixed 8 × 8 blocks. The residual of block k is calculated as follows:
Figure BDA0001726038870000101
wherein, CijAnd RijThe (i, j) th element of the current original block C and reconstructed block R. Then, a texture saliency value S of each block can be calculated according to the residual error of the blockTAnd normalized to [0, 1 ]]The range is as follows:
Figure BDA0001726038870000102
wherein N is the number of partitions in a frame. Since the spatial residual detection of 8 × 8 partitions is performed by the HEVC intra prediction method, no additional computation needs to be introduced.
In other embodiments: each frame may be partitioned into different sized non-overlapping blocks of 64 x 64 or 16 x 16 pixels, etc.; and may compute texture saliency maps from results of other video coding methods similar or comparable to intra prediction methods; and may be compressed according to other coding standards such as h.264/AVC or AVS. Preferably, the intra-prediction or other video processing is based on same-size partitions partitioned from the frame.
6.1.2 extracting temporal features by motion estimation
A fast moving object can be of visual interest. However, since a video sequence is captured by a camera in motion, there is global motion therein. Therefore, the local motion saliency needs to be measured by estimating the Motion Vector Difference (MVD) with HEVC inter-prediction motion estimation.
Motion estimation techniques in most video coding standards are generally based on block matching, where the motion vectors are represented by a 2D translational pattern and each block is matched to all candidate locations within a predetermined search area. Since motion vectors of neighboring blocks are typically highly correlated, in the motion vector prediction technique employed in HEVC, the motion vector of the current block is predicted from the motion vectors of nearby coded blocks.
Fig. 7 is an exemplary diagram of block-based motion estimation using Motion Vector (MV) prediction according to an embodiment of the present invention. As shown in fig. 7, for a current block 711 in a current frame 710, a vector mv is predicted from motion vectors of neighboring blocks pred712 and the corresponding block 721 is matched to all candidate locations within the predetermined search area 725. Finally, the best vector mv best723 and prediction vector mv pred721 are coded and transmitted.
In one embodiment, the motion vector difference generated by 8 x 8 block motion estimation is used. Wherein the size of the motion vector difference may be defined as:
MVDk=||mvbest(k)-mvpred(k)|| (3)
then, a motion saliency map S may be calculated by normalizing the motion vector difference within the same frameM
Figure BDA0001726038870000111
The motion saliency map may be computed from the results of motion estimation, which is the main process of HEVC video coding. Therefore, the method can extract the motion feature without introducing any additional processing. Fig. 8 is an exemplary diagram of a motion intensity map obtained by motion estimation according to an embodiment of the present invention.
In other embodiments, each frame may be partitioned into different sized non-overlapping blocks of 64 × 64 or 16 × 16 pixels, etc.; and may compute a motion saliency map from the results of other video coding methods similar or comparable to inter-prediction motion estimation; but also according to other coding standards such as h.264/AVC or AVS. Preferably, the motion estimation or other video processing is based on equally sized blocks segmented from the frame.
6.1.3 disparity estimation by disparity prediction
We also use stereo vision in the saliency analysis to further provide depth cues, and this stereo vision plays an important role in stereo panoramic video. Wherein, a block-based disparity estimation method is introduced in the disparity mapping process.
Fig. 9 is a schematic diagram illustrating block-based disparity estimation for stereo video encoding according to an embodiment of the present invention. As shown in fig. 9, in the high resolution video system, both the left field of view 910 and the right field of view 920 of the stereoscopic image are well corrected. Each field of view is divided into non-overlapping tiles of size 8 x 8 pixels, and all pixels within a tile are assumed to have the same disparity. Thus, it is expected that the left field of view block matching the right field of view block can be found on the same scan line and the disparity 922 becomes a one-dimensional vector (the vertical component equals zero). The disparity matching scheme is similar to motion estimation in inter prediction. Specifically, the search area 925 is limited to a range within ± 32 in the horizontal direction. The initial search position is set to the position of the corresponding block 921 of the right field of view 920, and the Sum of Absolute Differences (SAD) is used as a matching criterion.
To achieve better prediction accuracy, we take into account non-integer value disparity as well, and use the HEVC 7/8 tap filter to interpolate sub-pixel intensities. Since sub-pixel sample interpolation is one of the most complex operations, the sub-pixel disparity search of the present invention directly uses the 1/4 pixel samples generated by HEVC sub-pixel motion estimation. By reusing the 7-tap interpolation of HEVC, the computational complexity can be reduced significantly. And, according to the block disparity value dkGenerating a block-by-block disparity map:
Figure BDA0001726038870000121
in other embodiments, each frame may be partitioned into different sized non-overlapping blocks of 64 × 64 or 16 × 16 pixels, etc.; and may compute a disparity map from the results of other video coding methods similar or comparable to motion estimation; but also according to other coding standards such as h.264/AVC or AVS. Preferably, the motion estimation process is based on equally sized blocks segmented from the frame.
6.1.4 hybrid region of interest determination
In one embodiment, the spatial-temporal features are compared with the disparity features, i.e., the texture contrast S in equation (2)TMotion contrast S in equation (4)MAnd the parallax intensity in the formula (5) are combined to detect the region of interest. Although each feature has its own advantages and disadvantages, the best results are achieved by combining all of the features. First, each feature map is normalized to [0, 1 ]]The range of (1). Secondly, by mixing SM,STAnd SDSuperimposed to form a mixed stereo saliency map S:
S(bi)=λTSTMSMDSD(6)
wherein λ isT,λMAnd λDAre weighting parameters.
Fig. 10 is an exemplary schematic diagram of a stereoscopic video compression system based on a mixed region of interest according to an embodiment of the present invention. As shown in fig. 10, the stereo video compression system has a spatial prediction module 1001, a temporal prediction module 1102 and a disparity prediction module 1103. The results generated by the spatial prediction module 1001, the temporal prediction module 1102 and the disparity prediction module 1103 are input to the mixed region of interest generation module 1004, and after the salient region is identified, corresponding bit allocation is performed. The transform and quantization module 1105 performs quantization according to the bit allocation determined by the hybrid region of interest generation module 1004, and the entropy coding module 1106 operates to encode each frame to generate a compressed frame 1006.
6.2 region of interest based stereoscopic video coding
One of the concepts of region-of-interest based compression is that the bit allocation is done in favor of the salient regions. The hybrid region of interest detection method can generate high quality, high accuracy saliency maps. In addition, in order to improve the video compression performance, the high-level video standard HEVC with high compression efficiency is also selected.
Since the region of interest detection of the present invention is based on 8 x 8 blocking, the size of the estimated saliency map needs to be reduced to match the size of the current transform unit, which may be selected as 32 x 32, 16 x 16 and 8 x 8. The new QP value may be calculated as follows:
Q′=max(Q-ψ·(S-ES),0) (8)
wherein,
Figure BDA0001726038870000132
for the original QP value selected by the x265 encoder,
Figure BDA0001726038870000131
is a significance map for the size reduction of the current frame.
In this way, the QP value for coding units containing a significant region is reduced, while the QP value for coding units not containing a significant region is increased. The parameter ψ can be selected by the user and controls the bit rate distribution between salient and non-salient regions: the higher the value of ψ, the more bits of the saliency area.
Fig. 11 is an exemplary flowchart of a method for compressing stereoscopic video based on a mixed region of interest according to an embodiment of the present invention. As shown in fig. 11, the compression method includes the steps of:
step 1101: a texture saliency value for a first partition within a left view frame is determined by intra prediction. Preferably, the texture saliency value is determined from an output of DC mode intra prediction for High Efficiency Video Coding (HEVC).
Step 1102: determining a motion saliency value of the first partition by motion estimation. Preferably, the motion saliency value is determined from an output of a motion estimation of High Efficiency Video Coding (HEVC). Step 1103: determining a disparity between the first partition and a corresponding second partition within a right view frame. Preferably, the left-view frame and the right-view frame are first corrected in a first direction, and then a parallax search is performed in a second direction perpendicular to the first direction.
Step 1104: and determining a quantization parameter according to the parallax, the texture significance value and the motion significance value. Preferably, a mixed stereo saliency value is determined by superimposing the disparity, texture saliency values and motion saliency values with weighting parameters.
Step 1105: and quantizing the first block according to the quantization parameter. Wherein if the size of the block is different from the size of the current transform unit, the size of the mixed stereoscopic saliency map is reduced to match the size of the current transform unit, and a new quantization parameter is calculated.
According to an embodiment of the present invention, a region-of-interest based video coding scheme is employed that employs a visual attention based bit allocation method. Among these, spatial, temporal, and stereo cues are specifically taken into account in video attention prediction. The spatial and temporal contrast features are extracted directly from the video encoding process without introducing additional computations. In addition, sub-pixel disparity intensity estimation is also used to improve visual saliency accuracy. Thus, the perception quality of the end user is not affected while the stereo video is efficiently compressed.
The various modules, units and components described above may be implemented as: an Application Specific Integrated Circuit (ASIC); an electronic circuit; a combinational logic circuit; a Field Programmable Gate Array (FPGA); a processor (shared, dedicated, or group) that executes code; or other suitable hardware components that provide the functionality described above. The processor may be a microprocessor from Intel corporation or a mainframe computer from IBM corporation.
It should be noted that one or more of the above functions may be implemented by software or firmware stored in a memory and executed by a processor, or stored in a program memory and executed by a processor. Further, the software or firmware can be stored and/or transmitted within any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a "computer-readable medium" can be any medium that can contain or store the program for use by or in connection with the instruction execution system, apparatus, or device. The computer readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, a portable computer diskette (magnetic), a Random Access Memory (RAM) (magnetic), a read-only memory (ROM) (magnetic), an erasable programmable read-only memory (EPROM) (magnetic), a portable optical disk such as a CD, CD-R, CD-RW, DVD-R, or DVD-RW, or flash memory cards, secured digital cards, USB memory devices, memory sticks, and the like.
The various embodiments of the invention described above are merely preferred embodiments and are not intended to limit the scope of the invention, which encompasses any modification, equivalents, and improvements, without departing from the spirit and principles of the invention.

Claims (18)

1. A method of compressing stereoscopic video including left-view frames and right-view frames, the method comprising:
determining, by intra prediction, a texture saliency value of a first partition within the left view frame;
determining a motion significance value of the first partition by motion estimation of inter prediction;
determining a disparity saliency value between the first partition and a respective second partition within the right view frame;
determining a quantization parameter according to the disparity saliency value, the texture saliency value, and the motion saliency value; and
quantizing the first partition according to the quantization parameter;
wherein a mixed stereo saliency value for the first partition is determined by a weighted summation of the disparity saliency value, the texture saliency value, and the motion saliency value, and the quantization parameter Q' is determined by:
Q′=max(Q-ψ·(S-ES),0)
where Q is the original quantization parameter value, S is the mixed stereo saliency value, and ψ is a parameter for controlling the bit rate distribution between a saliency region and a non-saliency region.
2. The method of claim 1, further comprising:
determining the texture saliency value from an output of DC mode intra prediction for high efficiency video coding.
3. The method of claim 1, further comprising:
determining a motion saliency value for the first partition based on an output of a motion estimation for high efficiency video coding.
4. The method of claim 1, wherein the left view frame is partitioned into a plurality of non-overlapping partitions, and the motion estimation is based on a same size as the first partition.
5. The method of claim 4, further comprising:
and quantizing the first block according to the quantization parameter.
6. The method of claim 4, further comprising:
determining a hybrid stereo saliency map of the left view frame;
reducing the size of the hybrid stereo saliency map to match the size of a transform unit;
determining a second quantization parameter for the transform unit; and
and quantizing the transformation unit according to the second quantization parameter.
7. The method of claim 1, wherein the left-view frame and the right-view frame are modified in a first direction, and further comprising:
searching for the disparity saliency value in a second direction perpendicular to the first direction.
8. The method of claim 7, wherein the disparity saliency value comprises a non-integer value.
9. The method of claim 8, further comprising:
the disparity saliency value is determined from 1/4 pixel samples generated by sub-pixel motion estimation for high efficiency video coding.
10. A non-transitory computer readable medium having stored thereon computer executable instructions that when executed by a processor perform the following method of compressing stereoscopic video comprising left view frames and right view frames, the method comprising:
determining, by intra prediction, a texture saliency value of a first partition within the left view frame;
determining a motion significance value of the first block through motion estimation;
determining a disparity saliency value between the first partition and a respective second partition within the right view frame;
determining a quantization parameter according to the parallax significance value, the texture significance value and the motion significance value; and
quantizing the first partition according to the quantization parameter;
wherein a mixed stereo saliency value for the first partition is determined by a weighted summation of the disparity saliency value, the texture saliency value, and the motion saliency value, and the quantization parameter Q' is determined by:
Q′=max(Q-ψ·(S-ES),0)
where Q is the original quantization parameter value, S is the mixed stereo saliency value, and ψ is a parameter for controlling the bit rate distribution between a saliency region and a non-saliency region.
11. The computer-readable medium of claim 10, wherein the method further comprises:
determining the texture saliency value from an output of DC mode intra prediction for high efficiency video coding.
12. The computer-readable medium of claim 10, wherein the method further comprises:
determining a motion saliency value for the first partition based on an output of a motion estimation for high efficiency video coding.
13. The computer-readable medium of claim 10, wherein the left view frame is partitioned into a plurality of non-overlapping partitions, and the motion estimation is based on a same size as the first partition.
14. The computer-readable medium of claim 13, wherein the method further comprises:
and quantizing the first block according to the quantization parameter.
15. The computer-readable medium of claim 13, wherein the method further comprises:
determining a hybrid stereo saliency map of the left view frame;
reducing the size of the hybrid stereo saliency map to match the size of a transform unit;
determining a second quantization parameter for the transform unit; and
and quantizing the transformation unit according to the second quantization parameter.
16. The computer-readable medium of claim 10, wherein the left-view frame and the right-view frame are modified in a first direction, and the method further comprises:
searching for the disparity saliency value in a second direction perpendicular to the first direction.
17. The computer-readable medium of claim 16, wherein the disparity saliency value comprises a non-integer value.
18. The computer-readable medium of claim 17, wherein the method further comprises:
the disparity saliency value is determined from 1/4 pixel samples generated by sub-pixel motion estimation for high efficiency video coding.
CN201680078558.9A 2016-01-12 2016-01-18 Compression method and device of panoramic stereo video system Active CN108886598B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
CNPCT/CN2016/070712 2016-01-12
PCT/CN2016/070712 WO2017120776A1 (en) 2016-01-12 2016-01-12 Calibration method and apparatus for panoramic stereo video system
CNPCT/CN2016/070823 2016-01-13
PCT/CN2016/070823 WO2017120802A1 (en) 2016-01-12 2016-01-13 Stitching method and apparatus for panoramic stereo video system
PCT/CN2016/071238 WO2017120981A1 (en) 2016-01-12 2016-01-18 Compression method and apparatus for panoramic stereo video system

Publications (2)

Publication Number Publication Date
CN108886598A CN108886598A (en) 2018-11-23
CN108886598B true CN108886598B (en) 2020-08-25

Family

ID=59310625

Family Applications (3)

Application Number Title Priority Date Filing Date
CN201680078559.3A Active CN109076200B (en) 2016-01-12 2016-01-12 Method and device for calibrating panoramic stereo video system
CN201680078524.XA Active CN108886611B (en) 2016-01-12 2016-01-13 Splicing method and device of panoramic stereo video system
CN201680078558.9A Active CN108886598B (en) 2016-01-12 2016-01-18 Compression method and device of panoramic stereo video system

Family Applications Before (2)

Application Number Title Priority Date Filing Date
CN201680078559.3A Active CN109076200B (en) 2016-01-12 2016-01-12 Method and device for calibrating panoramic stereo video system
CN201680078524.XA Active CN108886611B (en) 2016-01-12 2016-01-13 Splicing method and device of panoramic stereo video system

Country Status (4)

Country Link
US (3) US10636121B2 (en)
EP (3) EP3403403B1 (en)
CN (3) CN109076200B (en)
WO (3) WO2017120776A1 (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10694102B2 (en) * 2016-07-22 2020-06-23 Immervision, Inc. Method to capture, store, distribute, share, stream and display panoramic image or video
US20200005529A1 (en) * 2017-01-23 2020-01-02 Virtual Diamond Boutique Inc. System, method and computer-accessible medium for displaying a three-dimensional digital version of an object
JP7159057B2 (en) * 2017-02-10 2022-10-24 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Free-viewpoint video generation method and free-viewpoint video generation system
CA3054162A1 (en) * 2017-02-27 2018-08-30 Thomson Licensing Method, system and apparatus for visual effects
CN108886616A (en) * 2017-12-27 2018-11-23 深圳市大疆创新科技有限公司 The method, apparatus and computer system of Video coding
EP3752877A4 (en) * 2018-02-17 2021-11-03 Dreamvu, Inc. System and method for capturing omni-stereo videos using multi-sensors
JP7223978B2 (en) * 2018-05-23 2023-02-17 パナソニックIpマネジメント株式会社 Calibration device and calibration method
US10878276B2 (en) * 2018-06-05 2020-12-29 Hrl Laboratories, Llc Method and system for detecting change of context in video streams
US20200226787A1 (en) * 2019-01-14 2020-07-16 Sony Corporation Information processing apparatus, information processing method, and program
CN109887036A (en) * 2019-01-21 2019-06-14 广州市安晓科技有限责任公司 A kind of automobile looks around the semi-automatic calibration system and method for panorama
CN110796012B (en) * 2019-09-29 2022-12-27 北京达佳互联信息技术有限公司 Image processing method and device, electronic equipment and readable storage medium
CN110749994A (en) * 2019-10-25 2020-02-04 江苏荣策士科技发展有限公司 Penetrating HMD glasses
CN111193920B (en) * 2019-12-31 2020-12-18 重庆特斯联智慧科技股份有限公司 Video picture three-dimensional splicing method and system based on deep learning network
CN111464805B (en) * 2020-03-13 2021-08-10 宁波大学 Three-dimensional panoramic video rapid coding method based on panoramic saliency
CN111935474A (en) * 2020-08-17 2020-11-13 广东申义实业投资有限公司 Spread type light-emitting turntable and image shooting method thereof
US11677979B2 (en) 2020-08-24 2023-06-13 Tencent America LLC Freeview video coding
US11758101B2 (en) * 2020-08-25 2023-09-12 Samsung Electronics Co., Ltd. Restoration of the FOV of images for stereoscopic rendering
US11900561B2 (en) 2020-11-25 2024-02-13 Electronics And Telecommunications Research Institute Deep learning-based image stitching method and apparatus
CN112637601B (en) * 2020-12-14 2023-01-03 天津光电通信技术有限公司 Encoding method and device based on fisheye panoramic video
CN112884844B (en) * 2021-01-13 2023-02-03 深圳市豪恩汽车电子装备股份有限公司 Method and device for calibrating panoramic image system and computer readable storage medium
CN115550516A (en) * 2021-06-30 2022-12-30 长城信息股份有限公司 Marker-based camera array calibration method, image splicing method, system, terminal and readable storage medium
CN113573058B (en) * 2021-09-23 2021-11-30 康达洲际医疗器械有限公司 Interframe image coding method based on space-time significance fusion
CN113965698B (en) * 2021-11-12 2024-03-08 白银银珠电力(集团)有限责任公司 Monitoring image calibration processing method, device and system for fire-fighting Internet of things
KR20230115819A (en) * 2022-01-27 2023-08-03 삼성전자주식회사 Electronic device for generating panoramic image and method for operating thereof
CN115209181B (en) * 2022-06-09 2024-03-22 咪咕视讯科技有限公司 Video synthesis method based on surrounding view angle, controller and storage medium
FR3147652A1 (en) * 2023-04-04 2024-10-11 Psa Automobiles Sa Method and device for calibrating a non-parallel stereoscopic vision system on board a vehicle.
CN118333915B (en) * 2024-06-11 2024-09-13 武汉精立电子技术有限公司 Wide-angle LMD distortion calibration method, correction method, device and equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8228560B2 (en) * 2005-04-13 2012-07-24 Acd Systems International Inc. Image contrast enhancement
CN103020985A (en) * 2012-11-12 2013-04-03 华中科技大学 Video image saliency detection method based on field quantity analysis
CN104247432A (en) * 2011-11-11 2014-12-24 弗兰霍菲尔运输应用研究公司 Efficient multi-view coding using depth-map estimate update
CN104255027A (en) * 2012-03-22 2014-12-31 高通股份有限公司 Inter layer texture prediction for video coding
CN104424642A (en) * 2013-09-09 2015-03-18 华为软件技术有限公司 Detection method and detection system for video salient regions
CN104822058A (en) * 2015-04-14 2015-08-05 宁波大学 Method for extracting saliency map of three-dimensional image
CN105049850A (en) * 2015-03-24 2015-11-11 上海大学 HEVC (High Efficiency Video Coding) code rate control method based on region-of-interest

Family Cites Families (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002065763A2 (en) 2001-02-12 2002-08-22 Carnegie Mellon University System and method for manipulating the point of interest in a sequence of images
US20040001138A1 (en) 2002-06-27 2004-01-01 Weerashinghe W.A. Chaminda P. Stereoscopic panoramic video generation system
US7119816B2 (en) 2003-03-31 2006-10-10 Microsoft Corp. System and method for whiteboard scanning to obtain a high resolution image
EP1679659A1 (en) 2005-01-06 2006-07-12 Thomson Licensing Method and device for selecting quantization parameters in a picture using side information
KR100739730B1 (en) 2005-09-03 2007-07-13 삼성전자주식회사 Apparatus and method for processing 3D dimensional picture
US7697839B2 (en) 2006-06-30 2010-04-13 Microsoft Corporation Parametric calibration for panoramic camera systems
CN104768031B (en) 2009-01-26 2018-02-09 汤姆森特许公司 Device for video decoding
US20110255589A1 (en) 2009-08-03 2011-10-20 Droplet Technology, Inc. Methods of compressing data and methods of assessing the same
EP2481209A1 (en) 2009-09-22 2012-08-01 Tenebraex Corporation Systems and methods for correcting images in a multi-sensor system
US8711204B2 (en) * 2009-11-11 2014-04-29 Disney Enterprises, Inc. Stereoscopic editing for video production, post-production and display adaptation
US10095953B2 (en) * 2009-11-11 2018-10-09 Disney Enterprises, Inc. Depth modification for display applications
US9445072B2 (en) * 2009-11-11 2016-09-13 Disney Enterprises, Inc. Synthesizing views based on image domain warping
US20110235706A1 (en) 2010-03-25 2011-09-29 Texas Instruments Incorporated Region of interest (roi) video encoding
US20120154518A1 (en) * 2010-12-17 2012-06-21 Microsoft Corporation System for capturing panoramic stereoscopic video
CN102055982B (en) 2011-01-13 2012-06-27 浙江大学 Coding and decoding methods and devices for three-dimensional video
CN102592124B (en) * 2011-01-13 2013-11-27 汉王科技股份有限公司 Geometrical correction method, device and binocular stereoscopic vision system of text image
US20120236934A1 (en) 2011-03-18 2012-09-20 Qualcomm Incorporated Signaling of multiview video plus depth content with a block-level 4-component structure
US8581961B2 (en) 2011-03-31 2013-11-12 Vangogh Imaging, Inc. Stereoscopic panoramic video capture system using surface identification and distance registration technique
WO2012136388A1 (en) 2011-04-08 2012-10-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Capturing panoramic or semi-panoramic 3d scenes
CN102243432A (en) 2011-06-28 2011-11-16 浙江工业大学 Panoramic three-dimensional photographing device
US9153025B2 (en) 2011-08-19 2015-10-06 Adobe Systems Incorporated Plane detection and tracking for structure from motion
KR101638173B1 (en) * 2011-09-06 2016-07-12 한국전자통신연구원 Method and apparatus for providing automated detection of calibration
US8928729B2 (en) 2011-09-09 2015-01-06 Disney Enterprises, Inc. Systems and methods for converting video
RU2580439C2 (en) * 2011-11-01 2016-04-10 Конинклейке Филипс Н.В. Based on significance of disparity
US20130176390A1 (en) 2012-01-06 2013-07-11 Qualcomm Incorporated Multi-hypothesis disparity vector construction in 3d video coding with depth
CN104169965B (en) * 2012-04-02 2018-07-03 英特尔公司 For system, the method and computer program product adjusted during the operation of anamorphose parameter in more filming apparatus systems
US8942422B2 (en) * 2012-04-06 2015-01-27 Adobe Systems Incorporated Nonlinear self-calibration for structure from motion (SFM) techniques
CN103516995A (en) 2012-06-19 2014-01-15 中南大学 A real time panorama video splicing method based on ORB characteristics and an apparatus
WO2014020529A1 (en) * 2012-08-02 2014-02-06 Earthmine, Inc. Three-dimensional plane panorama creation through hough-based line detection
US9025880B2 (en) * 2012-08-29 2015-05-05 Disney Enterprises, Inc. Visual saliency estimation for images and video
US20140098185A1 (en) 2012-10-09 2014-04-10 Shahram Davari Interactive user selected video/audio views by real time stitching and selective delivery of multiple video/audio sources
JP6107081B2 (en) * 2012-11-21 2017-04-05 富士通株式会社 Image processing apparatus, image processing method, and program
TWI602152B (en) * 2013-02-06 2017-10-11 聚晶半導體股份有限公司 Image capturing device nd image processing method thereof
CN103108187B (en) 2013-02-25 2016-09-28 清华大学 The coded method of a kind of 3 D video, coding/decoding method, encoder
CN103179405B (en) 2013-03-26 2016-02-24 天津大学 A kind of multi-view point video encoding method based on multi-level region-of-interest
US20140300691A1 (en) * 2013-04-04 2014-10-09 Panasonic Corporation Imaging system
US9398215B2 (en) 2013-04-16 2016-07-19 Eth Zurich Stereoscopic panoramas
US9542773B2 (en) * 2013-05-23 2017-01-10 Google Inc. Systems and methods for generating three-dimensional models using sensed position data
US9667990B2 (en) 2013-05-31 2017-05-30 Qualcomm Incorporated Parallel derived disparity vector for 3D video coding with neighbor-based disparity vector derivation
US9509979B2 (en) * 2013-11-26 2016-11-29 Mobileye Vision Technologies Ltd. Stereo auto-calibration from structure-from-motion
CA2933704A1 (en) 2013-12-13 2015-06-18 8702209 Canada Inc. Systems and methods for producing panoramic and stereoscopic videos
KR101537174B1 (en) * 2013-12-17 2015-07-15 가톨릭대학교 산학협력단 Method for extracting salient object from stereoscopic video
US9552061B2 (en) * 2014-03-26 2017-01-24 Microsoft Technology Licensing, Llc Eye gaze tracking using binocular fixation constraints
US10027948B2 (en) * 2014-05-20 2018-07-17 Nextvr Inc. Methods and apparatus including or for use with one or more cameras
US9911454B2 (en) 2014-05-29 2018-03-06 Jaunt Inc. Camera array including camera modules
JP6308449B2 (en) * 2014-06-26 2018-04-11 ホアウェイ・テクノロジーズ・カンパニー・リミテッド Method and apparatus for reducing computational load in high efficiency video coding
US9973694B1 (en) * 2014-08-21 2018-05-15 Jaunt Inc. Image stitching to form a three dimensional panoramic image
US9870617B2 (en) * 2014-09-19 2018-01-16 Brain Corporation Apparatus and methods for saliency detection based on color occurrence analysis
US10547825B2 (en) 2014-09-22 2020-01-28 Samsung Electronics Company, Ltd. Transmission of three-dimensional video
CN104580933A (en) 2015-02-09 2015-04-29 上海安威士科技股份有限公司 Multi-scale real-time monitoring video stitching device based on feature points and multi-scale real-time monitoring video stitching method
US9877016B2 (en) * 2015-05-27 2018-01-23 Google Llc Omnistereo capture and render of panoramic virtual reality content
WO2017092007A1 (en) * 2015-12-03 2017-06-08 SZ DJI Technology Co., Ltd. System and method for video processing
WO2017098090A1 (en) * 2015-12-12 2017-06-15 Aalto University Foundation Calibration technique for capturing panoramic images
US20190019299A1 (en) * 2016-01-03 2019-01-17 Humaneyes Technologies Ltd. Adaptive stitching of frames in the process of creating a panoramic frame
WO2017188714A1 (en) * 2016-04-26 2017-11-02 엘지전자 주식회사 Method for transmitting 360-degree video, method for receiving 360-degree video, apparatus for transmitting 360-degree video, apparatus for receiving 360-degree video
US10447993B2 (en) 2016-09-27 2019-10-15 Laduma, Inc. Stereoscopic 360 degree digital camera systems
CN109587203A (en) * 2017-09-29 2019-04-05 索尼公司 Information processing equipment and method, electronic device and computer-readable medium
US10902556B2 (en) * 2018-07-16 2021-01-26 Nvidia Corporation Compensating for disparity variation when viewing captured multi video image streams

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8228560B2 (en) * 2005-04-13 2012-07-24 Acd Systems International Inc. Image contrast enhancement
CN104247432A (en) * 2011-11-11 2014-12-24 弗兰霍菲尔运输应用研究公司 Efficient multi-view coding using depth-map estimate update
CN104255027A (en) * 2012-03-22 2014-12-31 高通股份有限公司 Inter layer texture prediction for video coding
CN103020985A (en) * 2012-11-12 2013-04-03 华中科技大学 Video image saliency detection method based on field quantity analysis
CN104424642A (en) * 2013-09-09 2015-03-18 华为软件技术有限公司 Detection method and detection system for video salient regions
CN105049850A (en) * 2015-03-24 2015-11-11 上海大学 HEVC (High Efficiency Video Coding) code rate control method based on region-of-interest
CN104822058A (en) * 2015-04-14 2015-08-05 宁波大学 Method for extracting saliency map of three-dimensional image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Saliency Prediction on Stereoscopic Videos;KIM,Haksub et al.;《IEEE TRANSCAATIONS ON IMAGE PROCESSING》;20140430;第1476-1490页 *

Also Published As

Publication number Publication date
EP3403401A1 (en) 2018-11-21
US10489886B2 (en) 2019-11-26
EP3403403A4 (en) 2019-08-21
US10636121B2 (en) 2020-04-28
US10643305B2 (en) 2020-05-05
CN109076200A (en) 2018-12-21
EP3403400B1 (en) 2022-04-20
CN108886611B (en) 2021-07-09
WO2017120981A1 (en) 2017-07-20
CN108886598A (en) 2018-11-23
WO2017120802A1 (en) 2017-07-20
EP3403403A1 (en) 2018-11-21
WO2017120776A1 (en) 2017-07-20
CN109076200B (en) 2021-04-23
EP3403403B1 (en) 2023-06-07
US20190028693A1 (en) 2019-01-24
EP3403401B1 (en) 2021-10-13
EP3403400A4 (en) 2019-10-09
EP3403400A1 (en) 2018-11-21
US20190035055A1 (en) 2019-01-31
CN108886611A (en) 2018-11-23
US20190028707A1 (en) 2019-01-24
EP3403401A4 (en) 2019-06-12

Similar Documents

Publication Publication Date Title
CN108886598B (en) Compression method and device of panoramic stereo video system
EP2491722B1 (en) Depth map generation techniques for conversion of 2d video data to 3d video data
JP7036599B2 (en) A method of synthesizing a light field with compressed omnidirectional parallax using depth information
US8488870B2 (en) Multi-resolution, multi-window disparity estimation in 3D video processing
US9525858B2 (en) Depth or disparity map upscaling
JP5970609B2 (en) Method and apparatus for unified disparity vector derivation in 3D video coding
EP3100454B1 (en) Method for low-latency illumination compensation process
US20140063188A1 (en) Apparatus, a Method and a Computer Program for Image Processing
WO2011014419A1 (en) Methods, systems, and computer-readable storage media for creating three-dimensional (3d) images of a scene
US9451233B2 (en) Methods and arrangements for 3D scene representation
Ma et al. Low complexity adaptive view synthesis optimization in HEVC based 3D video coding
JP6307152B2 (en) Image encoding apparatus and method, image decoding apparatus and method, and program thereof
JP6039178B2 (en) Image encoding apparatus, image decoding apparatus, method and program thereof
Pourazad et al. Generating the depth map from the motion information of H. 264-encoded 2D video sequence
Farid et al. Panorama view with spatiotemporal occlusion compensation for 3D video coding
Daribo et al. Arbitrarily shaped sub-block motion prediction in texture map compression using depth information
WO2016078162A1 (en) Hevc-based 3d video fast coding method
JP2015019326A (en) Encoding device, encoding method, decoding device, and decoding method
Li et al. Fast frame-rate up-conversion of depth video via video coding
Farid et al. A panoramic 3D video coding with directional depth aided inpainting
Pang et al. Multiview video coding using projective rectification-based view extrapolation and synthesis bias correction
JP6232117B2 (en) Image encoding method, image decoding method, and recording medium
Zhang et al. A super resolution reconstruction scheme for mixed spatio-temporal stereo video
Shimizu et al. Backward view synthesis prediction using virtual depth map for multiview video plus depth map coding
Tizon et al. Multi-view acquisition and advanced depth map processing techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant