WO2017125639A1 - Stereoscopic video encoding - Google Patents

Stereoscopic video encoding Download PDF

Info

Publication number
WO2017125639A1
WO2017125639A1 PCT/FI2016/050024 FI2016050024W WO2017125639A1 WO 2017125639 A1 WO2017125639 A1 WO 2017125639A1 FI 2016050024 W FI2016050024 W FI 2016050024W WO 2017125639 A1 WO2017125639 A1 WO 2017125639A1
Authority
WO
WIPO (PCT)
Prior art keywords
view image
image
overlapping region
pixel
disparity
Prior art date
Application number
PCT/FI2016/050024
Other languages
French (fr)
Inventor
Payman Aflaki Beni
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Priority to PCT/FI2016/050024 priority Critical patent/WO2017125639A1/en
Publication of WO2017125639A1 publication Critical patent/WO2017125639A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/115Selection of the code volume for a coding unit prior to coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/254Image signal generators using stereoscopic image cameras in combination with electromagnetic radiation sources for illuminating objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0081Depth or disparity estimation from stereoscopic image signals

Definitions

  • the present invention relates to video encoding, and in particular, but not exclusively to stereoscopic video encoding.
  • the Human Visual System then forms a 3D view of the scene when the images corresponding to the left and right perspectives are each presented to the respective left and right eyes.
  • this technology can have the drawback that the viewing area, such as film screen or television, only occupies part of the field of vision, and thus the experience of 3D view can be somewhat limited.
  • devices occupying a larger viewing area of the total field of view can be used such as a stereo viewing mask or goggles which is worn on the head in order that the viewing arc of the eyes is covered.
  • Each eye is then presented with the respective image via an individual small screen and lens arrangement.
  • Such technologies have the additional advantage that they can be used in a small space, and even on the move, compared to fairly large TV sets commonly used for 3D viewing.
  • a method comprising; capturing a first image view and a second view image by a camera arrangement; capturing ranging information for the first view image and the second view image; determining a disparity map image for the first view image from the ranging information for the first view image and the second view image; partitioning the disparity map image for the first view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; mapping the non-overlapping region and overlapping region to the first view image; and encoding the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image
  • the partitioning of the disparity map image for the first view image into the non- overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image may be determined based on a disparity value and a position index of at least one pixel of the disparity map image.
  • the partitioning of the disparity map image for the first view image into the non- overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image may comprises; determining, for the non-overlapping region, that at least one pixel in a row of the disparity map image has an equivalent position index value less than a position index for the at least one pixel in the row of the disparity map image, wherein the equivalent position index is based on a disparity value of the at least one pixel at a location given by the positon index in the row of the disparity map image; and determining, for the overlapping region, at least one further pixel in the row of the disparity map image that has a further equivalent position index value greater than a further position index in the row of the disparity map image for the at least one further pixel, wherein the further equivalent position index is based on a disparity value of the at least one further pixel at a location given by the further positon index in the row of the disparity map image.
  • the determining for the non-overlapping region and the determining for the overlapping region may be performed on a row by row basis of the disparity map image.
  • the determining for the non-overlapping region and the determining for the overlapping region may be performed for at least a two row by at least two row basis, wherein the disparity value is the mean or median of the disparity value of the at least one pixel of a first row of the at least two rows and of a disparity value of the at least one pixel of a second row of the at least two rows.
  • the determining for the non-overlapping region and the determining for the overlapping region may be performed on a block of pixel by a block of pixel basis, wherein the at least one pixel and the at least one further pixel of the disparity map image is at least one block of pixels and at least one block of further pixels respectively, and wherein the disparity value of the at least one pixel and the disparity value of the at least one further pixel is respectively a disparity value of a mean or median of disparity values of pixels within the at least one block of pixels and the disparity value of a mean or median of disparity values of pixels within the at least one further block of pixels.
  • the method may further comprise: determining a disparity map image for the second view image from the ranging information for the first view image and the second view image; partitioning the disparity map image for the second view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; mapping the non-overlapping region of the disparity map image for the second view image and the overlapping region of the disparity map image for the second view image to the second view image; and encoding the non-overlapping region of the second view image at a lower coding rate than the overlapping region of the second view image.
  • the first view image may be a left view image
  • the second view image may be a right view image.
  • the non-overlapping region may be a two dimensional region, and wherein the overlapping region may be a three dimensional region.
  • the camera arrangement may be a stereoscopic camera arrangement.
  • an apparatus comprising: at least one processor; and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to: capture a first image view and a second view image by a camera arrangement; capture ranging information for the first view image and the second view image; determine a disparity map image for the first view image from the ranging information for the first view image and the second view image; partition the disparity map image for the first view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; map the non-overlapping region and overlapping region to the first view image; and encode the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image.
  • the apparatus may be caused to partition the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image based on a disparity value and a position index of at least one pixel of the disparity map image.
  • the apparatus caused to partition of the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image may be further caused to; determine, for the non-overlapping region, that at least one pixel in a row of the disparity map image has an equivalent position index value less than a position index for the at least one pixel in the row of the disparity map image, wherein the equivalent position index is based on a disparity value of the at least one pixel at a location given by the positon index in the row of the disparity map image; and determine, for the overlapping region, at least one further pixel in the row of the disparity map image that has a further equivalent position index value greater than a further position index in the row of the disparity map image for the at least one further pixel, wherein the further equivalent position index is based on a disparity value of the at least one further pixel at a location given by the further positon index in the row of the disparity map
  • the equivalent position index may be given by a function which translates a disparity value of a pixel located at a position index in a row of the disparity map image to a position index based value.
  • the apparatus may be caused to perform the determining for the non-overlapping region and the overlapping region, on a row by row basis of the disparity map image.
  • the apparatus may be caused to perform the determining for the non-overlapping region and the overlapping region for at least a two row by at least two row basis, wherein the disparity value is the mean or median of the disparity value of the at least one pixel of a first row of the at least two rows and of a disparity value of the at least one pixel of a second row of the at least two rows.
  • the apparatus may be caused to perform the determining for the non-overlapping region and the determining for the overlapping region is performed on a block of pixel by a block of pixel basis, wherein the at least one pixel and the at least one further pixel of the disparity map image is at least one block of pixels and at least one block of further pixels respectively, and wherein the disparity value of the at least one pixel and the disparity value of the at least one further pixel is respectively a disparity value of a mean or median of disparity values of pixels within the at least one block of pixels and the disparity value of a mean or median of disparity values of pixels within the at least one further block of pixels.
  • the apparatus may be further caused to: determine a disparity map image for the second view image from the ranging information for the first view image and the second view image; partition the disparity map image for the second view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; map the non-overlapping region of the disparity map image for the second view image and the overlapping region of the disparity map image for the second view image to the second view image; and encode the non-overlapping region of the second view image at a lower coding rate than the overlapping region of the second view image.
  • the first view image may be a left view image
  • the second view image may be a right view image.
  • the non-overlapping region may be a two dimensional region, and wherein the overlapping region may be a three dimensional region.
  • the camera arrangement may be a stereoscopic camera arrangement.
  • an apparatus configured to capture a first image view and a second view image by a camera arrangement; capture ranging information for the first view image and the second view image; determine a disparity map image for the first view image from the ranging information for the first view image and the second view image; partition the disparity map image for the first view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; map the non-overlapping region and overlapping region to the first view image; and encode the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image.
  • the apparatus may be configured to partition the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image based on a disparity value and a position index of at least one pixel of the disparity map image.
  • the apparatus configured to partition of the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image may be further configured to; determine, for the non-overlapping region, that at least one pixel in a row of the disparity map image has an equivalent position index value less than a position index for the at least one pixel in the row of the disparity map image, wherein the equivalent position index is based on a disparity value of the at least one pixel at a location given by the positon index in the row of the disparity map image; and determine, for the overlapping region, at least one further pixel in the row of the disparity map image that has a further equivalent position index value greater than a further position index in the row of the disparity map image for the at least one further pixel, wherein the further equivalent position index is based on a disparity value of the at least one further pixel at a location given by the further positon index in the row of the disparity map
  • the equivalent position index is given by a function which translates a disparity value of a pixel located at a position index in a row of the disparity map image to a position index based value.
  • the apparatus may be configured to perform the determining for the non- overlapping region and the overlapping region, on a row by row basis of the disparity map image.
  • the apparatus may be configured to perform the determining for the non- overlapping region and the overlapping region for at least a two row by at least two row basis, wherein the disparity value is the mean or median of the disparity value of the at least one pixel of a first row of the at least two rows and of a disparity value of the at least one pixel of a second row of the at least two rows.
  • the apparatus may be configured to perform the determining for the non- overlapping region and the determining for the overlapping region is performed on a block of pixel by a block of pixel basis, wherein the at least one pixel and the at least one further pixel of the disparity map image is at least one block of pixels and at least one block of further pixels respectively, and wherein the disparity value of the at least one pixel and the disparity value of the at least one further pixel is respectively a disparity value of a mean or median of disparity values of pixels within the at least one block of pixels and the disparity value of a mean or median of disparity values of pixels within the at least one further block of pixels.
  • the apparatus may be further caused to determine a disparity map image for the second view image from the ranging information for the first view image and the second view image; partition the disparity map image for the second view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; map the non-overlapping region of the disparity map image for the second view image and the overlapping region of the disparity map image for the second view image to the second view image; and encode the non-overlapping region of the second view image at a lower coding rate than the overlapping region of the second view image.
  • the first view image may be a left view image
  • the second view image may be a right view image.
  • the non-overlapping region may be a two dimensional region, and wherein the overlapping region may be a three dimensional region.
  • the camera arrangement may be a stereoscopic camera arrangement.
  • a computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by a least one processor, causing performance of at least: capturing a first image view and a second view image by a camera arrangement; capturing ranging information for the first view image and the second view image; determining a disparity map image for the first view image from the ranging information for the first view image and the second view image; partitioning the disparity map image for the first view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; mapping the non-overlapping region and overlapping region to the first view image; and encoding the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image.
  • Figures 1 a, 1 b, 1 c and 1 d shows schematically an arrangement for capturing and displaying a stereo image to a user
  • Figure 2a shows a system and apparatus for stereo viewing
  • Figure 2b shows a stereo camera device for stereo viewing
  • Figure 2c shows a head-mounted display for stereo viewing
  • Figure 2d illustrates a camera arrangement for capturing stereoscopic video
  • Figure 2e illustrates a further camera arrangement for capturing stereoscopic video
  • Figure 3 shows schematically an electronic device in which embodiments can be deployed
  • Figure 4 shows schematically the partition of a stereoscopic image into 2D and 3D regions
  • Figure 5 shows a flow diagram illustrating the process of encoding a stereoscopic image according to embodiments
  • Figure 6 shows a flow diagram illustrating the process of determining the 2D/3D boundary for a stereoscopic image according to embodiments.
  • Figure 7 shows schematically the partition of a stereoscopic image into 2D and 3D regions according to embodiments in which there are objects located in the foreground at the boundary between 2D and 3D regions.
  • the following describes in more detail how the disparity between the left and right image, especially at the peripheries of the scene, can be exploited in order to improve the encoding efficiency of stereoscopic imaging and video systems.
  • Figures 1 a, 1 b, 1 c and 1 d shows a schematic block diagram of an exemplary configuration for forming a stereo image to user.
  • Figure 1 a a situation is shown where a human being is viewing two spheres A1 and A2 using both eyes E1 and E2.
  • the sphere A1 is closer to the viewer than the sphere A2, the respective distances to the first eye E1 being I_EI ,AI and I_EI ,A2-
  • the different objects reside in space at their respective (x,y,z) coordinates, defined by the coordinate system SZ, SY and SZ.
  • the distance d i2 between the eyes of a human being may be approximately 62-64 mm on average, and varying from person to person between 55 and 74 mm. This distance is referred to as the parallax, on which stereoscopic view of the human vision is based on.
  • the viewing directions (optical axes) DIR1 and DIR2 are typically essentially parallel, possibly having a small deviation from being parallel, and define the field of view for the eyes.
  • Figure 1 b there is a setup shown, where the eyes have been replaced by cameras C1 and C2, positioned at the location where the eyes were in Figure 1 a.
  • the distances and directions of the setup are otherwise the same.
  • the purpose of the setup of Figure 1 b is to be able to take a stereo image of the spheres A1 and A2.
  • the two images resulting from image capture are Fci and Fc2.
  • the "left eye” image Fci shows the image SA2 of the sphere A2 partly visible on the left side of the image SAI of the sphere A1 .
  • the "right eye” image Fc2 shows the image SA2 of the sphere A2 partly visible on the right side of the image SAI of the sphere A1 .
  • This difference between the right and left images is called disparity, and this disparity, being the basic mechanism with which the human visual system determines depth information and creates a 3D view of the scene, can be used to create an illusion of a 3D image.
  • the camera pair C1 and C2 has a natural parallax, that is, it has the property of creating natural disparity in the two images of the cameras. Natural disparity may be understood to be created even though the distance between the two cameras forming the stereo camera pair is somewhat smaller or larger than the normal distance (parallax) between the human eyes, e.g. essentially between 40 mm and 100 mm or even 30 mm and 120 mm.
  • FIG. 1 c the creating of this 3D illusion is shown.
  • the images Fci and Fc2 captured by the cameras C1 and C2 are displayed to the eyes E1 and E2, using displays D1 and D2, respectively.
  • the disparity between the images is processed by the human visual system so that an understanding of depth is created. That is, when the left eye sees the image SA2 of the sphere A2 on the left side of the image SAI of sphere A1 , and respectively the right eye sees the image of A2 on the right side, the human visual system creates an understanding that there is a sphere V2 behind the sphere V1 in a three-dimensional world.
  • the images Fci and Fc2 can also be synthetic, that is, created by a computer. If they carry the disparity information, synthetic images will also be seen as three- dimensional by the human visual system. That is, a pair of computer-generated images can be formed so that they can be used as a stereo image.
  • Figure 1d illustrates how the principle of displaying stereo images to the eyes can be used to create 3D movies or virtual reality scenes having an illusion of being three-dimensional.
  • the images Fxi and Fx2 are either captured with a stereo camera or computed from a model so that the images have the appropriate disparity.
  • a large number e.g. 30
  • the human visual system will create a cognition of a moving, three-dimensional image.
  • Figure 2a shows a system and apparatuses for stereo viewing, that is, for 3D video and 3D audio digital capture and playback.
  • the task of the system is that of capturing sufficient visual and auditory information from a specific location such that a convincing reproduction of the experience, or presence, of being in that location can be achieved by one or more viewers physically located in different locations and optionally at a time later in the future.
  • Such reproduction requires more information than can be captured by a single camera or microphone, in order that a viewer can determine the distance and location of objects within the scene using their eyes and their ears.
  • two camera sources are used to create a pair of images with disparity.
  • the human auditory system In a similar manned, for the human auditory system to be able to sense the direction of sound, at least two microphones are used (the commonly known stereo sound is created by recording two audio channels). The human auditory system can detect the cues e.g. in timing difference of the audio signals to detect the direction of sound.
  • the system of Figure 2a may consist of three main parts: image sources, a server and a rendering device.
  • a video capture device SRC1 comprises multiple (for example, 8) cameras CAM1 , CAM2, CAMN with overlapping field of view so that regions of the view around the video capture device is captured from at least two cameras.
  • the device SRC1 may comprise multiple microphones to capture the timing and phase differences of audio originating from different directions.
  • the device may comprise a high resolution orientation sensor so that the orientation (direction of view) of the plurality of cameras can be detected and recorded.
  • the device SRC1 comprises or is functionally connected to a computer processor PROC1 and memory MEM1 , the memory comprising computer program PROGR1 code for controlling the capture device.
  • the image stream captured by the device may be stored on a memory device MEM2 for use in another device, e.g. a viewer, and/or transmitted to a server using a communication interface COMM1 . It needs to be understood that although an 8-camera-cubical setup is described here as part of the system, another camera device including different number of cameras and/or different location adjustment of cameras may be used instead as part of the system.
  • one or more sources SRC2 of synthetic images may be present in the system.
  • Such sources of synthetic images may use a computer model of a virtual world to compute the various image streams it transmits.
  • the source SRC2 may compute N video streams corresponding to N virtual cameras located at a virtual viewing position.
  • the viewer may see a three-dimensional virtual world, as explained earlier for Figure 1 d.
  • the device SRC2 comprises or is functionally connected to a computer processor PROC2 and memory MEM2, the memory comprising computer program PROGR2 code for controlling the synthetic source device SRC2.
  • the image stream captured by the device may be stored on a memory device MEM5 (e.g. memory card CARD1 ) for use in another device, e.g. a viewer, or transmitted to a server or the viewer using a communication interface COMM2.
  • a memory device MEM5 e.g. memory card CARD1
  • a server SERV or a plurality of servers storing the output from the capture device SRC1 or computation device SRC2.
  • the device comprises or is functionally connected to a computer processor PROC3 and memory MEM3, the memory comprising computer program PROGR3 code for controlling the server.
  • the server may be connected by a wired or wireless network connection, or both, to sources SRC1 and/or SRC2, as well as the viewer devices VIEWER1 and VIEWER2 over the communication interface COMM3.
  • sources SRC1 and/or SRC2 sources SRC1 and/or SRC2
  • the viewer devices VIEWER1 and VIEWER2 over the communication interface COMM3.
  • the devices may have a rendering module and a display module, or these functionalities may be combined in a single device.
  • the devices may comprise or be functionally connected to a computer processor PROC4 and memory MEM4, the memory comprising computer program PROGR4 code for controlling the viewing devices.
  • the viewer (playback) devices may consist of a data stream receiver for receiving a video data stream from a server and for decoding the video data stream. The data stream may be received over a network connection through communications interface COMM4, or from a memory device MEM6 like a memory card CARD2.
  • the viewer devices may have a graphics processing unit for processing of the data to a suitable format for viewing as described with Figures 1 c and 1 d.
  • the viewer VIEWER1 comprises a high- resolution stereo-image head-mounted display for viewing the rendered stereo video sequence.
  • the head-mounted device may have an orientation sensor DET1 and stereo audio headphones.
  • the viewer VIEWER2 comprises a display enabled with 3D technology (for displaying stereo video), and the rendering device may have a head-orientation detector DET2 connected to it.
  • Any of the devices (SRC1 , SRC2, SERVER, RENDERER, VIEWER1 , VIEWER2) may be a computer or a portable computing device, or be connected to such.
  • Such rendering devices may have computer program code for carrying out methods according to various examples described in this text.
  • Figure 2b shows a camera device for adjustable stereo viewing.
  • the camera device comprises three or more cameras that are configured into camera pairs for creating the left and right eye images, or that can be arranged to such pairs.
  • the distance between cameras may correspond to the usual distance between the human eyes.
  • the cameras may be arranged so that they have significant overlap in their field-of- view. For example, wide-angle lenses of 180 degrees or more may be used, and there may be 3, 4, 5, 6, 7, 8, 9, 10, 12, 16 or 20 cameras.
  • the cameras may be regularly or irregularly spaced across the whole sphere of view, or they may cover only part of the whole sphere. For example, there may be three cameras arranged in a triangle and having a different directions of view towards one side of the triangle such that all three cameras cover an overlap area in the middle of the directions of view..
  • Figure 2b three stereo camera pairs are shown.
  • Figure 2e shows a further camera device for adjustable stereo viewing in which there are 8 cameras having wide-angle lenses and arranged regularly at the corners of a virtual cube and covering the whole sphere such that the whole or essentially whole sphere is covered at all directions by at least 3 or 4 cameras.
  • Camera devices with other types of camera layouts may be used.
  • a camera device with all the cameras in one hemisphere may be used.
  • the number of cameras may be e.g. 3, 4, 6, 8, 12, or more.
  • the cameras may be placed to create a central field of view where stereo images can be formed from image data of two or more cameras, and a peripheral (extreme) field of view where one camera covers the scene and only a normal non-stereo image can be formed. Examples of different camera devices that may be used in the system are described also later in this description.
  • Figure 2c shows a head-mounted display for stereo viewing.
  • the head-mounted display contains two screen sections or two screens DISP1 and DISP2 for displaying the left and right eye images.
  • the displays are close to the eyes, and therefore lenses are used to make the images easily viewable and for spreading the images to cover as much as possible of the eyes' field of view.
  • the device is attached to the head of the user so that it stays in place even when the user turns his head.
  • the device may have an orientation detecting module ORDET1 for determining the head movements and direction of the head. It is to be noted here that in this type of a device, tracking the head movement may be done, but since the displays cover a large area of the field of view, eye movement detection is not necessary.
  • the head orientation may be related to real, physical orientation of the user's head, and it may be tracked by a sensor for determining the real orientation of the user's head.
  • head orientation may be related to virtual orientation of the user's view direction, controlled by a computer program or by a computer input device such as a joystick. That is, the user may be able to change the determined head orientation with an input device, or a computer program may change the view direction (e.g. in gaming, the game program may control the determined head orientation instead or in addition to the real head orientation.
  • Figure 2d illustrates a camera CAM1 .
  • the camera has a camera detector CAMDET1 , comprising a plurality of sensor elements for sensing intensity of the light hitting the sensor element.
  • the camera has a lens OBJ1 (or a lens arrangement of a plurality of lenses), the lens being positioned so that the light hitting the sensor elements travels through the lens to the sensor elements.
  • the camera detector CAMDET1 has a nominal center point CP1 that is a middle point of the plurality sensor elements, for example for a rectangular sensor the crossing point of the diagonals.
  • the lens has a nominal center point PP1 , as well, lying for example on the axis of symmetry of the lens.
  • the direction of orientation of the camera is defined by the line passing through the center point CP1 of the camera sensor and the center point PP1 of the lens.
  • the direction of the camera is a vector along this line pointing in the direction from the camera sensor to the lens.
  • the optical axis of the camera is understood to be this line CP1 -PP1 .
  • Time-synchronized video, audio and orientation data is first recorded with the capture device. This can consist of multiple concurrent video and audio streams as described above. These are then transmitted immediately or later to the storage and processing network for processing and conversion into a format suitable for subsequent delivery to playback devices. The conversion can involve post-processing steps to the audio and video data in order to improve the quality and/or reduce the quantity of the data while preserving the quality at a desired level.
  • each playback device receives a stream of the data from the network, and renders it into a stereo viewing reproduction of the original location which can be experienced by a user with the head mounted display and headphones.
  • embodiments of the invention can be embodied as an algorithm on an electronic device 10 such as that exemplary depicted in Figure 3.
  • the electronic device 10 can be configured to execute an algorithm for the provision of exploiting the disparity between the left and right images in order to improve the encoding efficiency of stereoscopic imaging systems such as those depicted in Figures 1 b to 1 d. Furthermore said electronic device 10 may be a component of a stereoscopic imaging systems such as the systems depicted by of Figures 1 b to 1 d and Figures 2a to 2d.
  • the electronic device or apparatus 10 in some embodiments comprises a processor 21 which to a memory 22.
  • the processor 21 can in some embodiments be configured to execute various program codes.
  • the implemented program codes in some embodiments comprise a multichannel or stereo encoding or decoding code as described herein.
  • the implemented program codes 23 can in some embodiments be stored for example in the memory 22 for retrieval by the processor 21 whenever needed.
  • the memory 22 could further provide a section 24 for storing data, for example data that has been encoded in accordance with the application.
  • the processor is further linked to an I/O (Input/Output) port 1 1 for receiving and transmitting digital data.
  • the port 1 1 may be arranged to connect to the digital port of a camera or camera module.
  • the processor 21 is further linked to a transceiver (RX/TX) 13, to a user interface (Ul) 15.
  • the apparatus 10 may additionally include an integrated camera module comprising a camera having a lens for focusing an image on to a digital image capture means such as a charged coupled device (CCD).
  • a digital image capture means such as a charged coupled device (CCD).
  • the digital image capture means may be any suitable image capturing device such as complementary metal oxide semiconductor (CMOS) image sensor.
  • CMOS complementary metal oxide semiconductor
  • the user interface 15 enables a user to input commands to the electronic device 10, for example via a keypad, and/or to obtain information from the electronic device 10, for example via a display.
  • a touch screen may provide both input and output functions for the user interface.
  • the apparatus 10 in some embodiments comprises a transceiver 13 suitable for enabling communication with other apparatus, for example via a wireless communication network.
  • the encoding code in embodiments can be implemented in hardware and/or firmware.
  • Figure 4 shows schematically the concept for embodiments as described herein, in which there is depicted different areas of a stereoscopic image or video frame formed by combining a left and right image as taken by a left camera and right camera arranged as a horizontal parallax pair such as that arrangement depicted in Figure 1 b. It should be noted that similar approach is applicable on vertical parallax pairs too. It can be seen that the combined image comprises three regions. The region 401 depicts the stereo (or 3D) part of the image where the right image and left image making up the combined image have a common view of the scene.
  • the region 401 are the result of combining sections of the left and right image which share a common view of a scene, in which each image is taken from the perspective of the left and right eye respectively.
  • the viewer's HVS can combine the left and right images such that the image perceived by the viewer is a three dimensional image creating the depth perception.
  • a viewer's HVS will naturally cause the combined 3D region 401 to be perceived as the dominant region of the image, thereby drawing the majority of the viewer's cognitive attention.
  • the regions 402 and 403 which depict the regions of the combined image which do not overlap with a common scene.
  • the region 402 in Figure 4 can represent the region of the combined image taken by a left camera which does not overlap with the region of the combined image taken by the right camera. Thus, region 402 does not have a representation in the right view.
  • the region 403 in Figure 4 can represent the region of the combined image taken by the right camera which does not overlap with the region of the combined image as taken by the left camera. Thus, region 403 does not have a representation in the left view.
  • the regions 402 and 403 of the combined image will be perceived by the viewer's HVS as two dimensional regions as each of these regions do not have a corresponding view between the left and right images. Furthermore, as a direct consequence these areas may tend to be found at the periphery of the combined image, as depicted in Figure 4.
  • the regions 402 and 403 may be perceived subconsciously by the viewer as contributing less to the overall viewing experience than overlapping regions such as 401 . This is attributed to the fact that there is no depth perception achieved by these regions as only one image representing those areas is available to the viewer's HVS.
  • a stereoscopic image comprising the combination of a right and left image may be more efficiently encoded. For instance more coding bandwidth (or bits) may be expended over the overlapping region (depicted as 401 in Figure 4) than the peripheral or non-overlapping regions (depicted as 402 and 403 in Figure 4) without any noticeable loss in depth perception or perceived quality in the combined image.
  • Figure 5 shows the operation of an encoder deploying such a strategy which may be implement on an electronic device such as that depicted as 10 in Figure 3
  • the device or entity arranged to perform the encoding of the stereoscopic image may be configured to receive the left and right images from a camera pair such as the camera arrangement depicted in Figure 1 b as C1 and C2.
  • the step of receiving the left and right images from a camera pair arranged to capture a stereoscopic image is shown as processing step 501 .
  • the depth values for each pixel may be determined.
  • a texture map may also be produced for each left and right image view.
  • the depth value for each pixel location may be obtained by obtaining range data using the time-of-flight (TOF) principle for example by using a camera which may be provided with a light source, for example an infrared emitter, for illuminating the scene.
  • TOF time-of-flight
  • Such an illuminator may be arranged to produce an intensity modulated electromagnetic emission for a frequency between e.g. 10-100 MHz, which may require LEDs or laser diodes to be used.
  • Infrared light may be used to make the illumination unobtrusive.
  • the light reflected from objects in the scene is detected by an image sensor, which may be modulated synchronously at the same frequency as the illuminator.
  • the image sensor may be provided with optics; a lens gathering the reflected light and an optical band pass filter for passing only the light with the same wavelength as the illuminator, thus helping to suppress background light.
  • the image sensor may measure for each pixel the time the light has taken to travel from the illuminator to the object and back.
  • the distance to the object may be represented as a phase shift in the illumination modulation, which can be determined from the sampled data simultaneously for each pixel in the scene.
  • the range data or depth values may be obtained using a structured light approach which may operate for example approximately as follows.
  • a light emitter such as an infrared laser emitter or an infrared LED emitter, may emit light that may have a certain direction in a 3D space (e.g. follow a raster-scan or a pseudo-random scanning order) and/or position within an array of light emitters as well as a certain pattern, e.g. a certain wavelength and/or amplitude pattern.
  • the emitted light is reflected back from objects and may be captured using a sensor, such as an infrared image sensor.
  • the image/signals obtained by the sensor may be processed in relation to the direction of the emitted light as well as the pattern of the emitted light to detect a correspondence between the received signal and the direction/position of the emitted lighted as well as the pattern of the emitted light for example using a triangulation principle. From this correspondence a distance and a position of a pixel may be concluded.
  • depth sensing methods it is possible to estimate the depth values only taking into account the available images, using stereo matching algorithms.
  • stereo matching algorithms are usually based on cost aggregation methods, where input 3D cost volume is calculated from pixel dissimilarity measured between given left and right images considering a number of depth hypothesized.
  • L and R are left and right images
  • the whole image can be swept in order to compute the cost volume, where depth hypothesizes are sampled according to the desired rule.
  • the step of obtaining range data and determining the depth maps for each of the left and right image views is shown as processing step 503 in Figure 5.
  • the device may then be arranged to measure the disparity between the left and right views of the received stereoscopic image pair in order to classify the areas which form the 2D and 3D regions.
  • a measure of the disparity between the left and right views with respect to the left view image can be determined by using the depth value for each pixel of the left view image and then applying the following equation to each pixel of the left view image to give a disparity map image for the left view image.
  • D L f x l x ( x (— — ] +— ]
  • D L the disparity value for a pixel of the left view image
  • / is the focal length of the capturing camera
  • N is the number of bits representing the depth values
  • znear and Z far are the respective distances of the closest and the furthest objects in the scene to the camera respectively.
  • a measure of the disparity between the right and left views with respect to the right view image can be determined by using the depth value for pixel of the right view image and then applying the following equation to each pixel of the right view image to give a disparity map image for the right view image.
  • D R is the disparity value for a pixel of the right view image
  • d R is the depth value for a pixel of the right view image
  • N ,Z near ,Z far , f and / are as above.
  • depth estimation and sensing methods are provided as non-limiting examples and embodiments may be realized with the described or any other depth estimation and sensing methods and apparatuses.
  • the step of determining the disparity map image for the left view from the depth value of each pixel of the left image and the disparity map image for the right view from the depth value of each pixel of the right image is shown as processing step 505 in Figure 5.
  • each disparity map image can be partitioned into two regions, where one region of the disparity map image corresponds to the two dimensional (2D) image portion which covers the parts only presented in current view and the other region which when combined with the other parallax image forms the three dimensional (3D) image portion.
  • the disparity image view for the left view image can be partitioned as first region classified as 2D image and a second region classified as a 3D region.
  • the above regions may be visualised as regions 402 and 401 respectively.
  • the disparity map image may be partitioned into a 2D region and a 3D region along a boundary contour comprising pixels running from the top to the bottom of the disparity map image.
  • This may be viewed as forming a 2D band or stripe from the perspective of the left or right view image since the 2D region is relatively narrow region compared to the corresponding 3D region.
  • the 2D band or stripe can be classified as the pixels which lie to the left of the boundary contour the corresponding left view disparity map image.
  • the 2D band or stripe can comprise the pixels to the right of the boundary contour in the corresponding right view disparity map image.
  • FIG. 4 An example of this particular format of 2D and 3D region segregation is shown in Figure 4, where it can be seen that the demarcation between the two regions is a column of pixels, and that the corresponding 2D regions in the right and left view images each form a band or stripe on either the left or right extremity of the view image.
  • the demarcation or pixel boundary contour between the 2D and 3D regions may be determined by taking into account the disparity value at the sides of the image. This may be realized by positioning the pixel boundary such that its position from the edge of the respective view disparity map image may be linearly proportional to the disparity values of the pixels in a prospective pixel boundary for a particular row. The position of the pixel boundary from an edge of a respective disparity map image may be selected on an iterative basis row by row.
  • Figure 6 shows an example process by which a pixel boundary pixel position can be determined from a respective view disparity map image.
  • a pixel from the first row may be selected for a column within the vicinity of the left side edge of the left view image as the initial starting point.
  • an initial pixel may be selected from the first row and first column of the left view disparity map image, i.e. the top left hand pixel position.
  • embodiments may select other pixel positions as an initial starting point.
  • processing step 601 The step of selecting the initial starting pixel boundary position for a row of the left view disparity map image is shown as processing step 601 in Figure 6.
  • the disparity value D of the actual pixel location L for a row of the left view disparity map image may then be used to determine a measure which translates (or maps) the disparity value D into an equivalent pixel location value EPL
  • EPL may be viewed as an equivalent pixel location value for the disparity value D.
  • EPL a x D + b where a and b are constants.
  • the step of reading the disparity D at the location of the selected pixel of the left view disparity map image is shown as the processing step 603 in Figure 6.
  • the step of determining the equivalent pixel location value EPL for the disparity value D is shown as the processing step 605 in Figure 6.
  • the value EPL may then be compared against the actual location value L of the selected pixel of the left disparity map image.
  • EPL is determined to be less than the location value L of the current selected pixel then the next pixel along the row towards the centre of the image can be selected as a potential boundary pixel. The steps of determining EPL for the next pixel and testing the value of EPL against the next pixel location value L can then be repeated.
  • EPL is determined to be equal to or greater than the location value L of the current selected pixel
  • the feedback loop can be terminated and the current pixel is selected to be part of the boundary contour between the 2D and 3D regions. This step is represented in Figure 6 as the processing step 608.
  • the steps of Figure 6 may be repeated for the next row and all subsequent rows of the left view disparity map image.
  • the processing steps of Figure 6 can be performed for the right view disparity map image in order to obtain the boundary contour with respect to the right view image.
  • the effect of the processing steps in Figure 6 can be that the partition between the 2D and 3D regions can be portrayed as a band or stripe as shown by Figure 4. This can be due to the fact the objects in the right and left image views are within the same depth range.
  • Figure 7 depicts the effect of the processing steps in Figure 6 in which the boundary between the 2D and 3D regions follow the contour of objects located at the extremities of each of the left and right view images. This case is due to there being foreground objects located at the extremities of the respective image views.
  • each iteration of the steps 601 to 608 of Figure 6 can be applied for N rows (N>1 ) at the same time.
  • the value L is as defined and used in the previous embodiment.
  • the disparity value D at each column location L is a function of the N disparity values associated with the N rows.
  • the function of D can return a single disparity value D associated with all the selected N rows.
  • the calculated boundary pixel contour width as provided by step 608 will be applied to all selected N rows.
  • the function by which the single disparity value D can be given by one of the mean, median, maximum, minimum, or a weighted average of the available N disparity values associated with the N rows.
  • the steps in Figure 6 can be applied over square blocks of pixels where the size of each side of the square block may be given as N, with N being greater than 1 .
  • all the processing steps of Figure 6 can be applied on a block by block basis in order to find the boundary contour between the 2D and 3D regions of the respective image view.
  • the location value L is similar to previous embodiment except calculation and increasing of column location L. This value can be calculated as a function of block size (N) and the left side location of the block Lstart (starting location of the block). L can be calculated based on each of the following functions:
  • Disparity value D may then be calculated and used in a similar manner as the previous embodiment.
  • the calculated boundary pixel contour width in step 608 will then be applied on all rows of the selected block.
  • a single disparity ⁇ D all ) value is calculated for each image as a whole. This value can be calculated by using one of mean, median, max, min, or a weighted average over all disparity values of the image map. Following this, one fixed value of EPL will be calculated as follows:
  • This value will be used as the boundary pixel contour width for all of the rows in the image and hence, replaced the row by row steps illustrated in Figure 6. It is to be further understood that other embodiments may perform the above processing steps for either the left view disparity map image or the right view disparity map image. In these embodiments the position of the boundary pixel column can be found for either the left view or right view, and applied to both views.
  • the overall step of processing the left view disparity image map or right view disparity image map in order to locate the boundary between the 2D and 3D regions for each respective left and right image views is shown as processing step 507 in Figure 5.
  • the result of processing step 507 can be a boundary on the left view image separating the 2D and 3D regions, and a boundary on the right view image separating the 2D and 3D regions.
  • the output of the processing step 507 can be the boundary between the 2D and 3D regions for both the left and/or right view images.
  • the 2D/3D boundary from the left view disparity image map can then be mapped onto the left view image and the 2D/3D boundary from the right view disparity image map can be mapped onto the right view image. This step is shown as processing step 509 in Figure 5.
  • a different encoding regime can be applied to the 2D and the 3D regions in each of the left and right images. This is primarily as a consequence of the 2D region being viewed at the periphery of the combined image with the result that this region is not perceived in 3D by the viewer's HVS.
  • the 2D region can be encoded at a greater coding efficiency than the 3D region, or in other words the 2D region for the left and right image views can be encoded with less bits per pixel than the corresponding 3D region.
  • the encoding between the 2D and 3D regions of both the left and right view images may differ due to the 2D region incurring the extra step of low pass filtering. This has the effect of reducing the high frequency components in the 2D region, thereby reducing the relative number of bits required on a per pixel basis to encode the region when compared to encoding the 3D region with the same encoding scheme.
  • the encoding between the 2D and 3D regions may differ by using a courser block transform coefficient quantization step for the 2D region, on the basis that a hybrid coding mechanism is used for both regions.
  • the pixel sample values for the 2D and 3D regions in the left and right image views can be quantized directly.
  • the encoding efficiency may be achieved by quantizing 2D region pixel values at a different step size to 3D region pixel values.
  • some embodiments may cater for a smooth transition of encoding between the 2D and 3D regions.
  • the pixel blocks or pixel values around the boundary between the 2D and 3D regions may be gradually encoded with a higher rate as the pixels or coding blocks to be encoded approach the boundary.
  • This approach may be implemented by varying the particular encoding parameter which effects the coding rate.
  • the step size may be gradually reduced as the position of the pixel to be quantized approaches the boundary from the 2D to 3D region, and vice versa.
  • processing step 51 1 The step of encoding the 2D and 3D regions each at different encoding rates for the left view image and right view image is depicted as processing step 51 1 in Figure 5.
  • User equipment may comprise a stereoscopic video capture and recording device or module such as those described in embodiments of the application above.
  • user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers.
  • elements of a public land mobile network may also comprise elements of a stereoscopic video capture and recording device as described above.
  • the various embodiments of the application may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • While various aspects of the application may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
  • any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
  • the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
  • Embodiments of the application may be practiced in various components such as integrated circuit modules.
  • the design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
  • Programs can automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
  • the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
  • the term 'circuitry' refers to all of the following:
  • circuits and software and/or firmware
  • combinations of circuits and software such as: (i) to a combination of processor(s) or (ii) to portions of processors )/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions and
  • circuits such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
  • circuitry' applies to all uses of this term in this application, including any claims.
  • the term 'circuitry' would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware.
  • the term 'circuitry' would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or similar integrated circuit in server, a cellular network device, or other network device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

It is disclosed inter alia a method comprising determining a disparity map image for a first view image from a stereo camera arrangement;partitioning the disparity map image for the first view image into a non-overlapping region between the first view image and a second view image and an overlapping region between the first view image and the second view image; mapping the non-overlapping region and overlapping region to the first view image; and encoding the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image.

Description

Stereoscopic video encoding
Field The present invention relates to video encoding, and in particular, but not exclusively to stereoscopic video encoding.
Background Digital or stereoscopic viewing of still and moving images have become more common place with the availability of equipment to view 3D (three dimensional) films. Theatres are offering 3D movies based on viewing the movie with special glasses that ensure the viewing of different images for the left and right eye for each frame of the film. The same approach has been brought to home use with 3D- capable players and television sets. In practice the film consists of two separate views of the same scene, in which each view is taken from a different perspective corresponding to the left eye and right eye. These views are typically created with a camera configured to record the scene as two separate views, with each view taken from slightly different perspectives. The Human Visual System (HVS) then forms a 3D view of the scene when the images corresponding to the left and right perspectives are each presented to the respective left and right eyes. However, this technology can have the drawback that the viewing area, such as film screen or television, only occupies part of the field of vision, and thus the experience of 3D view can be somewhat limited.
For a more realistic experience, devices occupying a larger viewing area of the total field of view can be used such as a stereo viewing mask or goggles which is worn on the head in order that the viewing arc of the eyes is covered. Each eye is then presented with the respective image via an individual small screen and lens arrangement. Such technologies have the additional advantage that they can be used in a small space, and even on the move, compared to fairly large TV sets commonly used for 3D viewing.
There is, therefore, a need for solutions that provide for efficient encoding of digital images and video for the purpose of viewing 3D video.
Summary
There is provided according to the application a method comprising; capturing a first image view and a second view image by a camera arrangement; capturing ranging information for the first view image and the second view image; determining a disparity map image for the first view image from the ranging information for the first view image and the second view image; partitioning the disparity map image for the first view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; mapping the non-overlapping region and overlapping region to the first view image; and encoding the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image The partitioning of the disparity map image for the first view image into the non- overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image may be determined based on a disparity value and a position index of at least one pixel of the disparity map image.
The partitioning of the disparity map image for the first view image into the non- overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image may comprises; determining, for the non-overlapping region, that at least one pixel in a row of the disparity map image has an equivalent position index value less than a position index for the at least one pixel in the row of the disparity map image, wherein the equivalent position index is based on a disparity value of the at least one pixel at a location given by the positon index in the row of the disparity map image; and determining, for the overlapping region, at least one further pixel in the row of the disparity map image that has a further equivalent position index value greater than a further position index in the row of the disparity map image for the at least one further pixel, wherein the further equivalent position index is based on a disparity value of the at least one further pixel at a location given by the further positon index in the row of the disparity map image. The equivalent position index may be given by a function which translates a disparity value of a pixel located at a position index in a row of the disparity map image to a position index based value.
The determining for the non-overlapping region and the determining for the overlapping region may be performed on a row by row basis of the disparity map image.
Alternatively, the determining for the non-overlapping region and the determining for the overlapping region may be performed for at least a two row by at least two row basis, wherein the disparity value is the mean or median of the disparity value of the at least one pixel of a first row of the at least two rows and of a disparity value of the at least one pixel of a second row of the at least two rows.
Alternatively, the determining for the non-overlapping region and the determining for the overlapping region may be performed on a block of pixel by a block of pixel basis, wherein the at least one pixel and the at least one further pixel of the disparity map image is at least one block of pixels and at least one block of further pixels respectively, and wherein the disparity value of the at least one pixel and the disparity value of the at least one further pixel is respectively a disparity value of a mean or median of disparity values of pixels within the at least one block of pixels and the disparity value of a mean or median of disparity values of pixels within the at least one further block of pixels.
The method may further comprise: determining a disparity map image for the second view image from the ranging information for the first view image and the second view image; partitioning the disparity map image for the second view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; mapping the non-overlapping region of the disparity map image for the second view image and the overlapping region of the disparity map image for the second view image to the second view image; and encoding the non-overlapping region of the second view image at a lower coding rate than the overlapping region of the second view image. The first view image may be a left view image, and the second view image may be a right view image.
The non-overlapping region may be a two dimensional region, and wherein the overlapping region may be a three dimensional region.
The camera arrangement may be a stereoscopic camera arrangement.
According to a further aspect there is provided an apparatus comprising: at least one processor; and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to: capture a first image view and a second view image by a camera arrangement; capture ranging information for the first view image and the second view image; determine a disparity map image for the first view image from the ranging information for the first view image and the second view image; partition the disparity map image for the first view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; map the non-overlapping region and overlapping region to the first view image; and encode the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image. The apparatus may be caused to partition the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image based on a disparity value and a position index of at least one pixel of the disparity map image.
The apparatus caused to partition of the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image may be further caused to; determine, for the non-overlapping region, that at least one pixel in a row of the disparity map image has an equivalent position index value less than a position index for the at least one pixel in the row of the disparity map image, wherein the equivalent position index is based on a disparity value of the at least one pixel at a location given by the positon index in the row of the disparity map image; and determine, for the overlapping region, at least one further pixel in the row of the disparity map image that has a further equivalent position index value greater than a further position index in the row of the disparity map image for the at least one further pixel, wherein the further equivalent position index is based on a disparity value of the at least one further pixel at a location given by the further positon index in the row of the disparity map image.
The equivalent position index may be given by a function which translates a disparity value of a pixel located at a position index in a row of the disparity map image to a position index based value. The apparatus may be caused to perform the determining for the non-overlapping region and the overlapping region, on a row by row basis of the disparity map image. The apparatus may be caused to perform the determining for the non-overlapping region and the overlapping region for at least a two row by at least two row basis, wherein the disparity value is the mean or median of the disparity value of the at least one pixel of a first row of the at least two rows and of a disparity value of the at least one pixel of a second row of the at least two rows.
The apparatus may be caused to perform the determining for the non-overlapping region and the determining for the overlapping region is performed on a block of pixel by a block of pixel basis, wherein the at least one pixel and the at least one further pixel of the disparity map image is at least one block of pixels and at least one block of further pixels respectively, and wherein the disparity value of the at least one pixel and the disparity value of the at least one further pixel is respectively a disparity value of a mean or median of disparity values of pixels within the at least one block of pixels and the disparity value of a mean or median of disparity values of pixels within the at least one further block of pixels.
The apparatus may be further caused to: determine a disparity map image for the second view image from the ranging information for the first view image and the second view image; partition the disparity map image for the second view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; map the non-overlapping region of the disparity map image for the second view image and the overlapping region of the disparity map image for the second view image to the second view image; and encode the non-overlapping region of the second view image at a lower coding rate than the overlapping region of the second view image.
The first view image may be a left view image, and the second view image may be a right view image. The non-overlapping region may be a two dimensional region, and wherein the overlapping region may be a three dimensional region.
The camera arrangement may be a stereoscopic camera arrangement.
According to a another aspect there is provided an apparatus configured to capture a first image view and a second view image by a camera arrangement; capture ranging information for the first view image and the second view image; determine a disparity map image for the first view image from the ranging information for the first view image and the second view image; partition the disparity map image for the first view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; map the non-overlapping region and overlapping region to the first view image; and encode the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image.
The apparatus may be configured to partition the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image based on a disparity value and a position index of at least one pixel of the disparity map image.
The apparatus configured to partition of the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image may be further configured to; determine, for the non-overlapping region, that at least one pixel in a row of the disparity map image has an equivalent position index value less than a position index for the at least one pixel in the row of the disparity map image, wherein the equivalent position index is based on a disparity value of the at least one pixel at a location given by the positon index in the row of the disparity map image; and determine, for the overlapping region, at least one further pixel in the row of the disparity map image that has a further equivalent position index value greater than a further position index in the row of the disparity map image for the at least one further pixel, wherein the further equivalent position index is based on a disparity value of the at least one further pixel at a location given by the further positon index in the row of the disparity map image.
The equivalent position index is given by a function which translates a disparity value of a pixel located at a position index in a row of the disparity map image to a position index based value.
The apparatus may be configured to perform the determining for the non- overlapping region and the overlapping region, on a row by row basis of the disparity map image. The apparatus may be configured to perform the determining for the non- overlapping region and the overlapping region for at least a two row by at least two row basis, wherein the disparity value is the mean or median of the disparity value of the at least one pixel of a first row of the at least two rows and of a disparity value of the at least one pixel of a second row of the at least two rows.
The apparatus may be configured to perform the determining for the non- overlapping region and the determining for the overlapping region is performed on a block of pixel by a block of pixel basis, wherein the at least one pixel and the at least one further pixel of the disparity map image is at least one block of pixels and at least one block of further pixels respectively, and wherein the disparity value of the at least one pixel and the disparity value of the at least one further pixel is respectively a disparity value of a mean or median of disparity values of pixels within the at least one block of pixels and the disparity value of a mean or median of disparity values of pixels within the at least one further block of pixels. The apparatus may be further caused to determine a disparity map image for the second view image from the ranging information for the first view image and the second view image; partition the disparity map image for the second view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; map the non-overlapping region of the disparity map image for the second view image and the overlapping region of the disparity map image for the second view image to the second view image; and encode the non-overlapping region of the second view image at a lower coding rate than the overlapping region of the second view image.
The first view image may be a left view image, and the second view image may be a right view image. The non-overlapping region may be a two dimensional region, and wherein the overlapping region may be a three dimensional region.
The camera arrangement may be a stereoscopic camera arrangement. According to a yet further aspect there is provided a computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by a least one processor, causing performance of at least: capturing a first image view and a second view image by a camera arrangement; capturing ranging information for the first view image and the second view image; determining a disparity map image for the first view image from the ranging information for the first view image and the second view image; partitioning the disparity map image for the first view image into a non-overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; mapping the non-overlapping region and overlapping region to the first view image; and encoding the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image. Brief Description of Drawings
For better understanding of the present application and as to how the same may be carried into effect, reference will now be made by way of example to the accompanying drawings in which:
Figures 1 a, 1 b, 1 c and 1 d shows schematically an arrangement for capturing and displaying a stereo image to a user;
Figure 2a shows a system and apparatus for stereo viewing;
Figure 2b shows a stereo camera device for stereo viewing;
Figure 2c shows a head-mounted display for stereo viewing;
Figure 2d illustrates a camera arrangement for capturing stereoscopic video;
Figure 2e illustrates a further camera arrangement for capturing stereoscopic video;
Figure 3 shows schematically an electronic device in which embodiments can be deployed;
Figure 4 shows schematically the partition of a stereoscopic image into 2D and 3D regions;
Figure 5 shows a flow diagram illustrating the process of encoding a stereoscopic image according to embodiments;
Figure 6 shows a flow diagram illustrating the process of determining the 2D/3D boundary for a stereoscopic image according to embodiments; and
Figure 7 shows schematically the partition of a stereoscopic image into 2D and 3D regions according to embodiments in which there are objects located in the foreground at the boundary between 2D and 3D regions.
Description of Some Embodiments With the use of 3D viewing devices such as television screens and stereo viewing masks there exists areas at the periphery of the left and right views which lie outside the 3D image formed by the viewer's HVS. These areas are perceived by the viewer as 2D images on the periphery of the scene due to a left image not having a corresponding region or area in right image and vice versa. Therefore since these peripheral areas do not contribute to the main 3D image and depth perception the encoding of these regions or areas can be exploited in order to improve the overall encoding efficiency for stereoscopic video.
The following describes in more detail how the disparity between the left and right image, especially at the peripheries of the scene, can be exploited in order to improve the encoding efficiency of stereoscopic imaging and video systems.
In this regard reference is first made to Figures 1 a, 1 b, 1 c and 1 d which shows a schematic block diagram of an exemplary configuration for forming a stereo image to user. In Figure 1 a, a situation is shown where a human being is viewing two spheres A1 and A2 using both eyes E1 and E2. The sphere A1 is closer to the viewer than the sphere A2, the respective distances to the first eye E1 being I_EI ,AI and I_EI ,A2- The different objects reside in space at their respective (x,y,z) coordinates, defined by the coordinate system SZ, SY and SZ. The distance d i2 between the eyes of a human being may be approximately 62-64 mm on average, and varying from person to person between 55 and 74 mm. This distance is referred to as the parallax, on which stereoscopic view of the human vision is based on. The viewing directions (optical axes) DIR1 and DIR2 are typically essentially parallel, possibly having a small deviation from being parallel, and define the field of view for the eyes.
In Figure 1 b, there is a setup shown, where the eyes have been replaced by cameras C1 and C2, positioned at the location where the eyes were in Figure 1 a. The distances and directions of the setup are otherwise the same. Naturally, the purpose of the setup of Figure 1 b is to be able to take a stereo image of the spheres A1 and A2. The two images resulting from image capture are Fci and Fc2. The "left eye" image Fci shows the image SA2 of the sphere A2 partly visible on the left side of the image SAI of the sphere A1 . The "right eye" image Fc2 shows the image SA2 of the sphere A2 partly visible on the right side of the image SAI of the sphere A1 . This difference between the right and left images is called disparity, and this disparity, being the basic mechanism with which the human visual system determines depth information and creates a 3D view of the scene, can be used to create an illusion of a 3D image.
In this setup of Figure 1 b, where the inter-eye distances correspond to those of the eyes in Figure 1 a, the camera pair C1 and C2 has a natural parallax, that is, it has the property of creating natural disparity in the two images of the cameras. Natural disparity may be understood to be created even though the distance between the two cameras forming the stereo camera pair is somewhat smaller or larger than the normal distance (parallax) between the human eyes, e.g. essentially between 40 mm and 100 mm or even 30 mm and 120 mm.
In Figure 1 c, the creating of this 3D illusion is shown. The images Fci and Fc2 captured by the cameras C1 and C2 are displayed to the eyes E1 and E2, using displays D1 and D2, respectively. The disparity between the images is processed by the human visual system so that an understanding of depth is created. That is, when the left eye sees the image SA2 of the sphere A2 on the left side of the image SAI of sphere A1 , and respectively the right eye sees the image of A2 on the right side, the human visual system creates an understanding that there is a sphere V2 behind the sphere V1 in a three-dimensional world. Here, it needs to be understood that the images Fci and Fc2 can also be synthetic, that is, created by a computer. If they carry the disparity information, synthetic images will also be seen as three- dimensional by the human visual system. That is, a pair of computer-generated images can be formed so that they can be used as a stereo image.
Figure 1d illustrates how the principle of displaying stereo images to the eyes can be used to create 3D movies or virtual reality scenes having an illusion of being three-dimensional. The images Fxi and Fx2 are either captured with a stereo camera or computed from a model so that the images have the appropriate disparity. By displaying a large number (e.g. 30) frames per second to both eyes using display D1 and D2 so that the images between the left and the right eye have disparity, the human visual system will create a cognition of a moving, three-dimensional image.
Figure 2a shows a system and apparatuses for stereo viewing, that is, for 3D video and 3D audio digital capture and playback. The task of the system is that of capturing sufficient visual and auditory information from a specific location such that a convincing reproduction of the experience, or presence, of being in that location can be achieved by one or more viewers physically located in different locations and optionally at a time later in the future. Such reproduction requires more information than can be captured by a single camera or microphone, in order that a viewer can determine the distance and location of objects within the scene using their eyes and their ears. As explained in the context of Figures 1 a to 1 d, to create a pair of images with disparity, two camera sources are used. In a similar manned, for the human auditory system to be able to sense the direction of sound, at least two microphones are used (the commonly known stereo sound is created by recording two audio channels). The human auditory system can detect the cues e.g. in timing difference of the audio signals to detect the direction of sound.
The system of Figure 2a may consist of three main parts: image sources, a server and a rendering device. A video capture device SRC1 comprises multiple (for example, 8) cameras CAM1 , CAM2, CAMN with overlapping field of view so that regions of the view around the video capture device is captured from at least two cameras. The device SRC1 may comprise multiple microphones to capture the timing and phase differences of audio originating from different directions. The device may comprise a high resolution orientation sensor so that the orientation (direction of view) of the plurality of cameras can be detected and recorded. The device SRC1 comprises or is functionally connected to a computer processor PROC1 and memory MEM1 , the memory comprising computer program PROGR1 code for controlling the capture device. The image stream captured by the device may be stored on a memory device MEM2 for use in another device, e.g. a viewer, and/or transmitted to a server using a communication interface COMM1 . It needs to be understood that although an 8-camera-cubical setup is described here as part of the system, another camera device including different number of cameras and/or different location adjustment of cameras may be used instead as part of the system.
Alternatively or in addition to the video capture device SRC1 creating an image stream, or a plurality of such, one or more sources SRC2 of synthetic images may be present in the system. Such sources of synthetic images may use a computer model of a virtual world to compute the various image streams it transmits. For example, the source SRC2 may compute N video streams corresponding to N virtual cameras located at a virtual viewing position. When such a synthetic set of video streams is used for viewing, the viewer may see a three-dimensional virtual world, as explained earlier for Figure 1 d. The device SRC2 comprises or is functionally connected to a computer processor PROC2 and memory MEM2, the memory comprising computer program PROGR2 code for controlling the synthetic source device SRC2. The image stream captured by the device may be stored on a memory device MEM5 (e.g. memory card CARD1 ) for use in another device, e.g. a viewer, or transmitted to a server or the viewer using a communication interface COMM2.
There may be a storage, processing and data stream serving network in addition to the capture device SRC1 . For example, there may be a server SERV or a plurality of servers storing the output from the capture device SRC1 or computation device SRC2. The device comprises or is functionally connected to a computer processor PROC3 and memory MEM3, the memory comprising computer program PROGR3 code for controlling the server. The server may be connected by a wired or wireless network connection, or both, to sources SRC1 and/or SRC2, as well as the viewer devices VIEWER1 and VIEWER2 over the communication interface COMM3. For viewing the captured or created video content, there may be one or more viewer devices VIEWER1 and VIEWER2. These devices may have a rendering module and a display module, or these functionalities may be combined in a single device. The devices may comprise or be functionally connected to a computer processor PROC4 and memory MEM4, the memory comprising computer program PROGR4 code for controlling the viewing devices. The viewer (playback) devices may consist of a data stream receiver for receiving a video data stream from a server and for decoding the video data stream. The data stream may be received over a network connection through communications interface COMM4, or from a memory device MEM6 like a memory card CARD2. The viewer devices may have a graphics processing unit for processing of the data to a suitable format for viewing as described with Figures 1 c and 1 d. The viewer VIEWER1 comprises a high- resolution stereo-image head-mounted display for viewing the rendered stereo video sequence. The head-mounted device may have an orientation sensor DET1 and stereo audio headphones. The viewer VIEWER2 comprises a display enabled with 3D technology (for displaying stereo video), and the rendering device may have a head-orientation detector DET2 connected to it. Any of the devices (SRC1 , SRC2, SERVER, RENDERER, VIEWER1 , VIEWER2) may be a computer or a portable computing device, or be connected to such. Such rendering devices may have computer program code for carrying out methods according to various examples described in this text.
Figure 2b shows a camera device for adjustable stereo viewing. The camera device comprises three or more cameras that are configured into camera pairs for creating the left and right eye images, or that can be arranged to such pairs. The distance between cameras may correspond to the usual distance between the human eyes. The cameras may be arranged so that they have significant overlap in their field-of- view. For example, wide-angle lenses of 180 degrees or more may be used, and there may be 3, 4, 5, 6, 7, 8, 9, 10, 12, 16 or 20 cameras. The cameras may be regularly or irregularly spaced across the whole sphere of view, or they may cover only part of the whole sphere. For example, there may be three cameras arranged in a triangle and having a different directions of view towards one side of the triangle such that all three cameras cover an overlap area in the middle of the directions of view.. In Figure 2b, three stereo camera pairs are shown. Figure 2e shows a further camera device for adjustable stereo viewing in which there are 8 cameras having wide-angle lenses and arranged regularly at the corners of a virtual cube and covering the whole sphere such that the whole or essentially whole sphere is covered at all directions by at least 3 or 4 cameras.
Camera devices with other types of camera layouts may be used. For example, a camera device with all the cameras in one hemisphere may be used. The number of cameras may be e.g. 3, 4, 6, 8, 12, or more. The cameras may be placed to create a central field of view where stereo images can be formed from image data of two or more cameras, and a peripheral (extreme) field of view where one camera covers the scene and only a normal non-stereo image can be formed. Examples of different camera devices that may be used in the system are described also later in this description.
Figure 2c shows a head-mounted display for stereo viewing. The head-mounted display contains two screen sections or two screens DISP1 and DISP2 for displaying the left and right eye images. The displays are close to the eyes, and therefore lenses are used to make the images easily viewable and for spreading the images to cover as much as possible of the eyes' field of view. The device is attached to the head of the user so that it stays in place even when the user turns his head. The device may have an orientation detecting module ORDET1 for determining the head movements and direction of the head. It is to be noted here that in this type of a device, tracking the head movement may be done, but since the displays cover a large area of the field of view, eye movement detection is not necessary. The head orientation may be related to real, physical orientation of the user's head, and it may be tracked by a sensor for determining the real orientation of the user's head. Alternatively or in addition, head orientation may be related to virtual orientation of the user's view direction, controlled by a computer program or by a computer input device such as a joystick. That is, the user may be able to change the determined head orientation with an input device, or a computer program may change the view direction (e.g. in gaming, the game program may control the determined head orientation instead or in addition to the real head orientation.
Figure 2d illustrates a camera CAM1 . The camera has a camera detector CAMDET1 , comprising a plurality of sensor elements for sensing intensity of the light hitting the sensor element. The camera has a lens OBJ1 (or a lens arrangement of a plurality of lenses), the lens being positioned so that the light hitting the sensor elements travels through the lens to the sensor elements. The camera detector CAMDET1 has a nominal center point CP1 that is a middle point of the plurality sensor elements, for example for a rectangular sensor the crossing point of the diagonals. The lens has a nominal center point PP1 , as well, lying for example on the axis of symmetry of the lens. The direction of orientation of the camera is defined by the line passing through the center point CP1 of the camera sensor and the center point PP1 of the lens. The direction of the camera is a vector along this line pointing in the direction from the camera sensor to the lens. The optical axis of the camera is understood to be this line CP1 -PP1 .
The system described above may function as follows. Time-synchronized video, audio and orientation data is first recorded with the capture device. This can consist of multiple concurrent video and audio streams as described above. These are then transmitted immediately or later to the storage and processing network for processing and conversion into a format suitable for subsequent delivery to playback devices. The conversion can involve post-processing steps to the audio and video data in order to improve the quality and/or reduce the quantity of the data while preserving the quality at a desired level. Finally, each playback device receives a stream of the data from the network, and renders it into a stereo viewing reproduction of the original location which can be experienced by a user with the head mounted display and headphones. In general, embodiments of the invention can be embodied as an algorithm on an electronic device 10 such as that exemplary depicted in Figure 3. The electronic device 10 can be configured to execute an algorithm for the provision of exploiting the disparity between the left and right images in order to improve the encoding efficiency of stereoscopic imaging systems such as those depicted in Figures 1 b to 1 d. Furthermore said electronic device 10 may be a component of a stereoscopic imaging systems such as the systems depicted by of Figures 1 b to 1 d and Figures 2a to 2d.
The electronic device or apparatus 10 in some embodiments comprises a processor 21 which to a memory 22. The processor 21 can in some embodiments be configured to execute various program codes. The implemented program codes in some embodiments comprise a multichannel or stereo encoding or decoding code as described herein. The implemented program codes 23 can in some embodiments be stored for example in the memory 22 for retrieval by the processor 21 whenever needed. The memory 22 could further provide a section 24 for storing data, for example data that has been encoded in accordance with the application. The processor is further linked to an I/O (Input/Output) port 1 1 for receiving and transmitting digital data. For instance, the port 1 1 may be arranged to connect to the digital port of a camera or camera module. Additionally, the processor 21 is further linked to a transceiver (RX/TX) 13, to a user interface (Ul) 15.
In some embodiments the apparatus 10 may additionally include an integrated camera module comprising a camera having a lens for focusing an image on to a digital image capture means such as a charged coupled device (CCD). In other embodiments the digital image capture means may be any suitable image capturing device such as complementary metal oxide semiconductor (CMOS) image sensor.
The user interface 15 enables a user to input commands to the electronic device 10, for example via a keypad, and/or to obtain information from the electronic device 10, for example via a display. In some embodiments a touch screen may provide both input and output functions for the user interface. The apparatus 10 in some embodiments comprises a transceiver 13 suitable for enabling communication with other apparatus, for example via a wireless communication network.
The encoding code in embodiments can be implemented in hardware and/or firmware.
Figure 4 shows schematically the concept for embodiments as described herein, in which there is depicted different areas of a stereoscopic image or video frame formed by combining a left and right image as taken by a left camera and right camera arranged as a horizontal parallax pair such as that arrangement depicted in Figure 1 b. It should be noted that similar approach is applicable on vertical parallax pairs too. It can be seen that the combined image comprises three regions. The region 401 depicts the stereo (or 3D) part of the image where the right image and left image making up the combined image have a common view of the scene. In other words the region 401 are the result of combining sections of the left and right image which share a common view of a scene, in which each image is taken from the perspective of the left and right eye respectively. In this region of the combined image the viewer's HVS can combine the left and right images such that the image perceived by the viewer is a three dimensional image creating the depth perception. Further it is to be appreciated that a viewer's HVS will naturally cause the combined 3D region 401 to be perceived as the dominant region of the image, thereby drawing the majority of the viewer's cognitive attention. Also shown in Figure 4 are the regions 402 and 403 which depict the regions of the combined image which do not overlap with a common scene. The region 402 in Figure 4 can represent the region of the combined image taken by a left camera which does not overlap with the region of the combined image taken by the right camera. Thus, region 402 does not have a representation in the right view. Commensurately, the region 403 in Figure 4 can represent the region of the combined image taken by the right camera which does not overlap with the region of the combined image as taken by the left camera. Thus, region 403 does not have a representation in the left view. The regions 402 and 403 of the combined image will be perceived by the viewer's HVS as two dimensional regions as each of these regions do not have a corresponding view between the left and right images. Furthermore, as a direct consequence these areas may tend to be found at the periphery of the combined image, as depicted in Figure 4. Consequently, the regions 402 and 403 may be perceived subconsciously by the viewer as contributing less to the overall viewing experience than overlapping regions such as 401 . This is attributed to the fact that there is no depth perception achieved by these regions as only one image representing those areas is available to the viewer's HVS.
As a result of the above effect, a stereoscopic image comprising the combination of a right and left image may be more efficiently encoded. For instance more coding bandwidth (or bits) may be expended over the overlapping region (depicted as 401 in Figure 4) than the peripheral or non-overlapping regions (depicted as 402 and 403 in Figure 4) without any noticeable loss in depth perception or perceived quality in the combined image. In other words there may be an unequal distribution of coding bandwidth between overlapping and non-overlapping regions of the combined image, in which the overlapping region is assigned a higher proportion of the encoding bandwidth per pixel than the non-overlapping peripheral areas.
The concept for embodiments as described herein is to encode the overlapping regions of between the left and right images of a stereoscopic image at a higher encoding rate on a comparative per pixel basis than non-overlapping regions. To that respect Figure 5 shows the operation of an encoder deploying such a strategy which may be implement on an electronic device such as that depicted as 10 in Figure 3
Initially the device or entity arranged to perform the encoding of the stereoscopic image, such as the exemplary electronic device depicted as in Figure 3, may be configured to receive the left and right images from a camera pair such as the camera arrangement depicted in Figure 1 b as C1 and C2.
The step of receiving the left and right images from a camera pair arranged to capture a stereoscopic image is shown as processing step 501 . For each of the left and right image views the depth values for each pixel may be determined. Additionally a texture map may also be produced for each left and right image view. The depth value for each pixel location may be obtained by obtaining range data using the time-of-flight (TOF) principle for example by using a camera which may be provided with a light source, for example an infrared emitter, for illuminating the scene. Such an illuminator may be arranged to produce an intensity modulated electromagnetic emission for a frequency between e.g. 10-100 MHz, which may require LEDs or laser diodes to be used. Infrared light may be used to make the illumination unobtrusive. The light reflected from objects in the scene is detected by an image sensor, which may be modulated synchronously at the same frequency as the illuminator. The image sensor may be provided with optics; a lens gathering the reflected light and an optical band pass filter for passing only the light with the same wavelength as the illuminator, thus helping to suppress background light. The image sensor may measure for each pixel the time the light has taken to travel from the illuminator to the object and back. The distance to the object may be represented as a phase shift in the illumination modulation, which can be determined from the sampled data simultaneously for each pixel in the scene.
Alternatively or in addition to the above-described TOF-principle depth sensing, the range data or depth values may be obtained using a structured light approach which may operate for example approximately as follows. A light emitter, such as an infrared laser emitter or an infrared LED emitter, may emit light that may have a certain direction in a 3D space (e.g. follow a raster-scan or a pseudo-random scanning order) and/or position within an array of light emitters as well as a certain pattern, e.g. a certain wavelength and/or amplitude pattern. The emitted light is reflected back from objects and may be captured using a sensor, such as an infrared image sensor. The image/signals obtained by the sensor may be processed in relation to the direction of the emitted light as well as the pattern of the emitted light to detect a correspondence between the received signal and the direction/position of the emitted lighted as well as the pattern of the emitted light for example using a triangulation principle. From this correspondence a distance and a position of a pixel may be concluded. Alternatively or in addition to the above-described depth sensing methods, it is possible to estimate the depth values only taking into account the available images, using stereo matching algorithms. Such depth estimation (stereo-matching) techniques are usually based on cost aggregation methods, where input 3D cost volume is calculated from pixel dissimilarity measured between given left and right images considering a number of depth hypothesized.
C(x, y, d) = \\L(x, y) - R(x - d, y) \\
Where
C denotes resulting cost volume (associated with the left image)
L and R are left and right images
(x, y) are spatial coordinates within the image
d is a disparity hypothesis
For unrectified images, the whole image can be swept in order to compute the cost volume, where depth hypothesizes are sampled according to the desired rule. The step of obtaining range data and determining the depth maps for each of the left and right image views is shown as processing step 503 in Figure 5.
The device may then be arranged to measure the disparity between the left and right views of the received stereoscopic image pair in order to classify the areas which form the 2D and 3D regions.
In an embodiment a measure of the disparity between the left and right views with respect to the left view image can be determined by using the depth value for each pixel of the left view image and then applying the following equation to each pixel of the left view image to give a disparity map image for the left view image. DL = f x l x ( x (— — ] +— ]
where, DL\s the disparity value for a pixel of the left view image,
/ is the focal length of the capturing camera,
I is the translational difference between cameras,
dL\s the depth value for a pixel of the left view image,
N is the number of bits representing the depth values,
znear and Zfar are the respective distances of the closest and the furthest objects in the scene to the camera respectively.
A measure of the disparity between the right and left views with respect to the right view image can be determined by using the depth value for pixel of the right view image and then applying the following equation to each pixel of the right view image to give a disparity map image for the right view image.
Figure imgf000024_0001
where, DRis the disparity value for a pixel of the right view image,
dRis the depth value for a pixel of the right view image, and
N ,Znear,Zfar , f and / are as above.
It is to be understood that the above-described depth estimation and sensing methods are provided as non-limiting examples and embodiments may be realized with the described or any other depth estimation and sensing methods and apparatuses.
The step of determining the disparity map image for the left view from the depth value of each pixel of the left image and the disparity map image for the right view from the depth value of each pixel of the right image is shown as processing step 505 in Figure 5.
In embodiments each disparity map image can be partitioned into two regions, where one region of the disparity map image corresponds to the two dimensional (2D) image portion which covers the parts only presented in current view and the other region which when combined with the other parallax image forms the three dimensional (3D) image portion. For example in terms of the left image, the disparity image view for the left view image can be partitioned as first region classified as 2D image and a second region classified as a 3D region. With reference to Figure 4 the above regions may be visualised as regions 402 and 401 respectively.
In an embodiment the disparity map image may be partitioned into a 2D region and a 3D region along a boundary contour comprising pixels running from the top to the bottom of the disparity map image. This may be viewed as forming a 2D band or stripe from the perspective of the left or right view image since the 2D region is relatively narrow region compared to the corresponding 3D region. The 2D band or stripe can be classified as the pixels which lie to the left of the boundary contour the corresponding left view disparity map image. Conversely for the right view image, the 2D band or stripe can comprise the pixels to the right of the boundary contour in the corresponding right view disparity map image. An example of this particular format of 2D and 3D region segregation is shown in Figure 4, where it can be seen that the demarcation between the two regions is a column of pixels, and that the corresponding 2D regions in the right and left view images each form a band or stripe on either the left or right extremity of the view image.
In the first embodiment the demarcation or pixel boundary contour between the 2D and 3D regions (from a disparity map image) may be determined by taking into account the disparity value at the sides of the image. This may be realized by positioning the pixel boundary such that its position from the edge of the respective view disparity map image may be linearly proportional to the disparity values of the pixels in a prospective pixel boundary for a particular row. The position of the pixel boundary from an edge of a respective disparity map image may be selected on an iterative basis row by row. In this respect Figure 6 shows an example process by which a pixel boundary pixel position can be determined from a respective view disparity map image.
The following process will be described in terms of locating the pixel boundary position from the perspective of the left view image. However, it is to be understood that the process can be performed for either the left or right view disparity map image. A pixel from the first row may be selected for a column within the vicinity of the left side edge of the left view image as the initial starting point. In other words an initial pixel may be selected from the first row and first column of the left view disparity map image, i.e. the top left hand pixel position. However it is to be appreciated that embodiments may select other pixel positions as an initial starting point.
The step of selecting the initial starting pixel boundary position for a row of the left view disparity map image is shown as processing step 601 in Figure 6.
The disparity value D of the actual pixel location L for a row of the left view disparity map image may then be used to determine a measure which translates (or maps) the disparity value D into an equivalent pixel location value EPL In other words EPL may be viewed as an equivalent pixel location value for the disparity value D.
In embodiments EPL may be expressed as
EPL = a x D + b where a and b are constants. The step of reading the disparity D at the location of the selected pixel of the left view disparity map image is shown as the processing step 603 in Figure 6. The step of determining the equivalent pixel location value EPL for the disparity value D is shown as the processing step 605 in Figure 6. The value EPL may then be compared against the actual location value L of the selected pixel of the left disparity map image.
If EPL is determined to be less than the location value L of the current selected pixel then the next pixel along the row towards the centre of the image can be selected as a potential boundary pixel. The steps of determining EPL for the next pixel and testing the value of EPL against the next pixel location value L can then be repeated.
In Figure 6, the step of testing EPL against the current location value L is depicted as decision step 606, and the step of selecting the next pixel along the row of the left view disparity map image (L=L+1 ) in the case that EPL is less than the current location value L is shown as the feedback path 607.
If, however, EPL is determined to be equal to or greater than the location value L of the current selected pixel, then the feedback loop can be terminated and the current pixel is selected to be part of the boundary contour between the 2D and 3D regions. This step is represented in Figure 6 as the processing step 608.
It is to be appreciated that in the above process an initial location value of L = 1 can be assigned to the process if the initial selected pixel is drawn from the left most position of the row.
The steps of Figure 6 may be repeated for the next row and all subsequent rows of the left view disparity map image. As stated above the processing steps of Figure 6 can be performed for the right view disparity map image in order to obtain the boundary contour with respect to the right view image. The effect of the processing steps in Figure 6 can be that the partition between the 2D and 3D regions can be portrayed as a band or stripe as shown by Figure 4. This can be due to the fact the objects in the right and left image views are within the same depth range. However, Figure 7 depicts the effect of the processing steps in Figure 6 in which the boundary between the 2D and 3D regions follow the contour of objects located at the extremities of each of the left and right view images. This case is due to there being foreground objects located at the extremities of the respective image views.
In another embodiment, each iteration of the steps 601 to 608 of Figure 6 can be applied for N rows (N>1 ) at the same time. In this embodiment, the value L is as defined and used in the previous embodiment. However, the disparity value D at each column location L is a function of the N disparity values associated with the N rows. The function of D can return a single disparity value D associated with all the selected N rows. The calculated boundary pixel contour width as provided by step 608 will be applied to all selected N rows. The function by which the single disparity value D can be given by one of the mean, median, maximum, minimum, or a weighted average of the available N disparity values associated with the N rows. This embodiment can be used in implementations where limited processing resources are available, thereby requiring a less complex algorithm. In another embodiment, the steps in Figure 6 can be applied over square blocks of pixels where the size of each side of the square block may be given as N, with N being greater than 1 . As above, all the processing steps of Figure 6 can be applied on a block by block basis in order to find the boundary contour between the 2D and 3D regions of the respective image view. However, in this embodiment the location value L is similar to previous embodiment except calculation and increasing of column location L. This value can be calculated as a function of block size (N) and the left side location of the block Lstart (starting location of the block). L can be calculated based on each of the following functions:
L = Lstart (left side of the block)
L = Lstart + N - 1 (right side of the block)
L = Lstart + round (middle location of the block)
Where round is a function to return the closest integer value.
Disparity value D may then be calculated and used in a similar manner as the previous embodiment. The calculated boundary pixel contour width in step 608 will then be applied on all rows of the selected block.
In yet another embodiment, a single disparity {Dall) value is calculated for each image as a whole. This value can be calculated by using one of mean, median, max, min, or a weighted average over all disparity values of the image map. Following this, one fixed value of EPL will be calculated as follows:
EPL = a x Dall + b
This value will be used as the boundary pixel contour width for all of the rows in the image and hence, replaced the row by row steps illustrated in Figure 6. It is to be further understood that other embodiments may perform the above processing steps for either the left view disparity map image or the right view disparity map image. In these embodiments the position of the boundary pixel column can be found for either the left view or right view, and applied to both views.
The overall step of processing the left view disparity image map or right view disparity image map in order to locate the boundary between the 2D and 3D regions for each respective left and right image views is shown as processing step 507 in Figure 5. The result of processing step 507 can be a boundary on the left view image separating the 2D and 3D regions, and a boundary on the right view image separating the 2D and 3D regions. It is to be appreciated that the output of the processing step 507 can be the boundary between the 2D and 3D regions for both the left and/or right view images. The 2D/3D boundary from the left view disparity image map can then be mapped onto the left view image and the 2D/3D boundary from the right view disparity image map can be mapped onto the right view image. This step is shown as processing step 509 in Figure 5.
As stated before, a different encoding regime can be applied to the 2D and the 3D regions in each of the left and right images. This is primarily as a consequence of the 2D region being viewed at the periphery of the combined image with the result that this region is not perceived in 3D by the viewer's HVS. The 2D region can be encoded at a greater coding efficiency than the 3D region, or in other words the 2D region for the left and right image views can be encoded with less bits per pixel than the corresponding 3D region.
In embodiments the encoding between the 2D and 3D regions of both the left and right view images may differ due to the 2D region incurring the extra step of low pass filtering. This has the effect of reducing the high frequency components in the 2D region, thereby reducing the relative number of bits required on a per pixel basis to encode the region when compared to encoding the 3D region with the same encoding scheme. Alternatively or additionally in other embodiments the encoding between the 2D and 3D regions may differ by using a courser block transform coefficient quantization step for the 2D region, on the basis that a hybrid coding mechanism is used for both regions.
In other embodiments the pixel sample values for the 2D and 3D regions in the left and right image views can be quantized directly. With this approach the encoding efficiency may be achieved by quantizing 2D region pixel values at a different step size to 3D region pixel values.
In some embodiments there may be a combination of all or part of the above when encoding the 2D and 3D regions.
Furthermore, some embodiments may cater for a smooth transition of encoding between the 2D and 3D regions. For example the pixel blocks or pixel values around the boundary between the 2D and 3D regions may be gradually encoded with a higher rate as the pixels or coding blocks to be encoded approach the boundary. This approach may be implemented by varying the particular encoding parameter which effects the coding rate. For example in the case of directly quantizing the pixel values, the step size may be gradually reduced as the position of the pixel to be quantized approaches the boundary from the 2D to 3D region, and vice versa.
The step of encoding the 2D and 3D regions each at different encoding rates for the left view image and right view image is depicted as processing step 51 1 in Figure 5.
Although the above examples describe embodiments of the application operating within a within an apparatus 10, it would be appreciated that the invention as described below may be implemented as part of any stereoscopic capture apparatus.
User equipment may comprise a stereoscopic video capture and recording device or module such as those described in embodiments of the application above.
It shall be appreciated that the term user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers. Furthermore elements of a public land mobile network (PLMN) may also comprise elements of a stereoscopic video capture and recording device as described above.
In general, the various embodiments of the application may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the application may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this application may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples. Embodiments of the application may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs can automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication. As used in this application, the term 'circuitry' refers to all of the following:
(a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry) and
(b) to combinations of circuits and software (and/or firmware), such as: (i) to a combination of processor(s) or (ii) to portions of processors )/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions and
(c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
This definition of 'circuitry' applies to all uses of this term in this application, including any claims. As a further example, as used in this application, the term 'circuitry' would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware. The term 'circuitry' would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or similar integrated circuit in server, a cellular network device, or other network device.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

Claims

Claims:
1 . A method comprising;
capturing a first image view and a second view image by a camera arrangement;
capturing ranging information for the first view image and the second view image;
determining a disparity map image for the first view image from the ranging information for the first view image and the second view image;
partitioning the disparity map image for the first view image into a non- overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image;
mapping the non-overlapping region and overlapping region to the first view image; and
encoding the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image.
2. The method as claimed in Claim 1 , wherein the partitioning of the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image is determined based on a disparity value and a position index of at least one pixel of the disparity map image.
3. The method as claimed in Claims 1 and 2, wherein the partitioning of the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image comprises;
determining, for the non-overlapping region, that at least one pixel in a row of the disparity map image has an equivalent position index value less than a position index for the at least one pixel in the row of the disparity map image, wherein the equivalent position index is based on a disparity value of the at least one pixel at a location given by the positon index in the row of the disparity map image; and determining, for the overlapping region, at least one further pixel in the row of the disparity map image that has a further equivalent position index value greater than a further position index in the row of the disparity map image for the at least one further pixel, wherein the further equivalent position index is based on a disparity value of the at least one further pixel at a location given by the further positon index in the row of the disparity map image.
4. The method as claimed in Claim 3, wherein the equivalent position index is given by a function which translates a disparity value of a pixel located at a position index in a row of the disparity map image to a position index based value.
5. The method as claimed in Claims 3 and 4, wherein the determining for the non- overlapping region and the determining for the overlapping region is performed on a row by row basis of the disparity map image.
6. The method as claimed in Claims 3 and 4, wherein the determining for the non- overlapping region and the determining for the overlapping region is performed for at least a two row by at least two row basis, wherein the disparity value is the mean or median of the disparity value of the at least one pixel of a first row of the at least two rows and of a disparity value of the at least one pixel of a second row of the at least two rows.
7. The method as claimed in Claims 3 and 4, wherein the determining for the non- overlapping region and the determining for the overlapping region is performed on a block of pixel by a block of pixel basis, wherein the at least one pixel and the at least one further pixel of the disparity map image is at least one block of pixels and at least one block of further pixels respectively, and wherein the disparity value of the at least one pixel and the disparity value of the at least one further pixel is respectively a disparity value of a mean or median of disparity values of pixels within the at least one block of pixels and the disparity value of a mean or median of disparity values of pixels within the at least one further block of pixels.
8. The method as claimed in Claims 1 to 7, further comprising:
determining a disparity map image for the second view image from the ranging information for the first view image and the second view image;
partitioning the disparity map image for the second view image into a non- overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image;
mapping the non-overlapping region of the disparity map image for the second view image and the overlapping region of the disparity map image for the second view image to the second view image; and
encoding the non-overlapping region of the second view image at a lower coding rate than the overlapping region of the second view image.
9. The method as claimed in Claims 1 to 8, wherein the first view image is a left view image, and the second view image is a right view image.
10. The method as claimed in Claims 1 to 9, wherein the non-overlapping region is a two dimensional region, and wherein the overlapping region is a three dimensional region.
1 1 . The method as claimed in Claims 1 to 10, wherein the camera arrangement is a stereoscopic camera arrangement.
12. An apparatus comprising:
at least one processor; and
at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to:
capture a first image view and a second view image by a camera arrangement; capture ranging information for the first view image and the second view image;
determine a disparity map image for the first view image from the ranging information for the first view image and the second view image;
partition the disparity map image for the first view image into a non- overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image;
map the non-overlapping region and overlapping region to the first view image; and
encode the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image.
13. The apparatus as claimed in Claim 12, wherein the apparatus is caused to partition the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image based on a disparity value and a position index of at least one pixel of the disparity map image.
14. The apparatus as claimed in Claims 12 and 13, wherein the apparatus caused to partition of the disparity map image for the first view image into the non- overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image is further caused to;
determine, for the non-overlapping region, that at least one pixel in a row of the disparity map image has an equivalent position index value less than a position index for the at least one pixel in the row of the disparity map image, wherein the equivalent position index is based on a disparity value of the at least one pixel at a location given by the positon index in the row of the disparity map image; and
determine, for the overlapping region, at least one further pixel in the row of the disparity map image that has a further equivalent position index value greater than a further position index in the row of the disparity map image for the at least one further pixel, wherein the further equivalent position index is based on a disparity value of the at least one further pixel at a location given by the further positon index in the row of the disparity map image.
15. The apparatus as claimed in Claim 14, wherein the equivalent position index is given by a function which translates a disparity value of a pixel located at a position index in a row of the disparity map image to a position index based value.
16. The apparatus as claimed in Claims 14 and 15, wherein the apparatus is caused to perform the determining for the non-overlapping region and the overlapping region, on a row by row basis of the disparity map image.
17. The apparatus as claimed in Claims 14 and 15, wherein the apparatus is caused to perform the determining for the non-overlapping region and the overlapping region for at least a two row by at least two row basis, wherein the disparity value is the mean or median of the disparity value of the at least one pixel of a first row of the at least two rows and of a disparity value of the at least one pixel of a second row of the at least two rows.
18. The apparatus as claimed in Claims 14 and 15, wherein the apparatus is caused to perform the determining for the non-overlapping region and the determining for the overlapping region is performed on a block of pixel by a block of pixel basis, wherein the at least one pixel and the at least one further pixel of the disparity map image is at least one block of pixels and at least one block of further pixels respectively, and wherein the disparity value of the at least one pixel and the disparity value of the at least one further pixel is respectively a disparity value of a mean or median of disparity values of pixels within the at least one block of pixels and the disparity value of a mean or median of disparity values of pixels within the at least one further block of pixels.
19. The apparatus as claimed in Claims 12 to 18, further caused to: determine a disparity map image for the second view image from the ranging information for the first view image and the second view image;
partition the disparity map image for the second view image into a non- overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image;
map the non-overlapping region of the disparity map image for the second view image and the overlapping region of the disparity map image for the second view image to the second view image; and
encode the non-overlapping region of the second view image at a lower coding rate than the overlapping region of the second view image.
20. The apparatus as claimed in Claims 12 to 19, wherein the first view image is a left view image, and the second view image is a right view image.
21 . The apparatus as claimed in Claims 12 to 20, wherein the non-overlapping region is a two dimensional region, and wherein the overlapping region is a three dimensional region.
22. The apparatus as claimed in Claims 12 to 21 , wherein the camera arrangement is a stereoscopic camera arrangement.
23. An apparatus configured to:
capture a first image view and a second view image by a camera arrangement;
capture ranging information for the first view image and the second view image;
determine a disparity map image for the first view image from the ranging information for the first view image and the second view image;
partition the disparity map image for the first view image into a non- overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; map the non-overlapping region and overlapping region to the first view image; and
encode the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image.
24. The apparatus as claimed in Claim 23, wherein the apparatus is configured to partition the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image based on a disparity value and a position index of at least one pixel of the disparity map image.
25. The apparatus as claimed in Claims 23 and 24, wherein the apparatus configured to partition of the disparity map image for the first view image into the non-overlapping region between the first view image and the second view image and the overlapping region between the first view image and the second view image is further configured to;
determine, for the non-overlapping region, that at least one pixel in a row of the disparity map image has an equivalent position index value less than a position index for the at least one pixel in the row of the disparity map image, wherein the equivalent position index is based on a disparity value of the at least one pixel at a location given by the positon index in the row of the disparity map image; and
determine, for the overlapping region, at least one further pixel in the row of the disparity map image that has a further equivalent position index value greater than a further position index in the row of the disparity map image for the at least one further pixel, wherein the further equivalent position index is based on a disparity value of the at least one further pixel at a location given by the further positon index in the row of the disparity map image.
26. The apparatus as claimed in Claim 25, wherein the equivalent position index is given by a function which translates a disparity value of a pixel located at a position index in a row of the disparity map image to a position index based value.
27. The apparatus as claimed in Claims 25 and 26, wherein the apparatus is configured to perform the determining for the non-overlapping region and the overlapping region, on a row by row basis of the disparity map image.
28. The apparatus as claimed in Claims 25 and 26, wherein the apparatus is configured to perform the determining for the non-overlapping region and the overlapping region for at least a two row by at least two row basis, wherein the disparity value is the mean or median of the disparity value of the at least one pixel of a first row of the at least two rows and of a disparity value of the at least one pixel of a second row of the at least two rows.
29. The apparatus as claimed in Claims 25 and 26, wherein the apparatus is configured to perform the determining for the non-overlapping region and the determining for the overlapping region is performed on a block of pixel by a block of pixel basis, wherein the at least one pixel and the at least one further pixel of the disparity map image is at least one block of pixels and at least one block of further pixels respectively, and wherein the disparity value of the at least one pixel and the disparity value of the at least one further pixel is respectively a disparity value of a mean or median of disparity values of pixels within the at least one block of pixels and the disparity value of a mean or median of disparity values of pixels within the at least one further block of pixels.
30. The apparatus as claimed in Claims 23 to 29, further caused to:
determine a disparity map image for the second view image from the ranging information for the first view image and the second view image;
partition the disparity map image for the second view image into a non- overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image; map the non-overlapping region of the disparity map image for the second view image and the overlapping region of the disparity map image for the second view image to the second view image; and
encode the non-overlapping region of the second view image at a lower coding rate than the overlapping region of the second view image.
31 . The apparatus as claimed in Claims 23 to 30, wherein the first view image is a left view image, and the second view image is a right view image.
32. The apparatus as claimed in Claims 23 to 31 , wherein the non-overlapping region is a two dimensional region, and wherein the overlapping region is a three dimensional region.
33. The apparatus as claimed in Claims 23 to 32, wherein the camera arrangement is a stereoscopic camera arrangement.
34. A computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by a least one processor, causing performance of at least:
capturing a first image view and a second view image by a camera arrangement;
capturing ranging information for the first view image and the second view image;
determining a disparity map image for the first view image from the ranging information for the first view image and the second view image;
partitioning the disparity map image for the first view image into a non- overlapping region between the first view image and the second view image and an overlapping region between the first view image and the second view image;
mapping the non-overlapping region and overlapping region to the first view image; and encoding the non-overlapping region of the first view image at a lower coding rate than the overlapping region of the first view image.
PCT/FI2016/050024 2016-01-20 2016-01-20 Stereoscopic video encoding WO2017125639A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/FI2016/050024 WO2017125639A1 (en) 2016-01-20 2016-01-20 Stereoscopic video encoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/FI2016/050024 WO2017125639A1 (en) 2016-01-20 2016-01-20 Stereoscopic video encoding

Publications (1)

Publication Number Publication Date
WO2017125639A1 true WO2017125639A1 (en) 2017-07-27

Family

ID=55262826

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2016/050024 WO2017125639A1 (en) 2016-01-20 2016-01-20 Stereoscopic video encoding

Country Status (1)

Country Link
WO (1) WO2017125639A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109274813A (en) * 2018-08-14 2019-01-25 Oppo广东移动通信有限公司 Optimization of rate method, apparatus, electronic equipment and storage medium
EP4294010A1 (en) * 2022-06-16 2023-12-20 Axis AB Camera system and method for encoding two video image frames captured by a respective one of two image sensors

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013022401A2 (en) * 2011-08-10 2013-02-14 Telefonaktiebolaget L M Ericsson (Publ) Method and apparatus for creating a disocclusion map used for coding a three-dimensional video
US20140218473A1 (en) * 2013-01-07 2014-08-07 Nokia Corporation Method and apparatus for video coding and decoding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013022401A2 (en) * 2011-08-10 2013-02-14 Telefonaktiebolaget L M Ericsson (Publ) Method and apparatus for creating a disocclusion map used for coding a three-dimensional video
US20140218473A1 (en) * 2013-01-07 2014-08-07 Nokia Corporation Method and apparatus for video coding and decoding

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HOMAYOUNI MARYAM ET AL: "Perception aware coding of stereoscopic video", 2015 INTERNATIONAL CONFERENCE ON 3D IMAGING (IC3D), IEEE, 14 December 2015 (2015-12-14), pages 1 - 6, XP032856692, DOI: 10.1109/IC3D.2015.7391821 *
KRISHNA RAO VIJAYANAGAR ET AL: "Compression of residual layers of layered depth video using hierarchical block truncation coding", 3DTV-CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO (3DTV-CON), 2012, IEEE, 15 October 2012 (2012-10-15), pages 1 - 4, XP032275919, ISBN: 978-1-4673-4904-8, DOI: 10.1109/3DTV.2012.6365476 *
PINTO LUIS ET AL: "Asymmetric 3D video coding using regions of perceptual relevance", 2012 INTERNATIONAL CONFERENCE ON 3D IMAGING (IC3D), IEEE, 3 December 2012 (2012-12-03), pages 1 - 6, XP032491671, DOI: 10.1109/IC3D.2012.6615124 *
YUN ZHANG ET AL: "Stereoscopic Visual Attention-Based Regional Bit Allocation Optimization for Multiview Video Coding", EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, HINDAWI PUBLISHING CORP, US, vol. 2010, 29 June 2010 (2010-06-29), pages 848713 - 1, XP002687175, ISSN: 1687-6172, DOI: 10.1155/2010/848713 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109274813A (en) * 2018-08-14 2019-01-25 Oppo广东移动通信有限公司 Optimization of rate method, apparatus, electronic equipment and storage medium
EP4294010A1 (en) * 2022-06-16 2023-12-20 Axis AB Camera system and method for encoding two video image frames captured by a respective one of two image sensors

Similar Documents

Publication Publication Date Title
US11575876B2 (en) Stereo viewing
KR101944050B1 (en) Capture and render panoramic virtual reality content
Yamanoue et al. Geometrical analysis of puppet-theater and cardboard effects in stereoscopic HDTV images
US20170280133A1 (en) Stereo image recording and playback
US10631008B2 (en) Multi-camera image coding
Schmeing et al. Depth image based rendering: A faithful approach for the disocclusion problem
Domański et al. A practical approach to acquisition and processing of free viewpoint video
US10404964B2 (en) Method for processing media content and technical equipment for the same
Tang et al. A universal optical flow based real-time low-latency omnidirectional stereo video system
WO2017125639A1 (en) Stereoscopic video encoding
KR20150047604A (en) Method for description of object points of the object space and connection for its implementation
EP3267682A1 (en) Multiview video encoding
WO2019008233A1 (en) A method and apparatus for encoding media content
Zilly et al. Generic content creation for 3D displays
US20230281910A1 (en) Methods and apparatus rendering images using point clouds representing one or more objects
Steurer et al. 3d holoscopic video imaging system
Yamada et al. Multimedia ambience communication based on actual moving pictures in a steroscopic projection display environment
KR20220117288A (en) Augmenting the view of the real environment using the view of the volumetric video object

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16701979

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16701979

Country of ref document: EP

Kind code of ref document: A1