WO2021249955A1

WO2021249955A1 - Metadata to describe color filter on top on camera sensors

Info

Publication number: WO2021249955A1
Application number: PCT/EP2021/065188
Authority: WO
Inventors: Benoit Vandame; Frédéric Babon; Didier Doyen
Original assignee: Interdigital Vc Holdings France, Sas
Priority date: 2020-06-09
Filing date: 2021-06-07
Publication date: 2021-12-16

Abstract

An apparatus and a method are provided. The method comprises receiving data including corresponding metadata information about a content captured by a camera and analyzing it to determine information about positioniong and respective color of each pixel of the content. Subsequently at least two views of the content are determined with each having a plurality of different pixels. Finally, a view synthesis is generated from the views by constructing a single synthetized pixel using said plurality of different pixels of each view.

Description

METADATA TO DESCRIBE COLOR FILTER ON TOP ON CAMERA SENSORS TECHNICAL FIELD [0001] The present disclosure generally relates to color filter arrays and more specifically to techniques using metadata to describe color filter used with camera sensors. BACKGROUND [0002] Most consumer devices today can digitally capture an image or video with a single sensor which integrates light at the pixel level. Most sensors capture the visible light spectrum and additional colored filters must be placed in front of the sensor to capture specific colors/information. [0003] Many color filters produce a mosaic that provides information about the intensity of light in red, green, and blue (RGB) wavelength regions. The raw image data is often captured by the image sensor and then converted to a full-color image such as by a demosaicing algorithm. One popular filter that is used is the Bayer filter. Bayer filter is a set of 4 filters (2 green, 1 red and 1 blue) associated to 4 pixels of the sensor on a 2-by-2 basis. It produces a pattern that is well designed and takes into account the predominance of the green information in the human vision. [0004] Unfortunately, Bayer filters only provide (at the pixel level) a part of the information (the red or the green or the blue one). The RGB information for each pixel requires interpolation of the RAW data (raw data provided by the camera or other camera like instruments that may be particular to the camera device and is usually provided or “dumped” as raw data by the device). A 444 RGB signal is corresponding to a full resolution of R, G and B information but only 1/3 of this information comes from real captured data. The rest is only interpolated values. The goal of this invention is to provide information captured with the Bayer filter along with the content to allow the ability in the processing workflow to work only with real captured data (the RAW data that have not been interpolated.) If the structure and the positioning of the Bayer pattern is not transmitted with the 444 signal, there is no way to apply specific processing to only RAW data as soon as the 444 interpolation has been performed. [0005] Consequently, techniques are needed that can use raw data (or RAW data) as soon as the interpolation has been performed without a need to transmit the positioning of the filter pattern. SUMMARY [0006] Additional features and advantages are realized through similar techniques and other embodiments and aspects are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the description and to the drawings. [0007] An apparatus and a method are provided. The method comprises receiving data including corresponding metadata information about a content captured by a camera and analyzing it to determine information about positioniong and respective color of each pixel of the content. Subsequently at least two views of the content are determined with each having a plurality of different pixels. Finally, a view synthesis is generated from the views by constructing a single synthetized pixel using said plurality of different pixels of each view. [0008] In a different embodiment, another method and device is provided to obtain an image that includes at least one color component, the at least one color component including interpolated data and non-interpolated data and obtaining metadata indicating one or more locations in the at least one color component that have the non-interpolated data. BRIEF DESCRIPTION OF THE DRAWINGS [0009] The teachings of the present disclosure can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which: [0010] FIG.1A and 1B are illustration of a Bayer filter and a Bayer filter provided in front of a sensor; [0011] FIG.2A and 2B are illustration of workings of a plenoptic camera; [0012] FIG.3 is an illustration of a View synthesis principle according to one embodiment; [0013] FIG. 4 is an illustration of six consecutive slices of a virtual color cube according to one embodiment; [0014] FIG. 5 is an illustration of the application of the view synthesis algorithm according to one embodiment; [0015] FIG.6 is a table illustrating an Auxiliary Video Information (AVI) Infoframe; [0016] FIG. 7 is a table providing an extension of the Auxiliary Video Information (AVI) Infoframe of the table provided in FIG.6; [0017] FIG 8 is an illustration of an interpolated pixel coordinate and the distance between them using the three colors of Bayer filter; [0018] FIG 9 is an illustration of a comparison made between two versions of slice of a virtual color cube; [0019] FIG 10 is a flow chart illustration of one embodiment; and [0020] FIG 11 is an illustration of a system that can be used in conjunction with one or more embodiments. [0021] It should be understood that the drawings are for purposes of illustrating the concepts of the invention and are not necessarily the only possible configuration for illustrating the invention. To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS [0022] FIG.1A shows a color filter. For the ease of understanding a Bayer filter is used here but in alternate embodiments, other filters having alternate patterns can be used. The Bayer filter is almost used on most consumer digital cameras which is why it is used in this embodiment. [0023] FIG. 2A and 2B present schematically a main component of a plenoptic camera that enables the acquisition of light field data on which the present technique can be applied. The embodiments provided herein can be used with any camera including plenoptic cameras but since plenoptic cameras are more sophisticated than most regular digital cameras, they are provided to allow for the scope of applicability. [0024] Most plenoptic cameras as shown in FIG. 2A comprise a main lens referenced 101, and an image sensor (i.e. an array of pixel sensors (for example a sensor based on CMOS technology)), referenced 104. Between the main lens 101 and the image sensor 104, a microlens array (i.e. a set of micro-lens) referenced 102, is provided with a set of micro lenses referenced 103 as positioned. In some embodiments, some spacers can be optionally provided and can be located between the micro-lens array around each lens and the image sensor to prevent light from one lens to overlap with the light of other lenses at the image sensor side. In some embodiments, the main lens 101 can be a more complex optical system. The workings of plenoptic cameras are like any other conventional cameras. Hence, plenoptic cameras can be viewed as a conventional camera plus a micro-lens array set just in front of the image sensor as illustrated in FIG. 1A and 1B. The light rays passing through a micro-lens cover a part of the image sensor that records the radiance of these light rays. The recording by this part of the image sensor defines a micro-lens image. [0025] FIG.2B present what the image sensor 104 records. Indeed, in such view, it appears that the image sensor 104 comprises a set of pixels, referenced 201. The light rays passing through a micro-lens cover a number of pixels 201, and these pixels record the energy value of light rays that are incident/received. Hence the image sensor 104 of a plenoptic camera records an image which comprises a collection of 2D small images (i.e. the micro-lens images referenced 202) arranged within a 2D image (which is also named a raw 4D light-field image). Indeed, each small image (i.e. the micro-lens images) is produced by a micro-lens (the micro- lens can be identified by coordinates (i,j) from the array of lens). Hence, the pixels of the light- field are associated with 4 coordinates (x,y,i,j). L(x,y,i,j) being the 4D light-field recorded by the image sensor illustrates the image which is recorded by the image sensor. Each micro-lens produces a micro-image represented by a circle (the shape of the small image depends on the shape of the micro-lenses which is typically circular). Pixel coordinates (in the image sensor) are labelled (x, y). p is the distance between 2 consecutive micro-images, p is not necessary an integer value in general (however, in the present disclosure, we consider that p is an integer. For example, in FIG.2, we have p=4). Micro-lenses are chosen such that p is larger than a pixel size 6. Micro-lens images are referenced by their coordinate (i,j). Each micro-lens image samples the pupil of the main-lens with the (u, v) coordinate system. Some pixels might not receive any photons from any micro-lens especially if the shape of the micro-lenses is circular. In this case, the inter micro-lens space is masked out to prevent photons to pass outside from a micro-lens, resulting in some dark areas in the micro-images. It should be noted that micro- images can be re-organized into the so-called sub-aperture images. A sub-aperture images collects all 4D light-field pixels (i.e. the pixels that are positioned on the image sensor plane located behind the micro-lens) having the same (u, v) coordinates (the (u, v) coordinates correspond to coordinates on the main lens pupil). [0026] Referring back to FIG.1, FIG 1B provides a sensor 150 and a Color Filter Array (CFA) referenced as 160. In one embodiment, metadata can be added to the content of the CFA. In this example shown, the CFA is mounted on top of the camera and data can become available. In this case, for ease of understanding a Bayer filter is used but other types of filters can be used in other alternative embodiments. [0027] It should be noted that most CFAs (positioned on the image sensor) which is commonly used to sample various colors with pixels performing a single measure. The most common CFA pattern is the Bayer pattern made of 2 by 2 elements (i.e. the representation by the matrix B mentioned previously). For example, the FIG. 1B presents a CFA on the sensor 150, which is made of the repetition of the matrix 160, and where the size of the diameter of the micro-images presents the sub-aperture images obtained from the micro-images.. It appears that all the sub-aperture images are monochromatic. [0028] In the case where a Bayer filter is used as shown in FIG.1, the CFA embodiment can be described by (shown at 162):

Further details are provided as below: 1. the size of the CFA: number of pixels in ^^ and ^^ covered by the CFA. In case

of Bayer CFA the size is 2 ൈ 2 pixels. 2_{. For each cell}

_{ceils of the CFA, the} color filter is described as a flux scale absorption for the Red Green and Blue component. For instance

_, correspond to a filter which let all the Red photon entering the pixels corresponds to a “white” color component:

all the photons are collected into these pixels. More simply, the filter can be described by the corresponding color filter like Red, Green and Blue. By construction a CFA is replicated on the complete sensor. The filter

, o a C _௫ , _௬ is applied to pixels ^ ^^, ^^^ having the coordinate

mod

and

mod Figure 2

illustrates the common Bayer pattern mounted on a sensor. The Bayer pattern is described by:

_{, , ,} and

_, , , In case of a multi-view content, each view will be transmitted with its corresponding description of the CFA. [0029] Figure 6 provides a table that shows one embodiment where audio video information (AVI) can be used to obtain color information. The CFA metadata to AVI infoframe (such as in the CEA861-D standard, as one example) can be used to be added to color information received by a processor. Various types of auxiliary data can be carried from the Source to the DTV Monitor using InfoFrames. The Table shown in FIG. 6 provides one example to ease understanding. This table gives the list and the relative information of AVI infoframe data that can be transmitted, in this example using the CEA861-D standard. It may be possible to enlarge the amount of data to be transmitted (length of AVI Infoframe to be increased) and to add Data Byte information at the end of the table to include new data (e.g. our CFA metadata). [0030] FIG.7 shows another table, as per an alternate embodiment where the AVI infoframe in CEA861-D standard example as discussed above can be extended. In the Table of FIG.7, the embodiment provides a proposed modification that can insert the color filter definition is in yellow. The maximum size of the filter is limit to 4*4 pixels. In this example, 4 bits are allocated to define the filter as a combination of Red, Green and Blue (R, G, B hereinafter). [0031] FIG.3, provides one embodiment of the invention where a view synthesis is provided using a multi-view content. To illustrate this process, RAW data is compared to RGB 444 contents, a view synthesis application is presented. The goal is to use a multi-view input 310 to synthetize one view at any physical position between input views. In one embodiment, a camera array can acquire a set of input views. These input views are sent to the hardware platform 323 that will perform the view synthesis. (Cables to connect cameras to the HW platform can be compatible with what is needed – such as the CEA861-D standard which has been updated to vehicle CFA information- as shown at 320). [0032] In the example of Figure 3, the view synthetizer can use a set or all views, in this example only 4 views are used. On the right hand the synthetized view is shown at 350. Having the CFA information will allow the synthetizer to only use RAW data of each view to perform the synthesis. [0033] The information of the CFA is then used by some embodiments using a view synthesis algorithm. The following sub-section describes a possible implementation of a view synthesis algorithm which uses the CFA information. Other algorithms which could benefit from the CFA description could be used (algorithm such as mult-image demosaicing, multi-image super- resolution, etc.). [0034] The view synthesis denotes the computation of an image from a virtual camera which is located close to the matrix of cameras from which the MVD (Multi-View plus Depth) has been observed/computed. In this case the following steps can be used: 1. Consensus cube – With this step, a cube per input image is computed. It quantifies for many sampled depths (slices), how well all the depth-maps match from the viewing point of the selected input camera. 2. Soft Visibility cube – This cube is computed by integrating the consensus cube. The soft visibility cube quantifies, for a camera viewing point, how much an object is visible from a given pixel. The visibility is said to be “soft” because the depth-map estimations are error prone. As for the consensus cube, the soft visibility is comparable to a probability. 3. Virtual Colour cube estimation – Knowing the consensus and visibility cubes of the input images, a virtual colour cube is estimated from a virtual camera. 4. Virtual image computation from the virtual Colour cube – The virtual colour cube is stacked to form a single virtual image. [0035] These four steps will now be discussed in detail below. First is a discussion of the consensus cube. Consensus denotes how well the depth-maps agrees with one given depth-map. For each input image I_i made of ( N_,Ny ) pixels and its corresponding depth-map D_i,, a consensus cube C_i is computed. The cube C_i is made of ( Nx_, N y,S) pixels where S denotes the number of slices. Each slice s ∈ [1, S] is associated to a distance z which varies inversely proportional from z_min and z_max. The minimum and maximum distances are defines depending on the scene content, it is typically set to the same minimum and maximum distances used to compute the depth-maps. [0036] To define the consensus cube, the pulse function π( a,b ,c) is defined such that:

A^{lso the Heaviside ^^^ ^^, ^^^ function is defined as follow:} i_{f (2)} The value of the consensus at pi

xel ( x,y ) for the camera i at the slice s associated to the distance is equal to:

(₃₎

[0037] Where M is the set of cameras which are used to compute the consensus of camera i. For a precise computation, M is chosen equal to all cameras except camera is the

distance given by the depth-map associated with camera K at pixel coordinate

coordinate is computed by: 1/ de-projecting the pixel coordinate ( x,y ) from camera i into the WCS at ( x,y z) knowing z= D i(x, y), ; and 2/ by projection the WCS at ( x,y ,z) into camera K at coordinate ( x'_k ^, y 'k) . Projection and de-projection are computed with the intrinsic and extrinsic camera parameters. [0038] The consensus is defined as the ratio between the numbers of cameras which agrees that an object is distant to the camera by z= D i(_x, y) divided by the total number of cameras which can still see beyond distance z from the camera. [0039] The computation of the consensus C_i is noisy especially when most of the images are occulted beyond a certain distance. In this case, the denominator of equation (3) tends to zero. One option is to set a minimum value for the denominator. This minimum value is experimentally set to N' /4 where ^^′ is the number of cameras sharing almost the field of view. [0040] The consensus ^^_^ can be smoothed in order to improve its signal to noise. Denoising is performed slice per slice by so-called guided denoising algorithms. A local smoothing kernel is computed with surrounding pixels around C_i( x,y ,s) from the concensus at slice s and around pixels from the observed image I_i(x ,y) . [0041] The Soft Visibility is computed for a given image I_i by integrating its consensus C_i trough slices according to the following equation: (4)

The visibility is equal to 1 for the first slice, and decrease until 0. When the visibility is decreasing toward 0, this means that beyond a given slice, the image I_i is occulted by an object visible at pixel I_i(x ,y) . The max^^ in equation (4) prevents the visibility to decrease bellow 0. This occurs frequently because the consensus is the agreement between all cameras which are able to see beyond occulted objects from the view ^^. Potentially the:

can be equal to ^^ the number of cameras used to compute ^^_^. [0042] The Virtual Color cube estimation is the estimation of a virtual image seen from a virtual camera position is computed with a set of M' observed images I_k such that k ∈ M'. The set M' can be defined as simply as the 4 real cameras closest to the virtual camera. To estimate a virtual image seen from a virtual camera position, a virtual color cube Color_synth(x, y, z ) is preliminary computed. The color cube is in the coordinate system of the virtual camera which is characterized with intrinsic and extrinsic camera parameters. Each slice of this virtual cube is computed as an average of the M' images weighted by the corresponding soft visibility.

[0043] Similarly from equation (3), (x_k, y_k', z_k) denotes the re-projected coordinate (x, y, z) from the virtual camera to the real camera k. The great advantage of this approach is that the integer coordinate ( x , y, s ) from the virtual color cube are computed with a backward warping approach which is made possible thanks to the sampling of z by the cube.

[0044] An example is shown in FIG. 4. FIG. 4 provides an illustration of 6 consecutive slices of a virtual color cube (Top-left: 3 foreground slices, 3 bottom-right: background slices). The virtual color cube is similar to a focal-stack where only objects lying at the given slice are visible. The foreground objects have been removed. [0045] Virtual image computation can be provided by staking the virtual color cube. The virtual color cube is merged to form a unique virtual colour image. It is first required to compute the consensus cube Consensus _synth(x,y,z) and the visibility cube SoftVis_synth(x,y, z) associated with the colour virtual images. Similarly to equation (5) the computation is done by averaging the M' initial consensus or visibility cube:

Both cubes defined above are combined into CC(x, y, z)

CC(x, y, z) = min ( Consensus_synth(x,y,z),SoftVis_synth(x,y, z )) (8)

[0046] The CC is a kind of probability which varies between 0 to 1. The typical values are: - If a given CC(x,y,z) is equal to 1, this means that all cameras agrees that an object is lying at the distance z from the virtual camera, and is seen at the coordinate ( x , y) within the virtual camera.

- A high value CC > 50% is rare it corresponds to object where the depth estimation was accurate (textured areas) and positioned exactly on a slice of the virtual cameras and quite close to the slices of the real cameras.

— CC values are mostly equal to 0 since many slices do not match any object.

- For objects with few details, the depth-maps extracted from the raw images does not agree and the raw consensus is low, it can be as low as 1/N where N is the number of cameras. In this case the CC is also low with values around 1/iV.

— CC values can be lower than 1/N for objects which he between 2 slices. So CC values equal to few percent are common.

The color slices are then weighted by consensus and accumulated until ray visibility reaches zero:

[0047] In practice, the virtual color cube is saved with pixels made of 4 values: Red, Green, Blue and Alpha (RGBA). The RGB encodes the colour as computed by equation (5). The alpha encodes the CC(x, y, z ) component has computed by equation (8).

[0048] Figure 3 shows an application of the view synthesis. As shown 4 left images are observed by 4 central cameras. The right images are synthetic view of the virtual camera located at the middle of the 4 central ones. An algorithm can be applied to the images captured with a matrix of 4 x 4 cameras. 4 consensus and visibility cubes are computed with 128 slices for the 4 central cameras. All depth-maps are contributing to compute the consensus and visibility cubes: the set M is made of 15 cameras. The synthetic colour cube is computed with the 4 central cameras: the set M' is made of 4 cameras. Figure 5 illustrates a detailed view of the 4 original images (4 images on the left), and the synthetized image (right image). This algorithm can produce very accurate results even with scenes made of complex occlusions. It requires a large amount of memory for the M' consensus and visibility cubes. Decreasing the memory occupation can be performed by applied the complete process slice per slice. But care must be taken since a slice of the virtual color cube will intersect several slices of the consensus and visibility cubes associated to the original images. Slice per slice computation is not feasible for matrix of cameras where cameras are not roughly located on the same plane and pointing to the same orientation.

[0049] In one embodiment, an algorithm can also be provided that allow for view synthesis with CFA information provided. In this embodiment, the View synthesis is computed by merging several input pixels l_k(x_k',y_k') ^ml° a single pixel of the synthetic view slice as described in equations (5).

[0050] The coordinate (x_k', y_k') is a not an integer coordinate and interpolation is required to guess the pixel value at that coordinate. The coordinate (x_k,y_k) is decomposed into

where is the neared pixel rounding value, and (. } is the fractional part.

The pixel value of a non-integer coordinate is estimated using the 2 x 2 or 4 x 4 real pixels surrounding that coordinate. The interpolation defines the weight associated with surrounding pixels as a function of the fractional part of the coordinate. For instance with a bi-linear interpolation the interpolated pixel value is computed with 4 weights associated with the 4 surrounding pixels:

[0051] The input images I_k are assumed to be colour images which have been computed by demosaicing from the raw images recorded by the pixel sensor. The interpolation given in equation (10) is applied equally for the Red Green and Blue components. One notices that if the values {x_k'} and {y_k'} are close to 0 or 1 then the interpolated pixel value I_k (x_k,y_k) is almost equal to one of the surrounding input pixels. By contrast, if ( x_k' } and {y_k} are close to 0.5 . In case of bi-linear interpolation, the 4 surrounding pixel values listed in the previous equation are associated to given colours of the CFA mounted on top of the pixel sensor. If one keep only the interpolation pixel values l_k{x_k',y_k' ) such that ({x_k}{y_k}) is small then I_k(x_k,y_k ) ~ I_k([x_k'\, [y_k\). The input pixels are associated to one colour recorded according to the CFA. The h(x_k'_>y_k') input pixels are averaged into a single pixel of the synthetic view according to equation (5). By keeping only the real colour component as observed by the input pixels, the demosaicing is performed naturally. The strategy is to cumulate only colour components which have been observed, and discard the colour components estimated by demosaicing, this idea developed in [2],

[0052] The view synthesis algorithm can be improved as follows. The 3 distances CI_{R G,B}(X' ,y') between the coordinate (x_k,y_k) and the 3 closest colour components observed is used to weight the interpolated pixel value Iii(.x_k,y_k)· The coordinate (C_x{x_k], C_y{y_k'}') is the coordinate of the interpolated pixel within the CFA. In case of a Bayer CFA (2{x_k], 2{y_k}) is the coordinate of the projected pixel within the Bayer matrix. This coordinate is used to compute the closest distance d_{R G B} to a color component of the CFA is computed FIG. 8 provides the distance between the interpolation pixel coordinates and the three colors of a Bayer CFA. It should be noted that the distance di _j (x_k', y_k') between the projected coordinate (x_k', y_k') and the centre of the filter F(i,j ) is computed by:

[0053] The distance is computed for each pixel of the CFA. In case of the Bayer matrix, one computes 4 distances d_ij.. The distance to a given color C is computed by taking the minimum value of the distances to that colour. For instance for the Bayer matrix illustrated in 8, d_R = d_o,o, d_G = min(d_{1 ,0,} d_{0 1}) and d_B = d_{1 ,1}

[0054] The view synthesis algorithm is updated by modifying Equation (5) which becomes:

[0055] Where d_c is the distance between the projected coordinate (x_k,y_k)to the colour C. Equation (12) is applied for the 3 colors components.

[0056] Figure 9 provides one example of an embodiment that compares a slice of virtual color cube. The picture in the right is estimated with the original version, while the one in the left is the version that contains the CFA information.

[0057] In order to motivate the need of having the CFA information associated to an MVD content, a view synthesis using the CFA information introduced above has been partially implemented. Figure 9 also illustrates one slice of the virtual color cube corresponding to one object lying at that slice. The right image is showing the original version as computed by equation (5), and left image is showing the same slice computed with equation (12).

[0058] The view synthesis algorithm is updated by modifying Equation (5) which becomes:

Where d_c is the distance between the projected coordinate (x_k,y_k)to the colour C. Equation (12) is applied for the 3 color components

[0059] FIG. 9 provides an example of a result that can be obtained. In FIG. 9, a slice of a virtual color cube is compared with the version with the CFA information. Reference 910 which appears in the right image is showing the original version as computed by equation (5), and left image (905) is showing the same slice computed with equation (12).

[0060] Figure 10 is a flowchart illustration according to one embodiment. As shown in step 1010 data including corresponding metadata information about a content captured by a camera is received and in step 1020 this is analyzed to determining information about positioniong and respective color of each pixel of said content. Step 1030 deals with determining at least two views of the content, each having a plurality of different pixels. In step 1040 a view synthesis is generated from said views by constructing a single synthetized pixel using said plurality of different pixels of each view. In step 1050, a final rendering can optionally be provided. [0061] Figure 11 schematically illustrates a general overview of an encoding and decoding system according to one or more embodiments. The system of Figure 11 is configured to perform one or more functions and can have a pre-processing module 1100 to prepare a received content (including one more images or videos) for encoding by an encoding device 11400. The pre-processing module 11300 may perform multi-image acquisition, merging of the acquired multiple images in a common space and the like, acquiring of an omnidirectional video in a particular format and other functions to allow preparation of a format more suitable for encoding. Another implementation might combine the multiple images into a common space having a point cloud representation. Encoding device 11400 packages the content in a form suitable for transmission and/or storage for recovery by a compatible decoding device 11700. In general, though not strictly required, the encoding device 11400 provides a degree of compression, allowing the common space to be represented more efficiently (i.e., using less memory for storage and/or less bandwidth required for transmission). In the case of a 3D sphere mapped onto a 2D frame, the 2D frame is effectively an image that can be encoded by any of a number of image (or video) codecs. In the case of a common space having a point cloud representation, the encoding device 6400 may provide point cloud compression, which is well known, e.g., by octree decomposition. After being encoded, the data, is sent to a network interface 11500, which may be typically implemented in any network interface, for instance present in a gateway. The data can be then transmitted through a communication network 11500, such as internet but any other network may be foreseen. Then the data received via network interface 11600 may be implemented in a gateway, in a device. After reception, the data are sent to a decoding device 11700. Decoded data are then processed by the device 11800 that can be also in communication with sensors or users input data. The decoder 11700 and the device 11800 may be integrated in a single device (e.g., a smartphone, a game console, a STB, a tablet, a computer, etc.). In another embodiment, a rendering device 11900 may also be incorporated. [0062] In one embodiment, the decoding device 11700 can be used to obtain an image that includes at least one color component, the at least one color component including interpolated data and non-interpolated data and obtaining metadata indicating one or more locations in the at least one color component that have the non-interpolated data. [0063] A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this disclosure and are within the scope of this disclosure.

Claims

CLAIMS: 1. A method comprising: obtaining data including corresponding metadata information about a content captured by a camera; analyzing said data to determine information about positioning and respective color of one or more pixels of said content; determining at least two views of said content, each having a plurality of different pixels; and generating a view synthesis from said views by constructing a single synthetized pixel using said plurality of different pixels of each view.

2. An apparatus comprising at least one processor configured to: obtain data including corresponding information about a content; analyze said data to determine information about positioniong and respective color of one or more pixels of said content; determine at least two views of said content, each having a plurality of different pixels; generate a view synthesis from said views by constructing a single synthetized pixel using said plurality of different pixels of each view; render of said content using view synthesis

3. The method of claim 1 or apparatus of claim 2, wherein said first view inludes an image using a filter and a second image that includes at least one color component, the at least one color component including non-interpolated data from the first image and interpolated data based on the first image.

4. The method of claim 1 or 3 or the apparatus of claim 2 or 3 wherein said wherein said metadata indicates one or more locations of the non-interpolated data and one or more colors applied by the filter at the one or more locations.

5. The method of any of claims 1 or 3-4 further comprising providing a final rendering of said content using view synthesis.

6. The apparatus of any of claims 2-4 further configured to render said content using view syntheses.

7. The method of claims 1 or 3-5 or apparatus of any of claims 2-4 or 6 wherein said data is RAW data.

8. The method of claim 7 or apparatus of claim 7 wherein said raw data comprises data at least partially captured by a color filter array (CFA) disposed on a camera.

9. The method of claim 8 or apparatus of claim 8, wherein said CFA is captured by a Bayer filter.

10. The method of any of claims 1 or 3-5 or 7-9 or apparatus of any of claims 2 -4 or 6-9 wherein said captured content was captured by a plenoptic camera and multiple views.

11. The method of any of claims 1 or 3-9 or apparatus of any of claims 2 -9, wherein said Metadata includes Auxillary Video Information (AVI) Infoframe transmitted with said content.

12. The method of claim 11 or apparatus of claim 11, wherein said AVI is CEA861-D used by HDMI to transmit information about said content.

13. A method comprising: obtaining an image that includes at least one color component, the at least one color component including interpolated data and non-interpolated data; and obtaining metadata indicating one or more locations in the at least one color component that have the non-interpolated data.

14. A device comprising: at least one processor configured to obtain an image that includes at least one color component, the at least one color component including interpolated data and non-interpolated data; and obtain metadata indicating one or more locations in the at least one color component that have the non-interpolated data.

15. The method of claim 13 or device of claim 14 wherein said metadata includes non- interpolated data.

16. A non-transitory computer-readable medium storing computer-executable instructions executable to perform the method of any of claims 1, 3-5 and 7-13 or 15.