GB2524956A

GB2524956A - Method and device for encoding a plenoptic image or video

Info

Publication number: GB2524956A
Application number: GB1405854.9A
Authority: GB
Inventors: Hervã Le Floch
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2014-04-01
Filing date: 2014-04-01
Publication date: 2015-10-14
Anticipated expiration: 2034-04-01
Also published as: GB201405854D0; GB2524956B

Abstract

A plenoptic image, 600, is encoded by initially obtaining sub-aperture images, 601, from the plenoptic image and then estimating a value representative of the disparity between each corresponding pixel of at least two sub-aperture images (for example by using motion vector defining the disparity between corresponding pixels of two sub-aperture images.) Pixels are then classified, 603, 604, into groups based on the disparity value and regions of pixels defined based on their classification, 605, 607. The plenoptic image is then encoded, 609, using encoding parameters, at least one of which varies according to the defined image regions. This may involve encoding the different regions using different compression rates. The method may be used to encode a plenoptic video composed of plenoptic images. The method enables more efficient encoding of plenoptic images by allowing some regions of the image (for example those regions that can be accurately refocused) at a different compression rate to other regions, such as those that cannot be accurately refocused. The invention also relates to a device comprising processing means and encoding means configured to perform the method.

Description

I

Method and device for encoding a plenoptic image or video

FIELD OF THE INVENTION

The invention relates to a method and device for processing light field images, also called plenoptic images. The invention relates more particularly to improved encoding of a digital plenoptic image, and may in particular be used to improve compression of a plenoptic image.

The processed image may notably be a plenoptic digital photograph, or an image of a plenoptic video sequence.

BACKGROUND OF THE INVENTION

Plenoptic images are 2D images captured by an optical system different from conventional cameras. A plenoptic image can be refocused in a range of virtual focus planes after it is taken.

In a plenoptic system such as a plenoptic digital camera, an array of micro-lenses is located between the sensor and the main lens of the camera.

Depending on the system, the array of micro-lenses may be placed at the focal plane of said main lens or so that the micro-lenses are focused on the focal plane of the main lens.

A given number of pixels of the sensor are located underneath each micro-lens. Through this micro-lens array, the sensor captures pixel values that are related to the location and the orientation of light rays inside the main lens. By processing the captured plenoptic images comprising such information, the displacement or "disparity" of image parts that are not in focus can be analyzed and depth information can be extracted. This makes it possible to change the focusing plane of the 2D image after the capture of the image, and thus to refocus the image, i.e. virtually change the focal plane of the image and/or extend or shorten the focal depth of the image. By changing the focusing point or plane, sharpness and blur on the objects located at different depths in the 3D scene can be modified on the 2D image.

This refocusing provides the advantage of generating different 2D images with different focusing points. It enables different camera parameters to be simulated, namely the lens aperture and the focal plane.

Theoretical aspects of plenoptic imaging are set out for example in the document "Digital Light Field Photography, a dissertation submitted to the department of computer science and the committee on graduate studies of Stanford University in partial fulfillment of the requirements for the degree of doctor of philosophy" by Ren Ng, dated July 2006.

The document "Full Resolution Lightfield Rendering" by Andrew Lumsdaine and Todor Georgiev (January 2008, Adobe Systems Inc.) discloses an advanced plenoptic system.

The present invention relates to a method and device for processing, and more particularly for compressing plenoptic images.

Compression of light fields or plenoptic images is described in several documents. In particular, it is described in the documents U56476805, USSi 03111, and US81 55456. These documents relate respectively to use of inter-coding, use of LZW compression, and block coding, in the context of plenoptic imaging.

However, the processing and compression of plenoptic images remains to be optimized.

SUMMARY OF THE INVENTION

The applicant has noticed that it is not always possible to refocus the image on any focusing plane, especially when the number of pixels underneath each micro-lens is insufficient. When the main lens aperture used during the image capture is wide (e.g. f/i.4, f/2.8), the refocusing is efficient only in a range situated around the focal plane of the main lens used to capture the image.

In other words, it is possible to refocus the image only for the three dimensional regions of the scene that are situated in a depth close to the depth of the three-dimensional regions already in focus at the time of the image capture.

Consequently, in two-dimensional images, only some regions of the image can be refocused efficiently, depending on the focal plane and aperture of the main lens used when the image is taken.

A method and device are provided herein that take this feature of a plenoptic image into account to afford improved encoding.

According to a first aspect of the invention, there is provided a method for encoding a plenoptic image comprising pixels generated using an array of micro-lenses, said plenoptic image comprising pixels generated via an array of micro-lenses, and comprising a plurality of micro-lens images, each micro-lens image formed by an associated micro-lens. The method comprises the steps of: -obtaining sub-aperture images from the plenoptic image, each sub-aperture image being composed of the pixels located at the same position in each micro-lens image; -estimating a value representative of the disparity between the corresponding pixels of at least two sub-aperture images; -classifying pixels into groups based on the value representative of the disparity; -defining image regions based on the classification of pixels; and -encoding the plenoptic image using encoding parameters, wherein at least one encoding parameter varies according to the defined image regions.

Thus, the method enables different regions of the image to be encoded according to different parameters, e.g. to compress with different compression rates, or to compress the regions of the plenoptic image using different compression algorithms. More particularly, the regions of the plenoptic image that can be efficiently refocused are compressed in a different way than the regions that cannot be refocused correctly. For example] if the end-user wishes to have an excellent resolution for the refocused areas of the plenoptic image, the regions that can be refocused by the end user have to be encoded using parameters that ensure a high quality of compression (i.e. with small or no losses). On the contrary, if the end user wishes to keep the smoothness of the out-of-focus regions, for example to to maintain a peasant appearance (bokeb) for the out of focus regions, the encoding parameters may be chosen to ensure the best quality of compression (i.e. with small or no losses) in these regions.

In such a method, each group of pixels may be associated with a range or ranges of the value representative of the disparity.

Classifying the pixels may comprise comparing the value representative of disparity associated with each pixel to a predefined threshold separating the values representative of disparity into two ranges, and -classifying said pixels in a first group if the value is lower than or equals the threshold, -classifying said pixels in a second group if the value is higher than the threshold.

The encoding parameters may in particular determine a compression rate, the compression rate being different for different encoded regions.

In a preferred embodiment, the compression rate applied to a region is correlated to the values or range of values representative of the disparity associated with the region.

In a method according to an embodiment of the invention, defining a region of pixels comprise modifying group boundaries to eliminate isolated pixels and/or to smooth the outline of the group. Modifying group boundaries may comprise applying an erosion filter and a dilation filter.

In a method according to an embodiment of the invention, the steps of classifying the pixels and forming regions may be implemented in the sub-aperture images. The plenoptic image may be encoded in the form of a succession of sub-aperture images.

In a preferred embodiment of the invention, a first region of pixels and a second region of pixels may be formed, the first region corresponding to a part of the image comprising the pixels situated in the focusing planes in which the image can be refocused with a predefined sharpness, and the second region corresponding to the remaining part of the image comprising the pixels situated in the planes in which the image cannot be refocused.

In a variant of the invention, the value representative of the disparity may be derived from a motion vector defining the disparity between the corresponding pixels of two sub-aperture images. In a particular embodiment, the value representative of the disparity is one of the components of the motion vector defining the disparity between the corresponding pixels of two sub-aperture images.

In another variant, the value representative of the disparity may be derived from an average motion vector defining the disparity between the corresponding pixels of three or more sub-aperture images. In a particular embodiment, the value representative of the disparity is one of the components of the average motion vector defining the disparity between the corresponding pixels of three or more sub-aperture images.

In a preferred embodiment of the invention, encoding uses HEVC or MV-H EVO.

According to a second aspect of the invention, there is provided a method for encoding a plenoptic video composed of plenoptic images said plenoptic images comprising pixels generated via an array of micro-lenses, and comprising a plurality of micro-lens images, each micro-lens image formed by an associated micro-lens. The method comprises encoding each plenoptic image of the plenoptic video according to a method for encoding a plenoptic image according to the first aspect of the invention.

The invention also relates to a device for encoding a plenoptic image comprising said plenoptic image comprising pixels generated via an array of micro-lenses, and comprising a plurality of micro-lens images, each micro-lens image formed by an associated micro-lens. Said device comprises: * processing means configured to obtain sub-aperture images from the plenoptic image, each sub-aperture image being composed of the pixels located at the same position in each micro-lens image; * classifying means configured to estimate a value representative of the disparity between each corresponding pixel of at least two sub-aperture images, to classify the pixels into groups based on the value representative of the disparity, and to define image regions based on the classification of pixels; and * encoding means configured to encode the plenoptic image using encoding parameters, wherein at least one encoding parameter varies according to the defined image regions.

In such a device, the encoding means may be configured to encode according to parameters giving rise to a compression rate, the compression rate being different for different encoded regions. Encoding may use HEVC or MV-HEVC.

The classifying means may be configured to form a first region of pixels and a second region of pixels, the first region corresponding to a part of the image comprising the pixels situated in the focusing planes in which the image can be refocused with a predefined sharpness; the second region corresponding to the remaining part of the image comprising the pixels situated in the planes in which the image cannot be refocused.

In a method or device according to the invention, the micro-lens images or the plenoptic image may have the same shape and number of pixels, the pixels having the same location in each micro-lens image.

DETAILLED DESCRIPTION OF EMBODIEMENTS OF THE INVENTION

Other particularities and advantages of the invention will also emerge

from the following description.

In the accompanying drawings, given by way of non-limiting examples: Figure 1 illustrates in a block diagram an example of processing implemented on a plenoptic video, as an example of processing in which the invention may be implemented; Figure 2 illustrates the general principle of a plenoptic system; Figure 3 illustrates schematically a plenoptic system; Figure 4 illustrates the principle implemented for construction of sub-aperture images; Figure 5 illustrates schematically the creation of a refocused image; Figure 6 illustrates on a schematic diagram an example embodiment of a method according to the invention implemented on a still plenoptic image; Figure 7 illustrates on a schematic diagram an example embodiment of a method according to the invention implemented on a plenoptic video; Figure Sa illustrates on a schematic diagram an example of the main steps of one encoding scheme that may be implemented in the invention; Figure 8b illustrates on a schematic diagram another example of the main steps of one encoding scheme that may be implemented in the invention; Figure 9 illustrates in a block diagram an encoding method which may be implemented in an embodiment of the invention; Figure 10 illustrates a variant embodiment of a method described in Figures 6; Figure 11 illustrates in a block diagram an example of a method of classifying parts of a plenoptic image into 2 regions; Figure 12 illustrates in a block diagram another example of a method of classifying parts of a plenoptic image into 2 regions; Figure 13 illustrates schematically a device according to an embodiment of the invention.

Figure 1 illustrates in a block diagram an example of processing implemented on a plenoptic video. Such processing is an example of a process in which the invention may be implemented. Of course, the invention is not dedicated only to plenoptic video compression. The invention can also be implemented for still plenoptic image encoding.

At step 100, a plenoptic video is available. This plenoptic video is composed of several plenoptic images. At step 101, one image of this video is extracted for being compressed using an encoding algorithm.

At step 102, the plenoptic image extracted at step 101 is compressed by a lossy compression algorithm.

The elementary stream coming from this compression algorithm can be encapsulated, stored or transmitted on networks for subsequent decompression at step 103.

The result of the decompression performed at step 103 is a decompressed plenoptic image available at step 104. The decompressed plenoptic image has a lower quality than the original corresponding plenoptic image extracted at step 101 due to the losses induced by the compression algorithm and the possible resulting compression artefacts. The lower the compression ratio, the less the compression artefacts are present in the decompressed image.

A post-processing algorithm, such as a refocusing algorithm, is implemented on the decompressed plenoptic image at step 105, to obtain a refocused image, available at step 106.

In the case of processing a still image such as a plenoptic photograph instead of a video, the above described process starts at the step where a single image is available, a lossy compression algorithm being applied to said image at step 102.

Figure 2 illustrates very schematically the general principle of a plenoptic camera. The illustrated plenoptic camera comprises a main lens 200. In real embodiments of a camera, the main lens 200 generally comprises several lenses one behind the other. The main lens 200, represented here as a single lens, may thus comprises a group or several groups of lenses.The plenoptic camera also comprises an array of micro-lenses 201 that is located between a sensor 202 and the camera main lens 2001.

The distance between the array of micro-lenses and the sensor is equal to the focal length of the micro-lenses.

Plenoptic systems are commonly classified into two categories, generally called "Plenoptic camera 1.0" and "Plenoptic camera 2.0". In Plenoptic camera 1.0, the array of micro-lenses is situated at the focal plane of the main lens. This enables good sampling of the orientation of the light field inside the camera. The counter-part of this high sampling quality in the light field orientation is a lower spatial resolution. In plenoptic camera 2.0, the array of micro-lenses is situated so that the micro-lenses are focused on the focusing plane of the main lens. This enables a higher spatial resolution.

Figure 3 illustrates a plenoptic system. As explained with reference to Figure 1, the system comprises an array of micro-lenses 300. One micro-lens is located at a given position 301 on the array of micro lenses 300.

The sensor 303 comprises pixels. A group of pixels is located in a sensor area 302 situated under the micro-lens located at the given position 301.

The distance between the sensor plane and the micro-lens array plane equals the focal length of the micro-lenses.

A detailed view of the sensor area 302 is shown on the part of Figure 3 (surrounded view). In the represented example, the sensor area 302 comprises 49 pixels (corresponding to a 7x7 pixels array), located under a single micro-lens located at the given position 301 of the array of micro-lenses. More generally, the sensor comprises as many sensor areas as the number of micro lenses comprised on the array of micro-lenses. Each sensor area has the same pixel counts and the pixels of each sensor area have the same distribution over the sensor area.

The number of pixels constituting a sensor area depends on the camera characteristics. For a given pixel density on the sensor, the higher the number of pixels constituting a sensor area, the better the refocusing capability of the camera, but the lower the spatial resolution.

Each micro-lens thus generates a micro-lens image on a corresponding sensor area, each micro-lens image having the same shape and comprising the same pixel count, said pixels having the same disposition in each micro-lens image.

Figure 4 illustrates the principle implemented for construction of sub-aperture images. In a general manner, a sub-aperture image corresponds to an image formed by extracting the same pixel under each micro-lens (i.e. the same pixel in each micro-lens image).

The sensor of the plenoptic camera comprises pixels. As explained in reference to Figure 3, the pixels are associated with micro-lens images, and a micro-lens image is an image composed of the pixels under the corresponding micro-lens.

A plenoptic image is composed of a set of adjacent micro-lens images.

The micro-lens images may be designated using their coordinates in a reference associated with the micro-lens array. For example, Ml(x,y) designates a micro-lens image whose coordinates (x,y) are the horizontal and vertical coordinates of the corresponding micro-lens over the micro-lens array.

For example, in Figure 4 a plenoptic image 400 is illustrated with 2 micro-lens images 401 (in this example Ml(0,0) and Ml(1,0)). In the schematic illustration of this example, the sensor areas corresponding to the two micro-lens images 401 appear slightly separated: this separation does not generally exist on the actual image and sensor, and is drawn only for explanation purposes.

Each micro-lens image Ml(x,y) is composed of several pixels (7x7 in the described example). The horizontal and vertical indices of the pixel for a given micro-lens may respectively be called (u,v). For example, the two given pixels 403 in Figure 4 may be respectively denoted Ml(0,0,1,2) and Ml(1,0,1,2).

A sub-aperture image extraction process 402 is implemented on the plenoptic image 400. Several sub-aperture images can be created by extraction from the plenoptic image. A sub-aperture image may be called SI (for "sub-image") and is built from all the pixels having the same coordinates (u,v) across each micro-lens image. For example, by extracting all the pixels (1,2) across the micro-lens images, the sub-aperture image 404 denoted S1(1,2) is created.

Figure 5 illustrates the creation of a refocused image. Several sub-aperture images (500, 501.. .502) are used to implement a refocusing process 503, and obtain a refocused image 504. Refocusing is performed by shifting and adding sub-aperture images extracted from the plenoptic image as explained with reference to Figure 4. The shift depends on the coordinates (u,v) of the sub-aperture image Sl(u,v) and on the chosen refocusing plane. More particularly, during the refocusing process 503, a given refocusing factor is chosen and the sub-aperture images Sl(u,v) are shifted and added with a shift equal to (u(1-1/ 13), v(1-1I 3)).

The refocusing process 503 is a post-processing process that makes it possible to change the depth of the focusing plane of the image around the focal plane of the main lens of the plenoptic system. In other words, the end-user may choose which plane or planes of the image are sharp after the image has been taken.

However, not all depth planes can be in focus. The degree of possible refocusing depends on the number of pixels under each micro-lens and depends on the main lens aperture used when the image is taken. Consequently, the regions of the images that can be refocused (these regions may be called "Refocusable Regions") will vary according to these characteristics. The regions that cannot be refocused are generally blurred.

Figure 6 illustrates on a schematic diagram an example embodiment of a method according to the invention implemented on a still plenoptic image.

The process aims to encode the plenoptic image using different parameters, for instance different qualities of compression or compression rates for different regions of the plenoptic image. In the represented embodiment, two regions of the plenoptic image are defined: * the first region that can be refocused (i.e. the region corresponding to refocusing planes on which it is possible to refocus the image); * the second region that cannot be refocused.

The second region refers to image areas where, even though refocussing processes can be performed, a desired degree of focussing does not result because of the optical properties of the camera taking the image. For example, the end user may wish to have an excellent image quality in the region of the image that can be refocused. For the other regions, the end user may not be interested by a high image quality, because details of the image have already been lost when taking the photograph.

At step 600, a still plenoptic image is available. From this plenoptic image, sub-aperture images are extracted at step 601, according to a process as described with reference to Figure 4.

In the represented example, an array of 3x3 pixels is situated under each micro-lens. Each micro-lens image thus corresponds to 9 pixels. At step 602, the nine sub-aperture images denoted 1 to 9 resulting from the sub-aperture extraction are available.

Next, a classifying process 603, 604 is performed. Two regions of the plenoptic image are defined. A process for defining the two regions is described more in detail in Figures 11 and 12.

In order to define these two regions, at least two sub-aperture images are used. When two sub-aperture images are used, only the classifying process 603 can be performed. When more than two sub-aperture images are used, the classifying process 603 using two sub-aperture images is performed and the generalized classifying process 604 is performed using at least one of the other available sub-aperture images. One of the two regions is encoded with a higher quality than the other one, e.g. with a lower compression rate or using a compression algorithm which causes less loss.

The shape of the two defined regions is similar for all sub-aperture images. For example, after the classifying process 603, 604, the area 607 of the sub-aperture image denoted 1 corresponds to the region that can be refocused and thus has to be encoded with a high quality. The remaining area 605 of the sub-aperture image denoted 1 corresponds to the region that cannot be refocused.

Equally, the area 608 of the sub-aperture image denoted 9 corresponds to the region that can be refocused, while the remaining area 606 corresponds to the region that cannot be refocused. Areas 607 and 608 correspond to the same pixels in sub-aperture images denoted 1 and 9.

Of course, the same region exists in each sub-aperture image. Once the two regions have been defined, the compression encoding (comprising compression) is performed at step 609 for encoding the 9 sub-aperture images with different encoding parameters depending on the region of the image. The nine sub-aperture images are encoded at step 609 for example according to a predictive compression scheme. For example, HEVC or MV-HEVC (for "High Efficiency Video Coding" and "Multi-view High Efficiency Video Coding") can be used. Any other predictive scheme may also be used for encoding the nine sub-aperture images.

This results in an elementary stream 610 that may be for example stored or sent over a communication network.

Figure 7 represents another embodiment of the invention. This embodiment is dedicated to plenoptic video. As in Figure 6, an array of 3x3 pixels is situated under each micro-lens. Thus, each video frame comprises micro-lens images each constituted by 9 pixels.

At step 700, a plenoptic video is available. The plenoptic video will next be processed as successive plenoptic images. At step 701, a plenoptic image of the video is selected for being encoded. The sub-aperture representation of the selected image is generated at step 702. This representation comprises 9 sub-aperture images (703).

At least two of these sub-aperture images are used (704) for determining two regions: * The region that can be refocused. This region correspond to the area 708, 709 of the sub-aperture images respectively denoted 1 and 9 in the represented example.

* The region that cannot be refocused. This region correspond to the area 706 and 707 of the sub-aperture images respectively denoted I and 9 in the represented example.

As previously explained with reference to Figure 6 these two regions have the same shape and correspond to the same pixels in each sub-aperture image.

During the encoding or compression process 710, the two regions are encoded using different encoding parameters. Any compression scheme can be used. For example, predictive compression schemes like HEVC or MV-HEVC can be used for compressing the sub-aperture images with different quality. One such compression scheme is roughly described in reference to Figures 8a and 8b.

The previous steps are performed for each image of the plenoptic video, resulting in encoded sub-aperture images integrated in an elementary stream 712.

Figure 8a describes the main steps of a compression method that can be used in the invention. In this example, nine sub-aperture images denoted I to 9 are illustrated. This corresponds for example to the use of a 3x3 array of pixels per micro-lens. Of course, the number of sub-aperture images could be different, depending on the number of pixels under each micro-lens, and could be greater than nine or less than nine.

Different predictive schemes can be used for compressing a sub-aperture image. For example, the sub-aperture images 1-9 can be encoded as INTRA images. In such a case, each sub-aperture image is self-sufficient and is encoded without references to other sub-aperture images.

Typically, the first sub-aperture image 1 can be encoded as an INTRA image at step 800. For INTRA coding, many encoding tools are known: Coding Tree Units and Coding Units representation, spatial prediction, quantization, entropy coding, and so on.

Sub-aperture images can also be encoded as predicted image using Inter Coding at step 801 and 802. In such a coding mode, the sub-aperture image (e.g. sub-aperture image 2) is encoded with reference to another sub-aperture image (e.g. sub-aperture image 1). When encoding a video, this also enables the current frame to be encoded by using a temporal prediction in addition to the spatial prediction. The sub-aperture images can also be encoded by using several reference sub-aperture frames. An example of encoding using a multi-reference scheme is illustrated: sub aperture images 4 and 5 are used for the compression of the sub-aperture image 6.

Another example is given for the encoding of sub-aperture image 8, which uses sub-aperture images 7 and 9.

Other compression schemes based on inter-layer prediction can also be used. For example a compression scheme called MV-HEVC (Multi-View High Efficiency Video Coding) may be used. The principle of such a compression scheme is illustrated in Figure 8b. The sub-aperture images can be organized into layers. For example, three layers are defined in Figure 8b: * Layer 1 contains the sub-aperture images 1, 2 and 3 * Layer 2 contains the sub-aperture images 4, 5 and 6 * Layer 3 contains the sub-aperture images 7, 8 and 9 Multi-view compression algorithm like MV-HEVC enable INTRA compression, temporal compression (sub-aperture image 2 is encoded with respect to sub-aperture image 1; sub-aperture image 9 is encoded with respect to sub-aperture image 8) and inter-layer compression (sub-aperture image 4 is encoded in reference to sub-aperture image 1, sub-aperture image 7 is encoded in reference to sub-aperture image 4).

Figure 9 illustrates in a block diagram an encoding method which may be implemented in an embodiment of the invention. In this example, a video compression algorithm like HEVC is used. However, any other similar compression scheme could be used.

At step 900 a plenoptic image that will be encoded is provided. This image can be an image of a plenoptic video or a still plenoptic image. This image is decomposed into sub-aperture images, as described in Figure 4. At step 901, the whole set of sub-aperture images corresponding to the plenoptic image is available. Based on this set of sub-aperture images, at least two sub-aperture images are used for defining at step 902 the region that can be refocused available at step 903 and the complementary region (corresponding to the areas that cannot be refocused correctly) available at step 904.

The classification of the regions may be performed as described in reference to Figure 6 or Figure 7, and detailed in Figure 11 and Figure 12. The segmentation/classification into two regions is the same for all the sub-aperture images.

At step 905, one of the sub-aperture images is selected for being encoded using, for example, one of the compression schemes described in Figure 8a or Sb.

The compression scheme may be HEVC. In HEVC, the image to be compressed is split at step 906 into a rectangular grid called a CTU (for Coding Tree Unit) partition. Each CTU is selected at step 907 for being compressed using a predictive/quantization scheme.

The intersection of the CTU with the two regions defined at step 902 is calculated at step 908. We recall that step 902 provides the first region 903 which is the region that can be correctly refocused and the second region 904 wherein no refocus can be conducted.

If the CTU falls within or overlaps the first region, a quantization level will be used for quantifying the prediction residual used in the compression scheme.

This quantization level is called 02. If the intersection is not empty, a second quantization level 01 is selected to be used in this intersection. In HEVC, the quantization levels vary between 0 and 51. The lower the quantization step, the better the quality of the compressed CTU. If a better image quality should be ensured in the region that can be refocused compared to the quality in the other regions then 01 is lower than 02.

For example, the 01 could be calculated using this formula: 01 = 02 -ST with ST 3* The relationship between 01 and 02 could be also determined at step 913, with a visual interface in which the end user can manually tune the quantization step or the difference between the quantization steps. For example, the plenoptic image is compressed according to the steps 901 to 912 and the resulting elementary stream is decoded as a set of decoded sub-aperture images that are refocused as described in Figure 5. If the end user wishes to increase the differential quality between refocused regions and regions that cannot be refocused, he could decide to increase the value ST.

Once the quantization level has been selected at step 909, the CTU is encoded at step 910. For example, the best Coding Unit partition, the best Prediction Unit partition and the best compression modes are selected according to the HEVC compression algorithm.

An entropy coding is used at step 911 for generating the elementary stream, available at final step 912.

All the CTU associated with the sub-aperture images are encoded according to this scheme.

Figure 10 represents a variant of the algorithm described in Figures 6.

In this Figure, the plenoptic image available at step 1000 is directly compressed at step 1006 without using the sub-aperture representation of step 1001. This compression can be based on a still image compression algorithm like JPEG.

JPEG compression algorithm splits the image into blocks for compressing each block independently. The sub-aperture representation is only used for classifying, at step 1003 and 1004, the sub-aperture images into regions that can be refocused and regions that cannot be refocused.

Once this classification has been carried out on the sub-aperture images, the classification is transferred at step 1005 to the plenoptic image. For example, if a pixel of a sub-aperture image is classified as a "refocusable" pixel, its corresponding location in the plenoptic image is classified as "refocusable". This location can be easily determined: the relationship between pixel location in the sub-aperture image and pixel location in the plenoptic image is explained in Figure 4.

Next, during the encoding step 1006, a block will be compressed with a quantization level Qi if at least one pixel of the block is classified as a refocusing pixel. If no pixel is classified as a refocusing pixel then the block is compressed with a quantization level 02.

Figure 11 illustrates in a block diagram an example of a method of classifying parts of a plenoptic image into two regions. This Figure gives more details on an example of a classifying method as used in Figure 6 at step 603, Figure 7 at step 704, and Figure 10 at step 1003. For illustrating this classification, two adjacent sub-aperture images are chosen at step 1100 and 1101. This is not mandatory however. Taking non adjacent sub-aperture images requires only changing the threshold value 1' 1109.

For example, if 9 pixels arranged in a 3x3 array are disposed under each micro-lens, the sub-aperture images Sl(1,1) and Sl(1,2) may be used. Any other pair of adjacent sub-aperture images may be considered (Sl,j); Sl,j-'-1)) or (S,j),S+1,j)). A motion estimation algorithm is performed at step 1102 for estimating the motion between the two considered sub-aperture images. This motion between the sub-aperture images is also called disparity. The disparity may be defined by a motion vector. Different algorithms for determining the disparity vectors may be used, for example: * a so called optical flow method enables a pixel based motion between two images to be calculated; * a block based motion estimation algorithm (quadtree decomposition and translation model per leaf of the quadtree) The motion estimation algorithm can be simplified depending on the pair of selected sub-aperture images at steps 1100, and 1101. Indeed, if the pair (Sl,j); Sl,j+1)) is selected, the motion should be "vertical" (i.e. along the second coordinate). If the pair (SlU,fl; Sl(i+1,j)) is selected, the motion should be "horizontal" (i.e. along the first coordinate) due to the properties of the plenoptic images.

Therefore, the disparity estimation algorithm used may be simplified by estimating only the horizontal or vertical components of the motion.

Once the motion has been estimated at step 1102, a motion map is generated at step 1103. This pixel map has the same dimensions as the sub-aperture image. One motion vector is associated with each pixel. This motion map is simplified at step 1104, for example using a median filtering operation (of size 3x3 in this example). This is followed by the selection for each pixel of the maximum between the absolute value of the horizontal and vertical components of each motion vector. Once this operation has been carried out, a thresholding algorithm 1105 is performed. The value of each pixel is compared with a threshold T. In the described example, if the absolute value associated with the pixel is lower than (or equal to) T, then the pixel is labelled 0". If the pixel value is higher than the threshold T, the pixel is labelled "1". The result of the thresholding algorithm is a label map (binary picture), available at step 1106. The areas having adjacent pixels having a same label are called regions. These regions may correspond to the regions where the image can or cannot be refocused.

Further processing may be carried out on these regions for better results. For example, the label map or binary picture may be first eroded and next dilated at step 1107, using morphological filters. The erosion and dilation filter preferably use the same structuring element. The erosion filter allows deletion of the isolated labels 1' defining isolated regions, while the dilation restores the initial dimension of the regions defined by the label 1' that have not been deleted by the erosion. The result is a map available at step 1108 in which pixels labeled 0" define the "refocusable" region and the pixels labeled with 1 are the "non-refocusable" region. Each region may comprise separated areas.

The threshold T may depend on two criteria: * the two sub-aperture images used; and * the number of pixels under each micro-lens.

If the index between two sub-aperture images equals 2 (e.g. Sl,j); Sl,j+2)), the threshold used is 2T'. The threshold is then defined as xT, where x corresponds to the difference in coordinates between the two sub-aperture images used. In other words, the sub-aperture images used when a threshold xT is used are: Sl(i,j); Sl(i,j+x) or Sl(i,j); Sl(i+x,j).

The value of the threshold T can be set experimentally from a given set of images. Thus, to define T, several images may be taken with different camera parameters and the refocusing algorithm is then performed on this set of images. A "training" operation is carried out, which consists in determining the minimum and maximum factor such that the refocusing algorithm described in the Figure 5 is efficient, i.e. at least one part of the refocused image is sharp. As previously explained, refocusing is being based on shifting or translating and adding sub-aperture images. The translation is given by the formula: (u(1-1/ 3), v(1-1/ 3)). The minimum and maximum factor 3 (min, max) are calculated. Next, the threshold T is calculated according to the equation: T = max(ABS(1-1/ min), ABS(1-1/ max)).

The higher the number of pixels under each micro-lens, the higher the value of T. Figure 12 represents in a block-diagram another example of a method of classifying parts of a plenoptic image into two regions. Indeed, the classification algorithm described in Figure 11 may be generalized by using more than two sub-aperture images. Figure 12 describes an example of a generalized classification method.

At steps 1200 and 1201, two sub-aperture images are first used. In this example, the two sub-aperture images are adjacent. For example, if 9 pixels arranged in a 3x3 array are disposed under each micro-lens, the sub-aperture images Sl(1,1) and Sl(1,2) may be used.

Any other pair of adjacent sub-aperture images may be considered (Sl(i,j); Sl(i,j+1)) or (S(i,j),S(i+1,j)). A motion estimation algorithm is performed at step 1204 for estimating the disparity between the two considered sub-aperture images. The disparity may be defined by a motion vector. Different algorithms for determining the disparity vectors may be used, for example: * a so called optical flow method enables calculation of a pixel based motion between two images; * a block based motion estimation algorithm (quadtree decomposition and translation model per leaf of the quadtree).

The motion estimation algorithm can be simplified depending on the pair of selected sub-aperture images at steps 1200, and 1201. Indeed, if the pair (SlO,j); SlO,j+1)) is selected, the motion should be vertical" (i.e. along the second coordinate). If the pair (Sl,j); Sl(i+1,j)) is selected, the motion should be "horizontal" (i.e. along the first coordinate) due to the properties of the plenoptic images.

Therefore, the disparity estimation algorithm used could be simplified by estimating only the horizontal or vertical components of the motion.

Once the motion has been estimated at step 1204, a first motion map is generated at step 1206. This pixel map has the same dimensions as the sub-aperture image. One motion vector is associated with each pixel.

At step 1202 and 1203, two other sub-aperture images are used. For example, if 9 pixels arranged in a 3x3 array are disposed under each micro-lens, the sub-aperture images Sl(2,1) and Sl(2,2) may be used. Any other pair of adjacent sub-aperture images, different from the pair considered at steps 1200 and 1201, may be considered (Sl(i,j); Sl(i,j+1)) or (S(i,j),S(i+1,j)). A motion estimation algorithm is performed at step 1205 for estimating the disparity between the two considered sub-aperture images. The disparity may be defined by a motion vector.

This motion estimation algorithm is the same as at step 1204.

The motion estimation algorithm can be simplified depending on the pair of selected sub-aperture images at steps 1202, and 1203. Indeed, if the pair (Sl(i,j); SlO,j+1)) is selected, the motion should be "vertical" (i.e. along the second coordinate). If the pair (Sl(i,j); Sl(i+1J)) is selected, the motion should be "horizontal" (i.e. along the first coordinate) due to the properties of the plenoptic images.

Once the motion has been estimated at step 1205, a second motion map is generated at step 1207. This pixel map has the same dimensions as the sub-aperture image. One motion vector is associated with each pixel.

The two generated motion maps are merged at step 1208, resulting in a merged motion map available at step 1209.

Several merging algorithms are possible. One example of an algorithm consists in taking for each pixel the average motion vector (average value of each component) between the two motion maps.

In a variant, the motion estimation algorithm takes into account more than two sub-aperture images, for example four sub-aperture images are used. In such a case only one motion map is generated and no merging step is necessary.

An example of such an algorithm based on a block-based approach consists in: * Using the same block partition between the first pair of images and the second pair of images.

Estimating simultaneously the motion vectors between the sub- aperture images of the first pair of sub-aperture images and the sub-aperture images of the second sub-aperture images.

This motion map is simplified at step 1209, for example using a median filtering (of size 3x3 in this example). It is followed by the selection for each pixel of the maximum between the absolute value of the horizontal and vertical components of each motion vector. Once this operation has been carried out, a thresholding algorithm 1211 is performed. The value of each pixel is compared with a threshold T. If the absolute value associated with the pixel is lower than (or equals) T, the pixel is labelled 0". If the pixel value is higher than the threshold T, the pixel is labelled "1". The result of the thresholding algorithm is a label map (binary picture), available at step 1212. The areas having adjacent pixels having a same label are called regions. These regions may correspond to the regions where the image can or cannot be refocused.

Further processing may be carried out on these regions for better results. For example, the label map or binary picture may be first eroded and next dilated at step 1213, using morphological filters. The erosion and dilation filter preferably use the same structuring element. The erosion filter allows deletion of the isolated labels i' defining isolated regions, while the dilation restores the initial dimension of the regions defined by the label 1' that have not been deleted by the erosion. The result is a map available at step 1214 in which pixels labeled "0" define to the "refocusable" region and the pixels labeled with I are the "non-refocusable' region. Each region may comprise separated areas.

Figure 13 schematically represents a device according to an embodiment of the invention.

In the illustrated embodiment, the device 1300 comprises a central processing unit (CPU) 1301 capable of executing instructions from program ROM 1303 on powering up of the device, and instructions relating to a software application from main memory 1302 after the powering up.

The main memory 1302 is for example of Random Access Memory (RAM) type. The memory capacity can be expanded by an optional RAM extension connected to an expansion port (not illustrated).

Instructions relating to the software application may be loaded into the main memory 1302 from the hard-disk (HD) 1306 or the program ROM 1303 for example. Such a software application, when executed by the CPU 1301, causes an embodiment of a method for encoding a plenoptic image according to the invention to be pertormed.

A network interface 1304 may allow the connection of the device to a communication network. The software application when executed by the CPU may thus receive data from other devices through the network interface.

A user interface 1305 may allow information to be displayed to a user, and/or inputs to be received from a user.

The present invention thus provides a method and device that allows optimized encoding (and in particular compression) of plenoptic images or video, taking into account what the end-user wishes or the intended future use of the image or video. More particularly, different compression rates, or the best suited compression algorithms may be used for encoding the regions of the plenoptic images that can be refocused and the regions that cannot be refocused.

Although the present invention has been described hereinabove with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a skilled person in the art which lie within the scope of the present invention.

Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.

In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Claims

CLAIMS1. A method for encoding a plenoptic image comprising pixels generated using an array of micro-lenses, said plenoptic image comprising pixels generated via an array of micro-lenses] and comprising a plurality of micro-lens images, each micro-lens image formed by an associated micro-lens the method comprising the steps of: -obtaining sub-aperture images from the plenoptic image, each sub-aperture image being composed of the pixels located at the same position in each micro-lens image; -estimating a value representative of the disparity between the corresponding pixels of at least two sub-aperture images; -classifying pixels into groups based on the value representative of the disparity; -defining image regions based on the classification of pixels; and -encoding the plenoptic image using encoding parameters, wherein at least one encoding parameter varies according to the defined image regions.
2. A method according to claim 1, wherein each group of pixels is associated with a range or ranges of the value representative of the disparity.
3. A method according to claim 1, wherein classifying the pixels comprises comparing the value representative of disparity associated with each pixel to a predefined threshold separating the values representative of disparity into two ranges, and -classifying said pixels in a first group if the value is lower than or equals the threshold, -classifying said pixels in a second group if the value is higher than the threshold.
4. A method according to any of the preceding claims, wherein the encoding parameters determine a compression rate, the compression rate being different for different encoded regions.
5. A method according to claim 2 or claim 3, wherein the encoding parameters define a compression rate, the compression rate applied to a region being correlated to the values or range of values representative of the disparity associated with the region.
6. A method according to any one of the preceding claims, wherein defining a region of pixels comprises modifying group boundaries to eliminate isolated pixels and/or to smooth the outline of the group.
7. A method according to claim 6, wherein modifying group boundaries comprises applying an erosion filter and a dilation filter.
8. A method according any one of the preceding claims, wherein the steps of classifying the pixels and forming regions are implemented in the sub-aperture images.
9. A method according to claim 8 wherein the plenoptic image is encoded in the form of a succession of sub-aperture images.
10. A method according to any one of the preceding claims, wherein a first region of pixels and a second region of pixels are formed, the first region corresponding to a pad of the image comprising the pixels situated in the focusing planes in which the image can be refocused with a predefined sharpness; the second region corresponding to the remaining part of the image comprising the pixels situated in the planes in which the image cannot be refocused.
11. A method according to any one of the preceding claims, wherein the value representative of the disparity is derived from a motion vector defining the disparity between the corresponding pixels of two sub-aperture images.
12. A method according to claim 11, wherein the value representative of the disparity is one of the components of the motion vector defining the disparity between the corresponding pixels of two sub-aperture images.
13. A method according to any one of claims ito 10, wherein the value representative of the disparity is derived from an average motion vector defining the disparity between the corresponding pixels of three or more sub-aperture images.
14. A method according to claim 12, wherein the value representative of the disparity is one of the components of the average motion vector defining the disparity between the corresponding pixels of three or more sub-aperture images.
15. A method according to any one of the preceding claims, wherein encoding uses HEVC or MV-HEVC.
16. A method for encoding a plenoptic video composed of plenoptic images said plenoptic images comprising pixels generated via an array of micro-lenses, and comprising a plurality of micro-lens images, each micro-lens image formed by an associated micro-lens, wherein the method comprises encoding each plenoptic image of the plenoptic video according to a method of any one of claims ito 15;
17. A device for encoding a plenoptic image comprising said plenoptic image comprising pixels generated via an array of micro-lenses, and comprising a plurality of micro-lens images, each micro-lens image formed by an associated micro-lens; the device comprising -processing means configured to obtain sub-aperture images from the plenoptic image, each sub-aperture image being composed of the pixels located at the same position in each micro-lens image; -classifying means configured to estimate a value representative of the disparity between each corresponding pixel of at least two sub-aperture images, to classify the pixels into groups based on the value representative of the disparity, and to define image regions based on the classification of pixels; and -encoding means configured to encode the plenoptic image using encoding parameters, wherein at least one encoding parameter varies according to the defined image regions.
18. A device according to claim 17, wherein the encoding means is configured to encode according to parameters giving rise to a compression rate, the compression rate being different for different encoded regions.
19. A device according to claim 17 or claim 18, wherein encoding uses HEVC or MV-H EVO.
20. A device according to any one of claims 17 to 19, wherein the classifying means is configured to form a first region of pixels and a second region of pixels, the first region corresponding to a part of the image comprising the pixels situated in the focusing planes in which the image can be refocused with a predefined sharpness; the second region corresponding to the remaining part of the image comprising the pixels situated in the planes in which the image cannot be refocused.
21. A method or device according to any one of the preceding claims, wherein the micro-lens images or said plenoptic image have the same shape and number of pixels, the pixels having the same location in each micro-lens image.
22. A method or device for encoding a plenoptic image substantially as hereinbefore described and with reference to the accompanying drawings.