GB2563037A

GB2563037A - Method and apparatus for image compression

Info

Publication number: GB2563037A
Application number: GB1708613.3A
Authority: GB
Inventors: Aflaki Beni Payman; Hannuksela Miska
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2017-05-31
Filing date: 2017-05-31
Publication date: 2018-12-05
Also published as: GB201708613D0; WO2018220260A1

Abstract

An image comprising luma and chroma components (e.g. YUV or YCbCr image) is processed to determine at least two different areas within the image, 704. The chroma component in each of the different areas is then encoded in a different way, 706-712. For example different quantization parameters may be used for the different areas or one of the areas may be subjected to chroma sub-sampling. Further different encoding methods include using different bit-depth representations for the different areas or performing high frequency filtering on one of the areas. The different areas within the image may be determined based on the eye gaze of a user, the presence of a human face or body, the distribution of cone cells in the retina, depth information, the quantity of high frequency components in the luma/chroma components or the presence of alpha-numeric characters. Also disclosed is a method of receiving an MPD file corresponding to an image containing at least two different area in which the representations of the first and second areas have different chroma properties.

Description

TECHNICAL FIELD [0001 ] The present invention relates to a method for compressing images, an apparatus for compressing images, and computer program for compressing images.

BACKGROUND [0002] This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.

[0003] In many codecs input video has a so called YUV format, which is basically a raw uncompressed data video format. It is a collection of raw pixel values in YUV color space. The YUV video is composed of three components, namely one luma component, i.e. Y, and two chroma components, i.e. U, V. Such format was developed to utilize in black and white televisions, where there was a need to have a video signal transmission compatible with both color- and black and white-television infrastructures. In such scheme, the luma component was already available in the broadcasting technology and addition of UV Chroma components kept the technology compatible with both color- and black and white-receiver types.

[0004] Figures 2a—2d illustrate the YUV format. Figure 2a shows an example image as a gray scale image, the original being a colour image. Figure 2b shows the Y component of the image of Figure 2a, Figure 2c shows the U component of the image of Figure 2a, and Figure 2d shows the V component of the image of Figure 2a. Considering the nature of luma and chroma components, and based on the fact that all components are presenting the same scene, there is often a high correlation between the content of different components. For example, several edges and contours are similar. However, if there is not a distinct color difference on the same object in the scene, then there is no contour present in the chroma components while there can be details and contours present on the same object in the luma component. An example is shown in Figures 2a—2d, which in the luma component contains the shape of the text in the middle of the image but the corresponding area in the chroma components is all flat.

[0005] Considering the ever increasing amount of data to be transmitted, it may often be required to encode the images/video more efficiently. This is to meet the broadcasting infrastructure limitation and requirement to less storage capabilities for the same content. Moreover, the encoding/decoding complexity of the large content (4K, several views) may be a problem for some encoders and transmission systems.

SUMMARY [0006] Various embodiments provide a method and apparatus for compressing images to decrease bitrate required to encode images. In accordance with an embodiment, there is provided a method and an apparatus for compressing images unevenly at different spatial locations of the image. For example, based on the structure of human visual system (un-even distribution of cone cells in retina), different chroma qualities may be used in encoding and/or transmitting different parts of an image.

[0007] Various aspects of examples of the invention are provided in the detailed description.

[0008] According to a first aspect, there is provided a method comprising: receiving an image presented by luma and chroma components; determining at least a first area and a second area in the image; and encoding the chroma component of each of the at least two areas differently.

[0009] According to a second aspect, there is provided an apparatus comprising at least one processor; and at least one memory including computer program code the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:

receive an image presented by luma and chroma components; determine at leasta first area and a second area in the image; and encode the chroma component of each of the at least two areas differently.

[0010] According to a third aspect, there is provided an apparatus comprising at least one processor; and at least one memory including computer program code the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:

means for receiving an image presented by luma and chroma components; means for determining at least a first area and a second area in the image; and means for encoding the chroma component of each of the at least two areas differently.

[0011] According to a fourth aspect, there is provided a computer readable storage medium stored with code thereon for use by an apparatus, which when executed by a processor, causes the apparatus to perform:

receive an image presented by luma and chroma components; determine at least a first area and a second area in the image; and encode the chroma component of each of the at least two areas differently.

BRIEF DESCRIPTION OF THE DRAWINGS [0012] For a more complete understanding of example embodiments of the present invention, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:

[0013] Figure la shows an example of a multi-camera unit as a simplified block diagram, in accordance with an embodiment;

[0014] Figure lb shows a perspective view of a multi-camera unit, in accordance with an embodiment;

[0015] Figure 2a—2d illustrate the YUV colour format;

[0016] Figure 3a illustrates an example of cone cell distribution in retina;

[0017] Figure 3b illustrates the structure of an eye;

[0018] Figure 3 c illustrates an example of cone cell distribution in retina and respective chroma compression in an image, in accordance with an embodiment;

[0019] Figures 4a—4c show some alternative central image classifications, in accordance with an embodiment;

[0020] Figures 5a—5c illustrate some examples of layered zones of an image for gradually changing the quality of a chroma component, in accordance with an embodiment;

[0021] Figure 6 shows a schematic block diagram of an apparatus, in accordance with an embodiment;

[0022] Figure 7 shows a flowchart of a method, in accordance with an embodiment;

[0023] Figure 8 shows a schematic block diagram of an exemplary apparatus or electronic device;

[0024] Figure 9 shows an apparatus according to an example embodiment; and [0025] Figure 10 shows an example of an arrangement for wireless communication comprising a plurality of apparatuses, networks and network elements.

DETAILED DESCRIPTON OF SOME EXAMPLE EMBODIMENTS [0026] The following embodiments are exemplary. Although the specification may refer to an, one, or some embodiment(s) in several locations, this does not necessarily mean that each such reference is to the same embodiments), or that the feature only applies to a single embodiment. Single features of different embodiments may also be combined to provide other embodiments.

[0027] Figure la illustrates an example of a multi-camera unit 100, which comprises two or more cameras 102. In this example the number of cameras 102 is eight, but may also be less than eight or more than eight. Each camera 102 is located at a different location in the multi-camera unit 100 and may have a different orientation with respect to other cameras 102. As an example, the cameras 102 may have an omnidirectional constellation so that it has a 360° viewing angle in a 3D-space. In other words, such multi-camera unit 100 may be able to see each direction of a scene so that each spot of the scene around the multi-camera unit 100 can be viewed by at least one camera 102.

[0028] Without losing generality, any two cameras 102 of the multi-camera unit 100 may be regarded as a pair of cameras 102. Hence, a multi-camera unit of two cameras has only one pair of cameras, a multi-camera unit of three cameras has three pairs of cameras, a multi-camera unit of four cameras has six pairs of cameras, etc. Generally, a multi-camera unit 100 comprising N cameras 102, where N is an integer greater than one, has N(N-l)/2 pairs of cameras 102. Accordingly, images captured by the cameras 102 at a certain time may be considered as N(N-l)/2 pairs of captured images.

[0029] The multi-camera unit 100 of Figure la may also comprise a processor 104 for controlling the operations of the multi-camera unit 100. There may also be a memory 106 for for storing data and computer code to be executed by the processor 104, and a transceiver 108 for communicating with, for example, a communication network and/or other devices in a wireless and/or wired manner. The user device 100 may further comprise a user interface (UI) 110 for displaying information to the user, for generating audible signals and/or for receiving user input. However, the multi-camera unit 100 need not comprise each feature mentioned above, or may comprise other features as well. For example, there may be electric and/or mechanical elements for adjusting and/or controlling optics of the cameras 102 (not shown).

[0030] Figure la also illustrates some operational elements which may be implemented, for example, as a computer code in the software of the processor, in a hardware, or both. A 2D to 3D converting element 116 may convert 2D images to 3D images and vice versa; a location determination unit 124 and an orientation determination unit 126, wherein these units may provide the location and orientation information to the system. The location determination unit 124 and the orientation determination unit 126 may also be implemented as one unit. It should be noted that there may also be other operational elements in the multi-camera unit 100 than those depicted in Figure la and/or some of the above mentioned elements may be implemented in some other part of a system than the multi-camera unit 100.

[0031 ] Figure lb shows as a perspective view an example of an apparatus comprising the multi-camera unit 100. In Figure lb seven cameras 102a—102g can be seen, but the multi-camera unit 100 may comprise even more cameras which are not visible from this perspective. Figure lb also shows two microphones 112a, 112b, but the apparatus may also comprise one or more than two microphones.

[0032] In accordance with an embodiment, the multi-camera unit 100 may be controlled by another device, wherein the multi-camera unit 100 and the other device may communicate with each other and a user may use a user interface of the other device for entering commands, parameters, etc. and the user may be provided information from the multi-camera unit 100 via the user interface of the other device.

[0033] The human visual system (HVS) will now be shortly explained with reference to Figures 3a, 3b and 3c. The functioning of a camera is often compared with the workings of the eye 300 of which a simplified diagram is shown in Figure 3b; both focus light from external objects in the visual field onto a light-sensitive screen. Analogously to a camera that sends a message to produce a film, the lens 302 in the eye (pupil) refracts the incoming light onto the retina 304. Several optical and neural transformations are performed to provide visual perception. The retina is made up by millions of specialized photoreceptors known as rods 306 and cones 308. Rods 306 are responsible for vision at low light levels (scotopic vision). They do not mediate color vision and have a low spatial acuity and hence, are generally ignored in the human visual system modeling. An optical nerve 310 couples the cones and rods to brains.

[0034] Cones 308 are active at higher light levels (photopic vision). They are capable of color vision and are responsible for high spatial acuity. There are three types of cones 308 which are generally categorized to the short-, middle-, and long-wavelength sensitive cones i.e. S-cones, M-cones, and L-cones, respectively. These can be thought by an approximation to be sensitive to blue, green, and red color components of the perceived light. Each photoreceptor reacts to a wide range of spectral frequencies, with the peak sensitivity at approximately 440nm (blue) for S-cones, 550nm (green) for Mcones, and 580nm (red) for L-cones. The brain has the ability to fetch up the whole color spectrum from these three color components. This theory, known as trichromaticism, allows one to construct a full-color display using only a set of three components. Despite the fact that perception in typical daytime light level is dominated by cone photoreceptors, the total number of rods in the human retina (approx. 91 million) far exceeds the number of cones (roughly 4.5 million). Hence, the density of rods is much greater than cones throughout most of the retina. However, this ratio changes dramatically in the fovea in the center of the projected image which is the highly specialized region of the retina measuring about 1.2 millimeters in diameter.

The increased density of cones in the fovea is accompanied by a sharp decline in the density of rods. Figures 3a and 3c illustrates an example of cone cell distribution 312 and rod cell distribution 314 in retina at a center line of an eye. Figure 3c also depicts respective chroma compression in an image, in accordance with an embodiment (rectangles 316, 318 and 320).

[0035] Low-pass filtering texture views targets removing the high frequency components (HFCs) while keeping the spatial resolution and general structure of the image untouched. This enables the compression of the same content with reduced number of bits since less detail (high frequency components) need to be encoded. In the case where videos are presented in polarized displays, a downsampling with ratio 1/2 along the vertical direction may be applied to the content. This is because the vertical spatial resolution of the display is divided between the left and right view and hence, each one has half the vertical resolution. In such cases, depending on the display and content, a huge aliasing artifact might be introduced while perceiving the stereoscopic content. However, applying low-pass filtering may reduce such artifact considerably since the high frequency components responsible for the creation of aliasing are removed in a pre-processing stage.

[0036] Eye gaze tracking is a process of measuring or detecting either the point of gaze (where one is looking) or the motion of an eye relative to the head. An eye gaze tracker is a device for measuring eye positions and eye movements and to follow the movement of eye's pupil to figure out to which point the user is looking at. Eye gaze trackers may be used in research on the visual systems and subjective tests enabling researchers to follow the eye movement of users considering different content presented.

[0037] Eye gaze can, for example, be tracked using a camera tracking the movement of pupil in user's eye. The process can be done in real time and with a relatively low processing and resources required.

[0038] An algorithm may be used that can predict the eye gaze based on characteristics of the content. The process may require a considerable amount of operations per pixel and hence, in most of the hand held devices may not be utilized due to extensive power consumption. Such algorithm may use the eye gaze movement and estimate the next places that the eye gaze may be directed to. Such algorithm, may use both the content characteristics and also the tracked previous movement of the eye gaze prior to the current time.

[0039] Scalable video coding may refer to coding structure where one bitstream can contain multiple representations of the content, for example, at different bitrates, resolutions or frame rates. In these cases the receiver can extract the desired representation depending on its characteristics (e.g. resolution that matches best the display device). Alternatively, a server or a network element can extract the portions of the bitstream to be transmitted to the receiver depending on e.g. the network characteristics or processing capabilities of the receiver. A meaningful decoded representation can be produced by decoding only certain parts of a scalable bit stream. A scalable bitstream typically consists of a “base layer” providing the lowest quality video available and one or more enhancement layers that enhance the video quality when received and decoded together with the lower layers. In order to improve coding efficiency for the enhancement layers, the coded representation of that layer typically depends on the lower layers. E.g. the motion and mode information of the enhancement layer can be predicted from lower layers. Similarly the pixel data of the lower layers can be used to create prediction for the enhancement layer.

[0040] In some scalable video coding schemes, a video signal can be encoded into a base layer and one or more enhancement layers. An enhancement layer may enhance, for example, the temporal resolution (i.e., the frame rate), the spatial resolution, or simply the quality of the video content represented by another layer or part thereof. Each layer together with all its dependent layers is one representation of the video signal, for example, at a certain spatial resolution, temporal resolution and quality level. In this document, we refer to a scalable layer together with all of its dependent layers as a “scalable layer representation”. The portion of a scalable bitstream corresponding to a scalable layer representation can be extracted and decoded to produce a representation of the original signal at certain fidelity.

[0041 ] In HEVC, a picture can be partitioned in tiles, which are rectangular and contain an integer number of LCUs. In HEVC, the partitioning to tiles forms a grid comprising one or more tile columns and one or more tile rows. A coded tile is byte-aligned, which may be achieved by adding byte-alignment bits at the end of the coded tile.

[0042] In HEVC, a slice is defined to be an integer number of coding tree units contained in one independent slice segment and all subsequent dependent slice segments (if any) that precede the next independent slice segment (if any) within the same access unit. In HEVC, a slice segment is defined to be an integer number of coding tree units ordered consecutively in the tile scan and contained in a single NAL unit. The division of each picture into slice segments is a partitioning. In HEVC, an independent slice segment is defined to be a slice segment for which the values of the syntax elements of the slice segment header are not inferred from the values for a preceding slice segment, and a dependent slice segment is defined to be a slice segment for which the values of some syntax elements of the slice segment header are inferred from the values for the preceding independent slice segment in decoding order. In HEVC, a slice header is defined to be the slice segment header of the independent slice segment that is a current slice segment or is the independent slice segment that precedes a current dependent slice segment, and a slice segment header is defined to be a part of a coded slice segment containing the data elements pertaining to the first or all coding tree units represented in the slice segment. The CUs are scanned in the raster scan order of LCUs within tiles or within a picture, if tiles are not in use. Within an LCU, the CUs have a specific scan order.

[0043] In HEVC, a tile contains an integer number of coding tree units, and may consist of coding tree units contained in more than one slice. Similarly, a slice may consist of coding tree units contained in more than one tile. In HEVC, all coding tree units in a slice belong to the same tile and/or all coding tree units in a tile belong to the same slice. Furthermore, in HEVC, all coding tree units in a slice segment belong to the same tile and/or all coding tree units in a tile belong to the same slice segment.

[0044] A motion-constrained tile set is such that the inter prediction process is constrained in encoding such that no sample value outside the motion-constrained tile set, and no sample value at a fractional sample position that is derived using one or more sample values outside the motion-constrained tile set, is used for inter prediction of any sample within the motion-constrained tile set.

[0045] In the following, an example of un-even compression of an image by an apparatus 600 of Figure 6 will be described in more detail with reference to the flow diagram of Figure 7. Images to be encoded may be received 702 or obtained otherwise by the apparatus 600 for compression. The images may have been captured by one camera or more than one camera, for example by the multi-camera unit 100. The camera 100 may have been connected with the apparatus 600 via a camera interface 614 of the apparatus, for example, or the images may have been received by other means, such as via the communication interface 612. In the multi-camera situation, images captured substantially at the same time, similar distribution of un-evenly compressed areas may be used, or determination of the compression ratios for different areas within each image may be independent of other images. The image information may be received in the YUV-format or the apparatus 600 may convert the received image information into the YUV-format. For example, if the received images are in RGB (Red-Green-Blue) format, the apparatus 600 may perform RGB-to-YUV conversion. Hence, each pixel of the image will be represented by a luma component and two chroma components. Image information may be stored in to a memory 604 which may include a frame memory 608, for example.

[0046] The apparatus 600 may comprise processing circuitry such as a processor 601 for controlling the operations of the apparatus 600, for performing compression of images etc. The apparatus 600 may further comprise at least a compression determination block 602, a compression block 606, and an encoder/decoder 610. The apparatus 600 may also comprise a user interface 616. It should be noted that at least some of the operational entities of the apparatus 600 may be implemented as a computer code to be executed by the processor 601, or as a circuitry, or as a combination of computer code and circuitry.

[0047] The compression determination block 602 of the apparatus may receive 704 or obtain otherwise information which is related to the determination how to compress each area of the images. The compression determination block 602 may determine for each area or in pixel-wise compression for each pixel what kind of compression to use for that area or pixel. The compression determination block 602 may, for example, produce a compression map in which an indication of a compression factor, a compression ratio or a compression method for each area or each block is included. For example, if only two alternative compression factors will be used for an image, the indication may be, for example, one bit having either a logical state 0 or 1. If more than two compression alternatives are in use, more bits may be needed for that information for each area or pixel. Some examples of such information and how to obtain that information will be presented later in this application.

[0048] Images may be processed in a pixel-by-pixel manner or the image may be processed in larger entities (areas) wherein the compression may be similar to each pixel within the same area. As an example, images may be divided into macroblocks each having 16x16 pixels. Macroblock s may further be arranged in slices, and the slices in groups of slices, for example. Such an entity or pixel will also be called as a coding entity in this specification.

[0049] In H.264/AVC and HEVC, a picture may either be a frame or a field. A frame comprises a matrix of luma samples and possibly the corresponding chroma samples. A field is a set of alternate sample rows of a frame and may be used as encoder input, when the source signal is interlaced. Chroma sample arrays may be absent (and hence monochrome sampling may be in use) or chroma sample arrays may be subsampled when compared to luma sample arrays. Chroma formats may be summarized as follows:

In monochrome sampling there is only one sample array, which may be nominally considered the luma array.

In 4:2:0 sampling, each of the two chroma arrays has half the height and half the width of the luma array.

In 4:2:2 sampling, each of the two chroma arrays has the same height and half the width of the luma array.

In 4:4:4 sampling when no separate color planes are in use, each of the two chroma arrays has the same height and width as the luma array.

[0050] In H.264/AVC and HEVC, it is possible to code sample arrays as separate color planes into the bitstream and respectively decode separately coded color planes from the bitstream. When separate color planes are in use, each one of them is separately processed (by the encoder and/or the decoder) as a picture with monochrome sampling.

[0051] When chroma subsampling is in use (e.g. 4:2:0 or 4:2:2 chroma sampling), the location of chroma samples with respect to luma samples may be determined in the encoder side (e.g. as a pre-processing step or as a part of encoding). The chroma sample positions with respect to luma sample positions may be pre-defined for example in a coding standard, such as H.264/AVC or HEVC, or may be indicated in the bitstream for example as part of VUI of H.264/AVC or HEVC.

[0052] A partitioning may be defined as a division of a set into subsets such that each element of the set is in exactly one of the subsets.

[0053] In H.264/AVC, a macroblock is a 16x16 block of luma samples and the corresponding blocks of chroma samples. For example, in the 4:2:0 sampling pattern, a macroblock contains one 8x8 block of chroma samples per each chroma component. In H.264/AVC, a picture is partitioned to one or more slice groups, and a slice group contains one or more slices. In H.264/AVC, a slice consists of an integer number of macroblocks ordered consecutively in the raster scan within a particular slice group.

[0054] In the following, it is assumed that pixels within an area will be processed in the same order in each area of the image, but it may also be possible to have different processing orders in different areas. If, for example, the macroblock-based processing is used, pixels within the macroblock may be processed in a so-called zig-zag processing order.

[0055] Based on the structure of human visual system (un-even distribution of cone cells in retina), different chroma qualities for different parts of an image may be used in encoding and/or transmitting the image.

[0056] The compression may be performed by a compression block 606 of the apparatus 600. The compression block 606 selects 706 an area of pixels for processing and obtains 708 e.g. from the compression map compression information which corresponds with this area of pixels. The compression block 606 uses this information to select 710 appropriate compression method for that area of pixels and performs the compression 712. After the compression it may be examined 714 whether the whole image has been examined and compressed. If so, the images may be encoded and transmitted 716 e.g. to a communication network to be received and decoded by another apparatus. If the whole image has not yet been compressed, another coding entity may be selected and the above process repeated.

[0057] In the following, some examples of selecting a compression procedure will be explained in more detail.

[0058] The un-even distribution of cone cells in retina and majority of concentration of those cells (to recognize the color information of the scene) in the center and very limited number of cells in the side areas of the retina may be taken into consideration to enhance compression efficiency.

[0059] In accordance with an embodiment, the image area may be considered to include a central part, wherein the central part may be compressed with the original quality while the rest of the image (e.g. sides, parts surrounding the central part) could be compressed with degraded quality. The distribution of high/low quality may depend on the cone cell distribution on retina. Figures 4a—4c show some central image classifications, in accordance with an embodiment. In Figure 4a the central part of the image has a rectangular shape, in Figure 4b the central part of the image has an oval shape, and in Figure 4c the central part of the image has a rectangular shape. In the examples of Figures 4a—4c the central area extends in the vertical direction from the top of the image to the bottom of the image but it may also be possible that the height of the central area is less than the height of the image.

[0060] The principle of classifying the image area to include the central part and one or more other parts may be utilized so that more coarse compression may be applied to only one or both of the chroma components (UV) of pixels located outside the central part. It may also be possible that all the three components of the YUV-component format, i.e. luma and two chroma components, are compressed with degraded quality outside the central part. Furthermore, it may be possible that the compression need not be similar to each of the chroma components but both chroma components may be compressed separately and individually.

[0061] In accordance with an embodiment, viewer's eye gaze may be followed and further compression may be performed for the chroma component(s) in the areas which are not in the direct viewing direction of the user. In other words, stronger compression may be used in those areas of the image which are farther from the center of the viewer’s gaze than areas near or at the center of the viewer’s gaze. In this embodiment the direction of the gaze of the viewer may be determined and followed by using some appropriate eye gaze tracking method.

[0062] In accordance with an embodiment, statistics of different viewers are collected and may specifically comprise viewers’ viewing direction in the same image or in different parts of the same video. This statistics may be stored and retrieved when playing back the same video for a current viewer. Hence, utilizing the statistics of viewers’ viewing direction for the same video/image content chroma components may be compressed more coarsely in the areas that are according to the statistics unlikely to be in the direct viewing direction. Viewing direction may include, for example, user's viewport orientation in 360-degree video/image content and/or gaze direction. Viewport orientation may correspond, for example, to the head orientation when a head-mounted display is used.

[0063] In accordance with an embodiment, it may be possible to determine how details in different areas of the image are moving. In other words, motion in the image may be followed. Hence, those areas which have higher motion may be kept intact (less or no compression at all), and in the rest of the image the chroma components may be encoded more coarsely.

[0064] In another embodiment, an object tracking method may be used to track the moving objects in the scene and keeping the quality of chroma components for those moving objects intact while the quality of chroma components for other static objects in the scene are reduced.

[0065] In accordance with an embodiment, it may be possible to consider the high frequency components (HFCs) in the luma components and the areas with lower high frequency components for luma components may be encoded with lower quality of chroma components.

[0066] In accordance with an embodiment, it may be possible to consider the high frequency components (HFCs) in the chroma components and the areas with lower high frequency components for chroma components may be encoded with lower quality of chroma components.

[0067] In accordance with an embodiment, it may also be possible to take into account depth information and the chroma components related to the areas farther from the camera may be encoded more severely.

[0068] In accordance with another embodiments, it may also be possible to take into account depth information and the chroma components related to the areas where depth value is changing, is kept intact while other areas may be used for Chroma component quality degradation.

[0069] In accordance with an embodiment, objects or areas with alpha-numeric characters may be detected from the image and the other areas (i.e. areas with no alphanumeric characters) may be encoded with lower quality of chroma components.

[0070] In accordance with an embodiment, human faces and/or human bodies may be detected in the image, wherein other areas may be encoded with lower quality of chroma components.

[0071 ] In the following, some example methods which may enable further compression of chroma components will be shortly described. The chroma components may be down-sampled to reduce the number of samples of the chroma components. For example, downsampling may chosen to result into 4:2:0 chroma format in areas where further chroma compression is desired, while using 4:4:4 chroma format otherwise. It may also be possible to decrease the bit-depth allocated to chroma components, or otherwise reduce the value range used to represent chroma components. As another option, the quantization step for transform coefficients may be increased to encode chroma components. Furthermore, the chroma components may be low-pass filtered to remove any potential high frequency components.

[0072] Considering the cone cell distributions, a function can be considered to define the compression of chroma components while going farther from the center of the image. For example, considering Figures 5a—5c and assuming that the field of view for human is about 180 degrees, then the central 20 degrees concentration of cones could be encoded with the highest quality of chroma samples. This is illustrated with the dotted lines 502 in Figure 5a. The next 20 degrees may be encoded with a lower chroma quality. In other words, it may be possible to gradually change the chroma component quality going from center part to sides of the image. This is illustrated with the dotted lines 504 in Figure 5a. Figure 5a shows a rectangular center part, in accordance with an embodiment. Figure 5b shows an oval center part and the dotted lines 502, 504 illustrate how the chroma component quality could gradually change going from center part to sides of the image, in accordance with an embodiment. Figure 5c shows a diamond shaped center part and the dotted lines 502, 504 illustrate how the chroma component quality could gradually change going from center part to sides of the image, in accordance with an embodiment. In these Figures the chroma component quality decreases when the layered zones get closer to the edges of the image. Similar approach for transition between high to low quality may be taken into account to further reduce the chroma quality for the areas where there are less cone cells to perceive the color. For example, the rest of the viewing degrees (illustrated with the area 506 in Figures 5a—5c) may be encoded with the lowest chroma component quality.

[0073] It should be noted that the layers in the image may be created so that the main changing direction to be aligned is the horizontal direction i.e. the width of the image. This is because the un-even distribution of the cones in retina reflects more to the horizontal direction of the image.

[0074] In some cases (e.g. when the chroma quality is controlled by selecting a quantization step size), the central part and the layered zones may be selected to follow coding unit boundaries.

[0075] In an embodiment, the sizes of the central part and the layered zones depend on the physical screen size and the viewing distance, i.e. the angular size of the image.

The viewing distance may be detected for example with a camera mounted on the screen pointing towards the user, and detecting eyes from the camera image and measuring inter-pupillary distance. In another example, the viewing distance may be detected with a depth sensing camera. In yet another example, the physical screen size and the viewing distance are approximately or exactly known. For example, when the content is viewed with a head-mounted display, the physical screen size and the viewing distance are determined by the optical setup of the head-mounted display. A receiver may indicate the information indicative of the physical screen size and the viewing distance to the encoder.

[0076] The different chroma qualities can be achieved based, for example, on at least one of the following approaches. Different bit-depth (number of bits) may be assigned to present the chroma contents. In this scenario, the higher the number of bits to present the chroma components of any specific region of the image, the higher the quality of the color components in that region may be. The value range to represent chroma components may be reduced. For example, rather than using a nominal value range of 16 to 240 (for 8-bit chroma components), chroma components could be rescaled to occupy a value range of 32 to 224. The chroma transform coefficients may be quantized with a coarser quantization parameter. Chroma components may also be lowpass filtered to remove any high frequency components and hence, the required amount of bitrate to encode the samples may be reduced. It may also be possible to use different color gamuts, e.g. ITU-R BT.2020 for the central part and ITU-R BT.709 for the other parts.

[0077] Furthermore, different chroma format may be used for the central part than the other parts. For example, 4:4:4 sampling may be used for the central part and 4:2:0 sample for the other parts.

[0078] It should be noted that the further compression of chroma components may or may not follow the compression decision on the luma components. This means that if the compression decision on the luma components is to encode them more severely, this may translate directly to further encoding the chroma components too. However, in some embodiments, the encoding of chroma components may be completely independent from encoding of the luma components. It means that the chroma quality of components (level of compression) maybe defined solely based on the function translating the number of cone cells in retina rather than any other factor which may change the compression strength of the luma components. The compression of chroma components may also be selected separately and differently for each of the two chroma components (U, V).

[0079] In an embodiment, multiple versions of the content are encoded with motionconstrained tile sets (or alike). The versions may have the same luma fidelity and different chroma fidelity compared to each other. Different chroma fidelity may be achieved by any of the above-described methods for achieving further compression of chroma components, such as downsampled chroma sample array size (compared to the luma sample array size), selecting bit-depth and/or value range for chroma samples, quantization step size for chroma, low-pass filtering of chroma. A tile stream may be extracted from each tile set position of the encoded bitstream. Tile streams from different versions may be selectively transmitted so that for the central part the chroma is represented with a better quality than in the adjacent areas.

[0080] In an embodiment, content is compressed with motion-constrained tile sets (or alike) and the chroma components are coded in a scalable manner. This may enable compression of the content in an even manner and trimming the stream at the time of transmission. The chroma components are selectively transmitted so that for the central part a greater number of scalable layers (or alike) are transmitted compared to the adjacent areas. The chroma component coding in a scalable manner may be achieved in any of the following ways.

[0081] Bit-depth scalability may be used for chroma components. Motion-constrained tile sets (or alike) may need to be used for the enhancement layer but not necessarily for the base layer.

[0082] Color gamut scalability may be used, wherein motion-constrained tile sets (or alike) may need to be used for the enhancement layer but not necessarily for the base layer.

[0083] Chroma format scalability may be used, wherein motion-constrained tile sets (or alike) may need to be used for the enhancement layer but not necessarily for the base layer.

[0084] The input sequence for the base layer encoding may be processed e.g. by lowpass filtering chroma components, compared to the input sequence for the enhancement layer encoding. Motion-constrained tile sets (or alike) may need to be used for the enhancement layer but not necessarily for the base layer.

[0085] Data partitioning may be used in creating more than one partition of chroma transform coefficients.

[0086] In an embodiment, rather than using motion-constrained tile sets, region-ofinterest enhancement layers are encoded.

[0087] To enable client-driven selective streaming, tile streams and/or scalable layers in the above-described embodiments may be described in terms of their chroma properties in a streaming manifest, a media presentation description, or alike (hereafter jointly referred to as the streaming manifest). In an embodiment, a client parses the streaming manifest, and specifically the chroma properties of available tile streams and/or scalable layers. The client selects the tile streams and/or scalable layers in a manner that the chroma fidelity according to the parsed chroma properties for the central part is higher than that for the adjacent areas. The above-described embodiments for selectively encoding areas of the image with different chroma properties can be applied with the present embodiment by selectively requesting areas of the image with different chroma properties. The selection may be done for example by requesting the tile streams and/or scalable layers through their respective Uniform Resource Locators (URLs) parsed from the streaming manifest, e.g. using the HTTP protocol. The client may receive, decode, and play the selected tile streams and/or scalable layers.

[0088] Although the above examples described the operation in a multi-camera unit 100, similar principles may be applied to systems in which separate cameras are used and which are in a communication connection with a control element such as a server, wherein the control element may receive image information from the cameras and perform the image correction tasks using the principles presented above. The cameras may provide or the control unit may obtain in another way information on the location and pose of the cameras to determine overlapping areas.

[0089] The following describes in further detail suitable apparatus and possible mechanisms for implementing the embodiments of the invention. In this regard reference is first made to Figure 8 which shows a schematic block diagram of an exemplary apparatus or electronic device 50 depicted in Figure 9, which may incorporate a transmitter according to an embodiment of the invention.

[0090] The electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system. However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may require transmission of radio frequency signals.

[0091] The apparatus 50 may comprise a housing 30 for incorporating and protecting the device. The apparatus 50 further may comprise a display 32 in the form of a liquid crystal display. In other embodiments of the invention the display may be any suitable display technology suitable to display an image or video. The apparatus 50 may further comprise a keypad 34. In other embodiments of the invention any suitable data or user interface mechanism may be employed. For example the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display. The apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input. The apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of an earpiece 38, speaker, or an analogue audio or digital audio output connection. The apparatus 50 may also comprise a battery 40 (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator). The term battery discussed in connection with the embodiments may also be one of these mobile energy devices. Further, the apparatus 50 may comprise a combination of different kinds of energy devices, for example a rechargeable battery and a solar cell. The apparatus may further comprise an infrared port 41 for short range line of sight communication to other devices. In other embodiments the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/FireWire wired connection.

[0092] The apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50. The controller 56 may be connected to memory 58 which in embodiments of the invention may store both data and/or may also store instructions for implementation on the controller 56. The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller 56.

[0093] The apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a universal integrated circuit card (UICC) reader and a universal integrated circuit card for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.

[0094] The apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network. The apparatus 50 may further comprise an antenna 60 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).

[0095] In some embodiments of the invention, the apparatus 50 comprises a camera 42 capable of recording or detecting imaging.

[0096] With respect to Figure 10, an example of a system within which embodiments of the present invention can be utilized is shown. The system 10 comprises multiple communication devices which can communicate through one or more networks. The system 10 may comprise any combination of wired and/or wireless networks including, but not limited to a wireless cellular telephone network (such as a global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), long term evolution (LTE) based network, code division multiple access (CDMA) network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.

[0097] For example, the system shown in Figure 10 shows a mobile telephone network 11 and a representation of the internet 28. Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.

[0098] The example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22, a tablet computer. The apparatus 50 may be stationary or mobile when carried by an individual who is moving. The apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle or any similar suitable mode of transport.

[0099] Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station

24. The base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the internet 28. The system may include additional communication devices and communication devices of various types.

[0100] The communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.11, Long Term Evolution wireless communication technique (LTE) and any similar wireless communication technology. A communications device involved in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection. In the following some example implementations of apparatuses utilizing the present invention will be described in more detail.

[0101] Although the above examples describe embodiments of the invention operating within a wireless communication device, it would be appreciated that the invention as described above may be implemented as a part of any apparatus comprising a circuitry in which radio frequency signals are transmitted and received. Thus, for example, embodiments of the invention may be implemented in a mobile phone, in a base station, in a computer such as a desktop computer or a tablet computer comprising radio frequency communication means (e.g. wireless local area network, cellular radio, etc.).

[0102] In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits or any combination thereof. While various aspects of the invention may be illustrated and described as block diagrams or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as nonlimiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

[0103] Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

[0104] Programs, such as those provided by Synopsys, Inc. of Mountain View,

California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or fab for fabrication.

[0105] The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention.

[0106] In the following some examples will be provided.

[0107] According to a first example, there is provided a method comprising: receiving an image presented by luma and chroma components; determining at least a first area and a second area in the image; and encoding the chroma component of each of the areas differently.

[0108] In some embodiments the method comprises:

using cone cells distribution in the retina of human eye to determine the at least the first area and the second area in the image.

[0109] In some embodiments the method comprises:

using an eye gaze of a user to determine the at least the first area and the second area in the image.

[0110] In some embodiments the method comprises:

using different quantization parameter for the second area than the first area to achieve different encoding for each of the areas.

[0111] In some embodiments the method further comprises:

using different bit-depth representation of chroma components for the second area than the first area to achieve different encoding for each of the areas.

[0112] In some embodiments the method further comprises:

removing at least part of high frequency components of the chroma components in the second area to achieve different encoding for each of the areas.

[0113] According to a second example, there is provided an apparatus comprising at least one processor; and at least one memory including computer program code the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:

receive an image presented by luma and chroma components;

determine at least two different areas in the image; and encode the chroma component of each of the at least two areas differently.

[0114] In some embodiments of the apparatus, said at least one memory including computer program code configured to, with the at least one processor, cause the apparatus to:

[0115] In some embodiments of the apparatus, said at least one memory including computer program code configured to, with the at least one processor, cause the apparatus to:

[0116] In some embodiments of the apparatus, said at least one memory including computer program code configured to, with the at least one processor, cause the apparatus to:

[0117] In some embodiments of the apparatus, said at least one memory including computer program code configured to, with the at least one processor, cause the apparatus to:

[0118] In some embodiments of the apparatus, said at least one memory including computer program code configured to, with the at least one processor, cause the apparatus to:

moving at least part of high frequency components of the chroma components in the second area to achieve different encoding for each of the areas.

[0119] According to a third example, there is provided an apparatus comprising at least one processor; and at least one memory including computer program code the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:

determine at least a first area and a second area within an image area; receive a media presentation description comprising representations corresponding to areas of the image area, the representations comprising luma and chroma components and being characterized in the media presentation description by chroma properties;

select a first representation from said representations for the first area, and a second representation from said representations for the second area, the first and second representations having different chroma properties.

[0120] In some embodiments of the apparatus said at least one memory including computer program code configured to, with the at least one processor, cause the apparatus to perform the determining based on one or more of the following:

cone cells distribution in the retina of human eye; an eye gaze of a user;

the amount of high frequency components of luma or chroma components in the image;

depth information of the image;

presence of one or more areas including alpha-numeric characters;

presence of human faces or human bodies or both.

[0121] In some embodiments the apparatus, said at least one memory including computer program code configured to, with the at least one processor, cause the apparatus to achieve the different representations by using a different dequantization parameter for the second area than the first area.

[0122] In some embodiments the apparatus , said at least one memory including computer program code configured to, with the at least one processor, cause the apparatus to achieve the different representations by using a different bit-depth representation of chroma components in the second area than the first area.

[0123] According to a fourth example, there is provided an apparatus comprising: means for receiving an image presented by luma and chroma components; means for determining at least a first area and a second area in the image; and means for encoding the chroma component of each of the areas differently.

[0124] In some embodiments the apparatus comprises:

means for using cone cells distribution in the retina of human eye to determine the at least the first area and the second area in the image.

[0125] In some embodiments the apparatus comprises:

means for using an eye gaze of a user to determine the at least the first area and the second area in the image.

[0126] In some embodiments the apparatus comprises:

means for using different quantization parameter for the second area than the first area to achieve different encoding for each of the areas.

[0127] In some embodiments the apparatus comprises:

means for using different bit-depth representation of chroma components for the second area than the first area to achieve different encoding for each of the areas.

[0128] In some embodiments the apparatus comprises:

means for d removing at least part of high frequency components of the chroma components in the second area to achieve different encoding for each of the areas.

[0129] According to a fifth example, there is provided a computer readable storage medium stored with code thereon for use by an apparatus, which when executed by a processor, causes the apparatus to perform:

receive an image presented by luma and chroma components;

Claims

1. A method comprising:

receiving an image presented by luma and chroma components; determining at least a first area and a second area in the image; and encoding the chroma component of each of the areas differently.

2. The method according to claim 1,wherein the determining is based on one or more of the following:

cone cells distribution in the retina of human eye; an eye gaze of a user;

depth information of the image;

presence of one or more areas including alpha-numeric characters; presence of human faces or human bodies or both.

3. The method according to claim 1 or 2, wherein the different encoding is achieved by using a different quantization parameter for the second area than the first area.

4. The method according to claim 1 or 2, wherein the different encoding is achieved by using a different bit-depth representation of chroma components in the second area than the first area.

5. The method according to claim 1 or 2, wherein the different encoding is achieved by removing at least part of high frequency components of the chroma components in the second area.

6. The method according to claim 1 or 2, wherein the different encoding is achieved by sub sampling one or more of the chroma components in the second area.

7. A method comprising determining at least a first area and a second area within an image area; receiving a media presentation description comprising representations corresponding to areas of the image area, the representations comprising luma and chroma components and being characterized in the media presentation description by chroma properties;

selecting a first representation from said representations for the first area, and a second representation from said representations for the second area, the first and second representations having different chroma properties.

8. The method according to claim 7,wherein the determining is based on one or more of the following:

cone cells distribution in the retina of human eye; an eye gaze of a user;

depth information of the image;

9. The method according to claim 7 or 8, wherein the different representations are achieved by using a different dequantization parameter for the second area than the first area.

10. The method according to claim 7 or 8, wherein the different representations are achieved by using a different bit-depth representation of chroma components in the second area than the first area.

11. An apparatus comprising at least one processor; and at least one memory including computer program code the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:

determine at least a first area and a second area within an image area; receive a media presentation description comprising representations

5 corresponding to areas of the image area, the representations comprising luma and chroma components and being characterized in the media presentation description by chroma properties;

select a first representation from said representations for the first area, and a second representation from said representations for the second area, the first and second

10 representations having different chroma properties.

12. The apparatus according to claim 11, said at least one memory including computer program code configured to, with the at least one processor, cause the apparatus to perform the determining based on one or more of the following:

15 cone cells distribution in the retina of human eye;

an eye gaze of a user;

depth information of the image;

20 presence of one or more areas including alpha-numeric characters;

presence of human faces or human bodies or both.

13. The apparatus according to claim 11 or 12, said at least one memory including computer program code configured to, with the at least one processor, cause the

25 apparatus to achieve the different representations by using a different dequantization parameter for the second area than the first area.

14. The apparatus according to claim 11 or 12, said at least one memory including computer program code configured to, with the at least one processor, cause the apparatus to achieve the different representations by using a different bit-depth representation of chroma components in the second area than the first area.

15. An apparatus comprising at least one processor; and at least one memory including computer program code the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:

receive an image presented by luma and chroma components;

16. The apparatus according to claim 15, said at least one memory including computer program code configured to, with the at least one processor, cause the apparatus to perform any of the claims 2 to 6.

17. An apparatus comprising:

means for receiving an image presented by luma and chroma components; means for determining at least two different areas in the image; and means for encoding the chroma component of each of the at least two areas differently.

18. A computer readable storage medium stored with code thereon for use by an apparatus, which when executed by a processor, causes the apparatus to perform:

receive an image presented by luma and chroma components;