WO2017093604A1

WO2017093604A1 - A method, an apparatus and a computer program product for encoding and decoding video

Info

Publication number: WO2017093604A1
Application number: PCT/FI2016/050835
Authority: WO
Inventors: Jani Lainema
Original assignee: Nokia Technologies Oy
Priority date: 2015-11-30
Filing date: 2016-11-29
Publication date: 2017-06-08

Abstract

There are disclosed various methods, apparatuses and computer program products for video encoding. In some embodiments the curved intra prediction method comprises receiving an indication on a first prediction direction; receiving an indication on at least one other prediction direction; determining an active prediction direction for at least one sample based on said first prediction direction, said at least one other prediction direction and a location of said at least on sample; and determining a predicted value for said at least one sample based on said active prediction direction and a set of reference sample values.

Description

A METHOD, AN APPARATUS AND A COMPUTER PROGRAM PRODUCT FOR ENCODING AND DECODING VIDEO

TECHNICAL FIELD

[0001] The present embodiments relate to encoding and decoding of digital video material. In particular, the present embodiments relate to image coding with a curved intra prediction.

BACKGROUND

[0002] This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.

[0003] Directional intra prediction generates sample prediction blocks based on decoded samples around the block by extrapolating border samples directionally inside the block. That approach is able to model certain kinds of structures in the block very well, but does not work well when predicting some common classes of textures. E.g. the directional sample prediction is able to model accurately straight lines and edges, but in the case of curved lines or circular objects the existing prediction methods are not able to model the structures properly leading to excessive use of prediction error coding or unnecessarily small block partitioning (increasing the amount of overhead needed to signal the sizes, shapes and prediction modes of prediction blocks).

[0004] The present embodiments improve the related technology with respect to coding with a curved intra prediction.

SUMMARY

[0005] Some embodiments provide a method for encoding and decoding video information.

[0006] Various aspects of examples of the invention are provided in the detailed description.

[0007] According to a first aspect, there is provided a method comprising receiving an indication on a first prediction direction; receiving an indication on at least one other prediction direction; determining an active prediction direction for at least one sample based on said first prediction direction, said at least one other prediction direction and a location of said at least on sample; and determining a predicted value for said at least one sample based on said active prediction direction and a set of reference sample values. [0008] According to a second aspect, there is provided an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: receive an indication on a first prediction direction; receive an indication on at least one other prediction direction; determine an active prediction direction for at least one sample based on said first prediction direction, said at least one other prediction direction and a location of said at least on sample; and determine a predicted value for said at least one sample based on said active prediction direction and a set of reference sample values.

[0009] According to a third aspect, there is provided a computer program product embodied on a non-transitory computer readable medium, comprising computer program code to, when executed on at least one processor, cause an apparatus or a system to receive an indication on at least one other prediction direction; determine an active prediction direction for at least one sample based on said first prediction direction, said at least one other prediction direction and a location of said at least on sample; and determine a predicted value for said at least one sample based on said active prediction direction and a set of reference sample values.

[0010] According to an embodiment of the first, the second or the third aspect, an indication on prediction error is received and prediction error coding is applied.

[001 1] According to an embodiment of the first, the second or the third aspect, which embodiment can further be combined with the previous embodiment, the indication on the first prediction direction and the indication on the at least one other prediction direction are received for one of the following: a coding unit level, a prediction unit level, a transform unit level.

[0012] According to an embodiment of the first, the second or the third aspect, which embodiment can further be combined with any of the previous embodiments, the active prediction direction is determined for one of the following: each sample, samples on each row, samples on each column or samples in each sub-block inside a prediction block.

[0013] According to an embodiment of the first, the second or the third aspect, which embodiment can further be combined with any of the previous embodiments, one or more of the prediction directions is indicated by predicting one or more of the first prediction direction or the at least one other prediction direction from the other of the prediction directions.

[0 14] According to an embodiment of the first, the second or the third aspect, which embodiment can further be combined with any of the previous embodiments, the sample is from one of the following group: luma samples, chroma samples, both luma and chroma samples. BRIEF DESCRIPTION OF THE DRAWINGS

[0015] For a more complete understanding of example embodiments of the present invention, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:

[0016] Figure 1 shows a simplified block chart of an apparatus according to an embodiment;

[0017] Figure 2 shows a layout of an apparatus according to an embodiment;

[0018] Figure 3 shows a system according to an embodiment;

[0019] Figure 4 shows a block diagram of a video encoder according to related technoloj [0020] Figure 5 shows an example of a picture comprising two tiles;

[0021] Figure 6 shows a block diagram of a video decoder according to related technoloj IT,

[0022] Figures 7a, b show examples of directional intra prediction;

[0023] Figure 8 shows an example of block- wise operation; and

[0024] Figure 9 is a flowchart illustrating an encoding method according to an embodiment.

DETAILED DESCRIPTON OF SOME EXAMPLE EMBODIMENTS

[0025] The present embodiments relate to encoding and decoding of digital video material.

[0026] At first, an apparatus suitable for implementing the embodiments is described. In this regard reference is first made to Figures 1 and 2, where Figure 1 shows a block diagram of a video coding system according to an example embodiment as a schematic block diagram of an exemplary apparatus or electronic device 50, which may incorporate a codec according to an embodiment of the invention. Figure 2 shows a layout of an apparatus according to an example embodiment.

[0027] The electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system. However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may require encoding and decoding or encoding or decoding video images.

[0028] The apparatus 50 may comprise a housing 30 for incorporating and protecting the device. The apparatus 50 further may comprise a display 32 in the form of a liquid crystal display. In other embodiments of the invention the display may be any suitable display technology suitable to display an image or video. The apparatus 50 may further comprise a keypad 34. In other embodiments of the invention any suitable data or user interface mechanism may be employed. For example the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display. [0029] The apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input. The apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection. The apparatus 50 may also comprise a battery 40 (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator). The apparatus may further comprise a camera 42 capable of recording or capturing images and/or video. The apparatus 50 may further comprise an infrared port for short range line of sight communication to other devices. In other embodiments the apparatus 0 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB (Universal Serial Bus)/firewire wired connection.

[0030] The apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50. The controller 56 may be connected to memory 58 which in embodiments of the invention may store both data in the form of image and audio data and/or may also store instructions for implementation on the controller 56. The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller.

[0031] The apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC (Universal Integrated Circuit Card) and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.

[0032] The apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network. The apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).

[0033] The apparatus 50 may comprise a camera capable of recording or detecting individual frames which are then passed to the codec 54 or the controller for processing. The apparatus may receive the video image data for processing from another device prior to transmission and/or storage. The apparatus 50 may also receive either wirelessly or by a wired connection the image for coding/decoding.

[0034] With respect to Figure 3, an example of a system within which embodiments of the present invention can be utilized is shown. The system 10 comprises multiple communication devices which can communicate through one or more networks. The system 10 may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM (global systems for mobile communications), UMTS (universal mobile telecommunications system), CDMA (code division multiple access) network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.

[0035] The system 10 may include both wired and wireless communication devices and/or apparatus 50 suitable for implementing embodiments of the invention.

[0036] For example, the system shown in Figure 3 shows a mobile telephone network 11 and a representation of the internet 28. Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.

[0037] The example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22. The apparatus 50 may be stationary or mobile when carried by an individual who is moving. The apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle or any similar suitable mode of transport.

[0038] The embodiments may also be implemented in a set-top box; i.e. a digital TV receiver, which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC), which have hardware or software or combination of the encoder/decoder implementations, in various operating systems, and in chipsets, processors, DSPs (Digital Signal Processor) and/or embedded systems offering hardware/software based coding.

[0039] Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24. The base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the internet 28. The system may include additional communication devices and communication devices of various types.

[0040] The communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.11 and any similar wireless communication technology. A communications device involved in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection.

[0041 ] Video codec consists of an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form. Typically encoder discards some information in the original video sequence in order to represent the video in a more compact form (that is, at lower bitrate).

[0042] Typical hybrid video codecs, for example ITU-T H.263, H.264 and H.265, encode the video information in two phases. Firstly pixel values in a certain picture area (or "block") are predicted for example by motion compensation means (finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means (using the pixel values around the block to be coded in a specified manner). Secondly the prediction error, i.e. the difference between the predicted block of pixels and the original block of pixels, is coded. This is typically done by transforming the difference in pixel values using a specified transform (e.g. Discrete Cosine Transform (DCT) or a variant of it), quantizing the coefficients and entropy coding the quantized coefficients. By varying the fidelity of the quantization process, encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate). The encoding process is illustrated in Figure 4. Figure 4 illustrates an image to be encoded (I_n); a predicted representation of an image block (P'n); a prediction error signal (D_n); a reconstructed prediction error signal (D'_n); a preliminary reconstructed image (I'_n); a final reconstructed image (R'n); a transform (T) and inverse transform (T ¹); a quantization (Q) and inverse quantization (Q ¹); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pinter); intra prediction (Pmtra); mode selection (MS) and filtering (F).

[0043] In some video codecs, such as HEVC (High Efficiency Video Coding), video pictures are divided into coding units (CU) covering the area of the picture. A CU consists of one or more prediction units (PU) defining the prediction process for the samples within the CU and one or more transform units (TU) defining the prediction error coding process for the samples in the said CU. Typically, a CU consists of a square block of samples with a size selectable from a predefined set of possible CU sizes. A CU with the maximum allowed size is typically named as LCU (largest coding unit) or CTU (coding tree unit) and the video picture is divided into non-overlapping CTUs. A CTU can be further split into a combination of smaller CUs, e.g. by recursively splitting the CTU and resultant CUs. Each resulting CU typically has at least one PU and at least one TU associated with it. Each PU and TU can be further split into smaller PUs and TUs in order to increase granularity of the prediction and prediction error coding processes, respectively. Each PU has prediction information associated with it defining what kind of a prediction is to be applied for the pixels within that PU (e.g. motion vector information for inter predicted PUs and intra prediction directionality information for intra predicted PUs). Similarly each TU is associated with information describing the prediction error decoding process for the samples within the said TU (including e.g. DCT coefficient information). It is typically signaled at CU level whether prediction error coding is applied or not for each CU. In the case there is no prediction error residual associated with the CU, it can be considered there are no TUs for the said CU. The division of the image into CUs, and division of CUs into PUs and TUs is typically signaled in the bitstream allowing the decoder to reproduce the intended structure of these units.

[0044] Inter prediction, which may also be referred to as temporal prediction, motion compensation, or motion-compensated prediction, reduces temporal redundancy. In inter prediction the sources of prediction are previously decoded pictures. Intra prediction utilizes the fact that adjacent pixels within the same picture are likely to be correlated. Intra prediction can be performed in spatial or transform domain, i.e., either sample values or transform coefficients can be predicted. Intra prediction is typically exploited in intra coding, where no inter prediction is applied.

[0045] One outcome of the coding procedure is a set of coding parameters, such as motion vectors and quantized transform coefficients. Many parameters can be entropy-coded more efficiently if they are predicted first from spatially or temporally neighboring parameters. For example, a motion vector may be predicted from spatially adjacent motion vectors and only the difference relative to the motion vector predictor may be coded. Prediction of coding parameters and intra prediction may be collectively referred to as in-picture prediction.

[0046] In HEVC, a picture can be partitioned in tiles, which are rectangular and contain an integer number of LCUs. In HEVC, the partitioning to tiles forms a regular grid, where heights and widths of tiles differ from each other by one LCU at the maximum. In HEVC, a slice is defined to be an integer number of coding tree units contained in one independent slice segment and all subsequent dependent slice segments (if any) that precede the next independent slice segment (if any) within the same access unit. In HEVC, a slice segment is defined to be an integer number of coding tree units ordered consecutively in the tile scan order and contained in a single NAL unit. The division of each picture into slice segments is a partitioning. In HEVC, an independent slice segment is defined to be a slice segment for which the values of the syntax elements of the slice segment header are not inferred from the values for a preceding slice segment, and a dependent slice segment is defined to be a slice segment for which the values of some syntax elements of the slice segment header are inferred from the values for the preceding independent slice segment in decoding order. In HEVC, a slice header is defined to be the slice segment header of the independent slice segment that is a current slice segment or is the independent slice segment that precedes a current dependent slice segment, and a slice segment header is defined to be a part of a coded slice segment containing the data elements pertaining to the first or all coding tree units represented in the slice segment. The CUs are scanned in the raster scan order of LCUs within tiles or within a picture, if tiles are not in use. Within an LCU, the CUs have a specific scan order. Figure 5 shows an example of a picture comprising two tiles partitioned into square coding units (solid lines) which have been further partitioned into rectangular prediction units (dashed lines).

[0047] The decoder reconstructs the output video by applying prediction means similar to the encoder to form a predicted representation of the pixel blocks (using the motion or spatial information created by the encoder and stored in the compressed representation) and prediction error decoding (inverse operation of the prediction error coding recovering the quantized prediction error signal in spatial pixel domain). After applying prediction and prediction error decoding means the decoder sums up the prediction and prediction error signals (pixel values) to form the output video frame. The decoder (and encoder) can also apply additional filtering means to improve the quality of the output video before passing it for display and/or storing it as prediction reference for the forthcoming frames in the video sequence. The decoding process is illustrated in Figure 6. Figure 6 illustrates a predicted representation of an image block (P'_n); a reconstructed prediction error signal (D'_n); a preliminary reconstructed image (I'_n); a final reconstructed image (R'n); an inverse transform ( ¹); an inverse quantization (Q ¹); an entropy decoding (E^_1); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F).

[0048] The elementary unit for the input to an H.264/AVC and H.265 encoder and the output of an H.264/AVC and H.265 decoder, respectively, is a picture. A picture may either be a frame or a field. A frame comprises a matrix of luma samples and corresponding chroma samples. A field is a set of alternate sample rows of a frame and may be used as encoder input, when the source signal is interlaced. In H.264/AVC and H.265, a macroblock is a 16x16 block of luma samples and the corresponding blocks of chroma samples. For example, in the 4:2:0 sampling pattern, a macroblock contains one 8x8 block of chroma samples per each chroma component. In H.264/AVC and H.265, a picture is partitioned to one or more slice groups, and a slice group contains one or more slices. In H.264/AVC and H.265, a slice consists of an integer number.

[0049] Instead, or in addition to approaches utilizing sample value prediction and transform coding for indicating the coded sample values, a color palette based coding can be used. Palette based coding refers to a family of approaches for which a palette, i.e. a set of colors and associated indexes, is defined and the value for each sample within a coding unit is expressed by indicating its index in the palette. Palette based coding can achieve good coding efficiency in coding units with a relatively small number of colors (such as image areas which are representing computer screen content, like text or simple graphics). In order to improve the coding efficiency of palette coding different kinds of palette index prediction approaches can be utilized, or the palette indexes can be run-length coded to be able to represent larger homogenous image areas efficiently. Also, in the case the CU contains sample values that are not recurring within the CU, escape coding can be utilized. Escape coded samples are transmitted without referring to any of the palette indexes. Instead their values are indicated individually for each escape coded sample.

[0050] In typical video codecs the motion information is indicated with motion vectors associated with each motion compensated image block. Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder side) or decoded (in the decoder side) and the prediction source block in one of the previously coded or decoded pictures. In order to represent motion vectors efficiently those are typically coded differentially with respect to block specific predicted motion vectors. In typical video codecs the predicted motion vectors are created in a predefined way, for example calculating the median of the encoded or decoded motion vectors of the adjacent blocks. Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor. In addition to predicting the motion vector values, the reference index of previously coded/decoded picture can be predicted. The reference index is typically predicted from adjacent blocks and/or or co-located blocks in temporal reference picture. Moreover, typical high efficiency video codecs employ an additional motion information coding/decoding mechanism, often called merging/merge mode, where all the motion field information, which includes motion vector and corresponding reference picture index for each available reference picture list, is predicted and used without any modification/correction. Similarly, predicting the motion field information is carried out using the motion field information of adjacent blocks and/or co-located blocks in temporal reference pictures and the used motion field information is signaled among a list of motion field candidate list filled with motion field information of available adjacent/co-located blocks.

[0051] Typically video codecs support motion compensated prediction from one source image (uni-prediction) and two sources (bi-prediction). In the case of uni-prediction a single motion vector is applied whereas in the case of bi-prediction two motion vectors are signaled and the motion compensated predictions from two sources are averaged to create the final sample prediction. In the case of weighted prediction the relative weights of the two predictions can be adjusted, or a signaled offset can be added to the prediction signal.

[0052] In addition to applying motion compensation for inter picture prediction, similar approach can be applied to intra picture prediction. In this case the displacement vector indicates where from the same picture a block of samples can be copied to form a prediction of the block to be coded or decoded. This kind of intra block copying methods can improve the coding efficiency substantially in presence of repeating structures within the frame - such as text or other graphics.

[0053] In typical video codecs the prediction residual after motion compensation or intra prediction is first transformed with a transform kernel (like DCT) and then coded. The reason for this is that often there still exists some correlation among the residual and transform can in many cases help reduce this correlation and provide more efficient coding.

[0054] Typical video encoders utilize Lagrangian cost functions to find optimal coding modes, e.g. the desired Macroblock mode and associated motion vectors. This kind of cost function uses a weighting factor λ to tie together the (exact or estimated) image distortion due to lossy coding methods and the (exact or estimated) amount of information that is required to represent the pixel values in an image area: C = D + AR (Eq. 1)

Where C is the Lagrangian cost to be minimized, D is the image distortion (e.g. Mean Squared Error) with the mode and motion vectors considered, and R the number of bits needed to represent the required data to reconstruct the image block in the decoder (including the amount of data to represent the candidate motion vectors).

[0055] Scalable video coding refers to coding structure where one bitstream can contain multiple representations of the content at different bitrates, resolutions or frame rates. In these cases the receiver can extract the desired representation depending on its characteristics (e.g. resolution that matches best the display device). Alternatively, a server or a network element can extract the portions of the bitstream to be transmitted to the receiver depending on e.g. the network characteristics or processing capabilities of the receiver. A scalable bitstream typically consists of a "base layer" providing the lowest quality video available and one or more enhancement layers that enhance the video quality when received and decoded together with the lower layers. In order to improve coding efficiency for the enhancement layers, the coded representation of that layer typically depends on the lower layers. E.g. the motion and mode information of the enhancement layer can be predicted from lower layers. Similarly the pixel data of the lower layers can be used to create prediction for the enhancement layer.

[0056] A scalable video codec for quality scalability (also known as Signal-to-Noise or SNR) and/or spatial scalability may be implemented as follows. For a base layer, a conventional non-scalable video encoder and decoder is used. The reconstructed/decoded pictures of the base layer are included in the reference picture buffer for an enhancement layer. In H.264/AVC, HEVC, and similar codecs using reference picture list(s) for inter prediction, the base layer decoded pictures may be inserted into a reference picture list(s) for coding/decoding of an enhancement layer picture similarly to the decoded reference pictures of the enhancement layer. Consequently, the encoder may choose a base-layer reference picture as inter prediction reference and indicate its use typically with a reference picture index in the coded bitstream. The decoder decodes from the bitstream, for example from a reference picture index, that a base-layer picture is used as inter prediction reference for the enhancement layer. When a decoded base-layer picture is used as prediction reference for an enhancement layer, it is referred to as an inter-layer reference picture.

[0057] In addition to quality scalability following scalability modes exist:

[0058] Spatial scalability: Base layer pictures are coded at a higher resolution than enhancement layer pictures.

[0059] Bit-depth scalability: Base layer pictures are coded at lower bit-depth (e.g. 8 bits) than enhancement layer pictures (e.g. 10 or 12 bits).

[0060] Chroma format scalability: Base layer pictures provide higher fidelity in chroma (e.g. coded in 4:4:4 chroma format) than enhancement layer pictures (e.g. 4:2:0 format).

[0061 ] In all of the above scalability cases, base layer information could be used to code enhancement layer to minimize the additional bitrate overhead.

[0062] Scalability can be enabled in two basic ways. Either by introducing new coding modes for performing prediction of pixel values or syntax from lower layers of the scalable representation or by placing the lower layer pictures to the reference picture buffer (decoded picture buffer, DPB) of the higher layer. The first approach is more flexible and thus can provide better coding efficiency in most cases. However, the second, reference frame based scalability, approach can be implemented very efficiently with minimal changes to single layer codecs while still achieving majority of the coding efficiency gains available. Essentially a reference frame based scalability codec can be implemented by utilizing the same hardware or software implementation for all the layers, just taking care of the DPB management by external means.

[0063] Directional intra prediction generates sample prediction blocks based on decoded samples around the block by extrapolating border samples directionally inside the block. That approach is able to model certain kinds of structures in the block well, but fails to predict some common classes of textures. E.g. the directional sample prediction is able to model accurately straight lines and edges, but in the case of curved lines or circular objects, the existing prediction methods fail to model the structures properly leading to excessive used of prediction errors coding or unnecessarily small block partitioning (increasing the amount of overhead needed to signal the sizes, shapes and prediction modes of prediction blocks).

[0064] Current video codecs, such as H.264/AVC and H.265/HEVC, use square prediction blocks which can be predicted using a prediction direction indicated in the coded file/bitstream. Some extensions have been proposed to this paradigm. A publication by C. Dai, O.D. Escoda, P. Yin, X. Li, and C. Gomila, "Geometry-adaptive block partitioning for Intra prediction in image and video coding," IEEE International Conference on Image Processing, Sep. 2007, describes a geometry adaptive block partitioning for intra prediction. In this case a traditional square intra prediction block is divide into two segments with a linear line and both segments are predicted using separate prediction modes. A publication by M.-K. Kanga, C. Leea, J. Y. Leeb and Y.-S. Ho, "Adaptive geometry-based intra prediction for depth coding", IEEE International Conference on Multimedia & Expo, July 2010, describes further development of the solution of the previous publication, where partitioning of a prediction block is done with a curve fitting algorithm and as a result, the partitioning can follow also non-straight edges in the image. Also in this method, each pixel is classified to belong to either of the segments and each pixel is predicted using one of the two prediction modes signaled for the parent block. A publication by X. Cao, X. Peng, C. Lai, Y. Wang, Y. Lin, J. Xu, L. Liu, J. Zheng, Y. He, H. Yu, F. Wu, "CE6.b Report on Short Distance Intra Prediction Method", JCTVC-D299, Daegu, KR, Jan. 2011 , describes an approach where a rectangular (non-square) partitioning can be used to split prediction blocks into smaller units. Each prediction block is assigned a single prediction mode in traditional fashion.

[0065] The present embodiments are targeted to a solution where information on an initial prediction direction (or a mode) and additional information on how the prediction direction (or the mode) is modified as the decoding proceeds are signaled for a prediction block. Due to this embodiment, the spatial prediction process can generate curved structures often present in both natural video and computer generated video without the need for computationally demanding sample analysis and bitrate costly indication of block segmentations. The present embodiments also propose how some of the existing components in today's image and video codecs can be adjusted to optimize performance when a codec is aware of such changed in intra prediction directionality information.

[0066] According to an embodiment, an intended prediction (or a prediction sample block) is created by performing the following (a flowchart of this embodiment is illustrated in Figure 9):

i. receive (710) an indication on a first prediction direction; ii. receive (720) an indication on at least one other prediction direction, for example a second prediction direction;

iii. determine (730) an active prediction direction for at least one sample based on said first prediction direction, said at least one other prediction direction, such as a second prediction direction, and location of said at least one sample;

iv. determine (740) a predicted value for said at least one sample based on said active prediction direction and a set of reference sample values; and v. optionally receive (750) an indication on a prediction error and apply (760) prediction error coding or decoding means.

[0067] In this embodiment, an initial prediction mode is indicated for each prediction block. In the case the initial prediction mode is a non-directional mode (e.g. a DC prediction or planar prediction mode), the block is predicted in a traditional, non-directional way without indicating further prediction information. In the case of a directional prediction (e.g. prediction from directly above the block), another parameter is indicated defining the target prediction direction at the bottom row (in the case of vertical prediction) or rightmost column (in the case of horizontal prediction) of the prediction block.

[0068] In directional intra prediction according to related technology (e.g. H.265/HEVC), the prediction samples are generated as shown in the equation (Eq 2) below. For simplicity, the equations of the examples assume the vertical prediction with row-wise processing, but are equally applicable to horizontal modes and column-wise processing. p(x, y) = a_y*r(x+n_y) + (l-a_y) *r(x+n_y+l) (Eq 2) where r(x) presents an array of reference samples used in the interpolation process, and n_y and a_y are constant displacement parameters for a line of samples. In the case of 1/32 pixel sample accurate prediction, the displacement parameters may depend on the prediction direction iDirlndicated as follows: d_y = (y + 1) * iDirlndicated

Cly = d_y & 31

n_y = d_y » 5

[0069] Where "& " indicates binary AND operation, and ">> " indicates a bit shift operation. As a result, n_y represents the full sample displacement and a_y represents the fractional sample displacement of the total displacement d_y of the prediction projection at j/th line of the block (y=0 at the first line of the block). The traditional prediction process for a block of samples can be formulated then as following pseudo code:

for each line (y: 0...blkSize-l):

{

d_y = (y + 1) * iDirlndicated

a_y = d_y & 31

for each sample on line (x: 0...blkSize-l):

{

p(x, y) = a_y*r(x+n_y) + (l-a_y) *r(x+n_y+l)

}

[0070] According to an embodiment, the d_y value can depend both on the initial direction iDirlnit and the indicated second prediction direction IDirLast for example as follows:

for each line (y: 0...blkSize-l):

{

Cy = ( (y + 1) * iDirLast + (blkSize - y - 1) * iDirlnit + blkSize/2 ) / blkSize

a_y = d_y & 31

n_y = d_y » 5 for each sample on line (x: 0...blkSize-l).

{ p(x, y) = a. *r(x^■ n. i + (l-a_y) *r(x+n_y+l)

J

}

[0071 ] In this example, the active predication displacement d_y is updated each line by adding a local "active" prediction direction c_y to the earlier value of the prediction displacement. The local active prediction direction in this implementation example is determined as linear interpolation between the initial prediction direction iDirlnit and the final prediction direction iDirLast, but can be implemented in different ways. As an alternative implementation, the prediction displacement d_y can also be considered as the active prediction direction and be determined directly from iDirlnit and iDirLast without the accumulation process described above. In this case an implementation example can be expressed in following pseudo code: for each line (y: 0...blkSize-l):

{

dy ⁼ (y + V * iDirLast + (blkSize - y - 1) * iDirlnit + blkSize/2) / blkSize a_y = d_y & 31

for each sample on line (x: 0... blkSize- 1):

{

p(x, y) = a_y*r(x+n_y) + (l-a_y) *r(x+n_y+l)

}

[0072] Figure 7 represents examples of directional intra prediction. Fig. 6a illustrates prior art approach where each sample within a prediction block utilizes the same prediction direction in the prediction process. In Figure 7a, round pixels represent a block of prediction samples and black squares represent line of reference samples. Further Fig. 7a shows prediction projections for first and second row of samples with a fixed prediction direction.

[0073] Figure 7b illustrates results of applying a directionality that changes depending on the sample row's distance from the reference row. Thus, Fig. 7b is about prediction projections according to the present embodiments, when those are evaluated based on the vertical distance from the reference samples.

[0074] Figure 8 illustrates the same approach applied at the block level. Each 4x4 prediction block has a distinct prediction direction that has been calculated based on location of the block. Blocks on the first row of blocks can use reference samples above the blocks, while the blocks below the first row of blocks may use samples within the blocks immediately above them as references for prediction. Each of the block will have their individual prediction direction that has been determined based on each block's position relative to each other (or relative to a defined origin).

[0075] The present solution can be implemented according different embodiments. For example:

The indication of the first and second prediction directions can be done for a coding unit level, a prediction unit level or a transform unit level; or the indication of those can be done for a larger set of coding units, prediction units or transform units.

- Active prediction direction can be evaluated at different granularity. For example, it can be determined for each sample, samples on each row or samples on each column inside a prediction block or a set of prediction blocks. It can also be calculated at block level, e.g. each 4x4 block or NxN or NxM block inside a defined area can have its individual prediction direction that has been calculated based on block's position with respect to a defined origin or block's position with respect to a master block containing a plurality of such blocks.

Said first prediction direction and said second prediction direction can be indicated explicitly or implicitly. When they are indicated explicitly, one or more of those can be predicted from other prediction directions or prediction modes. E.g. the second prediction direction can be predicted using the first prediction direction. When the first and the second prediction directions are indicated implicitly, different approach can be used to derive the prediction directions. E.g. prediction directions (or modes) of the neighboring blocks can be used to set values for said first and second prediction directions. As an example, prediction direction for the bottom right sub-block in the prediction block can be indicated, and prediction directions for all other sub-blocks in the prediction block can be determined based on the indicated prediction direction and one or more prediction directions of the neighboring prediction blocks.

The curvature information can be derived from expected geometry of the image. E.g. in the case the source image is known or is indicated to have certain characteristics (for example geometric projections due to use of fish-eye lenses or panoramic reconstruction due to equirectangular or other projections) the curvature information or a predicted value for the curvature information can be calculated based on sample location within the picture.

Instead of two indicated prediction directions there can be more of those. E.g. there can be one prediction direction defined or indicated for each of the four corners or each of the four edges of the blocks. In this case, the active prediction direction for a sample could be interpolated using the location of the predicted sample with respect to the four corners or four edges of the blocks.

Sample prediction directions or sample prediction vectors can be determined for areas of different sizes or shapes or those can be determined for individual samples. E.g. the same sample prediction direction can be used for all the samples on the same row within a prediction block, the same sample prediction direction can be used for all samples on certain sub-blocks within a prediction unit or the same sample prediction direction can be used for all the samples within a prediction unit.

Sample prediction vectors calculated based on active sample prediction direction can refer to either reference samples outside the current prediction block or those can refer to already predicted or preconstructed samples within a prediction block, or a combination of those.

The numeric values of said first prediction direction or said second prediction direction can represent various things. E.g. those may represent offsets between locations or reference samples and predicted samples at some granularity (e.g. 1/32 or 1/64 sample accuracy) for a certain sample or sample group, or those may represent angular values in degrees at certain granularity.

Determination of the active prediction direction can be done in various ways. For example, it can be done assuming that said first prediction direction represent active prediction direction on the first row (or reference row) or a prediction block and said second prediction direction represents active prediction direction on the last row of the prediction block and active prediction direction for the intermediate rows is calculated by linearly interpolating between the first prediction direction and the second prediction direction.

As an alternative example, the first prediction direction can be assumed to represent the prediction direction at the middle of the block and second prediction direction can be assumed to represent a differential direction that is added for each row when moving further down from middle of the block and subtracted for each row when moving further up from the middle of the block.

Determination of the active prediction direction can also depend on different mathematical functions. For example, the dependency between the active prediction direction and sample location can be defined using mathematical functions such as sine, cosine or tangent or some combination or approximation of those. Additional information may be indicated to guide the calculation process for said active prediction direction. That information can describe, e.g. control points or weights that may be used in generation of numeric values for the active prediction direction.

Active prediction direction can represent various things. For example, it can indicate an offset between a certain row of samples within a prediction block and a reference block; or it can represent an offset between two rows of samples (tangent of the prediction). Determination of the predicted value for a sample can be also done in various ways. E.g. the location of the predicted sample can be projected to the reference line and in case the projected location falls inbetween reference samples, the output sample value can be calculated by a sub-sample interpolation process (e.g. as described in ITU-T

Recommendation H.265).

Prediction directions or prediction modes of the subsequent prediction blocks may be predicted from the prediction directions of previous blocks in various ways and that process can depend on location of blocks with respect to each other. E.g. the active prediction direction at the bottom of the block above the current block may be used as a predictor for said first prediction direction and at the active prediction direction at the right border of the block left from current block can be used as another prediction candidate for said first prediction direction.

[0076] The present embodiments provide advantages. For example, the present embodiments improve accuracy of the spatial intra prediction and thus improves picture quality and lower the bitrate required to achieve target quality levels.

[0077] The various embodiments of the invention can be implemented with the help of computer program code that resides in a memory and causes the relevant apparatuses to carry out the invention. For example, a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment. Yet further, a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.

[0078] An apparatus according to an embodiment comprises means for receiving an indication on a first prediction direction; means for receiving an indication on at least one other prediction direction; means for determining an active prediction direction for at least one sample based on said first prediction direction, said at least one other prediction direction and a location of said at least on sample; and means determining a predicted value for said at least one sample based on said active prediction direction and a set of reference sample values. An example of the means is a computer program code being run by a processor from a memory of the apparats.

[0079] It is obvious that the present invention is not limited solely to the above-presented embodiments, but can be modified within the scope of the appended claims.

Claims

1. A method comprising

receiving an indication on a first prediction direction;

- receiving an indication on at least one other prediction direction;

determining an active prediction direction for at least one sample based on said first prediction direction, said at least one other prediction direction and a location of said at least on sample; and

determining a predicted value for said at least one sample based on said active prediction direction and a set of reference sample values.

2. The method according to claim 1, further comprising

receiving an indication on prediction error;

applying prediction error coding.

3. The method according to claim 1 or 2, wherein the indication on the first prediction direction and the indication on the at least one other prediction direction are received for one of the following: a coding unit level, a prediction unit level, a transform unit level.

4. The method according to any of the claims 1 to 3, wherein the active prediction direction is determined for one of the following: each sample, samples on each row, samples on each column of samples in each sub-block inside a prediction block.

5. The method according to any of the preceding claims 1 to 4, wherein one or more of the prediction directions is indicated by predicting one or more of the first prediction direction or the at least one other prediction direction from the other of the prediction directions.

6. The method according to any of the preceding claims 1 to 5, wherein the sample is from one of the following group: luma samples, chroma samples, both luma and chroma samples.

7. An apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:

receive an indication on a first prediction direction;

receive an indication on at least one other prediction direction;

determine an active prediction direction for at least one sample based on said first prediction direction, said at least one other prediction direction and a location of said at least on sample; and

determine a predicted value for said at least one sample based on said active prediction direction and a set of reference sample values.

8. The apparatus according to claim 7, further comprising computer program product to cause the apparatus to

receive an indication on prediction error;

apply prediction error coding.

9. The apparatus according to claim 7 or 8, wherein the indication on the first prediction direction and the indication on the at least one other prediction direction are received for one of the following: a coding unit level, a prediction unit level, a transform unit level.

10. The apparatus according to any of the claims 7 to 9, wherein the active prediction direction is determined for one of the following: each sample, samples on each row, samples on each column or samples in each sub-block inside a prediction block.

11. The apparatus according to any of the preceding claims 7 to 10, wherein one or more of the prediction directions is indicated by predicting one or more of the first prediction direction or the at least one other prediction direction from the other of the prediction directions.

12. The apparatus according to any of the preceding claims 7 to 11, wherein the sample is from one of the following group: luma samples, chroma samples, both luma and chroma samples.

13. A computer program product embodied on a non-transitory computer readable medium, comprising computer program code to, when executed on at least one processor, cause an apparatus or a system to implement a method according to any of the claims 1 to 6.