WO2017178696A1

WO2017178696A1 - An apparatus and a computer program product for video encoding and decoding, and a method for the same

Info

Publication number: WO2017178696A1
Application number: PCT/FI2017/050248
Authority: WO
Inventors: Jani Lainema
Original assignee: Nokia Technologies Oy
Priority date: 2016-04-11
Filing date: 2017-04-07
Publication date: 2017-10-19
Also published as: FI20165308A

Abstract

The invention relates to a solution for encoding and decoding, wherein an intra prediction direction is determined for a first coding unit; an array of active reference samples is generated for said first coding unit by determining a distance measure between said first coding unit and an array of remote reference samples; by determining a location of at least one remote reference sample of said array of remote reference samples using said intra prediction direction and said distance measure; and by adding said at least one remote reference sample to said array of active reference samples. A block of intra prediction samples is generated using said intra prediction direction and said array of active reference samples.

Description

AN APPARATUS AND A COMPUTER PROGRAM PRODUCT FOR VIDEO ENCODING AND DECODING, AND A METHOD FOR THE SAME

Technical Field

The present solution generally relates to encoding and decoding of digital video material. In particular, the solution relates to a method, an apparatus and a computer program product for spatial intra prediction. Background

This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.

Video coding defines a set of direction intra prediction methods and how the reference samples for these modes are generated based on the decoded samples of neighboring coding units or transform units. These "local" reference samples are copied from the row immediately above the prediction block and from the column immediately left of the prediction. In addition to the copy operation, the reference samples may undergo some processing, such as low-pass filtering.

There is a need for an improved solution for spatial intra prediction Summary Now there has been invented an improved method and technical equipment implementing the method, by which the above problems are alleviated. Various aspects of the invention include a method, an apparatus and a computer readable medium comprising a computer program stored therein, which are characterized by what is stated in the independent claims. Various embodiments of the invention are disclosed in the dependent claims.

According to a first aspect, there is provided a method comprising determining an intra prediction direction for a first coding unit; generating an array of active reference samples for said first coding unit by determining a distance measure between said first coding unit and an array of remote reference samples, determining a location of at least one remote reference sample of said array of remote reference samples using said intra prediction direction and said distance measure, adding said at least one remote reference sample to said array of active reference samples; and generating a block of intra prediction samples using said intra prediction direction and said array of active reference samples.

According to an embodiment, the method further comprises generating the array of active reference samples based on local reference samples.

According to an embodiment, the method further comprises generating the array of active reference samples based on remote reference samples.

According to an embodiment, generating an array of active reference samples for said first coding unit based on remote reference samples comprises determining a vertical distance of at least one remote reference sample from a location of a local reference sample; determining a location of said at least one remote reference sample by using said intra prediction direction, said vertical distance and a location of said first coding unit; and including said at least one remote reference sample in said array of active reference samples.

According to an embodiment, the array of active reference samples is generated based on local reference samples and remote reference samples. According to an embodiment, the method further comprises averaging at least one local reference sample and at least one remote reference sample to generate a final reference sample to the array of active reference samples.

According to an embodiment, the method further comprises signaling in one of the following levels: a coding unit level, a prediction unit level, a transform unit level, whether local or remote reference samples are used to generate the active reference sample array.

According to an embodiment, determining the intra prediction direction for the first coding unit comprises testing more than one prediction directions and selecting the prediction direction that minimizes a rate-distortion measure. According to an embodiment, the method further comprises determining a location of at least one remote reference sample at fractional pixel accuracy and generating a value for said remote reference sample using an interpolation filtering operation. According to a second aspect, there is provided an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following determine an intra prediction direction for a first coding unit; generate an array of active reference samples for said first coding unit by determining a distance measure between said first coding unit and an array of remote reference samples, determining a location of at least one remote reference sample of said array of remote reference samples using said intra prediction direction and said distance measure, adding said at least one remote reference sample to said array of active reference samples; and to generate a block of intra prediction samples using said intra prediction direction and said array of active reference samples.

According to an embodiment, the computer program code is further configured to, with the processor, cause the apparatus to generate the array of active reference samples based on local reference samples.

According to an embodiment, the computer program code is further configured to, with the processor, cause the apparatus to generate the array of active reference samples based on remote reference samples. According to an embodiment, the computer program code is configured to, with the processor, cause the apparatus to generate an array of active reference samples for said first coding unit based on remote reference samples by determining a vertical distance of at least one remote reference sample from a location of a local reference sample; determining a location of said at least one remote reference sample by using said intra prediction direction, said vertical distance and a location of said first coding unit; including said at least one remote reference sample in said array of active reference samples;

According to an embodiment, the computer program code is further configured to, with the processor, cause the apparatus to generate the array of active reference samples based on local reference samples and remote reference samples.

According to an embodiment, the computer program code is further configured to, with the processor, cause the apparatus to average at least one local reference sample and at least one remote reference sample to generate a final reference sample to the array of active reference samples.

According to an embodiment, the computer program code is further configured to, with the processor, cause the apparatus to signal one of the following levels: a coding unit level, a prediction unit level, a transform unit level, whether local or remote reference samples are used to generate the active reference sample array.

According to an embodiment, the computer program code is further configured to, with the processor, cause the apparatus to determine the intra prediction direction for the first coding unit by testing more than one prediction directions and selecting the prediction direction that minimizes a rate-distortion measure.

According to an embodiment, the computer program code is further configured to, with the processor, cause the apparatus to determine a location of at least one remote reference sample at fractional pixel accuracy and generate a value for said remote reference sample using an interpolation filtering operation.

According to an embodiment, the apparatus further comprises an encoder.

According to an embodiment, the apparatus further comprises a decoder.

According to a third aspect, there is provided a computer program product embodied on a non-transitory computer readable medium, comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to: determine an intra prediction direction for a first coding unit; generate an array of active reference samples for said first coding unit by determining a distance measure between said first coding unit and an array of remote reference samples; determining a location of at least one remote reference sample of said array of remote reference samples using said intra prediction direction and said distance measure; adding said at least one remote reference sample to said array of active reference samples; and generate a block of intra prediction samples using said intra prediction direction and said array of active reference samples. Description of the Drawings

In the following, various embodiments of the invention will be described in more detail with reference to the appended drawings, in which Fig. 1 shows an apparatus according to an embodiment as a simplified block chart;

Fig. 2 shows a layout of an apparatus according to an embodiment;

Fig. 3 shows a system according to an embodiment;

Fig. 4 shows an encoder according to an embodiment;

Fig. 5 shows a decoder according to an embodiment;

Figs. 6a-6c show an example of related art for selecting local reference samples for spatial intra prediction;

Figs. 7a-7b are flowcharts illustrating a method according to an embodiment;

Fig. 8 shows a generation of an active reference array that is partially based on local reference samples; and

Fig. 9 shows a generation of an active reference array that is fully based on remote reference samples.

Description of Example Embodiments

The present embodiments relate to encoding and decoding of digital video material.

At first, an apparatus suitable for implementing the embodiments is described. In this regard reference is first made to Figures 1 and 2, where Figure 1 shows a block diagram of a video coding system according to an example embodiment as a schematic block diagram of an exemplary apparatus or electronic device 50, which may incorporate a codec according to an embodiment of the invention. Figure 2 shows a layout of an apparatus according to an embodiment.

The electronic device 50 may for example be a mobile terminal or a user equipment of a wireless communication system. However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may require encoding and decoding or encoding or decoding video images.

The apparatus 50 may comprise a housing 30 for incorporating and protecting the device. The apparatus 50 may further comprise a display 32 in the form of a liquid crystal display. In other embodiments of the invention the display may be any suitable display technology suitable to display an image or video. The apparatus 50 may further comprise a keypad 34. In other embodiments of the invention any suitable data or user interface mechanism may be employed. For example the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display. The apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input. The apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection. The apparatus 50 may also comprise a battery 40 (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator). The apparatus may further comprise a camera 42 capable of recording or capturing images and/or video. The apparatus 50 may further comprise an infrared port for short range line of sight communication to other devices. According to an embodiment, the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB (Universal Serial Bus)/firewire wired connection.

The apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50. The controller 56 may be connected to memory 58 which may store data in the form of image, video and/or audio data, and/or may also store instructions for implementation on the controller 56. The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of image, video and/or audio data or assisting in coding and decoding carried out by the controller.

The apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC (Universal Integrated Circuit Card) and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.

The apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network. The apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).

The apparatus 50 may comprise a camera capable of recording or detecting individual frames which are then passed to the codec 54 or the controller for processing. The apparatus may receive the video and/or image data for processing from another device prior to transmission and/or storage. The apparatus 50 may also receive either wirelessly or by a wired connection the video and/or image for coding/decoding. With respect to Figure 3, an example of a system within which embodiments of the present invention can be utilized is shown. The system comprises multiple communication devices 540, 541 , 542, 550, 551 , 560, 561 , 562, 563 and servers 540, 541 , 542 which can communicate through one or more networks 510, 520. The system may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM (global systems for mobile communications), UMTS (universal mobile telecommunications system), CDMA (code division multiple access) network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.

The system may include both wired and wireless communication devices and/or apparatus suitable for implementing embodiments of the invention. For example, the system shown in Figure 3 shows a mobile telephone network 520 and a representation of the internet 510. Connectivity to the internet 510 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.

The example communication devices shown in the system may include, but are not limited to, an electronic device or apparatus 551 , 562, 563, a combination of a personal digital assistant (PDA) and a mobile telephone, a PDA 550, an integrated messaging device (IMD), a desktop computer 561 , a notebook computer 560. The apparatus may be stationary or mobile when carried by an individual who is moving. The apparatus may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle or any similar suitable mode of transport. The embodiments may also be implemented in a set-top box; i.e. a digital TV receiver, which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC), which have hardware or software or combination of the encoder/decoder implementations, in various operating systems, and in chipsets, processors, DSPs (Digital Signal Processor) and/or embedded systems offering hardware/software based coding.

Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection to a base station 530, 531 . The base station 530, 531 may be connected to a network server that allows communication between the mobile telephone network 520 and the internet 510. The system may include additional communication devices and communication devices of various types. The communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.1 1 and any similar wireless communication technology. A communications device involved in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection.

Video codec consists of an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form. Typically encoder discards some information in the original video sequence in order to represent the video in a more compact form (that is, at lower bitrate).

Typical hybrid video codecs, for example ITU-T H.263, H.264 and H.265, encode the video information in two phases. Firstly pixel values in a certain picture area (or "block") are predicted for example by motion compensation means (finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means (using the pixel values around the block to be coded in a specified manner). Secondly the prediction error, i.e. the difference between the predicted block of pixels and the original block of pixels, is coded. This is typically done by transforming the difference in pixel values using a specified transform (e.g. Discreet Cosine Transform (DCT) or a variant of it), quantizing the coefficients and entropy coding the quantized coefficients. By varying the fidelity of the quantization process, encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate). The encoding process is illustrated in Figure 4. Figure 4 illustrates an image to be encoded (In); a predicted representation of an image block (P'n); a prediction error signal (D_n); a reconstructed prediction error signal (D'_n); a preliminary reconstructed image (l'_n); a final reconstructed image (R'_n); a transform (T) and inverse transform (T^~1); a quantization (Q) and inverse quantization (Cr¹); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pinter); intra prediction (Pintra); mode selection (MS) and filtering (F).

In some video codecs, such as HEVC (High Efficiency Video Coding), video pictures are divided into coding units (CU) covering the area of the picture. A CU consists of one or more prediction units (PU) defining the prediction process for the samples within the CU and one or more transform units (TU) defining the prediction error coding process for the samples in the said CU. Typically, a CU consists of a square block of samples with a size selectable from a predefined set of possible CU sizes. A CU with the maximum allowed size is typically named as LCU (largest coding unit) or CTU (coding tree unit) and the video picture is divided into non-overlapping CTUs. A CTU can be further split into a combination of smaller CUs, e.g. by recursively splitting the CTU and resultant CUs. Each resulting CU typically has at least one PU and at least one TU associated with it. Each PU and TU can be further split into smaller PUs and TUs in order to increase granularity of the prediction and prediction error coding processes, respectively. Each PU has prediction information associated with it defining what kind of a prediction is to be applied for the pixels within that PU (e.g. motion vector information for inter predicted PUs and intra prediction directionality information for intra predicted PUs). Similarly each TU is associated with information describing the prediction error decoding process for the samples within the said TU (including e.g. DCT coefficient information). It is typically signaled at CU level whether prediction error coding is applied or not for each CU. In the case there is no prediction error residual associated with the CU, it can be considered there are no TUs for the said CU. The division of the image into CUs, and division of CUs into PUs and TUs is typically signaled in the bitstream allowing the decoder to reproduce the intended structure of these units.

The decoder reconstructs the output video by applying prediction means similar to the encoder to form a predicted representation of the pixel blocks (using the motion or spatial information created by the encoder and stored in the compressed representation) and prediction error decoding (inverse operation of the prediction error coding recovering the quantized prediction error signal in spatial pixel domain). After applying prediction and prediction error decoding means the decoder sums up the prediction and prediction error signals (pixel values) to form the output video frame. The decoder (and encoder) can also apply additional filtering means to improve the quality of the output video before passing it for display and/or storing it as prediction reference for the forthcoming frames in the video sequence. The decoding process is illustrated in Figure 5. Figure 5 illustrates a predicted representation of an image block (P'n); a reconstructed prediction error signal (D'_n); a preliminary reconstructed image (I'n); a final reconstructed image (R'_n); an inverse transform (T^~1); an inverse quantization (Cr¹); an entropy decoding (E^~1); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F). The elementary unit for the input to an H.264/AVC and H.265 encoder and the output of an H.264/AVC and H.265 decoder, respectively, is a picture. A picture may either be a frame or a field. A frame comprises a matrix of luma samples and corresponding chroma samples. A field is a set of alternate sample rows of a frame and may be used as encoder input, when the source signal is interlaced. In H.264/AVC and H.265, a macroblock is a 16x16 block of luma samples and the corresponding blocks of chroma samples. For example, in the 4:2:0 sampling pattern, a macroblock contains one 8x8 block of chroma samples per each chroma component. In H.264/AVC and H.265, a picture is partitioned to one or more slice groups, and a slice group contains one or more slices. In H.264/AVC and H.265, a slice consists of an integer number.

Instead, or in addition to approaches utilizing sample value prediction and transform coding for indicating the coded sample values, a color palette based coding can be used. Palette based coding refers to a family of approaches for which a palette, i.e. a set of colors and associated indexes, is defined and the value for each sample within a coding unit is expressed by indicating its index in the palette. Palette based coding can achieve good coding efficiency in coding units with a relatively small number of colors (such as image areas which are representing computer screen content, like text or simple graphics). In order to improve the coding efficiency of palette coding different kinds of palette index prediction approaches can be utilized, or the palette indexes can be run-length coded to be able to represent larger homogenous image areas efficiently. Also, in the case the CU contains sample values that are not recurring within the CU, escape coding can be utilized. Escape coded samples are transmitted without referring to any of the palette indexes. Instead their values are indicated individually for each escape coded sample.

In typical video codecs the motion information is indicated with motion vectors associated with each motion compensated image block. Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder side) or decoded (in the decoder side) and the prediction source block in one of the previously coded or decoded pictures. In order to represent motion vectors efficiently those are typically coded differentially with respect to block specific predicted motion vectors. In typical video codecs the predicted motion vectors are created in a predefined way, for example calculating the median of the encoded or decoded motion vectors of the adjacent blocks. Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor. In addition to predicting the motion vector values, the reference index of previously coded/decoded picture can be predicted. The reference index is typically predicted from adjacent blocks and/or or co-located blocks in temporal reference picture. Moreover, typical high efficiency video codecs employ an additional motion information coding/decoding mechanism, often called merging/merge mode, where all the motion field information, which includes motion vector and corresponding reference picture index for each available reference picture list, is predicted and used without any modification/correction. Similarly, predicting the motion field information is carried out using the motion field information of adjacent blocks and/or co-located blocks in temporal reference pictures and the used motion field information is signaled among a list of motion field candidate list filled with motion field information of available adjacent/co-located blocks.

Typically video codecs support motion compensated prediction from one source image (uni-prediction) and two sources (bi-prediction). In the case of uni-prediction a single motion vector is applied whereas in the case of bi-prediction two motion vectors are signaled and the motion compensated predictions from two sources are averaged to create the final sample prediction. In the case of weighted prediction the relative weights of the two predictions can be adjusted, or a signaled offset can be added to the prediction signal.

In addition to applying motion compensation for inter picture prediction, similar approach can be applied to intra picture prediction. In this case the displacement vector indicates where from the same picture a block of samples can be copied to form a prediction of the block to be coded or decoded. This kind of intra block copying methods can improve the coding efficiency substantially in presence of repeating structures within the frame - such as text or other graphics.

In typical video codecs the prediction residual after motion compensation or intra prediction is first transformed with a transform kernel (like DCT) and then coded. The reason for this is that often there still exists some correlation among the residual and transform can in many cases help reduce this correlation and provide more efficient coding.

Typical video encoders utilize Lagrangian cost functions to find optimal coding modes, e.g. the desired Macroblock mode and associated motion vectors. This kind of cost function uses a weighting factor λ to tie together the (exact or estimated) image distortion due to lossy coding methods and the (exact or estimated) amount of information that is required to represent the pixel values in an image area:

C = D + R (Eq. 1 )

Where C is the Lagrangian cost to be minimized, D is the image distortion (e.g. Mean Squared Error) with the mode and motion vectors considered, and R the number of bits needed to represent the required data to reconstruct the image block in the decoder (including the amount of data to represent the candidate motion vectors).

Scalable video coding refers to coding structure where one bitstream can contain multiple representations of the content at different bitrates, resolutions or frame rates. In these cases the receiver can extract the desired representation depending on its characteristics (e.g. resolution that matches best the display device). Alternatively, a server or a network element can extract the portions of the bitstream to be transmitted to the receiver depending on e.g. the network characteristics or processing capabilities of the receiver. A scalable bitstream typically consists of a "base layer" providing the lowest quality video available and one or more enhancement layers that enhance the video quality when received and decoded together with the lower layers. In order to improve coding efficiency for the enhancement layers, the coded representation of that layer typically depends on the lower layers. E.g. the motion and mode information of the enhancement layer can be predicted from lower layers. Similarly the pixel data of the lower layers can be used to create prediction for the enhancement layer. A scalable video codec for quality scalability (also known as Signal-to-Noise or SNR) and/or spatial scalability may be implemented as follows. For a base layer, a conventional non-scalable video encoder and decoder is used. The reconstructed/decoded pictures of the base layer are included in the reference picture buffer for an enhancement layer. In H.264/AVC, HEVC, and similar codecs using reference picture list(s) for inter prediction, the base layer decoded pictures may be inserted into a reference picture list(s) for coding/decoding of an enhancement layer picture similarly to the decoded reference pictures of the enhancement layer. Consequently, the encoder may choose a base-layer reference picture as inter prediction reference and indicate its use typically with a reference picture index in the coded bitstream. The decoder decodes from the bitstream, for example from a reference picture index, that a base-layer picture is used as inter prediction reference for the enhancement layer. When a decoded base-layer picture is used as prediction reference for an enhancement layer, it is referred to as an inter-layer reference picture. In addition to quality scalability following scalability modes exist:

- Spatial scalability: Base layer pictures are coded at a higher resolution than enhancement layer pictures.

- Bit-depth scalability: Base layer pictures are coded at lower bit-depth (e.g. 8 bits) than enhancement layer pictures (e.g. 10 or 12 bits).

- Chroma format scalability: Base layer pictures provide higher fidelity in chroma (e.g. coded in 4:4:4 chroma format) than enhancement layer pictures (e.g. 4:2:0 format). In all of the above scalability cases, base layer information could be used to code enhancement layer to minimize the additional bitrate overhead.

Scalability can be enabled in two basic ways. Either by introducing new coding modes for performing prediction of pixel values or syntax from lower layers of the scalable representation or by placing the lower layer pictures to the reference picture buffer (decoded picture buffer, DPB) of the higher layer. The first approach is more flexible and thus can provide better coding efficiency in most cases. However, the second, reference frame based scalability, approach can be implemented very efficiently with minimal changes to single layer codecs while still achieving majority of the coding efficiency gains available. Essentially a reference frame based scalability codec can be implemented by utilizing the same hardware or software implementation for all the layers, just taking care of the DPB management by external means.

In order to be able to utilize parallel processing, images can be split into independently codable and decodable image segments (slices or tiles). Slices typically refer to image segments constructed of certain number of basic coding units that are processed in default coding or decoding order, while tiles typically refer to image segments that have been defined as rectangular image regions that are processed at least to some extend as individual frames.

Due to the fixed processing order of coding units and the definition of directional intra prediction modes, the near diagonal intra prediction modes are not able to reproduce some structures typically present especially in fisheye and virtual reality (panoramic / 360 degree) video and image content. These structures include e.g. near diagonal edges with bottom-left to top-right alignment. This kind of structures are often present in wide-angle imagery as the fisheye lenses used in cameras and the warped image data in different 360 degree projections tend to create structures with such alignments. Figures 6a - 6c illustrate examples of method according to related art for selecting local reference samples for spatial intra prediction. Figure 6a illustrates an image containing a coding tree unit C, which may utilize the already coded or decoded area A in coding of C, but is not able to utilize the unavailable area U as it has not been processed yet. Figure 6b illustrates an example, where the coding tree unit C is split into four coding units Co, Ci , C2, C3, and a typical local reference sample selection (black dots in the Figure 6b) for coding unit Co is located in the already coded/decoded area A, Figure 6c illustrates an example where the coding tree unit C is again split into four coding units Co, Ci , C2, C3, and coding units Co, Ci and C2 have been encoded or decoded. Now, a typical reference sample selection for coding unit C3 contains reference samples (empty dots (circles) in Figure 6c) located outside the coded area A and are typically padded using the closest available reference samples.

The present embodiments relate to a method that builds a reference sample array for directional intra prediction selectively either by using regular local reference samples on the block boundary or by using remote reference samples that have been obtained from decoded samples or further way blocks; or by using a combination of the aforementioned options. Different implementation alternatives describe cases where padding of missing regular reference samples is generated by remote referencing; cases where the locations of the remote references are determined based on selected prediction direction; and cases where the locations of the remote references are signaled with respect to a defined anchor location.

A method according to an embodiment is illustrated next. This embodiment is also illustrated in a flowchart of Figures 7a and 7b. The method according to an embodiment comprises (Figure 7a) determining 405 an intra prediction direction for a first coding unit; generating 410 an array of active reference samples for said first coding unit; and generating 415 a block of intra prediction samples using said intra prediction direction and said array of active reference samples. The array of active reference samples may be generated 410 by (Figure 7b) determining 41 1 a distance measure between said first coding unit and an array of remote reference samples; determining 412 a location of said at least one remote reference sample of said array of remote reference samples using said intra prediction direction and said distance measure; and adding 413 said at least one remote reference sample to said array of active reference samples.

An apparatus according to an embodiment, for example an apparatus of Figure 1 , comprises means for implementing the method as show in a flowchart of Figure 7a and/or 7b. Such means comprises at least a processor, a memory, and a computer program code residing in the memory. Yet, further the apparatus according to an embodiment comprises an encoder. According to another embodiment, the apparatus comprises a decoder. A video or image encoder operating according to the embodiment is able to select the prediction direction from a predefined set of available more than one directions for example by testing more than one different prediction directions and choosing the one that minimizes a rate-distortion measure. When evaluating different prediction directions, the encoder can generate different reference sample arrays based on the prediction direction under evaluation.

The indicated intra prediction direction can be used to determine the location of the remote reference samples. For example, if the vertical distance between the local reference samples Rloc and the remote reference samples Rrem is s (as depicted in Figures 8 and 9) and the indicated intra prediction angle g is represented as 1/32 fractional sample displacement per sample line as in ITU-T Recommendation H.265, the horizontal offset d between the local and remote reference samples can be determined as: d = (g * S + 16) » 5 where >> denotes a bit-shift operation. Due to integer arithmetics, the resulting offset d in this example represents full pixel horizontal displacement between the local and remote reference lines. In some embodiments, fractional sample accuracy may be used to improve prediction accuracy. In those cases the offset may be represented by two numbers - integer displacement dint and fractional displacement dFrac. As an example, these variables can be calculated at 1/32 sample accuracy as: dint = (g * s) » 5

dFrac = (g * s) % 31 where % represents an integer module operation. Now the location of the remote reference samples [xR, yR] can be calculated with respect to the location of the corresponding local reference samples (xL, yL) in the case of full pixel accuracy offset as: xR = xL + d

yR = yL - s and in the case of fractional pixel accuracy offset can be calculated as follows: xR = xL + dint + dFrac/32

xR = yL - s

Given that the available reference samples at location (x, y) in the already processed image area are denoted as A(x, y), the active reference sample values Ract(n) to be used in the spatial intra prediction process can be copied from the identified location as:

Ract(n) = Rrem(x) = A(xL + d + n, yL + s), if usage of remote references was indicated Ract(n) = Rloc(x) = A(xL + n, yL), if usage of local references was indicated.

In an embodiment, the remote reference samples are located on the bottom sample row of a coding tree unit row earlier in coding order, while the local reference samples are located on the bottom row of an earlier coding unit, prediction unit or transform unit row. In this embodiment, the vertical distance between remote and local reference samples may equal to the vertical position of the block within a coding tree unit. In an embodiment, the unavailable local reference samples (e.g. samples that fall outside of the already coded or decoded picture area) are substituted by remote reference samples. The remote reference samples may be copied from the closest already coded or decoded locations in the direction of the prediction or the remote reference samples may be copied from a dedicated remote reference row that may be for example the bottom sample row of a coding tree unit row earlier in decoding order. This process is illustrated in Figure 8 showing generation of an active reference array that is partially based on local references samples Rloc immediately above the block and partially based on remote reference samples Rrem that substitute those samples in local reference that fall outside of the coded or decoded image area. Vertical distance of the remote reference samples is in this example s pixels and horizontal offset d is calculated using the indicated prediction direction.

Figure 9 illustrates a generation of an active reference array that is fully based on remote reference samples Rrem by copying reconstructed samples from a vertical distance or s pixels and horizontal offset d calculated using the indicated prediction direction.

According to an embodiment, remote reference and local references may be combined in different ways. For example, all samples or some of the samples in local reference array and remote reference array can be averaged to generate an active reference array for a block. Also different weights can be applied in order to perform weighted averaging of some or all the samples in those arrays. According to an embodiment, it may be signaled at a coding unit level, a prediction unit level or a transform unit level whether local or remote reference samples are used to generate the active reference sample array. According to an embodiment, the determination if at least one remote reference sample is to be used can be done in various ways. For example, it can be indicated in the bitstream for some or all of the coding units, prediction units or transform units. It can also be determined implicitly, for example, by identifying if some of the local reference samples fall outside of the picture, slice or tile boundaries; or outside of the already coded or decoded picture area. In the case where usage of the remote reference is indicated in the bitstream, the signaling may be omitted based on the location of the coding unit, prediction unit or transform unit. For example, if the block to be predicted belongs to any of the coding tree units on the first row of coding tree units, there may be no need to signal usage of remote references as the candidate remote reference samples may in some configurations fall outside of the image area.

According to an embodiment, the indicated intra prediction direction and the vertical distance to the remote reference sample row is used to calculate horizontal location of the remote reference sample row.

According to an embodiment, an initial remote reference sample location calculated based on the prediction direction is used as a predicted value for the final location of a remote reference sample and additional signaling is used to define the difference between the final and predicted locations.

According to an embodiment, the remote reference samples for the vertical local references on the left border of the coding unit are obtained from a horizontal remote reference line above the block. In this embodiment, the locations of the remote references are calculated by projecting the locations of the vertical local references to the horizontal remote reference line using active prediction direction.

According to an embodiment, the processing steps on both encoder and decoder side can be done in different order or interleaved with each other, and some or all the steps can be done parallel to each other. For example, the decoder can determine intra prediction direction for multiple coding, prediction or transform block first and proceeding with remaining steps after that.

In a method, according to an embodiment, it is possible to use different accuracy for representing intra prediction direction and various approaches for interpolating values for sub-sample locations if the method is applied at sub-sample accuracy.

According to an embodiment, the determination that vertical distance of at least one remote reference sample from the location of the local reference samples can be substituted by determining the horizontal distance of at least one remote reference sample from the location of the local reference samples, in which case the method can be applied to horizontal prediction direction.

According to an embodiment, remote reference samples can be processed prior to using those as active reference samples for a block. For example, those can be filtered with a predefined filter or additional information may be provided in the bitstream to define the sample processing for the reference array.

The various embodiments may provide advantages. For example, the embodiments improve accuracy of the spatial intra prediction and thus improves picture quality and lowers the bitrate required to achieve target quality levels.

The various embodiments of the invention can be implemented with the help of computer program code that resides in a memory and causes the relevant apparatuses to carry out the invention. For example, a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment. Yet further, a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.

If desired, the different functions discussed herein may be performed in a different order and/or concurrently with other. Furthermore, if desired, one or more of the above-described functions and embodiments may be optional or may be combined.

Although various aspects of the embodiments are set out in the independent claims, other aspects comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.

It is also noted herein that while the above describes example embodiments, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications, which may be made without departing from the scope of the present disclosure as, defined in the appended claims.

Claims

Claims:

1 . A method, comprising:

- determining an intra prediction direction for a first coding unit;

- generating an array of active reference samples for said first coding unit by o determining a distance measure between said first coding unit and an array of remote reference samples;

o determining a location of at least one remote reference sample of said array of remote reference samples using said intra prediction direction and said distance measure;

o adding said at least one remote reference sample to said array of active reference samples;

- generating a block of intra prediction samples using said intra prediction direction and said array of active reference samples.

2. The method according to claim 1 , further comprising generating the array of active reference samples based on local reference samples.

3. The method according to claim 1 , further comprising generating the array of active reference samples based on remote reference samples.

4. The method according to claim 3, wherein generating an array of active reference samples for said first coding unit based on remote reference samples comprises

o determining a vertical distance of at least one remote reference sample from a location of a local reference sample;

o determining a location of said at least one remote reference sample by using said intra prediction direction, said vertical distance and a location of said first coding unit;

o including said at least one remote reference sample in said array of active reference samples;

5. The method according to claim 1 , further comprising generating the array of active reference samples based on local reference samples and remote reference samples.

6. The method according to claim 1 , further comprising averaging at least one local reference sample and at least one remote reference sample to generate a final reference sample to the array of active reference samples.

7. The method according to claim 2 or 3, further comprising - signaling in one of the following levels: a coding unit level, a prediction unit level, a transform unit level, whether local or remote reference samples are used to generate the active reference sample array.

8. The method according to any of the claims 1 to 7, wherein determining the intra prediction direction for the first coding unit comprises testing more than one prediction directions and selecting the prediction direction that minimizes a rate-distortion measure.

9. The method according to claim 1 , further comprising determining a location of at least one remote reference sample at fractional pixel accuracy and generating a value for said remote reference sample using an interpolation filtering operation.

10. An apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:

- determine an intra prediction direction for a first coding unit;

- generate an array of active reference samples for said first coding unit by

o determining a distance measure between said first coding unit and an array of remote reference samples;

o adding said at least one remote reference sample to said array of active reference samples; and

- generate a block of intra prediction samples using said intra prediction direction and said array of active reference samples.

1 1 . The apparatus according to claim 10, wherein the computer program code is further configured to, with the processor, cause the apparatus to generate the array of active reference samples based on local reference samples.

12. The apparatus according to claim 10, wherein the computer program code is further configured to, with the processor, cause the apparatus to generate the array of active reference samples based on remote reference samples.

13. The apparatus according to claim 12, wherein the computer program code is configured to, with the processor, cause the apparatus to generate an array of active reference samples for said first coding unit based on remote reference samples by o determining a vertical distance of at least one remote reference sample from a location of a local reference sample;

14. The apparatus according to claim 10, wherein the computer program code is further configured to, with the processor, cause the apparatus to generate the array of active reference samples based on local reference samples and remote reference samples.

15. The method according to claim 10, wherein the computer program code is further configured to, with the processor, cause the apparatus to average at least one local reference sample and at least one remote reference sample to generate a final reference sample to the array of active reference samples.

16. The apparatus according to claim 1 1 or 12, wherein the computer program code is further configured to, with the processor, cause the apparatus to signal one of the following levels: a coding unit level, a prediction unit level, a transform unit level, whether local or remote reference samples are used to generate the active reference sample array.

17. The apparatus according to any of the claims 10 to 16, wherein the computer program code is further configured to, with the processor, cause the apparatus to determine the intra prediction direction for the first coding unit by testing more than one prediction directions and selecting the prediction direction that minimizes a rate- distortion measure.

18. The apparatus according to claim 10, wherein the computer program code is further configured to, with the processor, cause the apparatus to determine a location of at least one remote reference sample at fractional pixel accuracy and generate a value for said remote reference sample using an interpolation filtering operation.

19. The apparatus according to any of the claims 10 to 18, further comprising an encoder.

20. The apparatus according to any of the claims 10 to 18, further comprising a decoder.

21 . A computer program product embodied on a non-transitory computer readable medium, comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to:

- determine an intra prediction direction for a first coding unit;

- generate an array of active reference samples for said first coding unit by