WO2023066672A1

WO2023066672A1 - Video coding using parallel units

Info

Publication number: WO2023066672A1
Application number: PCT/EP2022/077796
Authority: WO
Inventors: Jani Lainema; Ramin GHAZNAVI YOUVALARI; Alireza Aminlou; Pekka Astola; Alireza ZARE; Miska Matias Hannuksela; Seungwook Hong; Limin Wang; Krit Panusopone
Original assignee: Nokia Technologies Oy
Priority date: 2021-10-21
Filing date: 2022-10-06
Publication date: 2023-04-27

Abstract

An apparatus includes circuitry configured to: determine a block of samples in a picture; determine a parallel unit the block belongs to; determine at least one row of reference samples not belonging to the parallel unit, wherein the row is not dependent from decoded samples of any parallel unit within the picture; determine a location of at least one reference sample; and in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, determine a value for the reference sample from decoded samples of the parallel unit the block of samples belongs to; or in response to the location of the reference sample being at a location not belonging to the parallel unit, determine a value for the reference sample from the row of reference samples coded independently from reconstructed samples of any parallel unit within the picture.

Description

VIDEO CODING USING PARALLEL UNITS

TECHNICAL FIELD

[0001] The examples and non-limiting embodiments relate generally to multimedia transport and data compression and, more particularly, to video coding using parallel units.

BACKGROUND

[0002] It is known to perform data compression and decoding in a multimedia system.

BRIEF DESCRIPTION OF THE DRAWINGS

[0003] The foregoing aspects and other features are explained in the following description, taken in connection with the accompanying drawings, wherein:

[0004] FIG. 1 shows schematically an electronic device employing embodiments of the examples described herein.

[0005] FIG. 2 shows schematically a user equipment suitable for employing embodiments of the examples described herein.

[0006] FIG. 3 further shows schematically electronic devices employing embodiments of the examples described herein connected using wireless and wired network connections.

[0007] FIG. 4 shows schematically a block chart of an encoder used for data compression on a general level.

[0008] FIG. 5 illustrates an encoding process, based on the examples described herein.

[0009] FIG. 6A illustrates a part of a decoding process, where the rightmost and bottom lines of each parallel unit are included in a selection of samples.

[0010] FIG. 6B illustrates a part of the decoding process using top and leftmost lines for the selection of samples.

[0011] FIG. 6C illustrates a part of the decoding process where rightmost and bottom lines are selected as samples.

[0012] FIG. 6D illustrates using samples on both sides of parallel unit borders.

[0013] FIG. 7 illustrates an upsampling operation that can be applied to samples in buffers prior to using them as reference samples in intra prediction or in other operations.

[0014] FIG. 8 shows an example of a process where samples in one buffer can be predicted using samples of another buffer.

[0015] FIG. 9A illustrates an example selection of granularity and location of reference block motion vectors.

[0016] FIG. 9B illustrates a selection of a subset of reference motion vector locations.

[0017] FIG. 9C illustrates a process of generation of reference motion vectors in the horizontal direction. [0018] FIG. 10 is an example apparatus configured to implement video coding using parallel units, based on the embodiments described herein.

[0019] FIG. 11 is an example method to implement video coding using parallel units, based on the embodiments described herein.

[0020] FIG. 12 is an example method to implement video coding using parallel units, based on the embodiments described herein.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

[0021] Described herein are methods for video coding using parallel units.

[0022] The following describes in detail a suitable apparatus and possible mechanisms for a video/image encoding process according to embodiments. In this regard reference is first made to FIG. 1 and FIG. 2, where FIG. 1 shows an example block diagram of an apparatus 50. The apparatus may be an Internet of Things (loT) apparatus configured to perform various functions, such as for example, gathering information by one or more sensors, receiving or transmitting information, analyzing information gathered or received by the apparatus, or the like. The apparatus may comprise a video coding system, which may incorporate a codec. FIG. 2 shows a layout of an apparatus according to an example embodiment. The elements of FIG. 1 and FIG. 2 are explained next.

[0023] The electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system, a sensor device, a tag, or other lower power device. However, it would be appreciated that embodiments of the examples described herein may be implemented within any electronic device or apparatus which may process data by neural networks.

[0024] The apparatus 50 may comprise a housing 30 for incorporating and protecting the device. The apparatus 50 further may comprise a display 32 in the form of a liquid crystal display. In other embodiments of the examples described herein the display may be any suitable display technology suitable to display an image or video. The apparatus 50 may further comprise a keypad 34. In other embodiments of the examples described herein any suitable data or user interface mechanism may be employed. For example the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.

[0025] The apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analog signal input. The apparatus 50 may further comprise an audio output device which in embodiments of the examples described herein may be any one of: an earpiece 38, speaker, or an analog audio or digital audio output connection. The apparatus 50 may also comprise a battery (or in other embodiments of the examples described herein the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator). The apparatus may further comprise a camera capable of recording or capturing images and/or video. The apparatus 50 may further comprise an infrared port for short range line of sight communication to other devices. In other embodiments the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/firewire wired connection. [0026] The apparatus 50 may comprise a controller 56, processor or processor circuitry for controlling the apparatus 50. The controller 56 may be connected to memory 58 which in embodiments of the examples described herein may store both data in the form of image and audio data and/or may also store instructions for implementation on the controller 56. The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and/or decoding of audio and/or video data or assisting in coding and/or decoding carried out by the controller.

[0027] The apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.

[0028] The apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network. The apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and/or for receiving radio frequency signals from other apparatus(es).

[0029] The apparatus 50 may comprise a camera capable of recording or detecting individual frames which are then passed to the codec 54 or the controller for processing. The apparatus may receive the video image data for processing from another device prior to transmission and/or storage. The apparatus 50 may also receive either wirelessly or by a wired connection the image for coding/decoding. The structural elements of apparatus 50 described above represent examples of means for performing a corresponding function.

[0030] With respect to FIG. 3, an example of a system within which embodiments of the examples described herein can be utilized is shown. The system 10 comprises multiple communication devices which can communicate through one or more networks. The system 10 may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM, UMTS, CDMA, LTE, 4G, 5G network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.

[0031] The system 10 may include both wired and wireless communication devices and/or apparatus 50 suitable for implementing embodiments of the examples described herein.

[0032] For example, the system shown in FIG. 3 shows a mobile telephone network 11 and a representation of the internet 28. Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.

[0033] The example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22, and a head mounted display (HMD). The apparatus 50 may be stationary or mobile when carried by an individual who is moving. The apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle or any similar suitable mode of transport.

[0034] The embodiments may also be implemented in a set-top box; i.e. a digital TV receiver, which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC), which have hardware and/or software to process neural network data, in various operating systems, and in chipsets, processors, DSPs and/or embedded systems offering hardware/software based coding.

[0035] Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24. The base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the internet 28. The system may include additional communication devices and communication devices of various types.

[0036] The communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocolinternet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.11, 3GPP Narrowband loT and any similar wireless communication technology. A communications device involved in implementing various embodiments of the examples described herein may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection.

[0037] In telecommunications and data networks, a channel may refer either to a physical channel or to a logical channel. A physical channel may refer to a physical transmission medium such as a wire, whereas a logical channel may refer to a logical connection over a multiplexed medium, capable of conveying several logical channels. A channel may be used for conveying an information signal, for example a bitstream, from one or several senders (or transmitters) to one or several receivers.

[0038] The embodiments may also be implemented in so-called loT devices. The Internet of Things (loT) may be defined, for example, as an interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure. The convergence of various technologies has and may enable many fields of embedded systems, such as wireless sensor networks, control systems, home/building automation, etc. to be included in the Internet of Things (loT). In order to utilize the Internet loT devices are provided with an IP address as a unique identifier. loT devices may be provided with a radio transmitter, such as a WLAN or Bluetooth transmitter or a RFID tag. Alternatively, loT devices may have access to an IP -based network via a wired network, such as an Ethernet-based network or a power-line connection (PLC).

[0039] An MPEG-2 transport stream (TS), specified in ISO/IEC 13818-1 or equivalently in ITU- T Recommendation H.222.0, is a format for carrying audio, video, and other media as well as program metadata or other metadata, in a multiplexed stream. A packet identifier (PID) is used to identify an elementary stream (a.k.a. packetized elementary stream) within the TS. Hence, a logical channel within an MPEG-2 TS may be considered to correspond to a specific PID value.

[0040] Available media file format standards include ISO base media file format (ISO/IEC 14496-12, which may be abbreviated ISOBMFF) and file format for NAL unit structured video (ISO/IEC 14496-15), which derives from the ISOBMFF.

[0041] A video codec consists of an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form. A video encoder and/or a video decoder may also be separate from each other, i.e. need not form a codec. Typical encoders discard some information in the original video sequence in order to represent the video in a more compact form (that is, at lower bitrate).

[0042] Typical hybrid video encoders, for example many encoder implementations of ITU-T H.263 and H.264, encode the video information in two phases. Firstly pixel values in a certain picture area (or “block”) are predicted for example by motion compensation means (finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means (using the pixel values around the block to be coded in a specified manner). Secondly the prediction error, i.e. the difference between the predicted block of pixels and the original block of pixels, is coded. This is typically done by transforming the difference in pixel values using a specified transform (e.g. Discrete Cosine Transform (DCT) or a variant of it), quantizing the coefficients and entropy coding the quantized coefficients. By varying the fidelity of the quantization process, encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate).

[0043] In temporal prediction, the sources of prediction are previously decoded pictures (a.k.a. reference pictures, or reference frames). In intra block copy (IBC; a.k.a. intra-block-copy prediction and current picture referencing), prediction is applied similarly to temporal prediction but the reference picture is the current picture and only previously decoded samples can be referred in the prediction process. Inter-layer or inter-view prediction may be applied similarly to temporal prediction, but the reference picture is a decoded picture from another scalable layer or from another view, respectively. In some cases, inter prediction may refer to temporal prediction only, while in other cases inter prediction may refer collectively to temporal prediction and any of intra block copy, inter-layer prediction, and inter-view prediction provided that they are performed with the same or similar process as temporal prediction. Inter prediction or temporal prediction may sometimes be referred to as motion compensation or motion-compensated prediction.

[0044] Inter prediction, which may also be referred to as temporal prediction, motion compensation, or motion-compensated prediction, reduces temporal redundancy. In inter prediction the sources of prediction are previously decoded pictures. Intra prediction utilizes the fact that adjacent pixels within the same picture are likely to be correlated. Intra prediction can be performed in the spatial or transform domain, i.e., either sample values or transform coefficients can be predicted. Intra prediction is typically exploited in intra coding, where no inter prediction is applied.

[0045] One outcome of the coding procedure is a set of coding parameters, such as motion vectors and quantized transform coefficients. Many parameters can be entropy-coded more efficiently if they are predicted first from spatially or temporally neighboring parameters. For example, a motion vector may be predicted from spatially adjacent motion vectors and only the difference relative to the motion vector predictor may be coded. Prediction of coding parameters and intra prediction may be collectively referred to as in-picture prediction.

[0046] FIG. 4 shows a block diagram of a general structure of a video encoder. FIG. 4 presents an encoder for two layers, but it would be appreciated that presented encoder could be similarly extended to encode more than two layers. FIG. 4 illustrates a video encoder comprising a first encoder section 500 for a base layer and a second encoder section 502 for an enhancement layer. Each of the first encoder section 500 and the second encoder section 502 may comprise similar elements for encoding incoming pictures. The encoder sections 500, 502 may comprise a pixel predictor 302, 402, prediction error encoder 303, 403 and prediction error decoder 304, 404. FIG. 4 also shows an embodiment of the pixel predictor 302, 402 as comprising an inter-predictor 306, 406 (Pinter), an intrapredictor 308, 408 (Pintra), a mode selector 310, 410, a filter 316, 416 (F), and a reference frame memory 318, 418 (RFM). The pixel predictor 302 of the first encoder section 500 receives 300 base layer images (Io,_n) of a video stream to be encoded at both the inter-predictor 306 (which determines the difference between the image and a motion compensated reference frame 318) and the intra-predictor 308 (which determines a prediction for an image block based only on the already processed parts of the current frame or picture). The output of both the inter-predictor and the intra-predictor are passed to the mode selector 310. The intra-predictor 308 may have more than one intra-prediction modes. Hence, each mode may perform the intra-prediction and provide the predicted signal to the mode selector 310. The mode selector 310 also receives a copy of the base layer picture 300. Correspondingly, the pixel predictor 402 of the second encoder section 502 receives 400 enhancement layer images (Ii,_n) of a video stream to be encoded at both the inter-predictor 406 (which determines the difference between the image and a motion compensated reference frame 418) and the intra-predictor 408 (which determines a prediction for an image block based only on the already processed parts of the current frame or picture). The output of both the inter-predictor and the intra-predictor are passed to the mode selector 410. The intra-predictor 408 may have more than one intra-prediction modes. Hence, each mode may perform the intra-prediction and provide the predicted signal to the mode selector 410. The mode selector 410 also receives a copy of the enhancement layer picture 400.

[0047] Depending on which encoding mode is selected to encode the current block, the output of the inter-predictor 306, 406 or the output of one of the optional intra-predictor modes or the output of a surface encoder within the mode selector is passed to the output of the mode selector 310, 410. The output of the mode selector is passed to a first summing device 321 , 421. The first summing device may subtract the output of the pixel predictor 302, 402 from the base layer picture 300/enhancement layer picture 400 to produce a first prediction error signal 320, 420 (D_n) which is input to the prediction error encoder 303, 403.

[0048] The pixel predictor 302, 402 further receives from a preliminary reconstructor 339, 439 the combination of the prediction representation of the image block 312, 412 (P’_n) and the output 338, 438 (D’_n) of the prediction error decoder 304, 404. The preliminary reconstructed image 314, 414 (I’_n) may be passed to the intra-predictor 308, 408 and to the filter 316, 416. The filter 316, 416 receiving the preliminary representation may filter the preliminary representation and output a final reconstructed image 340, 440 (R’_n) which may be saved in a reference frame memory 318, 418. The reference frame memory 318 may be connected to the inter-predictor 306 to be used as the reference image against which a future base layer picture 300 is compared in inter-prediction operations. Subject to the base layer being selected and indicated to be the source for inter-layer sample prediction and/or inter-layer motion information prediction of the enhancement layer according to some embodiments, the reference frame memory 318 may also be connected to the inter-predictor 406 to be used as the reference image against which a future enhancement layer picture 400 is compared in inter-prediction operations. Moreover, the reference frame memory 418 may be connected to the inter-predictor 406 to be used as the reference image against which a future enhancement layer picture 400 is compared in interprediction operations.

[0049] Filtering parameters from the filter 316 of the first encoder section 500 may be provided to the second encoder section 502 subject to the base layer being selected and indicated to be the source for predicting the filtering parameters of the enhancement layer according to some embodiments.

[0050] The prediction error encoder 303, 403 comprises a transform unit 342, 442 (T) and a quantizer 344, 444 (Q). The transform unit 342, 442 transforms the first prediction error signal 320, 420 to a transform domain. The transform is, for example, the DCT transform. The quantizer 344, 444 quantizes the transform domain signal, e.g. the DCT coefficients, to form quantized coefficients.

[0051] The prediction error decoder 304, 404 receives the output from the prediction error encoder 303, 403 and performs the opposite processes of the prediction error encoder 303, 403 to produce a decoded prediction error signal 338, 438 which, when combined with the prediction representation of the image block 312, 412 at the second summing device 339, 439, produces the preliminary reconstructed image 314, 414. The prediction error decoder 304, 404 may be considered to comprise a dequantizer 346, 446 (Q ’), which dequantizes the quantized coefficient values, e.g. DCT coefficients, to reconstruct the transform signal and an inverse transformation unit 348, 448 (T¹), which performs the inverse transformation to the reconstructed transform signal wherein the output of the inverse transformation unit 348, 448 contains reconstructed block(s). The prediction error decoder may also comprise a block filter which may filter the reconstructed block(s) according to further decoded information and filter parameters.

[0052] The entropy encoder 330, 430 (E) receives the output of the prediction error encoder 303, 403 and may perform a suitable entropy encoding/variable length encoding on the signal to provide error detection and correction capability. The outputs of the entropy encoders 330, 430 may be inserted into a bitstream e.g. by a multiplexer 508 (M).

[0053] Accordingly, the examples described herein relate to coding and decoding of digital video material.

[0054] A video codec consists of an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form. Typically, the encoder discards some information in the original video sequence in order to represent the video in a more compact form (that is, at a lower bitrate).

[0055] Typical hybrid video codecs, for example ITU-T H.263 and H.264, encode the video information in two phases. Firstly pixel values in a certain picture area (or "block") are predicted for example by motion compensation means (finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means (using the pixel values around the block to be coded in a specified manner). Secondly the prediction error, i.e. the difference between the predicted block of pixels and the original block of pixels, is coded. This is typically done by transforming the difference in pixel values using a specified transform (e.g. Discrete Cosine Transform (DCT) or a variant of it), quantizing the coefficients and entropy coding the quantized coefficients. By varying the fidelity of the quantization process, the encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate). The encoding process is illustrated in FIG. 5.

[0056] In some video codecs, such as H.265/HEVC, video pictures are divided into coding units (CU) covering the area of the picture. A CU consists of one or more prediction units (PU) defining the prediction process for the samples within the CU and one or more transform units (TU) defining the prediction error coding process for the samples in the CU. Typically, a CU consists of a square block of samples with a size selectable from a predefined set of possible CU sizes. A CU with the maximum allowed size is typically named as the UCU (largest coding unit) or CTU (coding tree unit) and the video picture is divided into non-overlapping CTUs. A CTU can be further split into a combination of smaller CUs, e.g. by recursively splitting the CTU and resultant CUs. Each resulting CU typically has at least one PU and at least one TU associated with it. Each PU and TU can be further split into smaller PUs and TUs in order to increase the granularity of the prediction and prediction error coding processes, respectively. Each PU has prediction information associated with it defining what kind of a prediction is to be applied for the pixels within that PU (e.g. motion vector information for inter predicted PUs and intra prediction directionality information for intra predicted PUs). Similarly, each TU is associated with information describing the prediction error decoding process for the samples within the TU (including e.g. DCT coefficient information). It is typically signaled at the CU level whether prediction error coding is applied or not for each CU. In the case there is no prediction error residual associated with the CU, it can be considered there are no TUs for the said CU. The division of the image into CUs, and division of CUs into PUs and TUs is typically signaled in the bitstream allowing the decoder to reproduce the intended structure of these units.

[0057] The decoder reconstructs the output video by applying prediction means similar to the encoder to form a predicted representation of the pixel blocks (using the motion or spatial information created by the encoder and stored in the compressed representation) and prediction error decoding, where the prediction error decoding includes an inverse operation of the prediction error coding, recovering the quantized prediction error signal in the spatial pixel domain. After applying prediction and prediction error decoding means the decoder sums up the prediction and prediction error signals (pixel values) to form the output video frame. The decoder (and encoder) can also apply additional filtering means to improve the quality of the output video before passing it for display and/or storing it as prediction reference for the forthcoming frames in the video sequence. The decoding process is illustrated in FIG. 6A, FIG. 6B, FIG. 6C, and FIG. 6D.

[0058] Instead, or in addition to approaches utilizing sample value prediction and transform coding for indicating the coded sample values, a color palette based coding can be used. Palette based coding refers to a family of approaches for which a palette, i.e. a set of colors and associated indexes, is defined and the value for each sample within a coding unit is expressed by indicating its index in the palette. Palette based coding can achieve good coding efficiency in coding units with a relatively small number of colors (such as image areas which are representing computer screen content, like text or simple graphics). In order to improve the coding efficiency of palette coding different kinds of palette index prediction approaches can be utilized, or the palette indexes can be run-length coded to be able to represent larger homogenous image areas efficiently. Also, in the case the CU contains sample values that are not recurring within the CU, escape coding can be utilized. Escape coded samples are transmitted without referring to any of the palette indexes. Instead their values are indicated individually for each escape coded sample.

[0059] In typical video codecs the motion information is indicated with motion vectors associated with each motion compensated image block. Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder side) or decoded (in the decoder side) and the prediction source block in one of the previously coded or decoded pictures. In order to represent motion vectors efficiently those are typically coded differentially with respect to block specific predicted motion vectors. In typical video codecs the predicted motion vectors are created in a predefined way, for example by calculating the median of the encoded or decoded motion vectors of the adjacent blocks. Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor. In addition to predicting the motion vector values, the reference index of the previously coded/decoded picture can be predicted. The reference index is typically predicted from adjacent blocks and/or or co-located blocks in the temporal reference picture. Moreover, typical high efficiency video codecs employ an additional motion information coding/decoding mechanism, often called merging/merge mode, where all the motion field information, which includes a motion vector and corresponding reference picture index for each available reference picture list, is predicted and used without any modification/correction. Similarly, predicting the motion field information is carried out using the motion field information of adjacent blocks and/or co-located blocks in temporal reference pictures and the used motion field information is signaled among a motion field candidate list filled with motion field information of available adjacent/co-located blocks.

[0060] Typically, video codecs support motion compensated prediction from one source image (uni-prediction) and two sources (bi-prediction). In the case of uni-prediction a single motion vector is applied whereas in the case of bi-prediction two motion vectors are signaled and the motion compensated predictions from two sources are averaged to create the final sample prediction. In the case of weighted prediction the relative weights of the two predictions can be adjusted, or a signaled offset can be added to the prediction signal.

[0061] In addition to applying motion compensation for inter picture prediction, a similar approach can be applied to intra picture prediction. In this case the displacement vector indicates where from the same picture a block of samples can be copied to form a prediction of the block to be coded or decoded. This kind of intra block copying methods can improve the coding efficiency substantially in the presence of repeating structures within the frame - such as text or other graphics.

[0062] In typical video codecs the prediction residual after motion compensation or intra prediction is first transformed with a transform kernel (like DCT) and then coded. The reason for this is that often there still exists some correlation among the residual and the transform can in many cases help reduce this correlation and provide more efficient coding.

[0063] Typical video encoders utilize Lagrangian cost functions to find optimal coding modes, e.g. the desired macroblock mode and associated motion vectors. This kind of cost function uses a weighting factor X to tie together the (exact or estimated) image distortion due to lossy coding methods and the (exact or estimated) amount of information that is required to represent the pixel values in an image area:

C = D + XR (Equation 1)

[0064] In Equation 1, C is the Lagrangian cost to be minimized, D is the image distortion (e.g. mean squared error) with the mode and motion vectors considered, and Rthe number of bits needed to represent the required data to reconstruct the image block in the decoder (including the amount of data to represent the candidate motion vectors).

[0065] Scalable video coding refers to coding structure where one bitstream can contain multiple representations of the content at different bitrates, resolutions or frame rates. In these cases, the receiver can extract the desired representation depending on its characteristics (e.g. the resolution that matches best the display device). Alternatively, a server or a network element can extract the portions of the bitstream to be transmitted to the receiver depending on e.g. the network characteristics or processing capabilities of the receiver. A scalable bitstream typically consists of a "base layer" providing the lowest quality video available and one or more enhancement layers that enhance the video quality when received and decoded together with the lower layers. In order to improve coding efficiency for the enhancement layers, the coded representation of that layer typically depends on the lower layers. E.g. the motion and mode information of the enhancement layer can be predicted from lower layers. Similarly, the pixel data of the lower layers can be used to create a prediction for the enhancement layer.

[0066] A scalable video codec for quality scalability (also known as signal-to-noise or SNR) and/or spatial scalability may be implemented as follows. For a base layer, a conventional non-scalable video encoder and decoder is used. The reconstructed/decoded pictures of the base layer are included in the reference picture buffer for an enhancement layer. In H.264/AVC, HEVC, and similar codecs using reference picture list(s) for inter prediction, the base layer decoded pictures may be inserted into one or more reference picture lists for coding/decoding of an enhancement layer picture similarly to the decoded reference pictures of the enhancement layer. Consequently, the encoder may choose a baselayer reference picture as an inter-prediction reference and indicate its use typically with a reference picture index in the coded bitstream. The decoder decodes from the bitstream, for example from a reference picture index, that a base-layer picture is used as the inter prediction reference for the enhancement layer. When a decoded base-layer picture is used as the prediction reference for an enhancement layer, it is referred to as an inter-layer reference picture.

[0067] In addition to quality, the following scalability modes exist (1-3):

[0068] 1. Spatial scalability: base layer pictures are coded at a lower resolution than enhancement layer pictures.

[0069] 2. Bit-depth scalability: base layer pictures are coded at lower bit-depth (e.g. 8 bits) than enhancement layer pictures (e.g. 10 or 12 bits).

[0070] 3. Chroma format scalability: enhancement layer pictures provide higher fidelity in chroma (e.g. coded in 4:4:4 chroma format) than base layer pictures (e.g. 4:2:0 format).

[0071] In all of the above scalability cases, base layer information could be used to code the enhancement layer to minimize the additional bitrate overhead.

[0072] Scalability can be enabled in two basic ways: either 1) by introducing new coding modes for performing prediction of pixel values or syntax from lower layers of the scalable representation, or 2) by placing the lower layer pictures to the reference picture buffer (decoded picture buffer, DPB) of the higher layer. The first approach is more flexible and thus can provide better coding efficiency in most cases. However, the second, reference frame based scalability, approach can be implemented very efficiently with minimal changes to single layer codecs while still achieving the majority of the coding efficiency gains available. Essentially a reference frame based scalability codec can be implemented by utilizing the same hardware or software implementation for all the layers, just taking care of the DPB management by external means.

[0073] In order to be able to utilize parallel processing, images can be split into independently codable and decodable image segments (slices or tiles). Slices typically refer to image segments constructed of a certain number of basic coding units that are processed in default coding or decoding order, while tiles typically refer to image segments that have been defined as rectangular image regions that are processed at least to some extent as individual frames. [0074] Typically, video is encoded in a YUV or Y CbCr color space as that is found to reflect some characteristics of the human visual system and allows using a lower quality representation for Cb and Cr channels as human perception is less sensitive to the chrominance fidelity those channels represent.

[0075] In order to efficiently encode or decode chunks of video data in parallel the data needs to be divided into independent units. The smaller the independent units are, the friendlier the approach is for a parallel implementation as more data can be processed simultaneously. However, dividing the data into small independent units makes it difficult for typical video coding tools to perform efficiently as those are designed to take advantage of spatial and temporal correlations of data within and between pictures. As a result, atypical video encoder or decoder loses a significant amount of coding efficiency when configured to allow parallel computing based implementations.

[0076] There are different alternatives to make a video or image codec more amenable to parallel implementations. For example, ITU-T H.265 [refer e.g. to ITU-T recommendation H.265: “High efficiency video coding”, https://www.itii.mt/rec/T-REC-H.265] and H.266 [refer e.g. to ITU-T recommendation H.266: “Versatile video coding”, http://www.itii.int/rec/T-REC-H.266] support a tiling concept allowing a picture to be split into independently encodable and decodable units called tiles. Intra picture sample prediction, motion vector prediction, other predictive processes as well as context updating processes for arithmetic coding are all disallowed to access information across a tile boundary and thus a decoder or encoder can process tiles simultaneously without a need to be aware of the processing of other tiles.

[0077] Similarly, both H.265 and H.266 support a concept called wavefront parallel processing. This process allows multiple arithmetic coder instances to get launched for different lines of coding tree units and operate within the same picture simultaneously without interaction after the initial context models have been determined.

[0078] As described herein, a concept of a “parallel unit” is introduced, where the “parallel units” can be independently encoded and decoded in parallel. Independent processing is guaranteed by disabling sample and coding parameter dependencies between parallel units. In order to compensate for the missing reference data needed for various prediction processes, the herein described approach encodes and decodes a specifically generated set of reference data to substitute traditional data for picture areas belonging to different parallel units. Further described herein is how such data can be provided efficiently for intra coded pictures in the form of horizontal and vertical lines of samples, and for inter coded pictures in the form of motion vector fields indicated at a desired granularity. In addition, described herein is how such reference data can be further used to enhance the final image quality by adapting the post-processing means of a video codec when such data is available.

[0079] General concept

[0080] A parallel unit is defined here as a unit of video or a picture that can be reconstructed independently from a reconstruction of other parallel units within the same picture. That is, the reconstructed samples of one parallel unit before post-processing operations are independent from reconstructed samples from other parallel units within the same video frame. The syntax elements or coding of syntax elements belonging to one parallel unit may or may not be independent from syntax elements of other parallel units depending on the level of parallelism targeted by an application.

[0081] The obvious benefit of preventing dependencies between reconstructed sample values between parallel units is that the processing of such units can be performed in parallel without the need to wait for access to samples outside of the parallel unit being processed. The drawback is that a straightforward implementation of such approach using for example tiling or subpicture mechanisms of existing codecs results in significant penalties in coding efficiency as many tools in state-of-the-art video codecs rely on availability of neighboring sample values to perform different predictive operations crucial to the coding efficiency of the codecs.

[0082] In an embodiment of the herein described approach, the sample values and other information of a parallel unit is configured to be used outside of its own area and are made available in the form of reference lines coded prior to coding parallel units and decoded before decoding parallel units in a whole picture or other specified areas within a picture. The other specified areas may be defined to be for example tiles, subpictures, groups of coding tree units or a single coding tree unit.

[0083] A parallel unit can be configured or selected to represent certain areas in the picture, such as tiles, slices or subpictures, or a certain number of coding tree units or coding units. In an embodiment, an encoder indicates parallel unit based coding in or along a bitstream, e.g. in a sequence parameter set. In an embodiment, a decoder concludes to use parallel unit based decoding by decoding an indication from or along a bitstream, e.g. from a sequence parameter set. In an embodiment, an encoder indicates parallel unit areas in or along a bitstream, e.g. in a sequence parameter set or a picture parameter set. In an embodiment, a decoder decodes parallel unit areas from an indication from or along a bitstream, e.g. from a sequence parameter set or a picture parameter set. Such indication may for example be indicative of which picture partitioning unit, such as subpicture, slice, tile, coding tree Unit, or coding unit, is a parallel unit of its own.

[0084] In an embodiment, a parallel unit header may be defined, which may be as a part of coded parallel unit(s) and contain the data elements pertaining to a single parallel unit or multiple parallel units. The data elements in a parallel unit header may include, but not necessarily be limited to, a parallel unit identifier, parallel unit size, parallel unit type, and coding parameters such as one or more quantization parameters. In an embodiment each coding tree unit in a picture forms a parallel unit of its own. In another embodiment a parallel unit consists of an MxN array of coding tree units where M and N are indicated in a video bitstream. In another embodiment, the M and N values may be selected adaptively per picture or a group of pictures, for example for the purpose of load balancing in a heterogeneous processing architecture.

[0085] Selection of reference lines to code/decode

[0086] For an intra picture or an intra slice certain reference sample lines can be made available for parallel units to substitute traditional reference samples on the coding tree unit, coding unit, prediction unit or transform unit boundaries. Selection of reference lines to code or decode prior to coding or decoding parallel units can be made in different ways.

[0087] For example, as illustrated in FIG. 5 a certain number of lines of samples on the bottom boundary and right boundary inside the parallel units can be selected. Horizontal lines (including horizontal lines 501, 502, 503, 504, 505) can be collected to a horizontal reference line picture buffer Rhor 510 and vertical reference lines (including 511, 512, 513, 514, 515, 516, 517, 518) can be collected to a vertical reference line picture buffer R_ver 520. The number of lines on horizontal and vertical boundaries of parallel units selected for the reference line pictures can be the same or different. For example, one line of samples, two lines of samples or four lines of border samples can be selected from each parallel unit (e.g. ParUo 540 and ParUi 550) to be included in Rh_or 10 and R_ver 520.

[0088] FIG. 5 further shows a picture P 530 containing several parallel units including ParUo 540 and ParUi 550. The size of a parallel unit is given by ParU size 560, where the size of a parallel unit may include the size of one or more reference lines (e.g. 501).

[0089] FIG. 6A, FIG. 6B, FIG. 6C, and FIG. 6D illustrate different selections of samples to the horizontal and vertical reference line buffers.

[0090] FIG. 6A shows a case where the rightmost and bottom lines of each parallel unit are included. Thus, as shown in FIG. 6A, the rightmost vertical reference line buffers 640-1, 640-2, and 640-3 are selected for ParUo 610, ParUi 620, and ParUM 630, respectively, and the bottom reference line buffers 650-1, 650-2, and 650-3 are selected for ParUo 610, ParUi 620, and ParUM 630 respectively.

[0091] FIG. 6B demonstrates using top and leftmost lines of each parallel unit. Thus, as shown in FIG. 6B, the leftmost vertical reference line buffers 660-1, 660-2, and 660-3 are selected for ParUo 610, ParUi 620, and ParUM 630, respectively, and the top reference line buffers 670-1, 670-2, and 670- 3 are selected for ParUo 610, ParUi 620, and ParUM 630 respectively.

[0092] In the example of FIG. 6C, rightmost and bottom lines are selected (e.g. rightmost vertical reference line buffer 640-1 and top horizontal reference line buffer 670-1 is selected for ParUo 610), and in addition, the picture boundary lines are selected to be included in the set of horizontal and vertical reference lines. Thus, the horizontal picture boundary line 615 is selected to be included in the set of horizontal reference lines, and the vertical picture boundary line 625 is selected to be included in the set of vertical reference lines.

[0093] FIG. 6D illustrates using samples on both sides of parallel unit borders. For example, as shown in FIG. 6D, the bottom reference line buffer 650-1 is selected for ParUo 610 opposite and adjacent to the selection of the top reference line buffer 670-3 selected for ParUM 630. Similarly, as shown in FIG. 6D, the rightmost vertical reference line buffer 640-1 is selected for ParUo 610 opposite and adjacent to the selection of the leftmost vertical reference line buffer 660-2 selected for ParUi 620.

[0094] The configuration of FIG. 6D can be in some cases advantageous as that would allow the parallel units (610, 620, 630) to access not only information outside the area of the parallel unit (e.g. ParUo 610 accesses and uses information regarding top horizontal reference line buffer 670-3 of ParUM), but also have pre-knowledge of sample values inside the parallel unit itself (e.g. ParUo 610 accesses and uses information regarding bottom horizontal reference line buffer 650-1 of ParUo).

[0095] Conditional reference samples

[0096] In an embodiment, traditional reference samples are used as a prediction reference whenever they are available within the current parallel unit being encoded. When encoding of a parallel unit causes prediction from reference samples that are unavailable, e.g. from sample locations that have not been encoded or decoded yet or that are beyond the parallel unit boundaries, dedicated reference samples are encoded similarly to what is described with other embodiments e.g. related to FIG. 6A, FIG. 6B, FIG. 6C, and FIG. 6D. Dedicated reference samples may be coded and decoded selectively, only when traditional reference samples are unavailable. For example, with reference to FIG. 6B, if an encoder selects to use intra prediction of ParUo 610 from top-right directions, the reference lines of ParUi 620 are encoded as dedicated reference samples and used for prediction of ParUo 610.

[0097] In an embodiment, an encoder chooses between a first mode and a second mode for prediction. The first mode causes traditional reference samples to be used as the prediction reference. The second mode causes dedicated reference samples to be encoded and used as the prediction reference. The encoder uses a rate-distortion optimization function to select between the first mode and the second mode.

[0098] In an embodiment, dedicated reference samples used in coding of a parallel unit are encoded and included within the coded parallel unit. For example, dedicated reference samples may succeed, in coding order, a mode that causes the usage of the decided reference samples as the prediction reference.

[0099] In an embodiment, coded reference lines used in decoding of a coded parallel unit are decoded from the coded parallel unit.

[00100] Embodiments have been described above in relation to encoding of dedicated reference samples. Similar embodiments can be respectively realized related to decoding of dedicated reference samples.

[00101] Embodiments have been described above in relation to encoding of dedicated reference samples. Similar embodiments can be respectively realized related to encoding of dedicated reference motion vectors. For example, a dedicated reference motion vector may be encoded when an encoder chooses to use a mode using a motion vector candidate from a block that is unavailable, e.g. from a block that has not been encoded yet or is beyond the parallel unit boundaries. Likewise, similar embodiments can be respectively realized related to decoding of dedicated reference motion vectors.

[00102] Subsampled reference lines

[00103] Reference lines in the horizontal reference line buffer or the vertical reference line buffer, or both, can be subsampled to a representation in a reduced resolution. In this case, an upsampling operation can be applied to the samples in those buffers prior using them as reference samples in intra prediction or in other operations. FIG. 7 illustrates that process. [00104] For example, as shown in FIG. 7, reference lines of the vertical reference line buffers of the picture 705, including reference lines in the vertical reference line buffer 740-1 of ParUo 710 and reference lines in the vertical reference line buffer 740-2 of ParUi 720 can be subsampled using operation 715 to a first representation 760. A second operation 725 is applied to the first representation 760 to generate a reduced resolution representation 770 of vertical reference line buffers including vertical reference line buffer 740-1 and vertical reference line buffer 740-2. Thus operations 715 and 725 represent downsampling of the reference lines, such as reference lines in the vertical reference line buffer 740-1 of ParUo 710 and reference lines in the vertical reference line buffer 740-2 of ParUi 720.

[00105] Upsampling may be applied in the opposite direction, such that a first upsampling operation 735 is applied to the reduced resolution representation 770 to generate a first representation 760, and a second upsampling operation 745 is applied to the first representation 760 to generate reference lines or samples, such as reference lines or samples of vertical reference line buffer 740-1, or reference lines or samples of vertical reference line buffer 740-2. Such upsampling is performed prior using the reference lines as reference samples in intra prediction or in other operations.

[00106] Quality control for the reference lines

[00107] Quality of the reference lines can be controlled by varying the quantization parameter or parameters used in coding those lines similarly to the way quality of the primary coded picture P is controlled if the same or a similar kind of residual coding is applied to both reference lines and the primary picture. A quantization parameter or parameters for the horizontal and vertical reference lines can be the same or different. Coding of the quantization parameter or parameters of the reference lines can advantageously depend on the quantization parameter or parameters of the primary picture. For example, a quantization parameter for a reference line buffer can be calculated by adding an indicated value of a syntax element to a quantization parameter determined for the primary picture.

[00108] Chroma formats for reference lines

[00109] Reference lines and the primary picture may have the same or different chroma formats. For example, the primary picture may consist of samples in 4:2:0 sample chroma format (chroma having half vertical and half horizontal resolution), whereas the horizontal reference lines may be represented in 4:2:2 sampled chroma (chroma having full vertical and half horizontal resolution) and vertical reference lines may be represented in 4:4:0 sampled chroma (chroma having half vertical and full horizontal resolution). That kind or representation appears as a good selection in the case of a 4:2:0 primary picture as selecting reference lines according to the examples above, where the effect on chroma samples would result in those subsampling formats. However, that may not be ideal for handling and compressing the reference line buffers. Advantageously, the vertical reference lines can be represented in abitstream as horizontal lines (e.g. by rotating those lines 90 degrees counter-clockwise). With this representation horizontal and vertical reference lines originally having 4:2:2 sampling and 4:4:0 sampling, respectively, can be represented both as 4:2:2 sampled arrays.

[00110] Predicting a reference line buffer from the other [00111] With reference to FIG. 8, in order to code the reference line buffers Rhor 10 and R_ver 820 efficiently, samples in one buffer can be predicted using samples of the other buffer. An example of such process is provided in FIG. 8. In the example shown by FIG. 8, horizontal reference line buffer Rhor 810 is coded or decoded first (e.g. by decoding horizontal reference lines 801, 802, 803, 804, and/or 805, and/or horizontal reference lines of ParUo 840 and/or ParUi 850). Once reconstructed samples of Rhor 810 are available, those can be resampled to a reconstructed representation 830 to predict or to recover corresponding sample values in the vertical reference line buffer R_ver 820. Different interpolation operations can be then applied either between predicted samples of R_ver 820, or between already reconstructed samples in R_ver 820 and predicted samples of R_ver 820 to generate predictors for samples in R_ver 820 or predictors for certain lines or blocks of samples in R_ver 820. Coding of some or all of reconstructed sample values of R_ver 820 can then be limited to coding the difference between the reconstructed and predicted values at least for selected lines or blocks of samples leading to potential bitrate savings.

[00112] Thus, as shown in FIG. 8, once reconstructed samples of Rhor 810 are available, those can be resampled to 830 to predict or to recover corresponding sample values in the vertical reference line buffer Rver 820. The reconstructed samples 830 may be determined using operation 815 applied to Rhor 810. As an example, reconstructed sample 830-1 may be resampled to recover sample values 820-1, reconstructed sample 830-2 may be resampled to recover sample values 820-2, reconstructed sample 830-3 may be resampled to recover sample values 820-3, reconstructed sample 830-4 may be resampled to recover sample values 820-4, and reconstructed sample 830-5 may be resampled to recover sample values 820-5 and/or sample values 820-6. Vertical sample values 820-1, 820-2, 820-3, 820-4, 820-5, and 820-6 may represent any of vertical reference line buffers 811, 812, 813, 814, 815, 816, 817, and/or 818.

[00113] Inter prediction of a reference line

[00114] In an embodiment, a current block within an inter slice is (de)coded with intra coding. A reference line motion vector is derived for the current block. Methods for deriving a reference line motion vector may comprise but are not necessarily limited to copying a motion vector from an interceded neighboring block on the left for the derivation of a horizontal line buffer or from an inter-coded neighboring block above for the derivation of a vertical line buffer. A predicted reference line is derived using conventional motion compensation with the reference line motion vector as input.

[00115] In an embodiment, it is pre-defined, e.g. in a coding standard, or an encoder indicates in or along a bitstream, or a decoder decodes from or along a bitstream that decoding of a parallel unit does not depend on any other parallel units than collocated parallel units in reference pictures. In an embodiment, when inter prediction of a reference line causes a reference to a sample location outside a current parallel unit being (de)coded, the sample location is saturated to be within the parallel unit boundaries.

[00116] Representation as layers [00117] When encoding or decoding the reference line buffers Rhor (e.g. 510) and R_ver (e.g. 520), those reference line buffers may be packed spatially inside a frame, for example on top of the main picture to be encoded and decoded first. The reference line buffers (e.g. Rhor 10 and/or R_ver 520) can further be considered for example as pictures, slices, tiles or subpictures. Data comprising those reference lines may be temporally packed inside the main pictures as a separate picture and before the I-picture to be encoded and decoded first. Reference line data can be also marked as a base layer, base layers or generally layers that are required for coding and decoding the output picture P. As at least some of the parameters of the reference line buffers typically depend on the parameters of the primary picture P, the parameters of the reference line buffers can be derived directly from the parameters of the primary picture P. In this case, there is no need to indicate such information in the video bitstream but the derived parameters, such as the sizes of the reference line buffers can be used instead.

[00118] Using indicated reference samples in post-processing

[00119] In addition to acting as a source for intra prediction, the reference line buffers can be used also to guide different postprocessing operations. For example, a deblocking filter that is designed to smooth block boundaries in atypical video codec can take advantage of the sample values in reference line buffers and bias its decisions towards the sample values obtained from the reference line buffers. Similarly sample adaptive offset or adaptive loop filters can be configured to make decisions based on sample values in the reference line buffers. Or alternatively, the samples in the reference line buffers can be blended into the reconstructed image as a postprocessing operation additional to typical postprocessing steps. Such approaches are helpful especially at low bitrates as the sample values in the reference line buffers provide another estimate for the final sample values and the highly quantized sample values from the primary picture decoding can be refined to less noisy final values by combining the two estimates. The combination of the values from the primary picture reconstruction and values from the reference line buffers can be advantageously performed considering the quantization applied to those values. Based on the difference of the quantization the ones with expected higher quality can be given more weight in the combination process and the ones with expected lower quality can be given less weight.

[00120] Reference motion vector fields for inter coded pictures

[00121] Neighboring motion vector information can be provided to parallel units in the form of reference motion vector lines or reference motion vector fields. As illustrated in FIG. 5, FIG. 6A, FIG. 6B, FIG. 6C, and FIG. 6D for reference samples, similarly reference motion vectors can be collected and coded as reference lines containing the motion vector information to be used in predicting motion vectors inside independently decodable parallel units.

[00122] As motion vector fields tend to have different statistical properties compared to those of sample values, different approaches are beneficial for coding reference motion vector lines or fields. Also the selection for locations of the indicated or interpolated reference motion vectors can differ from locations of the indicated or interpolated reference samples. FIG. 9A illustrates an example selection of granularity and location of reference block motion vectors where the reference motion vectors are made available for square blocks with dimensions of one quarter of the dimensions of the parallel units. In practice different selections can be made and reference motion vectors can be generated for example for blocks representing 4x4 samples, especially if that is the accuracy of the motion vector fields used in predicting and coding the primary picture.

[00123] Thus, for example in FIG. 9A, a motion vector may be located in block 905 of ParUo 910, a motion vector may be located in block 915 ofParUi 920, and a motion vector may be located in block 925 of ParU_M 930.

[00124] Reference motion vectors can be indicated to all or a selected subset of reference motion vector locations. FIG. 9B illustrates one possible selection, where motion vectors (940, 950, 960, 970) are indicated for the block locations at the bottom-right comers of the parallel units. The rest of the reference motion vectors can be generated, for example, by interpolating between the indicated reference motion vectors. FIG. 9C demonstrates that process for the horizontal direction. For example, motion vectors 940-1, 940-2, and 940-3 may be determined by interpolation using at least motion vector 940, motion vectors 950-1, 950-2, and 950-3 may be determined by interpolation using at least motion vector 950, motion vectors 960-1, 960-2, and 960-3 may be determined by interpolation using at least motion vector 960, and motion vectors 970-1, 970-2, and 970-3 may be determined by interpolation using at least motion vector 970.

[00125] Motion vectors for the vertical columns of reference location can be generated similarly interpolating in the vertical direction between the indicated reference motion vectors. In some configurations it may be required to have reference motion vectors not only for the parallel unit boundaries, but also inside the parallel units. In that case those reference motion vectors can also be interpolated from the horizontal and vertical reference motion vector buffers, or more motion vectors can be indicated to help the interpolation process.

[00126] An alternative way to generate the reference motion vectors for the parallel unit boundaries is to use motion vectors from other pictures. That can be implemented, for example, by copying collocated motion vectors from selected reference pictures to the comer locations illustrated in FIG. 9B and interpolating between those, or by filling all the border reference motion vectors using such collocated motion vectors, or a combination of different approaches where it can be for example indicated in a video bitstream which approach to use within a picture or within a certain picture area.

[00127] In an embodiment motion vectors on the parallel unit borders are encoded or decoded as reference motion vector lines determining a motion vector for locations immediately above a parallel unit and immediately left of a parallel unit.

[00128] Combination of reference lines and reference motion vector fields

[00129] For some pictures only reference line buffers or only reference motion vector buffers may be generated. For example, if a picture or slice is intra coded, only reference line buffers may be generated. For some pictures both reference line buffers and reference motion vector buffers may be generated. Such picture could include for example high quality key pictures appearing in a coded representation as indicated intervals. Existence of reference line buffers or reference motion vector buffers can be also made dependent on the temporal layer a picture belongs to. For example, reference line buffers may be indicated only for the first temporal layer or a certain range of temporal layers. The presence of reference line buffers and reference motion vector buffers can also be indicated in a video bitstream per picture, per tile, per subpicture or per slice basis, allowing a video encoder do make the decision when those provide adequate benefit for coding of parallel units.

[00130] In an embodiment, a video or image decoder performs the following steps (1-4):

[00131] 1. Determine a block of samples in a picture.

[00132] 2. Determine a parallel unit the block of samples belongs to.

[00133] 3. Determine at least one row of reference samples above the parallel unit; wherein the row of reference samples is not dependent from decoded samples of any parallel unit within the same picture.

[00134] 4. Determine a location of at least one reference sample and: a. if the location of the reference sample belongs to the same parallel unit the block of samples belongs to, determine a value for the reference sample from the decoded samples of the parallel unit; or b. if the location of the reference sample is above the parallel unit the block of samples belongs to, determine a value for the reference sample from the row or reference samples coded independently from reconstructed samples of any parallel unit within the same picture.

[00135] The examples described herein may be targeted for the forthcoming H.267 video coding standard which is expected to be widely deployed in future video/imaging services, applications and products.

[00136] As a part of an open standard the feature can be recognized by generating video/image files or streams conforming to the standard (and utilizing the technique) and determining that a product is able to decode the files or streams.

[00137] FIG. 10 is an example apparatus 1000, which may be implemented in hardware, configured to implement video coding using parallel units, based on the examples described herein. The apparatus 1000 comprises a processor 1002, at least one non-transitory or transitory memory 1004 including computer program code 1005 (e.g. object-oriented code), wherein the at least one memory 1004 and the computer program code 1005 are configured to, with the at least one processor 1002, cause the apparatus to implement coding 1006 and/or decoding 1007 using parallel units, based on the examples described herein. The apparatus 1000 optionally includes a display or I/O 1008 that may be used to display content during coding 1006 and/or decoding 1007, or receive user input from for example a keypad. The apparatus 1000 includes one or more network (N/W) interfaces (I/F(s)) 1010. The N/W I/F(s) 1010 may be wired and/or wireless and communicate over the Intemet/other network(s) via any communication technique. The N/W I/F(s) 1010 may comprise one or more transmitters and one or more receivers. The N/W I/F(s) 1010 may comprise standard well-known components such as an amplifier, filter, frequency-converter, (de)modulator, and encoder/decoder circuitry(ies) and one or more antennas. In some examples, the processor 1002 is configured to implement the coding 1006 or decoding 1007 without use of memory 1004. Although coding 1006 and decoding 1007 are shown are separate items, they may be combined as one item, block, or unit to form a codec.

[00138] The memory 1004 may be implemented using any suitable data storage technology, such as semiconductor based memory devices, flash memory, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The memory 1004 may comprise a database for storing data. The memory 1004 may be volatile or non-volatile. Interface 1012 enables data communication between the various items of apparatus 1000, as shown in FIG. 10. Interface 1012 may be one or more buses, such as address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, and the like. The apparatus 1000 need not comprise each of the features mentioned, or may comprise other features as well. The apparatus 1000 may be an embodiment of any of the apparatuses shown in FIGS. 1 through 9C (inclusive), including any combination of those. The apparatus 1000 may be an encoder or decoder.

[00139] FIG. 11 is an example method 1100 to implement video coding using parallel units, based on the examples described herein. At 1110, the method includes determining a block of samples in a picture. At 1120, the method includes determining a parallel unit the block of samples belongs to. At 1130, the method includes determining at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from decoded samples of any parallel unit within the picture. At 1140, the method includes determining a location of at least one reference sample. At 1150, the method includes in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, determining a value for the at least one reference sample from decoded samples of the parallel unit the block of samples belongs to; or in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, determining a value for the at least one reference sample from the row of reference samples coded independently from reconstructed samples of any parallel unit within the picture. Method 1100 may be implemented with a decoder apparatus or a codec apparatus, such as apparatus 50 or apparatus 1000.

[00140] FIG. 12 is an example method 1200 to implement video coding using parallel units, based on the examples described herein. At 1210, the method includes encoding a block of samples in a picture. At 1220, the method includes encoding a parallel unit the block of samples belongs to. At 1230, the method includes encoding at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from encoded samples of any parallel unit within the picture. At 1240, the method includes determining a location of at least one reference sample. At 1250, the method includes wherein in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, a value for the at least one reference sample is determined from encoded samples of the parallel unit the block of samples belongs to; or wherein in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, a value for the at least one reference sample is determined from the row of reference samples coded independently from samples of any parallel unit within the picture. Method 1200 may be implemented with an encoder apparatus or a codec apparatus, such as apparatus 50 or apparatus 1000.

[00141] References to a ‘computer’, ‘processor’, etc. should be understood to encompass not only computers having different architectures such as single/multi-processor architectures and sequential /parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGAs), application specific circuits (ASICs), signal processing devices and other processing circuitry. References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device such as instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device, etc.

[00142] As used herein, the term ‘circuitry’, ‘circuit’ and variants may refer to any of the following: (a) hardware circuit implementations, such as implementations in analog and/or digital circuitry, and (b) combinations of circuits and software (and/or firmware), such as (as applicable): (i) a combination of processor(s) or (ii) portions of processor(s)/software including digital signal processor(s), software, and memory(ies) that work together to cause an apparatus to perform various functions, and (c) circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present. As a further example, as used herein, the term ‘circuitry’ would also cover an implementation of merely a processor (or multiple processors) or a portion of a processor and its (or their) accompanying software and/or firmware. The term ‘circuitry’ would also cover, for example and if applicable to the particular element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, or another network device. Circuitry or circuit may also be used to mean a function or a process used to execute a method.

[00143] The following examples are provided (1-26) among the herein described embodiments.

[00144] Example 1: an example apparatus includes at least one processor; and at least one non- transitory memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: determine a block of samples in a picture; determine a parallel unit the block of samples belongs to; determine at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from decoded samples of any parallel unit within the picture; determine a location of at least one reference sample; and in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, determine a value for the at least one reference sample from decoded samples of the parallel unit the block of samples belongs to; or in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, determine a value for the at least one reference sample from the row of reference samples coded independently from reconstructed samples of any parallel unit within the picture. [00145] Example 2: the apparatus of example 1, wherein there are no sample and coding dependencies between parallel units.

[00146] Example 3: the apparatus of any one of examples 1 to 2, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: determine a plurality of reference lines comprising at least some sample values of the picture prior to decoding the parallel unit, wherein the reference lines are configured to be used outside of an area comprising the parallel unit.

[00147] Example 4: the apparatus of example 3, wherein the reference lines are selected from a left, right, top or bottom side of the parallel unit, from both sides of a boundary of the parallel unit, or from a picture boundary.

[00148] Example 5: the apparatus of any one of examples 3 to 4, wherein the plurality of reference lines form a horizontal reference line picture buffer and a vertical reference line picture buffer.

[00149] Example 6: the apparatus of example 5, wherein reference lines in the horizontal reference line picture buffer and/or the vertical reference line picture buffer are subsampled to a representation in a reduced resolution, the reduced resolution being used for prediction.

[00150] Example 7 : the apparatus of any one of examples 5 to 6, wherein samples of the horizontal reference line picture buffer are used to predict samples of the vertical reference line picture buffer, or samples of the vertical reference line picture buffer are used to predict samples of the horizontal reference line picture buffer.

[00151] Example 8: the apparatus of any one of examples 1 to 7, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: decode parallel unit areas from an indication from or along a bitstream.

[00152] Example 9: the apparatus of any one of examples 1 to 8, wherein the reference samples and the picture have different chroma formats.

[00153] Example 10: the apparatus of any one of examples 1 to 9, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: determine a first motion vector field at a first sub-block location of the parallel unit; and determine a second motion vector field for a second sub-block location with interpolation using at least the first motion vector field of the first sub-block location.

[00154] Example 11: an example apparatus includes at least one processor; and at least one non- transitory memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: encode a block of samples in a picture; encode a parallel unit the block of samples belongs to; encode at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from encoded samples of any parallel unit within the picture; determine a location of at least one reference sample; and wherein in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, a value for the at least one reference sample is determined from encoded samples of the parallel unit the block of samples belongs to; or wherein in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, a value for the at least one reference sample is determined from the row of reference samples coded independently from samples of any parallel unit within the picture.

[00155] Example 12: the apparatus of example 11, wherein there are no sample and coding dependencies between parallel units.

[00156] Example 13: the apparatus of any one of examples 11 to 12, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: encode a plurality of reference lines comprising at least some sample values of the picture prior to encoding the parallel unit, wherein the reference lines are configured to be used outside of an area comprising the parallel unit.

[00157] Example 14: the apparatus of example 13, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: select the reference lines from a left, right, top or bottom side of the parallel unit, from both sides of a boundary of the parallel unit, or from a picture boundary.

[00158] Example 15: the apparatus of any one of examples 13 to 14, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: encode the plurality of reference lines into a horizontal reference line picture buffer and a vertical reference line picture buffer.

[00159] Example 16: the apparatus of example 15, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: subsample reference lines in the horizontal reference line picture buffer and/or the vertical reference line picture buffer to a representation in a reduced resolution, the reduced resolution being used for prediction.

[00160] Example 17: the apparatus of any one of examples 15 to 16, wherein samples of the horizontal reference line picture buffer are used to predict samples of the vertical reference line picture buffer, or samples of the vertical reference line picture buffer are used to predict samples of the horizontal reference line picture buffer.

[00161] Example 18: the apparatus of any one of examples 11 to 17, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: indicate parallel unit areas in or along a bitstream.

[00162] Example 19: the apparatus of any one of examples 11 to 18, wherein the reference samples and the picture have different chroma formats.

[00163] Example 20: the apparatus of any one of examples 11 to 19, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: indicate a first motion vector field at a first sub-block location of the parallel unit; wherein a second motion vector field for a second sub-block location is determined with interpolation using at least the first motion vector field of the first sub-block location. [00164] Example 21: an example method includes determining a block of samples in a picture; determining a parallel unit the block of samples belongs to; determining at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from decoded samples of any parallel unit within the picture; determining a location of at least one reference sample; and in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, determining a value for the at least one reference sample from decoded samples of the parallel unit the block of samples belongs to; or in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, determining a value for the at least one reference sample from the row of reference samples coded independently from reconstructed samples of any parallel unit within the picture.

[00165] Example 22: an example method includes encoding a block of samples in a picture; encoding a parallel unit the block of samples belongs to; encoding at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from encoded samples of any parallel unit within the picture; determining a location of at least one reference sample; and wherein in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, a value for the at least one reference sample is determined from encoded samples of the parallel unit the block of samples belongs to; or wherein in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, a value for the at least one reference sample is determined from the row of reference samples coded independently from samples of any parallel unit within the picture.

[00166] Example 23: an example a non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations is provided, the operations comprising: determining a block of samples in a picture; determining a parallel unit the block of samples belongs to; determining at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from decoded samples of any parallel unit within the picture; determining a location of at least one reference sample; and in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, determining a value for the at least one reference sample from decoded samples of the parallel unit the block of samples belongs to; or in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, determining a value for the at least one reference sample from the row of reference samples coded independently from reconstructed samples of any parallel unit within the picture.

[00167] Example 24: an example non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations, the operations comprising: encoding a block of samples in a picture; encoding a parallel unit the block of samples belongs to; encoding at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from encoded samples of any parallel unit within the picture; determining a location of at least one reference sample; and wherein in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, a value for the at least one reference sample is determined from encoded samples of the parallel unit the block of samples belongs to; or wherein in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, a value for the at least one reference sample is determined from the row of reference samples coded independently from samples of any parallel unit within the picture.

[00168] Example 25: an example apparatus includes means for determining a block of samples in a picture; means for determining a parallel unit the block of samples belongs to; means for determining at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from decoded samples of any parallel unit within the picture; means for determining a location of at least one reference sample; and in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, determining a value for the at least one reference sample from decoded samples of the parallel unit the block of samples belongs to; or in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, determining a value for the at least one reference sample from the row of reference samples coded independently from reconstructed samples of any parallel unit within the picture.

[00169] Example 26: an example apparatus includes means for encoding a block of samples in a picture; means for encoding a parallel unit the block of samples belongs to; means for encoding at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from encoded samples of any parallel unit within the picture; means for determining a location of at least one reference sample; and wherein in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, a value for the at least one reference sample is determined from encoded samples of the parallel unit the block of samples belongs to; or wherein in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, a value for the at least one reference sample is determined from the row of reference samples coded independently from samples of any parallel unit within the picture.

[00170] It should be understood that the foregoing description is only illustrative. Various alternatives and modifications may be devised by those skilled in the art. For example, features recited in the various dependent claims could be combined with each other in any suitable combination(s). In addition, features from different embodiments described above could be selectively combined into a new embodiment. Accordingly, the description is intended to embrace all such alternatives, modifications and variances which fall within the scope of the appended claims.

[00171] When a reference number as used herein is of the form y-x, this means that the referred to item is an instantiation of (or type of) reference number y or, for example if reference number y alone does not exist, a common or similar entity r common or similar concept. For example, 640-1 and 640- 2 in FIG. 6A are instantiations of (e.g. a first and second instantiation) of a common or similar rightmost vertical reference line buffer concept.

[00172] In the figures, lines represent couplings or operations and arrows represent directional couplings or operations or direction of data flow in the case of use for an apparatus or system, and lines represent couplings or operations and arrows represent transitions or operations or direction of data flow in the case of use for a method or signaling diagram.

[00173] The following acronyms and abbreviations that may be found in the specification and/or the drawing figures are defined as follows (acronyms may be appended together or with other characters or numbers, e.g. by using a dash/hyphen

or using parentheses (“()”))

3 GPP 3rd generation partnership project

4G fourth generation of broadband cellular network technology

5G fifth generation cellular network technology

802.x family of IEEE standards dealing with local area networks and metropolitan area networks

ASIC application specific integrated circuit

AVC advanced video coding

CDMA code-division multiple access

CTU coding tree unit

CU coding unit

DCT discrete cosine transform

DPB decoded picture buffer

DSP digital signal processor

FDMA frequency division multiple access

FPGA field programmable gate array

GSM global system for mobile communications

H.222.0 MPEG-2 systems, standard for the generic coding of moving pictures and associated audio information

H.2xx family of video coding standards in the domain of the ITU-T

HEVC high efficiency video coding

HMD head mounted display

IBC intra block copy

IEC International Electrotechnical Commission

IEEE Institute of Electrical and Electronics Engineers

I/F interface

IMD integrated messaging device

IMS instant messaging service

I/O input/output loT internet of things IP internet protocol

ISO International Organization for Standardization

ISOBMFF ISO base media file format

ITU International Telecommunication Union

ITU-T ITU Telecommunication Standardization Sector

UCU largest coding unit

UTE long-term evolution

MMS multimedia messaging service

MPEG-2 H.222/H.262 as defined by the ITU, where MPEG is moving picture experts group

NAL network abstraction layer

N/W network

P picture

ParU parallel unit

PC personal computer

PDA personal digital assistant

PID packet identifier

PLC power line communication

PU prediction unit

R row(s)

RFID radio frequency identification

RFM reference frame memory

SMS short messaging service

SNR signal-to-noise

TCP -IP transmission control protocol-internet protocol

TDMA time divisional multiple access

TS transport stream

TU transform unit

TV television

UICC universal integrated circuit card

UMTS universal mobile telecommunications system

USB universal serial bus

WUAN wireless local area network

YCbCr color space where Y is the luma component, Cb is the blue -difference chroma component, and Cr is the red-difference chroma component

YUV color space where Y is the luma component and U and V are the chrominance

(color) components

Claims

CLAIMS What is claimed is:

1. An apparatus comprising: at least one processor; and at least one non-transitory memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: determine a block of samples in a picture; determine a parallel unit the block of samples belongs to; determine at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from decoded samples of any parallel unit within the picture; determine a location of at least one reference sample; and in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, determine a value for the at least one reference sample from decoded samples of the parallel unit the block of samples belongs to; or in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, determine a value for the at least one reference sample from the row of reference samples coded independently from reconstructed samples of any parallel unit within the picture.

2. The apparatus of claim 1, wherein there are no sample and coding dependencies between parallel units.

3. The apparatus of any one of claims 1 to 2, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: determine a plurality of reference lines comprising at least some sample values of the picture prior to decoding the parallel unit, wherein the reference lines are configured to be used outside of an area comprising the parallel unit.

4. The apparatus of claim 3, wherein the reference lines are selected from a left, right, top or bottom side of the parallel unit, from both sides of a boundary of the parallel unit, or from a picture boundary.

5. The apparatus of any one of claims 3 to 4, wherein the plurality of reference lines form a horizontal reference line picture buffer and a vertical reference line picture buffer.

6. The apparatus of claim 5, wherein reference lines in the horizontal reference line picture buffer and/or the vertical reference line picture buffer are subsampled to a representation in a reduced resolution, the reduced resolution being used for prediction.

7. The apparatus of any one of claims 5 to 6, wherein samples of the horizontal reference line picture buffer are used to predict samples of the vertical reference line picture buffer, or samples of the vertical reference line picture buffer are used to predict samples of the horizontal reference line picture buffer.

29

8. The apparatus of any one of claims 1 to 7, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: decode parallel unit areas from an indication from or along a bitstream.

9. The apparatus of any one of claims 1 to 8, wherein the reference samples and the picture have different chroma formats.

10. The apparatus of any one of claims 1 to 9, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: determine a first motion vector field at a first sub-block location of the parallel unit; and determine a second motion vector field for a second sub-block location with interpolation using at least the first motion vector field of the first sub-block location.

11. An apparatus comprising: at least one processor; and at least one non-transitory memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: encode a block of samples in a picture; encode a parallel unit the block of samples belongs to; encode at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from encoded samples of any parallel unit within the picture; determine a location of at least one reference sample; and wherein in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, a value for the at least one reference sample is determined from encoded samples of the parallel unit the block of samples belongs to; or wherein in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, a value for the at least one reference sample is determined from the row of reference samples coded independently from samples of any parallel unit within the picture.

12. The apparatus of claim 11, wherein there are no sample and coding dependencies between parallel units.

13. The apparatus of any one of claims 11 to 12, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: encode a plurality of reference lines comprising at least some sample values of the picture prior to encoding the parallel unit, wherein the reference lines are configured to be used outside of an area comprising the parallel unit.

14. The apparatus of claim 13, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to:

30 select the reference lines from a left, right, top or bottom side of the parallel unit, from both sides of a boundary of the parallel unit, or from a picture boundary.

15. The apparatus of any one of claims 13 to 14, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: encode the plurality of reference lines into a horizontal reference line picture buffer and a vertical reference line picture buffer.

16. The apparatus of claim 15, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: subsample reference lines in the horizontal reference line picture buffer and/or the vertical reference line picture buffer to a representation in a reduced resolution, the reduced resolution being used for prediction.

17. The apparatus of any one of claims 15 to 16, wherein samples of the horizontal reference line picture buffer are used to predict samples of the vertical reference line picture buffer, or samples of the vertical reference line picture buffer are used to predict samples of the horizontal reference line picture buffer.

18. The apparatus of any one of claims 11 to 17, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: indicate parallel unit areas in or along a bitstream.

19. The apparatus of any one of claims 11 to 18, wherein the reference samples and the picture have different chroma formats.

20. The apparatus of any one of claims 11 to 19, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: indicate a first motion vector field at a first sub-block location of the parallel unit; wherein a second motion vector field for a second sub-block location is determined with interpolation using at least the first motion vector field of the first sub-block location.

21. A method comprising: determining a block of samples in a picture; determining a parallel unit the block of samples belongs to; determining at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from decoded samples of any parallel unit within the picture; determining a location of at least one reference sample; and in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, determining a value for the at least one reference sample from decoded samples of the parallel unit the block of samples belongs to; or in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, determining a value for the at least one reference sample from the row of reference samples coded independently from reconstructed samples of any parallel unit within the picture.

22. A method comprising: encoding a block of samples in a picture; encoding a parallel unit the block of samples belongs to; encoding at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from encoded samples of any parallel unit within the picture; determining a location of at least one reference sample; and wherein in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, a value for the at least one reference sample is determined from encoded samples of the parallel unit the block of samples belongs to; or wherein in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, a value for the at least one reference sample is determined from the row of reference samples coded independently from samples of any parallel unit within the picture.

23. A non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations, the operations comprising: determining a block of samples in a picture; determining a parallel unit the block of samples belongs to; determining at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from decoded samples of any parallel unit within the picture; determining a location of at least one reference sample; and in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, determining a value for the at least one reference sample from decoded samples of the parallel unit the block of samples belongs to; or in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, determining a value for the at least one reference sample from the row of reference samples coded independently from reconstructed samples of any parallel unit within the picture.

24. A non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations, the operations comprising: encoding a block of samples in a picture; encoding a parallel unit the block of samples belongs to; encoding at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from encoded samples of any parallel unit within the picture; determining a location of at least one reference sample; and wherein in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, a value for the at least one reference sample is determined from encoded samples of the parallel unit the block of samples belongs to; or wherein in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, a value for the at least one reference sample is determined from the row of reference samples coded independently from samples of any parallel unit within the picture.

25. An apparatus comprising: means for determining a block of samples in a picture; means for determining a parallel unit the block of samples belongs to; means for determining at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from decoded samples of any parallel unit within the picture; means for determining a location of at least one reference sample; and in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, determining a value for the at least one reference sample from decoded samples of the parallel unit the block of samples belongs to; or in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, determining a value for the at least one reference sample from the row of reference samples coded independently from reconstructed samples of any parallel unit within the picture.

26. An apparatus comprising: means for encoding a block of samples in a picture; means for encoding a parallel unit the block of samples belongs to; means for encoding at least one row of reference samples not belonging to the parallel unit, wherein the row of reference samples not belonging to the parallel unit is not dependent from encoded samples of any parallel unit within the picture; means for determining a location of at least one reference sample; and wherein in response to the location of the at least one reference sample belonging to the parallel unit the block of samples belongs to, a value for the at least one reference sample is determined from encoded samples of the parallel unit the block of samples belongs to; or wherein in response to the location of the at least one reference sample being at a location not belonging to the parallel unit, a value for the at least one reference sample is determined from the row of reference samples coded independently from samples of any parallel unit within the picture.

33