WO2016137369A1

WO2016137369A1 - Encoding and decoding of pictures in a video

Info

Publication number: WO2016137369A1
Application number: PCT/SE2015/050211
Authority: WO
Inventors: Jonatan Samuelsson; Martin Pettersson; Per Wennersten
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2015-02-25
Filing date: 2015-02-25
Publication date: 2016-09-01

Abstract

There are provided mechanisms for decoding a bitstream comprising a coded picture of a video sequence. The coded picture consists of a first intra coded block of samples and at least a second intra coded block of samples. The second intra coded block of samples succeeds the first intra coded block of samples in a bitstream order. The method comprises reconstructing the second intra coded block of samples before reconstructing the first intra coded block of samples. There are provided mechanisms for encoding a picture of a video sequence. The picture comprises a first block of samples and at least one of a second block of samples and a third block of samples. The second block of samples is the right spatially neighboring block of samples to the first block of samples. The third block of samples is the bottom spatially neighboring block of samples to the first block of samples. The method comprises predicting at least one of the second block of samples and the third block of samples with intra prediction. The method comprises predicting, with intra prediction, the first block of samples from at least one of the second block of samples and the third block of samples that is predicted with intra prediction.

Description

Encoding and decoding of pictures of a video using intra coding

TECHNICAL FIELD

Embodiments herein relate to the field of video coding, such as High Efficiency Video Coding (HEVC) or the like. In particular, embodiments herein relate to a method and a decoder for decoding a bitstream comprising a coded picture of a video sequence as well as a method and an encoder for encoding a picture of a video sequence. Corresponding computer programs therefor are also disclosed. BACKGROUND

State-of-the-art video coding standards are based on block-based linear transforms, such as a Discrete Cosine Transform (DCT). H.264/AVC and its predecessors define a macroblock as a basic processing unit that specifies the decoding process, typically consisting of 16x16 samples. A macroblock can be further divided into transform blocks, and into prediction blocks. Depending on a standard, the transform blocks and prediction blocks may have a fixed size or can be changed on a per-macroblock basis in order to adapt to local video characteristics.

The successor of H.264/AVC, H.265/HEVC (HEVC in short), replaces the 16x16 sample macroblocks with so-called coding tree units (CTUs) that can use the following block structures: 64x64, 32x32, 16x16 or 8x8 samples, where a larger block size usually implies increased coding efficiency. Larger block sizes are particularly beneficial for high-resolution video content. All CTUs in a picture are of the same size. In HEVC it is also possible to better sub-partition the picture into variable sized structures in order to adapt to different complexity and memory requirements. When encoding a sequence of pictures constituting a video with HEVC, each picture 5 is first split into CTUs. A CTU 14 consists of three blocks, one luma and two chroma, and the associated syntax elements. These luma and chroma blocks are called coding tree blocks (CTB). A CTB has the same size as a CTU, but may be further split into smaller blocks - the so called coding blocks (CBs), using a tree structure and quadtree-like signaling. A size of a CB can vary from 8x8 pixels up to the size of a CTB. A luma CB, two chroma CBs and the associated syntax form a coding unit 15 (CU).

Compressing a CU 15 is performed in two steps. In a first step, pixel values in the CU 15 are predicted from previously coded pixel values either in the same picture or in previous pictures. In a second step, a difference between the predicted pixel values and the actual values, the so-called residual, is calculated and transformed with e.g. a DCT.

Prediction can be performed for an entire CU 15 at once or on smaller parts separately. This is done by defining Prediction Units (PUs) 16, which may be the same size as the CU 15 for a given set of pixels, or further split hierarchically into smaller PUs. Each PU 16 defines separately how it will predict its pixel values from previously coded pixel values.

In a similar fashion, the transforming of the prediction error is done in Transform Units (TUs) 17, which may be the same size as CUs or split hierarchically into smaller sizes. The prediction error is transformed separately for each TU 17. A PU 16 size can vary from 4x4 to 64x64 pixels for its luma component, whereas a TU 17 size can vary from 4x4 to 32x32 pixels. Different PU 16 and TU 17 partitions as well as CU 15 and CTU 14 partitions are illustrated in Fig. 1. Prediction units have their pixel values predicted either based on the values of neighboring pixels in the same picture (intra prediction), or based on pixel values from one or more previous pictures (inter prediction). A picture that is only allowed to use intra-prediction for its blocks is called an intra picture (l-picture). The first picture in a sequence must be an intra picture. Another example of when intra pictures are used is for so-called key frames which provide random access points to the video stream. An inter picture may contain a mixture of intra-prediction blocks and inter-prediction blocks. An inter picture may be a predictive picture (P-picture) that uses one picture for prediction, and a bi-directional picture (B-picture) that uses two pictures for prediction.

Prior to encoding, a picture 5 may be split up into several tiles, each consisting of MxN CTUs, where M and N are integers. When encoding, the tiles are processed in the raster scan order (read horizontally from left to right until the whole line is processed and then move to the line below and repeat the same process) and the CTUs inside each tile are processed in the raster scan order. The CUs in a CTU 14 as well as PUs and TUs within a CU are processed in Z-scan order. This process is illustrated in Fig. 2. The same raster scan order and Z-scan order are applied when decoding a bitstream.

When decoding a CU 15 in a video bitstream, the syntax elements for the CU 15 are first parsed from the bitstream. The syntax elements are then used to reconstruct the corresponding decoded block of samples in the decoded picture. SUMMARY

In current video coding standards an intra block is typically predicted/reconstructed by using its top and/or left spatially neighboring blocks as a reference since only these are available when predicting/reconstructing the current block due to the order in which the blocks are scanned. This means that, even if both top and left spatially neighboring blocks are used when predicting/reconstructing the current block, only half of the available spatially neighboring blocks is used. Having less spatially neighboring blocks used in prediction means having a worse quality of prediction. Worse quality of prediction means larger difference between the original block of pixels and the predicted block of pixels. Taking into account that this difference is further transformed and quantized prior to packing it in a bitstream, and the larger difference means more information to send, it is clear that worse prediction results in a higher bitrate.

Thus, in order to reduce the bitrate, i.e. to achieve higher compression efficiency, it is of utter importance that the intra blocks are predicted as accurately as possible.

This and other objectives are met by embodiments as disclosed herein.

A first aspect of the embodiments defines a method, performed by a decoder, for decoding a bitstream comprising a coded picture of a video sequence. The coded picture consists of a first intra coded block of samples and at least a second intra coded block of samples. The second intra coded block of samples succeeds the first intra coded block of samples in a bitstream order. The method comprises reconstructing the second intra coded block of samples before reconstructing the first intra coded block of samples. A second aspect of the embodiments defines a decoder for decoding a bitstream comprising a coded picture of a video sequence. The coded picture consists of a first intra coded block of samples and at least a second intra coded block of samples. The second intra coded block of samples succeeds the first intra coded block of samples in a bitstream order. The decoder comprises processing means operative to reconstruct the second intra coded block of samples before reconstructing the first intra coded block of samples.

A third aspect of the embodiments defines a computer program for decoding a bitstream comprising a coded picture of a video sequence. The coded picture consists of a first intra coded block of samples and at least a second intra coded block of samples. The second intra coded block of samples succeeds the first intra coded block of samples in a bitstream order. The computer program comprises code means which, when run on a computer, causes the computer to reconstruct the second intra coded block of samples before reconstructing the first intra coded block of samples. A fourth aspect of the embodiments defines a computer program product comprising computer readable means and a computer program, according to the third aspect, stored on the computer readable means.

A fifth aspect of the embodiments defines a method, performed by an encoder, for encoding a picture of a video sequence. The picture comprises a first block of samples and at least one of a second block of samples and a third block of samples. The second block of samples is the right spatially neighboring block of samples to the first block of samples. The third block of samples is the bottom spatially neighboring block of samples to the first block of samples. The method comprises predicting at least one of the second block of samples and the third block of samples with intra prediction. The method comprises predicting, with intra prediction, the first block of samples from at least one of the second block of samples and the third block of samples that is predicted with intra prediction.

A sixth aspect of the embodiments defines an encoder for encoding a picture of a video sequence. The picture comprises a first block of samples and at least one of a second block of samples and a third block of samples. The second block of samples is the right spatially neighboring block of samples to the first block of samples. The third block of samples is the bottom spatially neighboring block of samples to the first block of samples. The encoder comprises processing means operative to predict at least one of the second block of samples and the third block of samples with intra prediction. The encoder comprises processing means operative to predict, with intra prediction, the first block of samples from at least one of the second block of samples and the third block of samples that is predicted with intra prediction.

A seventh aspect of the embodiments defines a computer program for encoding a picture of a video sequence. The picture comprises a first block of samples and at least one of a second block of samples and a third block of samples. The second block of samples is the right spatially neighboring block of samples to the first block of samples. The third block of samples is the bottom spatially neighboring block of samples to the first block of samples. The computer program comprises code means which, when run on a computer, causes the computer to predict at least one of the second block of samples and the third block of samples with intra prediction. The computer program comprises code means which, when run on a computer, causes the computer to predict, with intra prediction, the first block of samples from at least one of the second block of samples and the third block of samples that is predicted with intra prediction. An eighth aspect of the embodiments defines a computer program product comprising computer readable means and a computer program, according to the seventh aspect, stored on the computer readable means.

Advantageously, at least some of the embodiments provide higher compression efficiency.

It is to be noted that any feature of the first, second, third, fourth, fifth, sixth, seventh and eighth aspects may be applied to any other aspect, whenever appropriate. Likewise, any advantage of the first aspect may equally apply to the second, third, fourth, fifth, sixth, seventh and eighth aspect respectively, and vice versa. Other objectives, features and advantages of the enclosed embodiments will be apparent from the following detailed disclosure, from the attached dependent claims and from the drawings.

Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the element, apparatus, component, means, step, etc." are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated. BRIEF DESCRIPTION OF THE DRAWINGS

The invention, together with further objects and advantages thereof, may best be understood by making reference to the following description taken together with the accompanying drawings, in which: Fig. 1 illustrates different picture partitions for coding, prediction and transform used in HEVC.

Fig. 2 illustrates the order in which different picture partitions in HEVC are processed according to the raster scan order and the Z-scan order. Fig. 3 illustrates directional intra prediction modes defined in HEVC (Fig. 3(A)), with a more detailed illustration of directional mode 29 (Fig. 3(B)).

Figs. 4 (A) and (B) illustrate how intra prediction is performed by using spatially neighboring blocks as reference, as used in HEVC. Figure 4 (C) illustrates how intra prediction could be performed in some embodiments of the present invention.

Figs. 5 and 6 illustrate a flowchart of a method of decoding a bitstream comprising a coded picture of a video sequence, according to embodiments of the present invention.

Fig. 7 (A) illustrates the pixels from the neighboring blocks that are used for prediction in HEVC, whereas Fig. 7 (B) shows the pixels from the spatially neighboring blocks that are used for improved intra prediction according to some of the embodiments of the present invention. Fig. 8 illustrates an intra prediction mode that uses samples from the right and bottom spatially neighboring blocks together with the samples from the top and left spatially neighboring blocks according to the embodiments of the present invention.

Fig. 9 illustrates an example of a block of pixels for which it could be useful to use more than one row of pixels from the spatially neighboring blocks when prediction from all spatially neighboring blocks is allowed.

Figs. 10-12 illustrate flowcharts of a method of encoding a picture of a video sequence, according to embodiments of the present invention.

Figs. 13 and 15 depict a schematic block diagram illustrating functional units of a decoder for decoding a bitstream of a coded picture of a video sequence according to embodiments of the present invention.

Fig. 14 is a schematic block diagram illustrating a computer comprising a computer program product with a computer program for decoding a bitstream of a coded picture of a video sequence according to embodiments of the present invention.

Figs. 16 and 18 depict a schematic block diagram illustrating functional units of an encoder for encoding a picture of a video sequence according to embodiments of the present invention. Fig. 17 is a schematic block diagram illustrating a computer comprising a computer program product with a computer program for encoding a picture of a video sequence, according to embodiments of the present invention. DETAILED DESCRIPTION

The accompanying drawings, which are incorporated herein and form part of the specification, illustrate various embodiments of the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the art to make and use the invention. Throughout the drawings, the same reference numbers are used for similar or corresponding elements.

Throughout the description, the terms "video" and "video sequence", "intra predicted block" and "intra block", "inter predicted block" and "inter block" ", "block of samples" and "block", "pixel" and "sample" are interchangeably used.

Even though the description of the invention is based on the HEVC codec, it is to be understood by a person skilled in the art that the invention could be applied to any other state-of-the-art and a future block-based video coding standard. The present embodiments generally relate to a method and a decoder for decoding a bitstream comprising a coded picture of a video sequence as well as a method and an encoder for encoding a picture of a video sequence.

Modern video coding standards use the so-called hybrid approach that combines interVintra-picture prediction and 2D transform coding. As already said, intra prediction refers to prediction of the blocks in a picture based only on the information in that picture. A picture whose all blocks are predicted with intra prediction is called an intra picture (or l-picture). For all other pictures, inter-picture prediction is used, in which prediction information from other pictures is exploited. A picture where at least one block is predicted with inter prediction is called an inter picture. This means that an inter picture may have blocks that are intra predicted.

After all the blocks in a picture are predicted and after additional loop filtering, the picture is stored in the decoded picture buffer so that they can be used for the prediction of other pictures. Thus a decoder loop is used in the encoder and is synchronized with the true decoder to achieve the best performance and avoid mismatch with the decoder.

HEVC defines 3 types of intra prediction: DC, planar and angular. The DC intra prediction mode uses for prediction an average value of reference samples. This mode is particularly useful for flat surfaces. The planar mode uses average values of two linear predictions using four corner reference samples: it is essentially interpolating values over the block, assuming that all values to the right of the block are the same as the pixel one row above the block and one column to the right of the block. The values below the block are assumed to be equal to the pixel in the row below the block and the column to the left of the block. The planar mode helps in reducing the discontinuities along the block boundaries. HEVC supports all the block sizes, unlike in H.264/MPEG-4 AVC that supports plane prediction only for block sizes of 16x16.

Intra angular prediction defines 33 prediction directions, unlike H.264/MPEG-4 AVC where only 8 directions are allowed. As can be seen in Fig. 3 (A), the angles corresponding to these directions are chosen to cover near-horizontal and near-vertical angles more densely than near-diagonal angles, which follows from the statistics on the directions that prevail when using this type of prediction, as well as how effective these directions are. With intra angular prediction, each block is predicted directionally from the reconstructed spatially neighboring samples. For a NxN block, up to 4N+1 neighboring samples are used. Fig. 3(B) shows an example of directional mode 29. Unlike H.264/MPEG-4 AVC, that uses different intra angular prediction methods depending on the block size (4x4, 8x8 and 16x16), the intra angular prediction in HEVC is consistent regardless of a block size.

Inter prediction takes advantage of temporal redundancy between neighboring pictures, thus typically achieving higher compression ratios. The sample values of an inter predicted block are obtained from the corresponding block from its reference picture that is identified by the so-called reference picture index, where the corresponding block is obtained by a block matching algorithm. The result of the block matching is a motion vector, which points to the position of the matching block in the reference picture. A motion vector may not have an integer value: both H.264/MPEG-4 AVC and HEVC support motion vectors with units of one quarter of the distance between luma samples. For non-integer motion vectors the fractional sample interpolation is used to generate the prediction samples for non-integer sampling positions, where an eight-tap filter is used for the half-sample positions and a seven-tap filter for the quarter-sample positions. The difference between the block to be inter predicted and the matching block is called a prediction error. Prediction error is further transform coded and the transform coefficients are quantized before being transmitted to a decoder together with motion vector information to a decoder.

The fact that an intra block is predicted only from its spatially neighboring blocks that precede it in the Z-scan order can be exploited in order to improve its prediction, as illustrated in Fig. 4 (A). The blocks 0-15 are CUs consisting of blocks of samples. In this example these blocks of samples are to be intra predicted in a standard way by using HEVC. This means that the blocks are scanned in the Z-scan order, the block number representing an order in which a block is being predicted. In this example block 3 is predicted with an intra prediction mode using the border pixels from block 1 as a reference. Thus block 2 is not used as a reference for intra prediction of block 3 in this particular case. Block 3, on the other hand, is not used as a reference for intra prediction of block 2 as it succeeds block 2 in the Z- scan order and is therefore not available at the time when block 2 is being intra predicted. As a result, block 2 is not used as a reference for block 3, and block 3 is not used as a reference of block 2 either. In such situations, it may be beneficial if block 2 uses block 3 as a reference for intra prediction (in addition to block 0 possibly) as this may give a more accurate prediction for block 2.

Fig. 4(B) illustrates a similar example, based on HEVC, where the blocks to be predicted are of different sizes. In this example block 9 is predicted using angular intra prediction mode using the border pixels from blocks 3 and 4 as a reference. Thus blocks 6 and 8 are not used as a reference for block 9 in this particular case. Block 6, on the other hand, does not use block 9 as a reference as block 9 succeeds block 6 in the Z-scan order and is therefore not available at the time when block 6 is being intra predicted. Therefore, block 6 is not used as a reference for block 9, and block 9 is not used for prediction of block 6. In this situation, it may be beneficial if block 6 uses block 9 as a reference for intra prediction (in addition to blocks 0 and 5 possibly) as this may give a more accurate prediction for block 6. In this particular example block 8 may also need to be reevaluated if the pixel values of block 6 changed after using block 9 for intra prediction.

Note that the blocks in Figs. 4(A) and (B) may be a part of an intra picture, where all the blocks are intra predicted, or may be a part of an inter picture where some of the blocks are intra predicted whereas the rest of the blocks use inter prediction.

In the example from Fig. 4(A), having block 3 used as reference for block 2 means that block 3 has to be available for prediction when block 2 is being predicted. In the same way, in the example from Fig. 4(B), having block 9 used as reference for block 6 means that block 9 has to be available for prediction when block 6 is being predicted. This implies that blocks 3 and 9 respectively already have to be encoded and consequently reconstructed in the decoding loop at the encoder so that they are available for prediction of blocks 2 and 6 respectively. This also implies that one has to depart from a standard way of decoding where all the blocks are reconstructed in the same order as their syntax elements are parsed. Therefore, both encoding and decoding processes need to be modified to enable using more spatially neighboring blocks for intra prediction. In what follows we will first describe the decoding process, and then the encoding process will be explained.

According to one aspect, a method performed by a decoder 100, for decoding a bitstream 4 comprising a coded picture 7 of a video sequence 6 is disclosed. The coded picture 7 consists of a first intra coded block of samples 1 and at least a second intra coded block of samples 2. The second intra coded block of samples 2 succeeds the first intra coded block of samples 1 in a bitstream 4 order. The bitstream order is to be understood as the raster scan order or a Z-scan order. The method comprises the step S2 of reconstructing the second intra coded block of samples 2 before reconstructing the first intra coded block of samples 1.

The second intra coded block of samples 2 may be used for prediction of the first intra coded block of samples 1. The second intra coded block of samples 2 and the first intra coded block of samples 1 may be spatially neighboring blocks of samples, where by spatially neighboring blocks it is meant blocks that have a common border. The second intra coded block of samples 2 can, for example, be located to the right or below the first intra coded block of samples 1. Referring to the examples in Figs. 4 (A) and (B), the first intra coded block of samples 1 may correspond to blocks 2 and 6 respectively, whereas the second intra coded block of samples 2 may correspond to blocks 3 and 9 respectively.

The method may further optionally comprise step S1 , performed before step S2, of parsing the bitstream 4 to obtain syntax information related to coding of the video sequence 6. The syntax information may include one or more of: picture size, block size, prediction mode, reference picture selection for each block, motion vectors and transform coefficients.

In one embodiment, depicted in Fig. 6, the decoder 100 first parses the bitstream 4 to obtain the syntax information from which it derives the prediction mode used for each coded block of pixels (step S1). The decoder 100 then determines a reconstruction order for the intra blocks that ensures that the dependencies between the intra blocks are allowed (step S12). This order may be different from normal raster scan order and Z-scan and may depend on the intra directions of the blocks.

To illustrate how the blocks are reconstructed in a different order, consider the example in Fig. 4(C). The arrows illustrate the prediction dependencies. For example, block 3 uses blocks 0 and 1 as references for intra prediction. Block 6 uses blocks 0, 5 and 9 as references, which is different from the regular intra prediction in e.g. HEVC where only blocks 0 and 5 are allowed to be used as references. Similarly, block 8 uses blocks 6 and 9 as references which again is different than the regular intra prediction. Block 9 uses blocks 3 and 4 for prediction, but not blocks 6 and 8 as blocks 6 and 8 use block 9 as a reference. Having circular dependencies, when two blocks are referencing each other, is not allowed. Given these prediction dependencies, a possible reconstruction order that ensures that the dependencies are allowed may be: 0-1-2-3-4-5-9-6-8-7. This way block 9 is available when blocks 6 and 8 are being reconstructed and block 8 is available when block 7 is being reconstructed. The decoder 100 finally reconstructs the intra blocks in the determined reconstruction order (step S13). This way, when reconstructing a current intra block, the decoder 100 may already have reconstructed spatially neighboring intra blocks that would normally be reconstructed at a later time. The current block then has more neighboring pixels to use for prediction, which leads to improved prediction accuracy of the prediction and increased compression efficiency.

In another embodiment, the decoding is constrained to take place within a coding tree unit (CTU). This means that the improved intra prediction (using more neighboring pixels) can only be done inside CTUs, thus forbidding the reconstruction across the CTU borders. Having this constraint also puts limits on the computational complexity in a sense that memory access is not increased in a typical implementation since a decoder would typically anyway hold at least an entire CTU in memory at the same time.

The following steps are performed by the decoder 100 in this case. First, all the syntax elements in a CTU are parsed to obtain information related to coding of the video sequence 6. The syntax information includes one or more of: picture size, block size, prediction mode, reference picture selection for each block, motion vectors and transform coefficients. Parsing the syntax elements may be done in the bitstream order. If block 1 to be reconstructed is intra predicted, then the prediction modes of the blocks below 3 and/or to the right 2 are determined from the parsed syntax information. The determined intra prediction modes give information about the directions used in intra prediction of the blocks 1-3. That is, from this information one can deduce whether blocks 2 and/or 3 use block 1 as a reference as well as whether block 1 uses blocks 2 and/or 3 as a reference.

If block 1 uses the blocks below 3 and/or to the right 2 as a reference and the blocks below 3 and/or to the right 2 do not use block 1 for prediction, then the blocks below 3 and/or to the right 2 are reconstructed. Block 1 is subsequently reconstructed by using the reconstructed blocks below 3 and/or to the right 2 as a reference.

If block 1 does not use the blocks below 3 and/or to the right 2 as a reference or the blocks below 3 and/or to the right 2 use the block 1 for prediction then block 1 is reconstructed, and after that the blocks below 3 and/or to the right 2 are reconstructed by using their respective references based on their prediction modes.

In one embodiment, only parts of a neighboring block is available due to that it has been split into several sub-blocks out of which only a subset have been encoded with an intra direction that makes them available for prediction. This can be handled by interpolating or extrapolating values for those pixels that are not available for prediction and then performing intra prediction using the interpolated or extrapolated values.

The embodiments described above can be exploited, in the simplest case, by changing the intra prediction methods that the samples from the blocks located below and/or to the left of the current intra block can also be used, where available. Changing intra prediction modes requires modifications both on the encoder and the decoder side as the encoder and the decoder have to be synchronized in order to avoid prediction mismatch.

These new intra prediction modes are referred to as the improved intra prediction modes. Fig. 7 (A) illustrates the pixels from the neighboring blocks that are used for prediction in HEVC, whereas Fig. 7 (B) shows the bordering pixels from the spatially neighboring blocks that may be used for improved intra prediction according to some of the embodiments of the present invention. Improved intra prediction modes may be obtained by modifying the existing intra prediction modes. For example, the DC intra prediction mode that simply predicts that the values in the block are equal to the average of the neighboring values can be extended in a straight-forward way by allowing for more neighboring pixels to be averaged for prediction. In the HEVC planar intra prediction mode it is assumed that all values to the right of the block are the same as the pixel one row above the block and one column to the right of the block. Similarly, the values below the block are assumed to be equal to the pixel in the row below the block and the column to the left of the block. This intra prediction mode can therefore be easily be extended by using the proper values to the right of or below the block where available instead of the assumed values.

In addition to extending the existing intra prediction modes of HEVC, new intra modes that would benefit from using pixels from right and/or bottom blocks could be thought of. For instance, two different directions could be used for the angular mode, one direction as in HEVC (see Fig. 8) and one direction going in one of the opposite directions compared to the possible angular directions in Fig. 8. The pixel at the position where the two directions meet may be interpolated from the values of the bordering pixels from where the directions start and/or end. The interpolation could be made by using weights based on the distance to each pixel used for the interpolation or using some other way of calculating the weights. The improved intra prediction modes may be combined with the existing intra prediction modes, or they may simply replace some of the existing intra prediction modes.

The improved intra prediction modes may use more rows/columns of pixels from the spatially neighboring blocks for prediction, rather than only the border row/column of pixels. This could for instance give better prediction for blocks that contain curved surfaces as the one illustrated in Fig. 9.

As already said, using more spatially neighboring blocks for prediction of a current block requires changes in the encoding process as well. According to one aspect of the embodiments, a method performed by an encoder 200, for encoding a picture 5 of a video sequence 6 is disclosed. The picture 5 comprises a first block of samples 11 and at least one of a second block of samples 12 and a third block of samples 13. The second block of samples 12 is the right spatially neighboring block of samples to the first block of samples 11 and the third block of samples 13 is the bottom spatially neighboring block of samples to the first block of samples 11. The flowchart of the method is depicted in Fig. 10. In step S4, at least one of the second block of samples 12 and the third block of samples is predicted with intra prediction. In step S5, the first block of samples 11 is predicted with intra prediction from at least one of the second block of samples 12 and the third block of samples 13 that is predicted with intra prediction. This way the prediction of the block of samples is improved by taking more spatially neighboring inter predicted blocks of samples into account.

In one embodiment, depicted in Fig. 11 , the encoding is performed as a two pass procedure. In the first pass (step S6), a preliminary intra prediction mode 14 is chosen for the first block of samples 11 among the existing intra prediction modes, wherein the existing intra prediction modes perform prediction based on the top and/or left spatially neighboring blocks of samples. Thus the preliminary prediction mode 14 corresponds to the mode that would be used for the block of samples 11 if it was normally encoded, i.e. encoded with a standard encoder.

In the second pass, two prediction errors are calculated for the block of samples 11 (step S7). The first prediction error is the error corresponding to choosing the preliminary prediction mode 14 for the first block of samples 11. The prediction error is a function of the block of samples 11 and the predicted block of samples; for example the prediction error can be calculated as a mean squared error between the block of samples 11 and the reconstructed block of samples. The second prediction error corresponds to an error if an improved intra prediction 15 was used for the first block of samples 11. The improved intra prediction mode 15 may predict from the top and/or left spatially neighboring blocks of samples in combination with the bottom and/or the right spatially neighboring blocks of samples that do not use the first block of samples 11 as reference for intra prediction.

The two prediction errors are compared and, if the prediction error corresponding to the improved intra prediction mode 15 is smaller than the one corresponding to the preliminary prediction mode 14, the first block of samples 11 is predicted with the improved prediction mode 15 (step S8). That means that in the second pass it turned out that it is more beneficial to predict the block of samples 11 with improved intra prediction 15 than with the preliminary intra prediction 14 as there are spatially neighboring blocks that can be used to improve the prediction of the block of samples 11. If the prediction error corresponding to the preliminary prediction mode 14 is smaller than or equal to the one for the improved intra prediction mode 15, the first block of samples 11 is predicted the same way as with a normal encoding - with the preliminary prediction mode 14 (step S9).

In another embodiment, depicted in Fig. 12, the encoding is performed by calculating (step S10), in a first pass, estimates of prediction errors for all blocks of samples, given they are predicted with intra prediction with different combinations of available spatially neighboring blocks of samples. The prediction error is a function of the block of samples and the predicted block of samples, as in the previous embodiment. In the second pass, the intra prediction mode for the block of samples that is predicted first in the scan order is chosen among different combinations of intra prediction modes for that block and the neighboring blocks such that its prediction error is minimized. The intra prediction mode for the second block of samples in the scan order is chosen among different combinations of intra prediction modes for that block and the spatially neighboring blocks excluding the first block, given that the first block of samples is predicted with its chosen prediction mode. The second pass goes through all the blocks of samples and essentially repeats the same procedure: the intra prediction mode for a block of samples is chosen among different combinations of intra prediction modes for that block and the spatially neighboring blocks that precede that block in the scan order, given that the spatially neighboring blocks that precede that block are predicted in their respective chosen intra prediction modes (step S11). According to one embodiment it is not allowed to change a CU size after the first pass. According to another embodiment, splitting up a CU into smaller parts is allowed after the first pass. In fact it could even be beneficial as each split CU could use its own prediction mode.

Fig. 13 is a schematic block diagram of a decoder 100 for decoding a bitstream 4 comprising a coded picture 7 of a video sequence 6, according to an embodiment (see also Fig. 5). The coded picture 7 consists of a first intra coded block of samples 1 and at least a second intra coded block of samples 2. The second intra coded block of samples 2 succeeds the first intra coded block of samples 1 in a bitstream 4 order. The decoder 100 comprises a reconstructing module 180, configured to reconstruct the second intra coded block of samples 2 before reconstructing the first intra coded block of samples 1. The decoder 100 further optionally comprises a parsing module 170 configured to parse the bitstream 4 to obtain syntax information related to coding of the video sequence 6.

The decoder 100 may be an HEVC or H.264/AVC decoder, or any other state of the art decoder that combines interVintra-picture prediction and block based coding.

The parsing module 170 may be a part of a regular HEVC decoder that parses the bitstream 4 in order to obtain the information related to the coded video sequence 6 such as: picture size, sizes of blocks of samples, prediction modes for the blocks of samples, reference picture selection for each block of samples, motion vectors for inter coded blocks of samples and transform coefficients. The reconstructing module 180 may utilize the parsed syntax information from a parsing module 170 to reconstruct the pictures of the video sequence 6. For example, the reconstructing module 180 may obtain information on the intra prediction modes used for all the blocks of samples and can use this information to reconstruct the blocks of samples appropriately. In particular, the reconstructing module 180 is configured to reconstruct the second intra coded block of samples 2 before reconstructing the first intra coded block of samples 1 even though the second intra coded block of samples 2 succeeds the first intra coded block of samples 1 in a bitstream 4 order if the second intra coded block of samples 2 is used for prediction of the first intra coded block of samples 1.

The reconstructing module 180 may use the sample values of the second intra coded block of samples 2 to reconstruct the first intra coded block of samples 1. The reconstructing module 180 may use the improved intra prediction modes that use the samples from the top and/or left spatially neighboring blocks of samples in combination with the bottom and/or the right spatially neighboring blocks of samples. The improved intra prediction modes may be obtained by extending the existing intra prediction modes in e.g. HEVC. The reconstructing module 180 may also use both existing and improved intra prediction modes in order to reconstruct the second intra coded block of samples 2 and the first intra coded block of samples 1. The decoder 100 can be implemented in hardware, in software or a combination of hardware and software. The decoder 100 can be implemented in user equipment, such as a mobile telephone, tablet, desktop, netbook, multimedia player, video streaming server, set-top box or computer. The decoder 100 may also be implemented in a network device in the form of or connected to a network node, such as radio base station, in a communication network or system.

Although the respective units disclosed in conjunction with Fig. 13 have been disclosed as physically separate units in the device, where all may be special purpose circuits, such as ASICs (Application Specific Integrated Circuits). Alternative embodiments of the device are possible where some or all of the units are implemented as computer program modules running on a general purpose processor. Such an embodiment is disclosed in Fig. 14.

Fig. 14 schematically illustrates an embodiment of a computer 160 having a processing unit 110 such as a DSP (Digital Signal Processor) or CPU (Central Processing Unit). The processing unit 110 can be a single unit or a plurality of units for performing different steps of the method described herein. The computer also comprises an input/output (I/O) unit 120 for receiving a bitstream. The I/O unit 120 has been illustrated as a single unit in Fig. 14 but can likewise be in the form of a separate input unit and a separate output unit. Furthermore, the computer 160 comprises at least one computer program product 130 in the form of a non-volatile memory, for instance an EEPROM (Electrically Erasable Programmable Read-Only Memory), a flash memory or a disk drive. The computer program product 130 comprises a computer program 140, which comprises code means which, when run on the computer 160, such as by the processing unit 110, causes the computer 160 to perform the steps of the method described in the foregoing in connection with Fig. 5.

According to a further aspect 100 for decoding a bitstream 4 comprising a coded picture 7 of a video sequence 6 is provided as illustrated in Fig. 15. The processing means are exemplified by a CPU (Central Processing Unit) 110. The processing means is operative to perform the steps of the method described in the foregoing in connection with Fig. 5. That implies that the processing means 110 are operative to reconstruct the second intra coded block of samples 2 before reconstructing the first intra coded block of samples 1. The processing means 110 may be further operative to parse the bitstream 4 to obtain syntax information related to coding of the video sequence 6. Fig. 16 is a schematic block diagram of an encoder 200 for encoding a picture 5 of a video sequence 6. The picture 5 comprises a first block of samples 11 and at least one of a second block of samples 12 and a third block of samples 13. The second block of samples 12 is the right spatially neighboring block of samples to the first block of samples 11 and the third block of samples 13 is the bottom spatially neighboring block of samples to the first block of samples 11. The encoder 200 comprises a predictor 270, configured to predict at least one of the second block of samples 12 and the third block of samples 13 with intra prediction. The encoder 200 further comprises a predictor 280, configured to predict, with intra prediction, the first block of samples 11 from at least one of the second block of samples 12 and the third block of samples 13 that is predicted with intra prediction. The encoder 200 may be an HEVC or H.264/AVC encoder, or any other state of the art encoder that combines interVintra-picture prediction and block based coding.

The predictor 270 may use the sample values of the spatially neighboring blocks to blocks 12 and 13 to predict at least one of the blocks 12 and/or 13 with intra prediction. The predictor 270 may use the improved intra prediction modes that use the samples from the top and/or left spatially neighboring blocks of samples in combination with the bottom and/or the right spatially neighboring blocks of samples. The improved intra prediction modes may be obtained by extending the existing intra prediction modes in e.g. HEVC. The predictor 280 may also use both existing and improved intra prediction modes in order to find the mode that best predicts the block of samples 11.

The predictor 280 may use the sample values from at least one of the blocks 12 and 13 that are predicted with intra prediction to predict the block of samples 11. The predictor 280 may use the improved intra prediction modes that use the samples from the top and/or left spatially neighboring blocks of samples in combination with the bottom and/or the right spatially neighboring blocks of samples. The improved intra prediction modes may be obtained by extending the existing intra prediction modes in e.g. HEVC. The predictor 280 may also use both existing and improved intra prediction modes in order to find the mode that best predicts the block of samples 11. The encoder 200 can be implemented in hardware, in software or a combination of hardware and software. The decoder 200 can be implemented in user equipment, such as a mobile telephone, tablet, desktop, netbook, multimedia player, video streaming server, set-top box or computer. The encoder 200 may also be implemented in a network device in the form of or connected to a network node, such as radio base station, in a communication network or system.

Although the respective units disclosed in conjunction with Fig. 16 have been disclosed as physically separate units in the device, where all may be special purpose circuits, such as ASICs (Application Specific Integrated Circuits). Alternative embodiments of the device are possible where some or all of the units are implemented as computer program modules running on a general purpose processor. Such an embodiment is disclosed in Fig. 17.

Fig. 17 schematically illustrates an embodiment of a computer 260 having a processing unit 210 such as a DSP (Digital Signal Processor) or CPU (Central Processing Unit). The processing unit 210 can be a single unit or a plurality of units for performing different steps of the method described herein. The computer also comprises an input/output (I/O) unit 220 for receiving a video sequence. The I/O unit 220 has been illustrated as a single unit in Fig. 17 but can likewise be in the form of a separate input unit and a separate output unit. Furthermore, the computer 260 comprises at least one computer program product 230 in the form of a non-volatile memory, for instance an EEPROM (Electrically Erasable Programmable Read-Only Memory), a flash memory or a disk drive. The computer program product 230 comprises a computer program 240, which comprises code means which, when run on the computer 260, such as by the processing unit 210, causes the computer 260 to perform the steps of the method described in the foregoing in connection with Fig. 10.

According to a further aspect an encoder 200 for encoding a picture 5 of a video sequence 6 is provided as illustrated in Fig. 18. The picture 5 comprises a first block of samples 11 and at least one of a second block of samples 12 and a third block of samples 13.The processing means are exemplified by a CPU (Central Processing Unit) 210. The processing means is operative to perform the steps of the method described in the foregoing in connection with Fig. 10. That implies that the processing means 210 are operative to predict at least one of the second block of samples 12 and the third block of samples 13 with intra prediction. That further implies that the processing means 210 are operative to predict, with intra prediction, the first block of samples 11 from at least one of the second block of samples 12 and the third block of samples 13 that is predicted with intra prediction.

The embodiments described above are to be understood as a few illustrative examples of the present invention. It will be understood by those skilled in the art that various modifications, combinations and changes may be made to the embodiments without departing from the scope of the present invention. In particular, different part solutions in the different embodiments can be combined in other configurations, where technically possible.

Claims

1. A method, performed by a decoder (100), for decoding a bitstream (4) comprising a coded picture (7) of a video sequence (6), wherein the coded picture (7) consists of a first intra coded block of samples (1) and at least a second intra coded block of samples (2), wherein the second intra coded block of samples (2) succeeds the first intra coded block of samples (1) in a bitstream (4) order, the method comprising:

reconstructing (S2) the second intra coded block of samples (2) before reconstructing the first intra coded block of samples (1).

2. The method according to claim 1 , wherein the second intra coded block of samples (2) is used for prediction of the first intra coded block of samples (1).

3. The method according to claims 1-2 wherein the second intra coded block of samples (2) and the first intra coded block of samples (1) are spatially neighboring blocks of samples and wherein the second intra coded block of samples (2) is located to the right or below the first intra coded block of samples (1).

4. The method according to claim 1 , wherein the coded picture (7) further consists of a third intra coded block of samples (3), wherein the third intra coded block of samples (3) succeeds the second intra coded block of samples (2) in a bitstream (4) order, wherein the second (2) and third (3) intra coded blocks of samples are used for prediction of the first intra coded block of samples (1), wherein the second (2) and third (3) intra coded blocks of samples located to the right and below the first intra coded block of samples (1) respectively are spatially neighboring blocks with the first intra coded block of samples (1), the method comprising:

reconstructing (S3) the second (2) and third (3) intra coded blocks of samples before reconstructing the first intra coded block of samples (1).

5. The method according to any one of the previous claims, the method comprising reconstructing the intra coded blocks of samples in an order that ensures that intra prediction dependencies between intra coded blocks of samples are allowed.

6. The method according to any of the previous claims, wherein the coded picture (7) is split into at least one part, wherein all the intra coded blocks of samples from a part of the coded picture (7) are reconstructed in an order that ensures that intra prediction dependencies between them are allowed.

7. The method according to claim 6, wherein the part of the coded picture (7) is a coding tree unit (CTU).

8. The method according to any one of the previous claims, wherein the method comprises:

parsing (S1) the bitstream (4) to obtain syntax information related to coding of the video sequence (6).

9. The method according to any one of the previous claims, wherein the method comprises:

parsing the syntax elements for the second (2) and/or third (3) intra coded block of samples before parsing the syntax elements for the first intra coded block of samples (1).

10. The method according to claim 8 or 9, wherein the syntax information includes one or more of: picture size, block size, intra prediction mode and transform coefficients.

11. The method according to any one of the previous claims, wherein the decoding of the bitstream (4) is based on HEVC or H.264/AVC.

12. The method according to claim 11 , wherein the intra coded blocks of samples are

reconstructed based on the top and/or left spatially neighboring intra coded blocks of samples in combination with the bottom and/or the right spatially neighboring intra coded blocks of samples.

13. A method, performed by an encoder (200), for encoding a picture (5) of a video sequence (6), wherein the picture (5) comprises a first block of samples (11) and at least one of a second block of samples (12) and a third block of samples (13), wherein the second block of samples (12) is the right spatially neighboring block of samples to the first block of samples (11 ) and wherein the third block of samples (13) is the bottom spatially neighboring block of samples to the first block of samples (11), the method comprising:

predicting (S4) at least one of the second block of samples (12) and the third block of samples (13) with intra prediction;

predicting (S5), with intra prediction, the first block of samples (11 ) from at least one of the second block of samples (12) and the third block of samples (13) that is predicted with intra prediction.

14. The method according to claim 13, the method comprising:

choosing (S6) a preliminary intra prediction mode (14) for the first block of samples (11) in a first pass, among the existing intra prediction modes, wherein the existing intra prediction modes may predict from the top and/or left spatially neighboring blocks of samples;

calculating (S7), in a second pass, a prediction error for the first block of samples (11) predicted with the preliminary prediction mode (14) and a prediction error for the first block of samples predicted with an improved intra prediction mode (15), wherein the prediction error is a function of the block of samples and the predicted block of samples, and wherein the improved intra prediction mode (15) may predict from the top and/or left spatially neighboring blocks of samples in combination with the bottom and/or the right spatially neighboring blocks of samples that do not use the first block of samples (11) as reference for intra prediction;

if the calculated prediction error for the first block of samples (11) with the improved intra prediction mode (15) is smaller than the calculated prediction error for the first block of samples with the preliminary prediction mode (14):

predicting (S8) the first block of samples (11) with the improved intra prediction mode (15); if the calculated prediction error for the first block of samples (11) with the improved intra prediction mode (15) is larger than or equal to the calculated prediction error for the first block of samples (11) with the preliminary prediction mode (14):

predicting (S9) the first block of samples (11) with the preliminary intra prediction mode (14).

15. The method according to claim 13, the method comprising:

calculating (S10), in a first pass, estimates of prediction errors for all blocks of samples, given they are intra predicted with different prediction modes determined by different combinations of spatially neighboring blocks of samples, wherein the prediction error is a function of the block of samples and the predicted block of samples;

choosing (S11 ), in a second pass, an intra prediction mode for the block of samples based on the chosen intra prediction modes of its spatially neighboring blocks that precede the block of samples in a scan order and the calculated estimates of prediction errors.

16. The method according to any of claims 13-15, wherein the first block of samples (11) is encoded with an intra prediction mode that predicts from at least two spatially neighboring blocks of samples, wherein the intra prediction mode uses at least two different directions to reconstruct the pixels for the first block of samples (11).

17. The method according to any of claims 13-15, wherein the first block of samples (11) is encoded with an intra prediction mode that predicts from at least two spatially neighboring blocks of samples, wherein the intra prediction mode uses at least two rows of pixels in at least one of the spatially neighboring block of samples to provide non-linear reconstructions of the pixels in the first block of samples (11).

18. The method according to any one of claims 13-17, wherein the encoding of the picture of the video sequence is based on HEVC or H.264/AVC.

19. A decoder (100) for decoding a bitstream (4) comprising a coded picture (7) of a video sequence (6), wherein the coded picture (7) consists of a first intra coded block of samples (1) and at least a second intra coded block of samples (2), wherein the second intra coded block of samples (2) succeeds the first intra coded block of samples (1) in a bitstream (4) order, the decoder (100) comprising processing means (110) operative to:

reconstruct the second intra coded block of samples (2) before reconstructing the first intra coded block of samples (1).

20. The decoder (100) according to claim 19, wherein the processing means (110) comprise a processor (150) and a memory (130) wherein said memory (130) is containing instructions executable by said processor (150).

21. The decoder (100) according to any of claims 19-20, wherein the processing means (110) is further operative to:

parse the bitstream (4) to obtain syntax information related to coding of the video sequence (6).

22. An encoder (200), for encoding a picture (5) of a video sequence (6), wherein the picture (5) comprises a first block of samples (11) and at least one of a second block of samples (12) and a third block of samples (13), wherein the second block of samples (12) is the right spatially neighboring block of samples to the first block of samples (11 ) and wherein the third block of samples (13) is the bottom spatially neighboring block of samples to the first block of samples (11), the encoder (200) comprising processing means (210) operative to::

predict at least one of the second block of samples (12) and the third block of samples (13) with intra prediction;

predict, with intra prediction, the first block of samples (11 ) from at least one of the second block of samples (12) and the third block of samples (13) that is predicted with intra prediction.

5 23. The encoder (200) according to claim 22, wherein the processing means (210) comprise a processor (250) and a memory (230) wherein said memory (230) is containing instructions executable by said processor (250).

24. The encoder (200) according to any of claims 22-23, wherein the processing means (210) is

10 further operative to:

choose a preliminary intra prediction mode (14) for the first block of samples (11 ) in a first pass, among the existing intra prediction modes, wherein the existing intra prediction modes may predict from the top and/or left spatially neighboring blocks of samples;

calculate, in a second pass, a prediction error for the first block of samples (11) predicted

15 with the preliminary prediction mode (14) and a prediction error for the first block of samples predicted with an improved intra prediction mode (15), wherein the prediction error is a function of the block of samples and the predicted block of samples, and wherein the improved intra prediction mode (15) may predict from the top and/or left spatially neighboring blocks of samples in combination with the bottom and/or the right spatially neighboring blocks of samples that do not use the first block of samples (11 )

20 as reference for intra prediction;

predict the first block of samples (11) with the improved intra prediction mode (15);

25 if the calculated prediction error for the first block of samples (11) with the improved intra prediction mode (15) is larger than or equal to the calculated prediction error for the first block of samples (11) with the preliminary prediction mode (14):

predict the first block of samples (11) with the preliminary intra prediction mode (14).

30 25. The encoder (200) according to any of claims 22-24, wherein the processing means (210) is further operative to:

calculate, in a first pass, estimates of prediction errors for all blocks of samples, given they are intra predicted with different prediction modes determined by different combinations of spatially neighboring blocks of samples, wherein the prediction error is a function of the block of samples and the predicted block of samples;

choose, in a second pass, an intra prediction mode for the block of samples based on the chosen intra prediction modes of its spatially neighboring blocks that precede the block of samples in a scan order and the calculated estimates of prediction errors.

26. A computer program (140) for decoding a bitstream (4) comprising a coded picture (7) of a video sequence (6), wherein the coded picture (7) consists of a first intra coded block of samples (1) and at least a second intra coded block of samples (2), wherein the second intra coded block of samples (2) succeeds the first intra coded block of samples (1) in a bitstream (4) order, the computer program (140) comprising code means which, when run on a computer (160), causes the computer (160) to:

27. The computer program (140) according to claim 26, further causing the computer (160) to parse the bitstream (4) to obtain syntax information related to coding of the video sequence (6).

28. A computer program (240) for encoding a picture (5) of a video sequence (6), wherein the picture (5) comprises a first block of samples (11) and at least one of a second block of samples (12) and a third block of samples (13), wherein the second block of samples (12) is the right spatially neighboring block of samples to the first block of samples (11 ) and wherein the third block of samples (13) is the bottom spatially neighboring block of samples to the first block of samples (11), the computer program (240) comprising code means which, when run on a computer (260), causes the computer (260) to:

predict, with intra prediction, the first block of samples (11) from at least one of the second block of samples (12) and the third block of samples (13) that is predicted with intra prediction.

29. A computer program product (300) comprising computer readable means (310) and a computer program (140) according to claims 26-27 stored on the computer readable means (310).

30. A computer program product (400) comprising computer readable means (410) and a computer program (240) according to claims 28 stored on the computer readable means (410).