WO2016133440A1

WO2016133440A1 - Methods, encoder and decoder for coding of video sequences

Info

Publication number: WO2016133440A1
Application number: PCT/SE2015/050194
Authority: WO
Inventors: Jonatan Samuelsson; Per Wennersten
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2015-02-19
Filing date: 2015-02-19
Publication date: 2016-08-25

Abstract

A method and a decoder (120) for decoding a bitstream including an encoded representation of a first block of a first frame of a video sequence as well as a method and an encoder (110) for encoding the first block of the first frame of the video sequence into a bitstream including the encoded representation of the video sequence are disclosed. The first frame precedes the second frame in a bitstream order of the bitstream. The decoder (120) predicts (A120) the first block based on an area including at least a portion of a second block of the second frame, wherein the second block is predicted independently of the first block. The encoder (110) predicts (A040) the first block based on the area. Corresponding computer programs and carriers therefor are also disclosed.

Description

METHODS, ENCODER AND DECODER FOR CODING OF VIDEO SEQUENCES

TECHNICAL FIELD

Embodiments herein relate to the field of video coding, such as High Efficiency Video Coding (HEVC) or the like. In particular, embodiments herein relate to a method and an encoder for encoding a first block of a first frame of a video sequence into a bitstream including an encoded representation of the video sequence as well as a method and a decoder for decoding a bitstream including an encoded representation of a first block of a first frame of a video sequence. Corresponding computer programs and carriers therefor are also disclosed.

BACKGROUND

In the field of video coding, it is often desired to compress a video sequence into a coded video sequence. The video sequence may for example have been captured by a video camera. A purpose of compressing the video sequence is to reduce a size, e.g. in bits, of the video sequence. In this manner, the coded video sequence will require smaller memory when stored and/or less bandwidth when transmitted from e.g. the video camera. A so called encoder is often used to perform compression, or encoding, of the video sequence. Hence, the video camera may comprise the encoder. The coded video sequence may be transmitted from the video camera to a display device, such as a television set (TV) or the like. In order for the TV to be able to decompress, or decode, the coded video sequence, it may comprise a so called decoder. This means that the decoder is used to decode the received coded video sequence. In other scenarios, the encoder may be comprised in a radio base station of a cellular communication system and the decoder may be comprised in a wireless device, such as a cellular phone or the like, and vice versa.

A known video coding technology is called High Efficiency Video Coding (HEVC), which is a new video coding standard, recently developed by Joint Collaborative Team - Video Coding (JCT-VC). JCT-VC is a collaborative project between Moving Pictures Expert Group (MPEG) and International Telecommunication Union's Telecommunication Standardization Sector (ITU-T). A coded picture of an HEVC bitstream is included in an access unit, which comprises a set of Network Abstraction Layer (NAL) units. NAL units are thus a format of packages which form the bitstream.

When encoding a single frame of a video sequence with HEVC, the pixels are arranged into so called Treeblocks, typically 64x64 pixels in size. A number of

Treeblocks of a frame is shown in Figure 1 . Each Treeblock is then hierarchically split into Coding Units (CUs), ranging in size from 64x64 to 8x8 pixels.

Compressing a CU is performed in two steps. In a first step, pixel values in the CU are predicted from previously coded pixel values either in the same frame or in previous frames. In a second step, a difference between the predicted pixel values and the actual values is calculated and a transform coefficient is determined to compensate for the difference.

Prediction can be performed for an entire CU or on smaller parts thereof.

Therefore, Prediction Units (PUs) are defined. The PUs may have the same size as the CU for a given set of pixels, or the CU may be further hierarchically split into PUs that are smaller than the CU. Each PU defines separately how it will predict its pixel values from previously coded pixel values.

In a similar fashion, transform coefficients are determined per Transform Unit (TU). The TUs may be the same size as a CU or the CU may be hierarchically split into TUs that are smaller than the CU. A prediction error, i.e. the difference mentioned above, is transformed separately for each TU.

Now consider prediction units in more detail. A prediction unit will either predict pixel values based on values of neighboring pixels in the same frame, or the prediction unit will predict pixel values based on an area(s) in one or more previous frames.

Predicting from nearby pixels is called Intra-prediction, and is used for the entire first frame, since there is no preceding frame. Predicting from one or more previous frames is called Inter-prediction. See Figure 2, where T denotes Intra-predicted frames and P denotes Inter-predicted frames. In Figure 2, the first frame uses only intra prediction and is called an l-frame, or intra-frame, whereas the following frames predict from previous frames and are called P-frames, or predicted frames. Note that P-frames may use intra- prediction as well.

Figure 3 illustrates another example of prediction using intra/inter-prediction from various frames. The frames are numbered according to output order as indicated by the lower line of numbers. The letters above the numbers indicate type of frame: l-frame, B- frame and P-frames.

In Figure 3, after Frame 0, Frame 2 is encoded as a P-frame by predicting from Frame 0 and possibly using some intra prediction as well. Afterwards, Frame 1 is encoded, where each block has the possibility of predicting from Frame 0, Frame 2, both, or using intra prediction. Frame 1 is said to use bi-directional prediction, and is thus called a B-frame. When predicting from two different frames, the prediction is essentially the weighted sum of two predictions, one in either direction. In Figure 3, each pair of a B- and P-frame, such as frames 2 and 1 , constitute a Group of Pictures (GOP). A GOP is a useful unit because it consists of a group of pictures that are encoded out-of-order with respect to display order.

It is also possible to use larger GOPs, a size of 8 frames in a GOP is typical. This is accomplished by adding more layers of hierarchical B-pictures in between P-pictures. GOP-structure is normally chosen while considering the content of a video sequence to be encoded: with some video sequences best compression may be achieved by large GOPs, and some other video sequences by smaller GOPs.

Even though GOPs provide a means for obtaining efficient coding of video sequences, improvements to coding efficiency may still be required in some situations.

SUMMARY

An object may be to improve efficiency of video coding of the above mentioned kinds. According to an aspect, the object is achieved by a method, performed by a decoder, for decoding a bitstream including an encoded representation of a first block of a first frame of a video sequence. The video sequence comprises a second frame, wherein the first frame precedes the second frame in a bitstream order of the bitstream. The decoder predicts the first block based on an area including at least a portion of a second block of the second frame, wherein the second block is predicted independently of the first block.

According to another aspect, the object is achieved by a decoder configured to decode a bitstream including an encoded representation of a first block of a first frame of a video sequence, wherein the video sequence comprises a second frame, wherein the first frame precedes the second frame in a bitstream order of the bitstream. The decoder is configured to predict the first block based on an area including at least a portion of a second block of the second frame, wherein the second block is predicted independently of the first block.

According to a further aspect, the object is achieved by a method, performed by an encoder, for encoding a first block of a first frame of a video sequence into a bitstream including an encoded representation of the video sequence. The video sequence comprises a second frame, wherein the first frame precedes the second frame in a bitstream order of the bitstream. The encoder predicts the first block based on an area including at least a portion of a second block of the second frame, wherein the second block is predicted independently of the first block. According to yet another aspect, the object is achieved by an encoder configured to encode a first block of a first frame of a video sequence into a bitstream including an encoded representation of the video sequence, wherein the video sequence comprises a second frame, wherein the first frame precedes the second frame in a bitstream order of the bitstream. The encoder is configured to predict the first block based on an area including at least a portion of a second block of the second frame, wherein the second block is predicted independently of the first block.

According to some embodiments, the decoder predicts the first block based on the area. As mentioned, the second frame comprises the area, i.e. the area may be part of the second frame that is subsequent to the first frame in the bitstream order. The area includes at least a portion of a second block of the second frame. This may mean that the area may be built up by the second block and possibly also further blocks. Since the second block, and optionally also any further blocks, is predicted independently of the first block, the area may be predicted, i.e. reconstructed, without a need for first predicting the first block. The need for such prediction would cause a severe circular reference that may prevent advantageous use of prediction of the first block based on the area. In this manner, the first block of the first frame is allowed to be predicted from the area of the second frame, wherein the second frame would normally, due to the bitstream order, be decoded after the first frame. In a particular example, the first and second frames form a group of frames, aka group of pictures (GOP). For a GOP, there is defined a certain coding structure, or coding order. In this example, the embodiments herein allow the first block to use a different coding structure than that defined by the GOP. This means that any block of the GOP, but not the first block, that predicts only from a frame coded before the GOP can be reconstructed separately from the remainder of the GOP. Next, such any block may be used for prediction of other blocks, e.g. the first block, of the GOP. In this manner, this particular example allows for bypassing the certain coding structure, defined by the GOP, on a per block basis. Therefore, greater flexibility as compared to changing the GOP structure, in which coding structure on a per frame basis is defined, may be achieved.

In another example, the embodiments herein may allow the first block to be predicted from an area that is temporarily closer to the first frame. This may be understood from Figure 3, in which the first block of the first frame 'Frame 2' is predicted based on the area of the second frame 'Frame 1 '. Generally, a closer temporal location provides higher compression efficiency. Thus, advantageously, the embodiments herein may provide a higher compression efficiency.

BRIEF DESCRIPTION OF THE DRAWINGS

The various aspects of embodiments disclosed herein, including particular features and advantages thereof, will be readily understood from the following detailed description and the accompanying drawings, in which:

Figure 1 is a block diagram illustrating blocks and units of a frame,

Figure 2 is an overview illustrating a sequence of frames,

Figure 3 is another overview illustrating another sequence of frames,

Figure 4 is a schematic overview of an exemplifying system in which

embodiments herein may be implemented,

Figure 5 is a schematic, combined signaling scheme and flowchart illustrating embodiments of the methods when performed in the system according to Figure 4, Figure 6 is a flowchart illustrating embodiments of the method in the encoder,

Figure 7 is a block diagram illustrating embodiments of the encoder.

Figure 8 is a flowchart illustrating embodiments of the method in the decoder, and

Figure 9 is a block diagram illustrating embodiments of the decoder.

DETAILED DESCRIPTION

Throughout the following description similar reference numerals have been used to denote similar features, such as actions, steps, nodes, elements, units, modules, circuits, parts, items or the like, when applicable. In the Figures, features that appear in some embodiments are indicated by dashed lines.

Figure 4 depicts an exemplifying system 100 in which embodiments herein may be implemented.

The system 100 includes a network 101 , such as a wired or wireless network.

Exemplifying networks include cable television network, internet access networks, fiberoptic communication networks, telephone networks, cellular radio communication networks, any Third Generation Partnership Project (3GPP) network, Wi-Fi networks, etc.

In this example, the system 100 further comprises an encoder 110, comprised in a source device 1 1 1 , and a decoder 120, comprised in a target device 121 .

The source and/or target device 1 1 1 , 121 may be embodied in the form of various platforms, such as television set-top-boxes, video players/recorders, video cameras, Blu-ray players, Digital Versatile Disc (DVD)-players, media centers, media players, user equipments and the like. As used herein, the term "user equipment" may refer to a mobile phone, a cellular phone, a Personal Digital Assistant (PDA) equipped with radio communication capabilities, a smartphone, a laptop or personal computer (PC) equipped with an internal or external mobile broadband modem, a tablet PC with radio communication capabilities, a portable electronic radio communication device or the like.

As an example, the encoder 1 10, and/or the source device 1 1 1 , may send 131 , over the network 101 , a bitstream to the decoder 120, and/or the target device 121 . The bitstream may be video data, e.g. in the form of one or more NAL units. The video data may thus for example represent pictures of a video sequence. In case of HEVC, the bitstream comprises a Coded Video Sequence (CVS) that is HEVC compliant.

The bitstream may thus be an encoded representation of a video sequence to be transferred from the source device 1 1 1 to the target device 121 . Hence, more generally, the bitstream may include encoded units, such as the NAL units.

Before proceeding with the description of the embodiments herein, an insight made by the present inventors will be explained here. Consider Figure 3 in the background section. If the syntax of an entire GOP is decoded before any reconstruction starts, any blocks predicted solely from the pixels in the previous I- or P-frame, i.e. a frame preceding the entire GOP, can be reconstructed independently of the rest of the GOP. These blocks may thus be used for prediction by other pictures, aka frames, in the GOP, even pictures which would normally - according to coding structure of the GOP - be reconstructed earlier than the frames of the blocks that can be reconstructed independently of the rest of the GOP.

It shall here be said that reconstruction may include prediction and optionally deblocking, filtering and the like. This means for example that reconstruction, as opposed to prediction, also involves decoding/encoding of the prediction error mentioned in the background section.

In particular, for example, parts of the P frame, which is normally decoded first, could be allowed to predict from blocks in the B-frame, provided these blocks in turn predict their pixel values from the preceding P- or l-frame, i.e. a frame that precedes the entire GOP. This may be advantageous since the B-frame is closer temporally to the P frame than a frame that the P frame normally could predict from. A frame that is located temporally closer tends to be a better match, i.e. better basis for prediction.

Figure 5 illustrates exemplifying embodiments when implemented in the system 100 of Figure 1 .

The encoder 1 10 performs a method for encoding a first block of a first frame of a video sequence into a bitstream including an encoded representation of the video sequence. The encoding may be based on HEVC, H.264 or the like. Thus, the encoder 1 10 may be an HEVC encoder.

The decoder 120 performs a method for decoding a bitstream including an encoded representation of a first block of a first frame of a video sequence. The decoding of the encoded representation may be based on HEVC, H.264 or the like. Thus, the decoder 120 may be an HEVC decoder.

The video sequence is described in more detail in order to better explain the embodiments herein. The video sequence comprises the first frame and a second frame. The first frame precedes the second frame in a bitstream order of the bitstream.

The bitstream order of the first and second frame may be defined by that at least one unit, such as NAL unit, of the bitstream, including a first part of the first frame, precedes any unit of the bitstream, including a second part of the second frame. The at least one unit may carry syntax information. The syntax information may include one or more of frame size, block size, prediction mode, reference picture selection for each block, transform coefficients and the like. As an example, the syntax information may be used to determine a syntax of a group of frames, e.g. including the first and second frames.

The following detailed description discloses a plurality of embodiments. The plurality of embodiments includes, but is not limited to, the following example

embodiments, which are described with reference to Figure 5.

A first example embodiment relates to when the first and second frames form a group of frames. Hence, the first and second frames may form a group of frames, wherein at least one previous frame precedes all frames of the group of frames. The group of frames may also be referred to as a Group of Pictures (GOP). According to HEVC, GOP is a way of organizing pictures in a particular coding order. In the first example embodiment, now described briefly for the decoder 120, the decoder 120 first determines a coding order of blocks in the group of frames. The determined coding order may be different from the coding order for the frames as defined by the GOP. Then, the decoder 120 reconstructs blocks of the group of frames that only references pictures before the group of frames. This may mean that the decoder 120 predicts blocks that are predictable using only portions of frames temporally located before the group of frames. Subsequently, the first block will be reconstructed by the decoder 120.

A second example embodiment also relates to when the first and second frames form a group of frames. In the second example embodiment, now described briefly for the decoder 120, the decoder 120 performs two passes according to the coding order defined by the group of frames. In a first pass, the decoder 120 reconstructs blocks of the group of frames that only refer to, or reference, pictures before the group of frames. In a second pass, further blocks, e.g. including the first block, will be reconstructed by the decoder 120.

A third example embodiment relates to when the first and second frames do not form a group of frames. Thus, there is no inherent coding order defined between the first and second frames as is the case when the first and second frames do form a group of frames. In the third example embodiment, the bitstream order will be relied upon for determination of coding order. One or more of the following actions may be performed in any suitable order.

Action A010 to A040 relates to how the encoder 1 10 encodes the video sequence to be decoded by the decoder 120 in actions A070 to A120.

Action A010

In some examples, the encoder 1 10 may predict a second block based on at least one previous frame preceding the first frame in the bitstream order. This means that the second block is inter-coded from the at least one previous frame. This action may be performed in the first pass of the second example embodiment.

The prediction of the second block may be performed before the prediction in action A040. In this manner, an area used in action A040 may have been reconstructed such that it can be used for prediction of the first block. This may mean that the area may have been both predicted and a prediction error therefor may have been compensated. Reference is made to the background section for explanation of prediction error.

Action A020

In some examples, the encoder 1 10 may predict the second block based on a further area of the second frame. This means that the second block is intra-coded from the further area of the second frame. The further area may preferably be predicted independently from the first block.

The prediction in action A020 may be performed before the prediction in action A040. In this manner, the area used in action A040 may have been reconstructed, i.e. not only predicted but e.g. also transformed using transformation coefficients (which is briefly described in the background section), such that it can be used for prediction of the first block.

Action A030

The encoder 1 10 may predict further blocks of the first and second frames based on one or more of the at least one previous frame or one or more additional areas of the first and second frames. The one or more additional areas may be predicted

independently from the first block. When the encoder 1 10 predicts the further blocks based on the one or more of the at least one previous frame, this action may be part of a first pass according to the second example embodiment.

This means that prediction of blocks may be performed according to coding order given by a GOP structure, while those blocks that are predicted from a subsequent frame are left blank. As an example, a blank block may be the first block that is predicted in action A040 and thus after action A040, the first block is not blank, or non- reconstructed.

Action A040

The encoder 1 10 predicts the first block based on an area, wherein the area is referred to as "the area used in action A040" above. The area includes at least a portion of a second block of the second frame. The second block is predicted independently of the first block. This action may be performed as part of the second pass according the second example embodiments.

The bitstream may indicate that the area is usable for prediction of the first block. This may mean that the bitstream indicates that the first block refers to, or references, the area.

The predicting A040 of the first block may be performed after the predicting A030 of further blocks.

In one example, the group of frames may be associated with two coding structures under the bi-constraint that a respective coding structure may be applied to only a respective portion of each frame of the group. Expressed differently, a first coding structure of the group of frames may apply to only a first portion of each frame of the group and a second coding structure may apply to only a second portion of each frame of the group. In more detail, it may be that e.g. upper portions of frames use the first coding structure and low portions of frames use the first coding structure. The first coding structure and the second coding structure may be exclusively associated to the first and second portions, respectively. Thus, only one coding structure at the time may apply to any given portion of a frame.

Action A050

The encoder 1 10 may send at least parts of the bitstream. Action A060

Subsequent to action A050, the decoder 120 may receive at least parts of the bitstream.

Action A070

In the second example embodiment, in order to obtain syntax information before prediction in action A120, the decoder 120 may parse the encoded representation. The syntax information thus obtained is related to coding of the bitstream.

The syntax information may include one or more of: frame size, block size, prediction mode, reference picture selection for each block, transform coefficients and the like.

Action A080

After action A070, the decoder 120 may parse the encoded representation by determining a prediction order for prediction of blocks of the first frame based on the syntax information. Since reconstruction includes prediction, the prediction order may also be referred to as a reconstruction order.

As mentioned above, the syntax information may comprise the reference picture selection for each block. Then, the decoder 120 may compare the bitstream order to the reference picture selection to deduce a suitable prediction order, i.e. the reconstruction order of blocks of the first frame needs to be restricted such that a reference picture selection is reconstructed before a current block is predicted.

Another way of achieving this is to determine the reconstruction order to be the same as the bitstream order and then, when a current block cannot be decoded due to a reference to a non-reconstructed area, the current block is left blank. Subsequently, e.g. after a last block of the first frame has been reconstructed, or even not reconstructed in case of reference to a non-reconstructed area, the decoder 120 attempts to reconstruct that or those block(s) that were left blank. This manner of making a first and second pass for prediction of the blocks of the first frame has also been briefly described above for the case that the first and second frames are included in the group of frames.

Action A090

Similarly to action A010 for the encoder 1 10, the decoder 120 may predict the second block based on at least one previous frame preceding the first frame in the bitstream order. Accordingly, the second block is inter-coded from the at least one previous frame.

The prediction in action A090 may be performed before the prediction in action

A120.

Action A100

Similarly to action A020 for the encoder 1 10, the decoder 120 may predict the second block based on a further area of the second frame. Accordingly, the second block may be intra-coded from the further area of the second frame.

The prediction in action A100 may be performed before the prediction in action

A120.

Action A110

Similarly to action A030 for the encoder 1 10, the decoder 120 may predict further blocks of the first and second frames based on one or more of the at least one previous frame or one or more additional areas of the first and second frames. The additional areas may be predicted independently of the first block.

For the second example embodiment, this action may be part of the first pass according to the coding order given by the group of frames.

Action A120

The decoder 120 predicts the first block based on an area. The area includes at least a portion of a second block of the second frame. The second block is predicted independently of the first block.

The prediction in action A120 may be performed after the prediction in action

A1 10. For the second example embodiment, this action may be part of the second pass according to the coding order given by the group of frames.

With the various embodiments above, it is common to use different coding parameters, such as quantization parameters (QP) for IPPP coding compared to hierarchical coding schemes. IPPP coding refers to Intra and predicted frames used in coding similar to Figure 2, it is not an abbreviation. In one embodiment, the values of these parameters may be inferred by the prediction used for each block, rather than specified on a frame basis. Typically, IPPP coding will use different QPs than IBP coding. Again, IBP coding is not an abbreviation; instead it refers to a coding technique known in the art. In particular, a B-frame will have higher QP. If a block in the B-frame is coded with P-coding instead, and available for prediction to a subsequent picture, it makes sense not to use this higher QP value. The decoder 120 may then decode that block using a typical QP for a P-frame in IPPP coding.

Elaborating further, for example, in hierarchical prediction, P-frames tend to have significantly lower QP (higher quality) than B-frames, since they will be used more for prediction. If a block in a P-frame instead predicts from a previous frame in an IPPP- manner, this block may be given a higher QP (lower quality) since it is unlikely to be used for prediction by the rest of the GOP. Similarly, a block in a B-frame that will now be available for prediction by P-frames could implicitly be given lower QP (higher quality) than the rest of the B-frame.

With the embodiments herein added signaling cost is fairly low. Blocks in the B- frames are already allowed to predict solely from frames preceding the GOP and these blocks may then be used for prediction by other frames in the same GOP. The cost then, comes when for example coding Frame 2 in Figure 3. Here, it may be signaled whether Frame 0 or Frame 1 may be used for prediction for each inter-block. An approach may be to treat both as reference pictures, similar to how multiple reference pictures are handled. Notably, Frame 0 is a regular reference frame in that it is entirely reconstructed, whereas only parts of Frame 1 are available for prediction. In an alternate embodiment, this is used in order to infer whether Frame 1 is a reasonable candidate reference for a given block, or even to infer, without signaling, which of the frames should be used for each block in Frame 2. In the embodiment with no signaling, Frame 2 would signal as if it were predicting from Frame 0, but if Frame 1 contained data for the relevant area, that data would be used for prediction instead, which may involve halving the motion vector. As an example, the signaling may involve a number representing an index into a list of possible reference pictures.

In some embodiments, all in-loop filtering, such as HEVC's deblocking and Sample Adaptive Offset (SAO), is performed after all the blocks in a frame have been decoded. This means that the new reference pictures allowed due to our scheme will be unfiltered, whereas the normal reference pictures will have been filtered normally.

In Figure 6, a schematic flowchart of exemplifying methods in the encoder 1 10 is shown. Again, the same reference numerals as above have been used to denote the same or similar features, in particular the same reference numerals have been used to denote the same or similar actions. Accordingly, the encoder 1 10 performs a method for encoding a first block of a first frame of a video sequence into a bitstream including an encoded representation of the video sequence.

As mentioned, the video sequence comprises a second frame, wherein the first frame precedes the second frame in a bitstream order of the bitstream.

One or more of the following actions may be performed in any suitable order.

Action A010

The encoder 1 10 may predict the second block based on at least one previous frame preceding the first frame in the bitstream order.

The predicting A010 of the second block based on the at least one previous frame may be performed before the predicting A040 of the first block. Action A020

The encoder 1 10 may predict the second block based on a further area of the second frame.

The predicting A020 of the second block based on the further area may be performed before the predicting A040 of the first block.

Action A030

The encoder 1 10 may predict further blocks of the first and second frames based on one or more of the at least one previous frame or one or more additional areas of the first and second frames. Action A040

The encoder 1 10 predicts the first block based on an area including at least a portion of a second block of the second frame, wherein the second block is predicted independently of the first block.

The first and second frames may form a group of frames, wherein the at least one previous frame precedes all frames of the group of frames.

A first coding structure of the group of frames may apply to only a first portion of each frame of the group and a second coding structure may apply to only a second portion of each frame of the group.

The first coding structure and the second coding structure may be exclusively associated to the first and second portions, respectively.

The bitstream order of the first and second frame may be defined by that at least one unit of the bitstream, including a first part of the first frame, precedes any unit of the bitstream, including a second part of the second frame.

The at least one unit may carry syntax information.

The syntax information may include one or more of frame size, block size, prediction mode, reference picture selection for each block, and transform coefficients.

The encoding may be based on HEVC or H.264.

With reference to Figure 7, a schematic block diagram of embodiments of the encoder 1 10 of Figure 1 is shown. The encoder 1 10 is thus configured to encode a first block of a first frame of a video sequence into a bitstream including an encoded representation of the video sequence.

A first coding structure of the group of frames may apply to only a first portion of each frame of the group and a second coding structure may apply to only a second portion of each frame of the group. The first coding structure and the second coding structure may be exclusively associated to the first and second portions, respectively.

The encoding may be based on HEVC or H.264.

The at least one unit may carry syntax information.

The encoder 1 10 may comprise a processing module 701 , such as a means, one or more hardware modules and/or one or more software modules for performing the methods described herein.

The encoder 1 10 may further comprise a memory 702. The memory may comprise, such as contain or store, a computer program 703.

According to some embodiments herein, the processing module 701 comprises, e.g. 'is embodied in the form of or 'realized by', a processing circuit 704 as an exemplifying hardware module. In these embodiments, the memory 702 may comprise the computer program 703, comprising computer readable code units executable by the processing circuit 704, whereby the encoder 1 10 is operative to perform the methods of Figure 5 and/or Figure 6.

In some other embodiments, the computer readable code units may cause the encoder 1 10 to perform the method according to Figure 5 and/or 6 when the computer readable code units are executed by the encoder 1 10.

Figure 7 further illustrates a carrier 705, or program carrier, which comprises the computer program 703 as described directly above.

In some embodiments, the processing module 701 comprises an Input/Output unit 706, which may be exemplified by a receiving module and/or a sending module as described below when applicable. In further embodiments, the processing module 701 may comprise one or more of a predicting module 710, and a sending module 720 as exemplifying hardware modules. In other examples, one or more of the aforementioned exemplifying hardware modules may be implemented as one or more software modules.

Therefore, according to the various embodiments described above, the encoder 1 10, the processing module 701 and/or the predicting module 710 is operative to, such as configured to, predict the first block based on an area including at least a portion of a second block of the second frame, wherein the second block is predicted independently of the first block.

The encoder 1 10, the processing module 701 and/or the predicting module 710 may be operative to, such as configured to, predict the second block based on at least one previous frame preceding the first frame in the bitstream order.

The encoder 1 10, the processing module 701 and/or the predicting module 710 may be operative to, such as configured to, predict the second block based on a further area of the second frame. The encoder 1 10, the processing module 701 and/or the predicting module 710 may be operative to, such as configured to, predict the second block based on the at least one previous frame and/or to predict the second block based on the further area, before the prediction of the first block. The encoder 1 10, the processing module 701 and/or the predicting module 710 may be operative to, such as configured to, predict further blocks of the first and second frames based on one or more of the at least one previous frame or one or more additional areas of the first and second frames. The encoder 1 10, the processing module 701 and/or the predicting module 710 may be operative to, such as configured to, predict the first block after prediction of further blocks. In Figure 8, a schematic flowchart of exemplifying methods in the decoder 120 is shown. Again, the same reference numerals as above have been used to denote the same or similar features, in particular the same reference numerals have been used to denote the same or similar actions. Accordingly, the decoder 120 performs a method for decoding a bitstream including an encoded representation of a first block of a first frame of a video sequence.

As mentioned, the video sequence comprises a second frame, wherein the first frame precedes the second frame in a bitstream order of the bitstream. One or more of the following actions may be performed in any suitable order.

Action A070

The decoder 120 may parse the encoded representation to obtain syntax information related to coding of the bitstream.

The syntax information may include one or more of: frame size, block size, prediction mode, reference picture selection for each block, and transform coefficients.

Action A080

The decoder 120 may parse the encoded representation by determining a prediction order for prediction of blocks of the first frame based on the syntax information.

Action A090

The decoder 120 may predict the second block based on at least one previous frame preceding the first frame in the bitstream order.

The predicting A090 of the second block based on the at least one previous frame may be performed before the predicting in action A120.

Action A100

The decoder 120 may predict the second block based on a further area of the second frame.

The predicting A100 of the second block based on the further area may be performed before the predicting in action A120. Action A110

The decoder 120 may predict further blocks of the first and second frames based on one or more of the at least one previous frame or one or more additional areas of the first and second frames.

Action A120

The decoder 120 predicts the first block based on an area including at least a portion of a second block of the second frame, wherein the second block is predicted independently of the first block.

The predicting A120 of the first block may be performed after the predicting A1 10 of further blocks.

The first and second frames may form a group of frames, wherein the at least one previous frame precedes all frames of the group of frames. A first coding structure of the group of frames may apply to only a first portion of each frame of the group and a second coding structure may apply to only a second portion of each frame of the group.

The first coding structure and the second coding structure are exclusively associated to the first and second portions, respectively.

The at least one unit may carry part of the syntax information.

The decoding of the encoded representation may be based on HEVC or H.264.

With reference to Figure 9, a schematic block diagram of embodiments of the decoder 120 of Figure 4 is shown. The decoder 120 is thus configured to decode a bitstream including an encoded representation of a first block of a first frame of a video sequence.

The first and second frames may form a group of frames, wherein the at least one previous frame may precede all frames of the group of frames. A first coding structure of the group of frames may apply to only a first portion of each frame of the group and a second coding structure may apply to only a second portion of each frame of the group.

The decoding of the encoded representation may be based on HEVC or H.264.

The at least one unit carries part of the syntax information.

The decoder 120 may comprise a processing module 901 , such as a means, one or more hardware modules and/or one or more software modules for performing the methods described herein.

The decoder 120 may further comprise a memory 902. The memory may comprise, such as contain or store, a computer program 903.

According to some embodiments herein, the processing module 901 comprises, e.g. 'is embodied in the form of or 'realized by', a processing circuit 904 as an exemplifying hardware module. In these embodiments, the memory 902 may comprise the computer program 903, comprising computer readable code units executable by the processing circuit 904, whereby the decoder 120 is operative to perform the methods of Figure 5 and/or Figure 8. In some other embodiments, the computer readable code units may cause the decoder 120 to perform the method according to Figure 5 and/or 8 when the computer readable code units are executed by the decoder 120.

Figure 9 further illustrates a carrier 905, or program carrier, which comprises the computer program 903 as described directly above.

In some embodiments, the processing module 901 comprises an Input/Output unit 906, which may be exemplified by a receiving module and/or a sending module as described below when applicable. In further embodiments, the processing module 901 may comprise one or more of a predicting module 910, a parsing module 920, a determining module 930, and a receiving module 940 as exemplifying hardware modules. In other examples, one or more of the aforementioned exemplifying hardware modules may be implemented as one or more software modules.

Therefore, according to the various embodiments described above, the decoder 120, the processing module 901 and/or the predicting module 910 is operative to, such as configured to, predict the first block based on an area including at least a portion of a second block of the second frame, wherein the second block is predicted independently of the first block.

The decoder 120, the processing module 901 and/or the predicting module 910 may be operative to, such as configured to, predict the second block based on at least one previous frame preceding the first frame in the bitstream order.

The decoder 120, the processing module 901 and/or the predicting module 910 may be operative to, such as configured to, predict the second block based on a further area of the second frame.

The decoder 120, the processing module 901 and/or the predicting module 910 may be operative to, such as configured to, predict the second block based on the at least one previous frame and/or to predict the second block based on the further area, before performing prediction of the first block.

The decoder 120, the processing module 901 and/or the predicting module 910 may be operative to, such as configured to, predict further blocks of the first and second frames based on one or more of the at least one previous frame or one or more additional areas of the first and second frames.

The decoder 120, the processing module 901 and/or the predicting module 910 may be operative to, such as configured to, predict the first block after the prediction of further blocks. The decoder 120, the processing module 901 and/or the parsing module 920 may be operative to, such as configured to, parse the encoded representation to obtain syntax information related to coding of the bitstream.

The decoder 120, the processing module 901 and/or the parsing module 920 may be operative to, such as configured to, parse the encoded representation by determining a prediction order for prediction of blocks of the first frame based on the syntax information.

As used herein, the term "node", or "network node", may refer to one or more physical entities, such as devices, apparatuses, computers, servers or the like. This may mean that embodiments herein may be implemented in one physical entity. Alternatively, the embodiments herein may be implemented in a plurality of physical entities, such as an arrangement comprising said one or more physical entities, i.e. the embodiments may be implemented in a distributed manner.

As used herein, the term "unit" may refer to one or more functional units, each of which may be implemented as one or more hardware modules and/or one or more software modules in a node.

As used herein, the term "program carrier" may refer to one of an electronic signal, an optical signal, a radio signal, and a computer readable medium. In some examples, the program carrier may exclude transitory, propagating signals, such as the electronic, optical and/or radio signal. Thus, in these examples, the carrier may be a non-transitory carrier, such as a non-transitory computer readable medium.

As used herein, the term "processing module" may include one or more hardware modules, one or more software modules or a combination thereof. Any such module, be it a hardware, software or a combined hardware-software module, may be a determining means, estimating means, capturing means, associating means, comparing means, identification means, selecting means, receiving means, sending means or the like as disclosed herein. As an example, the expression "means" may be a module

corresponding to the modules listed above in conjunction with the Figures. As used herein, the term "software module" may refer to a software application, a Dynamic Link Library (DLL), a software component, a software object, an object according to Component Object Model (COM), a software component, a software function, a software engine, an executable binary software file or the like.

As used herein, the term "processing circuit" may refer to a processing unit, a processor, an Application Specific integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or the like. The processing circuit or the like may comprise one or more processor kernels.

As used herein, the expression "configured to" may mean that a processing circuit is configured to, or adapted to, by means of software configuration and/or hardware configuration, perform one or more of the actions described herein.

As used herein, the term "action" may refer to an action, a step, an operation, a response, a reaction, an activity or the like.

As used herein, the term "memory" may refer to a hard disk, a magnetic storage medium, a portable computer diskette or disc, flash memory, random access memory (RAM) or the like. Furthermore, the term "memory" may refer to an internal register memory of a processor or the like.

As used herein, the term "computer readable medium" may be a Universal Serial Bus (USB) memory, a DVD-disc, a Blu-ray disc, a software module that is received as a stream of data, a Flash memory, a hard drive, a memory card, such as a MemoryStick, a Multimedia Card (MMC), Secure Digital (SD) card, etc.

As used herein, the term "computer readable code units" may be text of a computer program, parts of or an entire binary file representing a computer program in a compiled format or anything there between.

As used herein, the term "radio resource" may refer to a certain coding of a signal and/or a time frame and/or a frequency range in which the signal is transmitted. In some examples, a resource may refer to one or more Physical Resource Blocks (PRB) which is used when transmitting the signal. In more detail, a PRB may be in the form of Orthogonal Frequency Division Multiplexing (OFDM) PHY resource blocks (PRB). The term "physical resource block" is known from 3GPP terminology relating to e.g. Long Term Evolution Systems.

As used herein, the terms "number" and/or "value" may be any kind of digit, such as binary, real, imaginary or rational number or the like. Moreover, "number" and/or "value" may be one or more characters, such as a letter or a string of letters. "Number" and/or "value" may also be represented by a bit string.

As used herein, the term "set of" may refer to one or more of something. E.g. a set of devices may refer to one or more devices, a set of parameters may refer to one or more parameters or the like according to the embodiments herein.

As used herein, the expression "in some embodiments" has been used to indicate that the features of the embodiment described may be combined with any other embodiment disclosed herein. Even though embodiments of the various aspects have been described, many different alterations, modifications and the like thereof will become apparent for those skilled in the art. The described embodiments are therefore not intended to limit the scope of the present disclosure.

Claims

A method, performed by a decoder (120), for decoding a bitstream including an encoded representation of a first block of a first frame of a video sequence, wherein the video sequence comprises a second frame, wherein the first frame precedes the second frame in a bitstream order of the bitstream, wherein the method comprises: predicting (A120) the first block based on an area including at least a portion of a second block of the second frame, wherein the second block is predicted independently of the first block.

The method according to claim 1 , wherein the method comprises:

predicting (A090) the second block based on at least one previous frame preceding the first frame in the bitstream order.

The method according to claim 1 , wherein the method comprises:

predicting (A100) the second block based on a further area of the second frame.

The method according to claim 2 or 3, wherein the predicting (A090) of the second block based on the at least one previous frame and/or the predicting (A100) of the second block based on the further area is performed before the predicting (A120) of the first block.

The method according to claim 1 or 2, wherein the method comprises:

predicting (A1 10) further blocks of the first and second frames based on one or more of the at least one previous frame or one or more additional areas of the first and second frames.

The method according to the preceding claim, wherein the predicting (A120) of the first block is performed after the predicting (A1 10) of further blocks.

The method according to any one of the preceding claims, wherein the method comprises: parsing (A070) the encoded representation to obtain syntax information related to coding of the bitstream.

8. The method according to the preceding claim, wherein the syntax information

includes one or more of:

frame size, block size, prediction mode, reference picture selection for each block, and transform coefficients.

9. The method according to the preceding claim, wherein the parsing of the encoded representation comprises:

determining (A080) a prediction order for prediction of blocks of the first frame based on the syntax information.

10. The method according to any one of the preceding claims, wherein the first and second frames form a group of frames, wherein the at least one previous frame precedes all frames of the group of frames.

1 1 . The method according to the preceding claim, wherein a first coding structure of the group of frames applies to only a first portion of each frame of the group and a second coding structure applies to only a second portion of each frame of the group.

12. The method according to the preceding claim, wherein the first coding structure and the second coding structure are exclusively associated to the first and second portions, respectively.

13. The method according to any one of the preceding claims, wherein the decoding of the encoded representation is based on HEVC or H.264.

14. The method according to any one of the preceding claims, wherein the bitstream order of the first and second frame is defined by that at least one unit of the bitstream, including a first part of the first frame, precedes any unit of the bitstream, including a second part of the second frame.

15. The method according to the preceding claim, when dependent on any one of claims 7-9, wherein the at least one unit carries part of the syntax information.

16. A method, performed by an encoder (1 10), for encoding a first block of a first frame of a video sequence into a bitstream including an encoded representation of the video sequence, wherein the video sequence comprises a second frame, wherein the first frame precedes the second frame in a bitstream order of the bitstream, wherein the method comprises:

predicting (A040) the first block based on an area including at least a portion of a second block of the second frame, wherein the second block is predicted independently of the first block.

17. The method according to the preceding claim, wherein the method comprises:

predicting (A010) the second block based on at least one previous frame preceding the first frame in the bitstream order. 18. The method according to claim 16, wherein the method comprises:

predicting (A020) the second block based on a further area of the second frame.

19. The method according to claim 17 or 18, wherein the predicting (A010) of the second block based on the at least one previous frame and/or the predicting (A020) of the second block based on the further area is performed before the predicting (A040) of the first block.

20. The method according to any one of claims 16-19, wherein the method comprises:

predicting (A030) further blocks of the first and second frames based on one or more of the at least one previous frame or one or more additional areas of the first and second frames.

21 . The method according to the preceding claim, wherein the predicting (A040) of the first block is performed after the predicting (A030) of further blocks.

22. The method according to any one of the preceding claims, wherein the first and

second frames form a group of frames, wherein the at least one previous frame precedes all frames of the group of frames.

23. The method according to the preceding claim, wherein a first coding structure of the group of frames applies to only a first portion of each frame of the group and a second coding structure applies to only a second portion of each frame of the group.

24. The method according to the preceding claim, wherein the first coding structure and the second coding structure are exclusively associated to the first and second portions, respectively. 25. The method according to any one of claims 16-24, wherein the encoding is based on HEVC or H.264.

26. The method according to any one of claims 16-25, wherein the bitstream order of the first and second frame is defined by that at least one unit of the bitstream, including a first part of the first frame, precedes any unit of the bitstream, including a second part of the second frame.

27. The method according to the preceding claim, wherein the at least one unit carries syntax information.

28. The method according to the preceding claim, wherein the syntax information

includes one or more of frame size, block size, prediction mode, reference picture selection for each block, and transform coefficients. 29. A decoder (120) configured to decode a bitstream including an encoded

representation of a first block of a first frame of a video sequence, wherein the video sequence comprises a second frame, wherein the first frame precedes the second frame in a bitstream order of the bitstream, wherein the decoder (120) is configured to:

predict the first block based on an area including at least a portion of a second block of the second frame, wherein the second block is predicted

independently of the first block.

30. The decoder (120) according to claim 29, wherein the decoder (120) is configured to predict the second block based on at least one previous frame preceding the first frame in the bitstream order. 31 . The decoder (120) according to claim 29, wherein the decoder (120) is configured to predict the second block based on a further area of the second frame.

32. The decoder (120) according to claim 30 or 31 , wherein the decoder (120) is

configured to predict the second block based on the at least one previous frame and/or to predict the second block based on the further area, before performing prediction of the first block.

33. The decoder (120) according to claim 29 or 30, wherein the decoder (120) is

configured to predict further blocks of the first and second frames based on one or more of the at least one previous frame or one or more additional areas of the first and second frames.

34. The decoder (120) according to the preceding claim, wherein the decoder (120) is configured to predict the first block after the prediction of further blocks.

35. The decoder (120) according to any one of claims 29-34, wherein the decoder (120) is configured to parse the encoded representation to obtain syntax information related to coding of the bitstream. 36. The decoder (120) according to the preceding claim, wherein the syntax information includes one or more of:

37. The decoder (120) according to the preceding claim, wherein the decoder (120) is configured to parse the encoded representation by determining a prediction order for prediction of blocks of the first frame based on the syntax information.

38. The decoder (120) according to any one of claims 29-37, wherein the first and second frames form a group of frames, wherein the at least one previous frame precedes all frames of the group of frames. 39. The decoder (120) according to the preceding claim, wherein a first coding structure of the group of frames applies to only a first portion of each frame of the group and a second coding structure applies to only a second portion of each frame of the group.

40. The decoder (120) according to the preceding claim, wherein the first coding

structure and the second coding structure are exclusively associated to the first and second portions, respectively.

41 . The decoder (120) according to any one of claims 29-40, wherein the decoding of the encoded representation is based on HEVC or H.264.

42. The decoder (120) according to any one of claims 29-41 , wherein the bitstream order of the first and second frame is defined by that at least one unit of the bitstream, including a first part of the first frame, precedes any unit of the bitstream, including a second part of the second frame.

43. The decoder (120) according to the preceding claim, when dependent on any one of claims 35-37, wherein the at least one unit carries part of the syntax information.

44. An encoder (1 10) configured to encode a first block of a first frame of a video

sequence into a bitstream including an encoded representation of the video sequence, wherein the video sequence comprises a second frame, wherein the first frame precedes the second frame in a bitstream order of the bitstream, wherein the encoder (1 10) is configured to:

independently of the first block.

45. The encoder (1 10) according to the preceding claim, wherein the encoder (1 10) is configured to predict the second block based on at least one previous frame preceding the first frame in the bitstream order.

46. The encoder (1 10) according to claim 44, wherein the encoder (1 10) is configured to predict the second block based on a further area of the second frame. 47. The encoder (1 10) according to claim 45 or 46, wherein the encoder (1 10) is

configured to predict the second block based on the at least one previous frame and/or to predict the second block based on the further area, before the prediction of the first block. 48. The encoder (1 10) according to claim 44 or 45, wherein the encoder (1 10) is

configured to:

predict further blocks of the first and second frames based on one or more of the at least one previous frame or one or more additional areas of the first and second frames.

49. The encoder (1 10) according to the preceding claim, wherein the encoder (1 10) is configured to predict the first block after prediction of further blocks.

50. The encoder (1 10) according to any one of claims 44-49, wherein the first and

51 . The encoder (1 10) according to the preceding claim, wherein a first coding structure of the group of frames applies to only a first portion of each frame of the group and a second coding structure applies to only a second portion of each frame of the group.

52. The encoder (1 10) according to the preceding claim, wherein the first coding

53. The encoder (1 10) according to any one of claims 44-52, wherein the encoding is based on HEVC or H.264.

54. The encoder (1 10) according to any one of claims 44-53, wherein the bitstream order of the first and second frame is defined by that at least one unit of the bitstream, including a first part of the first frame, precedes any unit of the bitstream, including a second part of the second frame.

55. The encoder (1 10) according to the preceding claim, wherein the at least one unit carries syntax information.

56. The encoder (1 10) according to the preceding claim, wherein the syntax information includes one or more of frame size, block size, prediction mode, reference picture selection for each block, and transform coefficients.

57. A computer program (903), comprising computer readable code units which when executed on a decoder (120) causes the decoder (120) to perform the method according to any one of claims 1 -15.

58. A carrier (905) comprising the computer program according to the preceding claim, wherein the carrier (905) is one of an electronic signal, an optical signal, a radio signal and a computer readable medium.

59. A computer program (703), comprising computer readable code units which when executed on an encoder (1 10) causes the encoder (1 10) to perform the method according to any one of claims 16-28.

60. A carrier (705) comprising the computer program according to the preceding claim, wherein the carrier (705) is one of an electronic signal, an optical signal, a radio signal and a computer readable medium.