WO2021115657A1

WO2021115657A1 - Video encoding and video decoding

Info

Publication number: WO2021115657A1
Application number: PCT/EP2020/077607
Authority: WO
Inventors: Saverio BLASI; Andre Seixas DIAS; Gosala KULUPANA
Original assignee: British Broadcasting Corporation
Priority date: 2019-12-13
Filing date: 2020-10-01
Publication date: 2021-06-17
Also published as: CN114868385A; GB201918431D0; EP4074031A1; GB2589932A

Abstract

Video decoding is implemented using a combined inter-picture merge and intra-picture prediction facility, operable to predict samples of a block of video sample data. The combined inter-picture merge and intra-picture prediction facility is operable in one of a plurality of modes wherein, in at least one of said modes, decoding involves generating a prediction of samples of a block of video sample data by the application of a blending mask, the blending mask governing partitioning of the block into two parts, wherein the blending mask is applied to a first prediction generated by an inter-picture merge prediction process and to a second prediction generated by an intra-picture prediction process.

Description

Video Encoding and Video Decoding

FIELD

This disclosure relates to video encoding and video decoding.

BACKGROUND

Video compression provides opportunities to reduce payload on a transmission channel. It also provides opportunities for increasingly efficient storage of encoded video data. Known video coding standards enable transmission (or storage) of bitstream data defining a video, such that a receiver (or retriever) of the bitstream is able to decode the bitstream in such a way as to construct a decoded video which is substantially faithful to the original video from which the encoded bitstream was derived.

Early video coding standards were devised with a view to reproduction of video on equipment where relatively low or medium quality reconstruction is acceptable. This includes hand-held devices or personal computing devices. To a large extent, the acceptability of particular levels of quality is as much driven by user demand as by the capability of the playback equipment.

As receiver equipment improves in quality and capability, so does user demand for higher quality reproduction of original video. The technical objective thus emerges to enable reproduction of video on a player, to a higher quality than hitherto implemented. Another technical objective which emerges from this is to enable transmission of encoded video data on a communication channel, so as to allow higher quality at a receiver despite a data capacity constraint of the communication channel.

DESCRIPTION OF DRAWINGS

Figure 1 is a schematic representation of a communications network in accordance with an embodiment; Figure 2 is a schematic representation of an emitter of the communications network of figure 1 ;

Figure 3 is a diagram illustrating an encoder implemented on the emitter of figure 2;

Figure 4 is a flow diagram of a prediction process performed at a prediction module of the encoder of figure 3;

Figure 5 is a schematic representation of a receiver of the communications network of figure 1 ;

Figure 6 is a diagram illustrating a decoder implemented on the receiver of figure 4; and

Figure 7 is a flow diagram of a prediction process performed at a prediction module of the decoder of figure 6.

DESCRIPTION OF EMBODIMENTS

Aspects of the present disclosure may correspond with the subject matter of the appended claims.

In general terms, in certain embodiments disclosed herein, decoding of a merge- predicted block is carried out using a combination of inter-prediction and intra-prediction. The combination of inter-prediction and intra-prediction may be performed in one of a plurality of modes. In some of said modes, said combination may be operable to generate a prediction of samples of a block of video sample data by the application of a blending mask. In some of said modes, said combination may be operable to generate a prediction of samples following a partitioning of the block by means of a geometrical partitioning scheme. The mode or combination of inter-prediction and intra-prediction to be used may be determined by a decoder by way of signalling on a received bitstream, or by way of inference. The inference may be established by a decoder by way of a determination carried out on the basis of characteristics of the block, of surrounding blocks, or of other criteria. Embodiments disclosed herein relate to a method of performing prediction in a video codec by means of more efficiently exploiting redundancies in the video signal. As will be appreciated by the reader, a video presentation generally comprises a plurality of frames, for sequential display on playback equipment. Various strategies are used to reduce the amount of data required to describe each frame in turn on a bitstream transmitted on a communications channel from an emitter to a receiver. As will be understood, the emitter will comprise an encoder for encoding the frame data into a bitstream, and the receiver will comprise a decoder for generating frame data on the basis of information borne in the bitstream.

In embodiments of the present disclosure, each frame of the video presentation is partitioned into blocks. At the encoder, the content of a block is predicted based on previously compressed content. Such content may be extracted from the same frame as the current block being predicted, as in the case of intra-prediction, or such content may be extracted from previously encoded frames, as in the case of inter-prediction. This block prediction is subtracted from the actual block, resulting in a set of residual differences (residuals). In an embodiment, the residual data can be encoded using a transformation into the frequency domain. However, as will be recognised by the reader, the transform of data from the time domain to the frequency domain may be specific to certain implementations, and is not essential to the performance of disclosed embodiments.

Merge prediction may be employed to compute the inter-prediction for a given set of samples, in which the content of a partition can be predicted on the basis of the motion information pertaining to another neighbouring block or neighbouring block partition. In certain circumstances, merge inter-prediction may not provide a good prediction for a specific partition. Thus, embodiments described herein provide an encoding process, and a corresponding decoding process, in which other types of prediction, such as intra prediction, may be employed on blocks, with a prospect of leading to more accurate results. An approach to video encoding and decoding allows blocks to be predicted using a combination of intra-prediction and inter-prediction.

In some embodiments, this combination may happen by applying uniform weighting to the inter-prediction and intra-prediction samples to produce a combined prediction. In some embodiments, such uniform weighting may be derived from characteristics of neighbouring blocks.

In other embodiments, the combined intra-prediction and inter-prediction block is further partitioned into two partitions. Two predictions are computed, one for each partition in the block, respectively corresponding to an inter-prediction (for instance computed by means of merge prediction) and an intra-prediction (computed by means of a mode that is either inferred, or signalled in the bitstream). The partitioning of the block into two partitions may be extracted from a set of possible partitioning schemes. The combined prediction is formed by using one prediction in one of the two partitions and the other prediction in the other of the two partitions. In other embodiments, the partitioning scheme may depend on an angle defining the directionality of the edge dividing the two partitions, and on an offset or distance which depends on the distance between the edge dividing the two partitions and the centre of the block. A smoothing process may be employed whereby samples next to the border between the two partitions are predicted using a weighted combination of the two partitions.

In other embodiments, the combined intra-prediction and inter-prediction block is performed by means of a blending mask formed of a set of weights. Two predictions are computed, an inter-prediction (for instance computed by means of merge prediction) and an intra-prediction (computed by means of a mode that is either inferred, or signalled in the bitstream). The combined prediction is formed by applying the blending mask to the two predictions. In other embodiments, the blending mask may be formed by means of a geometrical partitioning process based on an angle and an offset. The angle and offset, together with other characteristics of the block (for instance the width and height of the block, or the bitdepth of the signal) may determine the weights in the blending mask. A variety of different angles and offsets may be considered, resulting in a variety of possible blending masks. The blending mask to use on a current block may be signalled in the bitstream or may be inferred.

Embodiments may accommodate a case wherein some of the weights in the blending mask associated with the inter-prediction are 0 and some of the weights associated with intra-prediction are 1 ; in such a case, then at least part of at least one of the two partitions will be predicted completely using intra-prediction. Similarly, embodiments may accommodate a case wherein some of the weights in the blending mask associated with the inter-prediction are 0 and some of the weights associated with intra-prediction are 1; in such a case, then at least part of at least one of the two partitions will be predicted completely using inter-prediction.

By way of background, work on the establishment of a video compression standard under the heading “VVC” has been commissioned by JVET, formed by MPEG and VCEG. An approach to prediction currently adopted in the draft VVC specifications will now be described.

A combined inter-prediction merge and intra-prediction is considered. A list of possible merge candidates is computed. This list is formed by means of a process whereby some of these candidates may use motion information extracted from previously encoded blocks, or where some merge candidates may use motion information that is obtained by combining or manipulating the motion information of previously encoded blocks. A merge candidate can be used to form a merge inter-predicted prediction. When using combined inter-prediction merge and intra-prediction, an intra-predicted block is also computed. The computation of the intra-predicted block follows from the determination of an intra-prediction mode to be used. This mode may be a fixed mode, for instance the Planar mode, or may be inferred based on information extracted from neighbouring blocks, or may be determined based on signalling in the bitstream.

The two predictions are then combined together. Each sample in the combined prediction is computed using the two collocated samples in the two predictions, whereby such samples are then weighted to form the combined prediction sample. A fixed weight is used to perform the combination. The weight may be inferred based on information extracted from neighbouring blocks, such as for instance whether the neighbouring blocks are intra-predicted or not.

Signalling may happen by means of a mechanism whereby a bit is signalled to determine whether the current block should employ the combined inter-prediction merge and intra prediction facility.

The reader will appreciate that embodiments disclosed herein are not limited in their application to the above approach - different techniques for partitioning, for assembling a list of partition candidates, for signalling, and for encoding data for signalling, may be employed and the present disclosure is not mandated on any one implementation.

Embodiments disclosed herein make use of a combined inter-picture merge and intra picture prediction facility operable to predict samples of a block of video sample data, the combined inter-picture merge and intra-picture prediction facility being operable in one of a plurality of modes.

In some embodiments, each mode corresponds to the use of a blending mask to compute the combination of intra and inter-prediction. In some embodiments, the blending mask is extracted from a list of blending masks, whereby the possible blending masks are derived based on a number of different angles and offsets (or distances). For each combination of a given angle and offset, a particular blending mask can be derived whereby the weights in the blending mask will produce a blended prediction in accordance to such angle and offset. In some embodiments, a geometrical partitioning index is assigned to each blending mask, where a computation can be performed to derive the angle and offset of the blending mask depending on the corresponding geometrical partitioning index.

In some embodiments, each mode corresponds to the use of a partitioning process whereby each block is split into two partitions based on an angle and an offset to compute the combination of intra and inter-prediction. The resulting combined intra and inter predictions is then treated in the same way as similar partitioning methods such as triangular partitions or geometrical partitions, which means they may be smoothed at the edge forming the boundary between the two partitions of the block.

In some embodiments, determination of the combined inter-picture merge and intra picture prediction mode to use on a given block may follow from the establishment of a look-up table. The look-up table may contain a number of possible modes where the formation of the look-up table may depend on characteristics of the current block and/or of neighbouring blocks. The look-up table may be used to enable signalling by an encoder and, correspondingly, interpretation of information signalled on a bitstream received by a decoder, to identify which of the possible modes has been employed in an encoding. In one embodiment, the look-up table may be formed depending on information extracted from neighbouring blocks. In one embodiment, the look-up table may be formed depending on whether the block on the top-left of the current block is intra-predicted or not. In one embodiment, the look-up table may be formed depending on whether the block on the bottom-right of the current block is intra-predicted or not. Other indicators may be used to trigger formation of the look-up table, depending on the specific implementation.

In one embodiment, the look-up table may be formed depending on information extracted from the current block. In one embodiment, the look-up table may be formed depending on the width and height of the current block, or whether the block on the top-left of the current block is intra-predicted or not.

In other embodiments, other neighbouring blocks may be used to designate whether the look-up table is formed. In other embodiments, the look-up table may be formed depending on employment of intra-prediction in a combination of neighbouring blocks.

In one embodiment, the candidates in the look-up table may include a mode whose angle corresponds to having an edge at the border between the two partitions that connects the top-right corner of the block with the bottom-left corner of the block. In one embodiment, the look-up table may include modes whose angles are selected to be close to the angle that corresponds to having an edge at the border between the two partitions that connects the top-right corner of the block with the bottom-left corner of the block. In one embodiment, these candidates may be included in the look-up table if both blocks on the top-left of the current block and on the bottom-right of the current block are intra-predicted.

In one embodiment, the candidates in the look-up table may include a mode whose angle corresponds to having an edge at the border between the two partitions that connects the top-left corner of the block with the bottom-right corner of the block. In one embodiment, the look-up table may include modes whose angles are selected to be close to the angle that corresponds to having an edge at the border between the two partitions that connects the top-left corner of the block with the bottom-right corner of the block. In one embodiment, these candidates may be included in the look-up table if one and one only between the blocks on the top-left of the current block and on the bottom- right of the current block are intra- predicted.

In some embodiments, the look-up tables contain a number of items set to a power of 2. In some embodiments, the look-up tables contain 8 candidates.

Following the determination of a look-up table, the encoder may select one mode in the look-up table, and correspondingly form a combined inter-picture merge and intra-picture prediction. In some embodiments, the selected mode may be signalled in the bitstream using a combined inter-picture merge and intra-picture prediction mode index. In one embodiment, the index is signalled using a fixed number of bits. In one embodiment, in case the look-up table contains 8 modes, the index is signalled using 3 bits.

The reader will appreciate that if the index is signalled using 3 bits, this does not compel the definition of 8 modes, it simply means that the table has capacity for 8 modes. It may be convenient, for example, for an embodiment to define fewer than 8 modes, and to leave one or more entries in the table in reserve for future extension.

In some embodiments, determination of the partitioning used for combining the inter picture merge and intra-picture prediction mode to use on a given block may depend on information extracted from neighbouring blocks. In one embodiment, the partitioning may be determined depending on whether the block on the top-left of the current block is intra-predicted or not. In one embodiment, the partitioning may be determined depending on whether the block on the bottom-right of the current block is intra-predicted or not. In one embodiment, the partitioning may be determined depending on whether both the block on the bottom-right of the current block and the block on the top-left of the current block are intra-predicted or not. In one embodiment, the usage of an intra-predicted candidate or an inter-predicted candidate in one of the two partitions may be determined depending on whether the neighbouring blocks are intra-predicted or not.

In another embodiment, rather than communicating an index to a LUT entry, the combined inter-picture merge and intra-picture prediction mode can be directly extracted from the bitstream. In some embodiments, the determination of the combined inter picture merge and intra-picture prediction mode may depend on the detection of a flag indicating usage of combined inter-picture merge and intra-picture prediction. In some emdobiments, the determination of the combined inter-picture merge and intra-picture prediction mode may depend on the detection of a flag indicating usage of the conventional combined inter-picture merge and intra-picture prediction using uniforme weighting. In some embodiments, the determination of the combined inter-picture merge and intra-picture prediction mode may depend on signalling that is extracted from the bitstream only upon the determination of flags indicating that conventional combined inter-picture merge and intra-picture prediction is not used.

As illustrated in figure 1 , an arrangement is illustrated comprising a schematic video communication network 10, in which an emitter 20 and a receiver 30 are in communication via a communications channel 40. In practice, the communications channel 40 may comprise a satellite communications channel, a cable network, a ground-based radio broadcast network, a POTS-implemented communications channel, such as used for provision of internet services to domestic and small business premises, fibre optic communications systems, or a combination of any of the above and any other conceivable communications medium.

Furthermore, the disclosure also extends to communication, by physical transfer, of a storage medium on which is stored a machine readable record of an encoded bitstream, for passage to a suitably configured receiver capable of reading the medium and obtaining the bitstream therefrom. An example of this is the provision of a digital versatile disk (DVD) or equivalent. The following description focuses on signal transmission, such as by electronic or electromagnetic signal carrier, but should not be read as excluding the aforementioned approach involving storage media.

As shown in figure 2, the emitter 20 is a computer apparatus, in structure and function. It may share, with general purpose computer apparatus, certain features, but some features may be implementation specific, given the specialised function for which the emitter 20 is to be put. The reader will understand which features can be of general purpose type, and which may be required to be configured specifically for use in a video emitter.

The emitter 20 thus comprises a graphics processing unit 202 configured for specific use in processing graphics and similar operations. The emitter 20 also comprises one or more other processors 204, either generally provisioned, or configured for other purposes such as mathematical operations, audio processing, managing a communications channel, and so on.

An input interface 206 provides a facility for receipt of user input actions. Such user input actions could, for instance, be caused by user interaction with a specific input unit including one or more control buttons and/or switches, a keyboard, a mouse or other pointing device, a speech recognition unit enabled to receive and process speech into control commands, a signal processor configured to receive and control processes from another device such as a tablet or smartphone, or a remote-control receiver. This list will be appreciated to be non-exhaustive and other forms of input, whether user initiated or automated, could be envisaged by the reader.

Likewise, an output interface 214 is operable to provide a facility for output of signals to a user or another device. Such output could include a display signal for driving a local video display unit (VDU) or any other device.

A communications interface 208 implements a communications channel, whether broadcast or end-to-end, with one or more recipients of signals. In the context of the present embodiment, the communications interface is configured to cause emission of a signal bearing a bitstream defining a video signal, encoded by the emitter 20.

The processors 204, and specifically for the benefit of the present disclosure, the GPU 202, are operable to execute computer programs, in operation of the encoder. In doing this, recourse is made to data storage facilities provided by a mass storage device 208 which is implemented to provide large-scale data storage albeit on a relatively slow access basis, and will store, in practice, computer programs and, in the current context, video presentation data, in preparation for execution of an encoding process.

A Read Only Memory (ROM) 210 is preconfigured with executable programs designed to provide the core of the functionality of the emitter 20, and a Random Access Memory 212 is provided for rapid access and storage of data and program instructions in the pursuit of execution of a computer program.

The function of the emitter 20 will now be described, with reference to figure 3. Figure 3 shows a processing pipeline performed by an encoder implemented on the emitter 20 by means of executable instructions, on a data file representing a video presentation comprising a plurality of frames for sequential display as a sequence of pictures.

The data file may also comprise audio playback information, to accompany the video presentation, and further supplementary information such as electronic programme guide information, subtitling, or metadata to enable cataloguing of the presentation. The processing of these aspects of the data file are not relevant to the present disclosure.

Referring to Figure 3, the current picture or frame in a sequence of pictures is passed to a partitioning module 230 where it is partitioned into rectangular blocks of a given size for processing by the encoder. This processing may be sequential or parallel. The approach may depend on the processing capabilities of the specific implementation.

Each block is then input to a prediction module 232, which seeks to discard temporal and spatial redundancies present in the sequence and obtain a prediction signal using previously coded content. Information enabling computation of such a prediction is encoded in the bitstream. This information should comprise sufficient information to enable computation, including the possibility of inference at the receiver of other information necessary to complete the prediction.

The prediction signal is subtracted from the original signal to obtain a residual signal. This is then input to a transform module 234, which attempts to further reduce spatial redundancies within a block by using a more suitable representation of the data. The reader will note that, in some embodiments, domain transformation may be an optional stage and may be dispensed with entirely. Employment of domain transformation, or otherwise, may be signalled in the bitstream.

The resulting signal is then typically quantised by quantisation module 236, and finally the resulting data formed of the coefficients and the information necessary to compute the prediction for the current block is input to an entropy coding module 238 makes use of statistical redundancy to represent the signal in a compact form by means of short binary codes. Again, the reader will note that entropy coding may, in some embodiments, be an optional feature and may be dispensed with altogether in certain cases. The employment of entropy coding may be signalled in the bitstream, together with information to enable decoding, such as an index to a mode of entropy coding (for example, Huffman coding) and/or a code book.

By repeated action of the encoding facility of the emitter 20, a bitstream of block information elements can be constructed for transmission to a receiver or a plurality of receivers, as the case may be. The bitstream may also bear information elements which apply across a plurality of block information elements and are thus held in bitstream syntax independent of block information elements. Examples of such information elements include configuration options, parameters applicable to a sequence of frames, and parameters relating to the video presentation as a whole.

The prediction module 232 will now be described in further detail, with reference to figure 4. As will be understood, this is but an example, and other approaches, within the scope of the present disclosure and the appended claims, could be contemplated.

The following process is performed on each motion compensated block in an inter- predicted frame.

The prediction module 232 is configured to determine, for a given block partitioned from a frame, whether it is advantageous to apply combined inter-picture merge and intra prediction to the block, and, if so, to generate a combined inter-picture merge and intra prediction prediction for the block, and combination information to enable signalling to a decoder as to the manner in which the block has been subjected to combined inter picture merge and intra-prediction and how the combined inter-picture merge and intra prediction prediction information is then to be decoded. The prediction module then applies the selected mode of combined inter-picture merge and intra-prediction, if applicable, and then determines a prediction, on the basis of which residuals can then be generated as previously noted. The prediction employed is signalled in the bitstream, for receipt and interpretation by a suitably configured decoder. In case the encoder determines that it is not advantageous to apply combined inter-picture merge and intra prediction to the block, conventional prediction methods may be employed to predict the content of the block, including conventional inter-prediction and/or conventional intra prediction techniques. The encoder will signal, by means of a flag, on the bitstream, whether or not triangular partitioning has been employed. Turning therefore to the encoder-side algorithm illustrated in figure 4, in step S102 a set of candidate combined inter-picture merge and intra-prediction modes is assembled for the block in question. Candidates are assembled using any of the techniques as previously described. Candidates may include the conventional way of performing combined inter-picture merge and intra-prediction as previously described, plus any other combined inter-picture merge and intra-prediction modes which may be identified as suitable. This may include modes that make use of blending masks, and/or modes that correspond to partitioning the block following a geometrical partitioning. The candidates for a given block may be obtained by analysing neighbouring blocks information, or information extracted from the current block. The candidates for a given block may be determined using the same parameters that are used to determine the geometrical partitioning mode.

Then a loop commences in step S104, with operations carried out on each candidate triangular partition. For each combined inter-picture merge and intra-prediction mode candidate, in step S106, a prediction is determined using the mode associated with that candidate. In step S108, a quality measure is determined for that prediction, comprising a score of accuracy of the prediction with respect to the original data. Step S110 signifies the closure of the loop.

Thus, when all candidates have been considered, in step S112, the candidate with the best quality score is selected. The attributes of this candidate are then encoded, such as using an encoding using a fixed number of bits or established in a look-up table or using a Golomb code as described above. Other techniques may be used for signalling. These attributes are added to the bitstream for transmission.

The structural architecture of the receiver is illustrated in figure 5. It has the elements of being a computer implemented apparatus. The receiver 30 thus comprises a graphics processing unit 302 configured for specific use in processing graphics and similar operations. The receiver 30 also comprises one or more other processors 304, either generally provisioned, or configured for other purposes such as mathematical operations, audio processing, managing a communications channel, and so on. As the reader will recognise, the receiver 30 may be implemented in the form of a set top box, a hand held personal electronic device, a personal computer, or any other device suitable for the playback of video presentations.

An input interface 306 provides a facility for receipt of user input actions. Such user input actions could, for instance, be caused by user interaction with a specific input unit including one or more control buttons and/or switches, a keyboard, a mouse or other pointing device, a speech recognition unit enabled to receive and process speech into control commands, a signal processor configured to receive and control processes from another device such as a tablet or smartphone, or a remote-control receiver. This list will be appreciated to be non-exhaustive and other forms of input, whether user initiated or automated, could be envisaged by the reader.

Likewise, an output interface 314 is operable to provide a facility for output of signals to a user or another device. Such output could include a television signal, in suitable format, for driving a local television device.

A communications interface 308 implements a communications channel, whether broadcast or end-to-end, with one or more recipients of signals. In the context of the present embodiment, the communications interface is configured to cause emission of a signal bearing a bitstream defining a video signal, encoded by the receiver 30.

The processors 304, and specifically for the benefit of the present disclosure, the GPU 302, are operable to execute computer programs, in operation of the receiver. In doing this, recourse is made to data storage facilities provided by a mass storage device 308 which is implemented to provide large-scale data storage albeit on a relatively slow access basis, and will store, in practice, computer programs and, in the current context, video presentation data, resulting from execution of an receiving process.

A Read Only Memory (ROM) 310 is preconfigured with executable programs designed to provide the core of the functionality of the receiver 30, and a Random Access Memory 312 is provided for rapid access and storage of data and program instructions in the pursuit of execution of a computer program. The function of the receiver 30 will now be described, with reference to figure 6. Figure 6 shows a processing pipeline performed by a decoder implemented on the receiver 20 by means of executable instructions, on a bitstream received at the receiver 30 comprising structured information from which a video presentation can be derived, comprising a reconstruction of the frames encoded by the encoder functionality of the emitter 20.

The decoding process illustrated in figure 6 aims to reverse the process performed at the encoder. The reader will appreciate that this does not imply that the decoding process is an exact inverse of the encoding process.

A received bit stream comprises a succession of encoded information elements, each element being related to a block. A block information element is decoded in an entropy decoding module 330 to obtain a block of coefficients and the information necessary to compute the prediction for the current block. The block of coefficients is typically de- quantised in dequantisation module 332 and typically inverse transformed to the spatial domain by transform module 334.

As noted above, the reader will recognise that entropy decoding, dequantisation and inverse transformation would only need to be employed at the receiver if entropy encoding, quantisation and transformation, respectively, had been employed at the emitter.

A prediction signal is generated as before, from previously decoded samples from current or previous frames and using the information decoded from the bit stream, by prediction module 336. A reconstruction of the original picture block is then derived from the decoded residual signal and the calculated prediction block in the reconstruction block 338. The prediction module 336 is responsive to information, on the bitstream, signalling the use of triangular partitioning and, if such information is present, reading from the bitstream the mode under which triangular partitioning has been implemented and thus which prediction technique should be employed in reconstruction of a block information sample. By repeated action of the decoding functionality on successively received block information elements, picture blocks can be reconstructed into frames which can then be assembled to produce a video presentation for playback.

An exemplary decoder algorithm, complementing the encoder algorithm described earlier, is illustrated in figure 7.

As noted previously, the decoder functionality of the receiver 30 extracts from the bitstream a succession of block information elements, as encoded by the encoder facility of the emitter 20, defining block information and accompanying configuration information.

In general terms, the decoder avails itself of information from prior predictions, in constructing a prediction for a present block. In doing so, the decoder may combine the knowledge from inter-prediction, i.e. from a prior frame, and intra-prediction, i.e. from another block in the same frame.

Thus, for a merge-predicted block, in step S202, the information enabling formation of a prediction candidate is extracted from the bitstream. This can be in the form of a flag, which may be binary in syntactical form, indicating whether or not combined inter-picture merge and intra-prediction has been used.

In step S204, a decision is taken dependent on the value of this flag. If combined inter picture merge and intra-prediction is to be used for the merge-predicted block, then in step S206 a look-up table containing a list of possible modes is considered. This list may be pre-determined, or may depend on information inferred from available information (such as the size or the block, or the manner in which neighbouring blocks have been decoded). It may be pre-stored at the receiver, or it may be transmitted thereto on the bitstream. This transmission of look-up table information may be at a commencement transmission of the current bitstream transmission, or it may be, for instance, in a pre configuration transmission to the receiver to configure the receiver to be capable of decoding bitstreams encoded to a particular specification.

In step S208, an index is extracted from the bitstream to signal which item in the look-up table is to be employed in generating a prediction. In step S210, the look-up table is consulted, in accordance with the index, to obtain a set of attributes defining the combined inter-picture and intra-prediction mode to be used. The attributes may be considered, collectively, as prediction configuration attributes, which can be used by the decoder to configure the way the decoder constructs a prediction of the block samples, whether combined inter-picture and intra-prediction is to be employed, and, if so, the manner in which combined inter-picture and intra-prediction is to be performed to reconstruct the block. The attributes may for instance also specify how the combination should be implemented, such as including weight parameters, or an index to another table of pre-determined weight parameters. In step S212, a prediction is generated using the specific characteristics determined in step S210.

In the alternative, if combined inter-picture merge and intra-prediction has not been signalled, using the previously described flag, then conventional techniques are used, in step S220, to generate a prediction of the merge-predicted block.

It will be understood that the invention is not limited to the embodiments above-described and various modifications and improvements can be made without departing from the concepts described herein. Except where mutually exclusive, any of the features may be employed separately or in combination with any other features and the disclosure extends to and includes all combinations and sub-combinations of one or more features described herein.

Claims

CLAIMS:

1. A video decoder for decoding encoded video data, the decoder comprising a combined inter-picture merge and intra-picture prediction facility operable to predict samples of a block of video sample data, the combined inter-picture merge and intra-picture prediction facility being operable in one of a plurality of modes wherein, in at least one of said modes, said decoder is operable to generate a prediction of samples of a block of video sample data by the application of a blending mask, the blending mask governing partitioning of the block into two parts, wherein the blending mask is applied to a first prediction generated by an inter-picture merge prediction process and to a second prediction generated by an intra-picture prediction process.

2. A video decoder in accordance with claim 1 operable to produce a prediction of at least some samples in a first of said parts fully by means of inter-prediction.

3. A video decoder in accordance with claim 1 or claim 2 operable to produce a prediction of at least some samples in a second of said parts fully by means of intra-prediction.

4. A video decoder in accordance with any one of the preceding claims, operable to detect a mode indication flag, and responsive to said mode indication flag to generate the prediction.

5. A video decoder in accordance with claim 4 wherein the detection of the mode indication flag depends upon the detection of a flag indicating usage of combined inter-picture merge and intra-picture prediction.

6. A video decoder in accordance with any of the preceding claims wherein the decoder is operable to detect blending mask information, and responsive to the blending mask information to determine a blending mask to be employed on the two parts of the block.

7. A video decoder in accordance with any one of claims 1 to 5 wherein the decoder is operable to infer, from the encoded video data, a blending mask to be employed on the two parts of the block.

8. A video decoder in accordance with any preceding claim wherein the decoder is operable to determine a list of candidate blending masks for an extracted block, from a plurality of possible blending masks.

9. A video decoder in accordance with claim 8 operable to determine the list of candidate blending masks depending on characteristics of the current block.

10. A video decoder in accordance with claim 8 or claim 9 operable to determine the list of candidate blending masks depending on the width or height of the current block.

11. A video decoder in accordance with any one of claims 8 to 10 operable to determine the list of candidate blending masks depending on whether the width of the block is higher, equal or smaller than the height of the block.

12. A video decoder in accordance with any one of claims 8 to 11 operable to determine the list of candidate blending masks depending on characteristics of one or more blocks neighbouring the current block.

13. A video decoder in accordance with claim 12 operable to determine the list of candidate blending masks depending on whether the one or more neighbouring blocks are intra-predicted.

14. A video decoder in accordance with any one of claims 8 to 13, operable to detect a blending mask index from the bitstream whereby the blending mask index indicates which of the blending masks in the list of candidate blending masks should be used to operate the combined inter-picture merge and intra-picture prediction facility.

15. A video decoder in accordance with claim 14 wherein the decoder is operable to decode b bits from the bitstream indicative of a blending mask index, the blending mask index indicating which blending mask to use of the list of candidate blending masks, the list comprising up to 2^b candidate blending masks.

16. A video decoder in accordance with any preceding claim, operable to detect a facility usage flag, the decoder being responsive thereto to cause implementation of said combined inter-picture merge and intra-picture prediction facility, and to configure said decoder to be operable to detect said mode indication flag.

17. A video decoder in accordance with any preceding claim, wherein the decoder is operable to detect partition information, and responsive to the partition information to determine the partition of the block.

18. A video decoder in accordance with any one of the preceding claims wherein the decoder is operable to infer, from the encoded video data, the partition of the block.

19. A video decoder in accordance with any preceding claim, wherein the blending mask to be applied is defined by an angle and an offset.

20. A video decoder in accordance with claim 19, wherein the angle and offset defining the blending mask determine the weights to be used when combining the first prediction and the second prediction.

21. A video decoder in accordance with any one of claims 1 to 18 wherein the combination of the first prediction and the second prediction is governed by weights dependent on an angle and an offset.

22. A video decoder operable to decode encoded video data, the decoder comprising a combined inter-picture merge and intra-picture prediction facility operable to predict samples of a block of video sample data, the combined inter-picture merge and intra-picture prediction facility being operable in one of a plurality of modes, at least one of the plurality of modes being operable in accordance with a geometrical partitioning scheme.

23. A video decoder in accordance with claim 22, wherein the combined inter-picture merge and intra-picture prediction facility is operable in one of a plurality of geometrical partitioning modes.

24. A video decoder in accordance with claim 23 wherein the decoder is operable to determine an ordered list, from the plurality of geometrical partitioning modes, the order of the list being inferred from at least one characteristic of the block and/or of at least one neighbouring block.

25. A video decoder in accordance with any one of claims 22 to 24 wherein the decoder is operable to determine the geometrical partitioning scheme at least partially as a result of an inference process as a function of the width of the block.

26. A video decoder in accordance with any one of claims 22 to 25 wherein the decoder is operable to determine the geometrical partitioning scheme inferentially as a function of the height of the block.

27. A video decoder in accordance with any one of claims 22 to 25 wherein the decoder is operable to determine the geometrical partitioning scheme inferentially as a function of the ratio of the width of the block to the height of the block.

28. A video decoder in accordance with any one of claims 22 to 27, the geometrical partitioning scheme resulting in the block being partitioned into two parts, wherein the prediction for one of the parts of the block is an inter-prediction, and wherein the partition is determined on the basis of a merge candidate used for the inter prediction.

29. A video decoder in accordance with any one of claims 22 to 27, the geometrical partitioning scheme resulting in the block being partitioned into two parts, wherein the decoder is operable to determine the partition inferentially on the basis of the type of prediction used to encode one neighbouring block.

30. A method of decoding encoded video data, comprising predicting samples of a block of video sample data, employing a combined inter-picture merge and intra- picture prediction process operable in one of a plurality of modes wherein, in at least one of said modes, said combined inter-picture merge and intra-picture prediction process comprises generating a prediction of samples of a block of video sample data by the application of a blending mask, the blending mask governing partitioning of the block into two parts, wherein the blending mask is applied to a first prediction generated by an inter-picture merge prediction process and to a second prediction generated by an intra-picture prediction process.

31. A method in accordance with claim 30 comprising producing a prediction of at least some samples in a first of said parts fully by means of inter-prediction.

32. A method in accordance with claim 30 or claim 31 comprising producing a prediction of at least some samples in a second of said parts fully by means of intra-prediction.

33. A method in accordance with any one of claims 30 to 32, comprising detecting a mode indication flag, and, responsive to said mode indication flag, generating the prediction.

34. A method in accordance with claim 33 wherein the detecting of the mode indication flag depends upon detecting a flag indicating usage of combined inter picture merge and intra-picture prediction.

35. A method in accordance with any one of claims 30 to 34 comprising detecting blending mask information, and responding to the blending mask information by determining a blending mask to be employed on the two parts of the block.

36. A method in accordance with any one of claims 30 to 34 comprising inferring, from the encoded video data, a blending mask to be employed on the two parts of the block.

37. A method in accordance with any one of claims 30 to 36 comprising determining a list of candidate blending masks for an extracted block, from a plurality of possible blending masks.

38. A method in accordance with claim 37 comprising determining the list of candidate blending masks depending on characteristics of the current block.

39. A method in accordance with claim 37 or claim 38 comprising determining the list of candidate blending masks depending on the width or height of the current block.

40. A method in accordance with any one of claims 37 to 39 comprising determining the list of candidate blending masks depending on whether the width of the block is higher, equal or smaller than the height of the block.

41. A method in accordance with any one of claims 34 to 37 comprising determining the list of candidate blending masks depending on characteristics of one or more blocks neighbouring the current block.

42. A method in accordance with claim 41 comprising determining the list of candidate blending masks depending on whether the one or more neighbouring blocks are intra-predicted.

43. A method in accordance with any one of claims 37 to 42, comprising detecting a blending mask index from the bitstream whereby the blending mask index indicates which of the blending masks in the list of candidate blending masks should be used to operate the combined inter-picture merge and intra-picture prediction facility.

44. A method in accordance with claim 43 comprising decoding b bits from the bitstream indicative of a blending mask index, the blending mask index indicating which blending mask to use of the list of candidate blending masks, the list comprising up to 2^b candidate blending masks.

45. A method in accordance with any one of claims 30 to 44, comprising detecting a facility usage flag, and responsive thereto causing implementation of said combined inter-picture merge and intra-picture prediction process, and to configure said method to be seek detection of said mode indication flag.

46. A method in accordance with any one of claims 30 to 45, comprising detecting partition information, and, responsive to detected partition information, determining the partition of the block.

47. A method in accordance with any one of claims 30 to 46 comprising inferring, from the encoded video data, the partition of the block.

48. A method of decoding encoded video data, the method comprising a combined inter-picture merge and intra-picture prediction process for predicting samples of a block of video sample data, the combined inter-picture merge and intra-picture prediction process being performed in one of a plurality of modes, at least one of the plurality of modes being in accordance with a geometrical partitioning scheme.

49. A method in accordance with claim 48, wherein the combined inter-picture merge and intra-picture prediction process is performed in one of a plurality of geometrical partitioning modes.

50. A method in accordance with claim 49 comprising determining an ordered list, from the plurality of geometrical partitioning modes, including inferring the order of the list from at least one characteristic of the block and/or of at least one neighbouring block.

51. A method in accordance with any one of claims 48 to 50 comprising determining the geometrical partitioning scheme at least partially as a result of an inference process as a function of the width of the block.

52. A method in accordance with any one of claims 48 to 51 comprising determining the geometrical partitioning scheme inferentially as a function of the height of the block.

53. A method in accordance with any one of claims 45 to 48 comprising determining the geometrical partitioning scheme inferentially as a function of the ratio of the width of the block to the height of the block.

54. A method in accordance with any one of claims 48 to 53, the geometrical partitioning scheme resulting in the block being partitioned into two parts, wherein the prediction for one of the parts of the block is an inter-prediction, and wherein the partition is determined on the basis of a merge candidate used for the inter prediction.

55. A method in accordance with any one of claims 48 to 53, the geometrical partitioning scheme resulting in the block being partitioned into two parts, including determining the partition inferentially on the basis of the type of prediction used to encode one neighbouring block.

56. A video encoder for encoding video data, the decoder comprising a combined inter-picture merge and intra-picture prediction facility operable to generate a prediction of samples of a block of video data, the combined inter-picture merge and intra-picture prediction facility being operable in one of a plurality of modes wherein, in at least one of said modes, said encoder is operable to generate a prediction of samples of a block of video sample data by the application of a blending mask, the blending mask governing partitioning of the block into two parts, wherein the blending mask is applied to a first prediction generated by an inter-picture merge prediction process and to a second prediction generated by an intra-picture prediction process.

57. A video encoder in accordance with claim 56: operable to produce a prediction of at least some samples in a first of said parts fully by means of inter-prediction; and/or operable to produce a prediction of at least some samples in a second of said parts fully by means of intra-prediction; and/or operable to generate a mode indication flag, indicating use of the mode of the combined inter-picture merge and intra-picture prediction facility in generating the prediction, including, optionally, to generate a facility usage flag indicative of use of combined inter-picture merge and intra-picture prediction; and/or operable to include, with the encoded video data, blending mask information indicative of a blending mask to be used at a corresponding decoder, optionally including, with the encoded video data, a blending mask index indicating which of the blending masks in a list of candidate blending masks should be used to operate a combined inter-picture merge and intra-picture prediction facility at a decoder, wherein the blending mask index optionally comprises b bits, the blending mask index indicating which blending mask to use of the list of candidate blending masks, the list comprising up to 2^b candidate blending masks; and/or operable to include, with the encoded video data, partition information to indicate the partition of the block.

58. A video encoder operable to encode video data, the encoder comprising a combined inter-picture merge and intra-picture prediction facility operable to generate a prediction of samples of a block of video sample data, the combined inter-picture merge and intra-picture prediction facility being operable in one of a plurality of modes, at least one of the plurality of modes being operable in accordance with a geometrical partitioning scheme.

59. A video encoder in accordance with claim 58, wherein the combined inter-picture merge and intra-picture prediction facility is operable in one of a plurality of geometrical partitioning modes.

60. A method of encoding video data, comprising predicting samples of a block of video sample data, employing a combined inter-picture merge and intra-picture prediction process operable in one of a plurality of modes wherein, in at least one of said modes, said combined inter-picture merge and intra-picture prediction process comprises generating a prediction of samples of a block of video sample data by the application of a blending mask, the blending mask governing partitioning of the block into two parts, wherein the blending mask is applied to a first prediction generated by an inter-picture merge prediction process and to a second prediction generated by an intra-picture prediction process.

61. A method of encoding video data, the method comprising a combined inter picture merge and intra-picture prediction process for predicting samples of a block of video sample data, the combined inter-picture merge and intra-picture prediction process being performed in one of a plurality of modes, at least one of the plurality of modes being in accordance with a geometrical partitioning scheme.

62. A computer program product, comprising processor executable instructions which, when executed by a general purpose computer, cause the computer to perform a method in accordance with any one of claims 30 to 55 or claims 60 or 61.

63. A signal bearing processor executable instructions which, when executed by a general purpose computer, cause the computer to perform a method in accordance with any one of claims 30 to 55 or claims 60 or 61.

64. A computer readable storage medium bearing encoded video data encoded by a method in accordance with any one of claims 30 to 55 or claims 60 or 61.

65. A computer readable signal bearing a bitstream of encoded video data encoded by a method in accordance with any one of claims 30 to 55 or claims 60 or 61.