WO2020002762A1

WO2020002762A1 - Method and apparatus for motion compensation with non-square sub-blocks in video coding

Info

Publication number: WO2020002762A1
Application number: PCT/FI2019/050464
Authority: WO
Inventors: Jani Lainema
Original assignee: Nokia Technologies Oy
Priority date: 2018-06-28
Filing date: 2019-06-18
Publication date: 2020-01-02

Abstract

A method, apparatus and computer program product select a sub-block shape and size for high-order motion compensation for the purpose of utilizing the selected sub-blocks in the motion compensation process in a video encoding or decoding process. The method, apparatus and computer program product receive a bitstream comprising one or more coding units. For each coding unit, the method, apparatus and computer program product determine if the coding unit is to be motion compensated by using a translational motion model or a higher order motion model. In response to determining that the coding unit is to be motion compensated by using a translational motion model or a higher order motion model, the method, apparatus and computer program product determine a dimension of the coding unit; determine a shape and size for a set of sub-blocks in the coding unit; and perform motion compensated prediction for the set of sub-blocks.

Description

METHOD AND APPARATUS FOR MOTION COMPENSATION WITH NON-SQUARE

SUB-BLOCKS IN VIDEO CODING

TECHNICAL FIELD

[0001] An example embodiment relates generally to video encoding and decoding.

BACKGROUND

[0002] A video codec consists of an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form. An encoder generally discards some information in the original video sequence in order to represent the video in a more compact form.

[0003] Various hybrid video codecs, for example, video codecs that operate in accordance with the International Telegraph Union Telecommunication Standardization Sector (ITU-T) H.263 and H.264, encode the video information in two phases. Firstly pixel values in a certain picture area (or“block”) are predicted for example by motion compensation which includes finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded or by spatial means which includes using the pixel values around the block to be coded in a specified manner. Secondly the prediction error, that is, the difference between the predicted block of pixels and the original block of pixels, is coded.

[0004] One common way of performing motion compensation is block based affine motion compensation. In block based affine motion compensation, the minimum block size used in the motion compensation process dictates the worst case computational complexity and memory bandwidth requirements for the operation. Finding the right minimum block size is a tricky issue; a larger minimum block size helps to control the memory bandwidth required for performing motion compensation but has a negative impact on the coding efficiency of the block based affine motion compensation.

BRIEF SUMMARY

[0005] A method, apparatus and computer program product are provided in accordance with an example embodiment in order to algorithmically select a sub-block shape and size for high-order motion compensation for the purpose of utilizing the selected sub-blocks in the motion compensation process in a video encoding or decoding process.

[0006] In one example embodiment, a method is provided that includes receiving a bitstream comprising one or more coding units. The method further includes additional operations performed for each coding unit. The additional operations include determining if the coding unit is to be motion compensated by using a translational motion model or a higher order motion model. The method further includes additional operations in response to determining that the coding unit is to be motion compensated by using a translational motion model or a higher order motion model. The additional operations include determining a dimension of the coding unit. The additional operations further include determining a shape and size for a set of sub-blocks in the coding unit. The set of sub-blocks fully cover an area of the coding unit based on the dimension of the coding unit. The additional operations further include performing motion compensated prediction for the set of sub-blocks. The additional operations further include storing an output of the motion compensated prediction.

[0007] In some implementations of such a method, the coding unit comprises one or more prediction units and determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more dimensions of the one or more prediction units. In some embodiments, determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more selected prediction type associated with the one or more sub-block. In some embodiments, the selected prediction type includes one of: a uni-prediction or a bi-prediction. In some embodiments, the set of sub-blocks includes sub-blocks of size M samples in one direction and N samples in another direction. In some embodiments, values of M and N are predefined and M is larger than N, and wherein N corresponds to a vertical direction if a motion vector difference is larger in the vertical direction compared to a horizontal direction. In some embodiments, M is equal to N, and the prediction unit associated with the set of sub-blocks is a square prediction unit.

[000S] In another example embodiment, an apparatus is provided that includes at least one processor and at least one memory including computer program code for one or more programs with the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to receive a bitstream comprising one or more coding units. The computer program code is configured to, with the at least one processor, cause the apparatus to perform additional operations for each coding unit. The additional operations include determining if the coding unit is to be motion compensated by using a translational motion model or a higher order motion model. The computer program code is configured to, with the at least one processor, cause the apparatus to perform additional operations in response to determining that the coding unit is to be motion compensated by using a translational motion model or a higher order motion model. The additional operations include determining a dimension of the coding unit. The additional operations further include determining a shape and size for a set of sub-blocks in the coding unit. The set of sub-blocks fully cover an area of the coding unit based on the dimension of the coding unit. The additional operations further include performing motion compensated prediction for the set of sub-blocks. The additional operations further include storing an output of the motion compensated prediction.

[0009] In some implementations of such an apparatus, the coding unit comprises one or more prediction units and determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more dimensions of the one or more prediction units. In some embodiments, determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more selected prediction type associated with the one or more sub-block. In some embodiments, the selected prediction type includes one of: a uni-prediction or a bi- prediction. In some embodiments, the set of sub-blocks includes sub-blocks of size M samples in one direction and N samples in another direction. In some embodiments, values of M and N are predefined and M is larger than N, and wherein N corresponds to a vertical direction if a motion vector difference is larger in the vertical direction compared to a horizontal direction. In some embodiments, M is equal to N, and the prediction unit associated with the set of sub-blocks is a square prediction unit.

[0010] In another example embodiment, an apparatus is provided that includes means for receiving a bitstream comprising one or more coding units. The apparatus further includes means for performing additional operations for each coding unit. The additional operations include determining if the coding unit is to be motion compensated by using a translational motion model or a higher order motion model. The apparatus further includes means for performing additional operations in response to determining that the coding unit is to be motion compensated by using a translational motion model or a higher order motion model. The additional operations include determining a dimension of the coding unit. The additional operations further include determining a shape and size for a set of sub-blocks in the coding unit. The set of sub-blocks fully cover an area of the coding unit based on the dimension of the coding unit. The additional operations further include performing motion compensated prediction for the set of sub-blocks. The additional operations further include storing an output of the motion compensated prediction.

[0011] In some implementations of such an apparatus, the coding unit comprises one or more prediction units and determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more dimensions of the one or more prediction units. In some embodiments, determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more selected prediction type associated with the one or more sub-block. In some embodiments, the selected prediction type includes one of: a uni-prediction or a bi- prediction. In some embodiments, the set of sub-blocks includes sub-blocks of size M samples in one direction and N samples in another direction. In some embodiments, values of M and N are predefined and M is larger than N, and wherein N corresponds to a vertical direction if a motion vector difference is larger in the vertical direction compared to a horizontal direction. In some embodiments, M is equal to N, and the prediction unit associated with the set of sub-blocks is a square prediction unit.

[0012] In another example embodiment, a computer program product is provided that includes at least one non-transitory computer-readable storage medium having computer executable program code instructions stored therein with the computer executable program code instructions comprising program code instructions configured, upon execution, to receive a bitstream comprising one or more coding units. The computer executable program code instructions further include program code instructions configured, upon execution, to perform additional operations for each coding unit. The additional operations include determining if the coding unit is to be motion compensated by using a translational motion model or a higher order motion model. The computer executable program code instructions further include program code instructions configured, upon execution, to perform additional operations in response to determining that the coding unit is to be motion compensated by using a translational motion model or a higher order motion model. The additional operations include determining a dimension of the coding unit. The additional operations further include determining a shape and size for a set of sub-blocks in the coding unit. The set of sub-blocks fully cover an area of the coding unit based on the dimension of the coding unit. The additional operations further include performing motion compensated prediction for the set of sub-blocks. The additional operations further include storing an output of the motion compensated prediction.

[0013] In some implementations of such a computer program product, the coding unit comprises one or more prediction units and determining the shape and size for the set of sub- blocks in the coding unit utilizes one or more dimensions of the one or more prediction units. In some embodiments, determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more selected prediction type associated with the one or more sub-block. In some embodiments, the selected prediction type includes one of: a uni-prediction or a bi prediction. In some embodiments, the set of sub-blocks includes sub-blocks of size M samples in one direction and N samples in another direction. In some embodiments, values of M and N are predefined and M is larger than N, and wherein N corresponds to a vertical direction if a motion vector difference is larger in the vertical direction compared to a horizontal direction. In some embodiments, M is equal to N, and the prediction unit associated with the set of sub-blocks is a square prediction unit.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] Having thus described certain example embodiments of the present disclosure in general terms, reference will hereinafter be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:

[0015] Figures 1 A and 1B illustrate examples of coding units and prediction units;

[0016] Figures 2A and 2B illustrate examples of segmentations of a prediction unit;

[0017] Figure 3 is a block diagram of an apparatus that may be specifically configured in accordance with an example embodiment of the present disclosure;

[0018] Figure 4 is a flowchart illustrating a set of operations performed, such as by the apparatus of Figure 3, in accordance with an example embodiment of the present disclosure;

[0019] Figure 5A and 5B illustrate examples of sub-block divisions; and

[0020] Figure 6 illustrates bi-prediction using two reference pictures.

DETAIFED DESCRIPTION

[0021] Some embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all, embodiments of the invention are shown. Indeed, various embodiments of the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Tike reference numerals refer to like elements throughout. As used herein, the terms“data,”“content,”“information,” and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present invention. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present invention.

[0022] Additionally, as used herein, the term‘circuitry’ refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present. This definition of‘circuitry’ applies to all uses of this term herein, including in any claims. As a further example, as used herein, the term‘circuitry’ also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware. As another example, the term‘circuitry’ as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.

[0023] As defined herein, a“computer-readable storage medium,” which refers to a non- transitory physical storage medium (e.g., volatile or non-volatile memory device), can be differentiated from a“computer-readable transmission medium,” which refers to an electromagnetic signal.

[0024] A method, apparatus and computer program product are provided in accordance with an example embodiment to algorithmically select a sub-block shape and size for high- order motion compensation and to use the selected sub-blocks in the motion compensation process in a video encoding or decoding process. Various different criteria can be used in the selection process, for example, selecting directionality of the non-square sub-block to be used in the motion compensation process based on shape, size and motion parameters of the coding unit, and the like. The method, apparatus and computer program product may be utilized in conjunction with a variety of video codec formats including the High Efficiency Video Coding standard (HEVC or H.265/HEVC), International Standards Organization (ISO) base media file format (ISO/IEC 14496-12, which may be abbreviated as ISOBMFF), Moving Picture Experts Group (MPEG)-4 file format (ISO/IEC 14496-14, also known as the MP4 format), file formats for NAL (Network Abstraction Layer) unit structured video (ISO/IEC 14496-15) and 3^rd Generation Partnership Project (3GPP file format) (3GPP Technical Specification 26.244, also known as the 3GP format). An example embodiment is described in conjunction with the HEVC. The aspects of the disclosure are not limited to the HEVC, but rather the description is given for one possible basis on top of which an example embodiment of the present disclosure may be partly or fully realized.

[0025] A video codec consists of an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form. An encoder generally discards some information in the original video sequence in order to represent the video in a more compact form (that is, at a lower bitrate).

[0026] Various hybrid video codecs, for example ITU-T H.263 and H.264, encode the video information in two phases. Firstly, pixel values in a certain picture area (or“block”) are predicted for example by motion compensation such as finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded or spatially by using the pixel values around the block to be coded in a specified manner.

Secondly, the prediction error, that is, the difference between the predicted block of pixels and the original block of pixels, is coded. This coding may be done by transforming the difference in pixel values using a specified transform (e.g., Discrete Cosine Transform (DCT) or a variant of DCT), quantizing the coefficients and entropy coding the quantized coefficients. By varying the fidelity of the quantization process, an encoder can control the balance between the accuracy of the pixel representation (picture quality) and the size of the resulting coded video representation (file size or transmission bitrate). Figure 1 A illustrates a coding unit or a prediction unit consisting of one translational prediction block. A single motion vector (in the case of uni-prediction) defines the motion of the block. Figure 1B illustrates a prediction unit consisting of four motion compensated sub-blocks. Motion vectors for the sub-blocks are typically derived from higher order motion parameters indicated in a video bitstream.

[0027] In some video codecs, such as H.265/HEVC, video pictures are divided into coding units (CU) covering the area of the picture. A CU consists of one or more prediction units (PU) defining the prediction process for the samples within the CU and one or more transform units (TU) defining the prediction error coding process for the samples in the said CU. A CU consists of a square block of samples with a size selectable from a predefined set of possible CU sizes. A CU with the maximum allowed size may be named as LCU (largest coding unit) or CTU (coding tree unit) and the video picture is divided into non-overlapping CTUs. A CTU can be further split into a combination of smaller CUs, e.g., by recursively splitting the CTU and the resultant CUs. Each resulting CU may have at least one PU and at least one TU associated with the CU. Each PU and TU can be further split into smaller PUs and TUs in order to increase granularity of the prediction and prediction error coding processes, respectively. Each PU has prediction information associated with the PU defining what kind of a prediction is to be applied for the pixels within that PU (e.g., motion vector information for inter predicted PUs and intra prediction directionality information for intra predicted PUs). Similarly each TU is associated with information describing the prediction error decoding process for the samples within the said TU (including, e.g., DCT coefficient information). Information describing the prediction error may be signaled at CU level whether prediction error coding is applied or not for each CU. In the case there is no prediction error residual associated with the CU, it can be considered that there are no TUs for the CU. The division of the image into CUs, and division of CUs into PUs and TUs may be signaled in the bitstream, thereby allowing the decoder to reproduce the intended structure of these units. In some embodiments, prediction units and transform units can be defined to be always equal to their encapsulating coding unit. In such an embodiment, there is naturally no need to signal further divisions for coding units. The term“coding unit” can also be used to describe prediction units or transform units.

[0028] A decoder reconstructs the output video by applying prediction techniques similar to the encoder to form a predicted representation of the pixel blocks (using the motion or spatial information created by the encoder and stored in the compressed representation) and prediction error decoding (the inverse operation of the prediction error coding so as to recover the quantized prediction error signal in the spatial pixel domain). After applying prediction and prediction error decoding, the decoder sums the prediction and prediction error signals (pixel values) to form the output video frame. The decoder (and encoder) can also apply additional filtering to improve the quality of the output video before passing the video for display and/or storing the video as prediction reference for the forthcoming frames in the video sequence. Figures 2A and 2B illustrate example segmentations of a prediction unit into rectangular non-square motion compensated sub-blocks when the height of the prediction unit is larger than its width (Figure 2A) and when the width of a prediction unit is larger than its height (Figure 2B).

[0029] Color palette based coding can also be utilized in addition to utilizing samples from a CU. Palette based coding is a family of coding approaches in which a palette, that is, a set of colors and associated indexes, is defined. The value for each sample within a coding unit is expressed by indicating its index in the palette. Palette based coding can achieve good coding efficiency in coding units with a relatively small number of colors (such as image areas which are representing computer screen content, such as text or simple graphics). In order to improve the coding efficiency of palette coding different kinds of palette index prediction approaches can be utilized, or the palette indices can be run-length coded to be able to represent larger homogenous image areas efficiently. Also, in the case where the CU contains sample values that are not recurring within the CU, escape coding can be utilized. Escape coded samples are transmitted without referring to any of the palette indices. Instead, values of escaped coded samples are indicated individually for each escape coded sample.

[0030] In various video codecs, the motion information is indicated with motion vectors associated with each motion compensated image block. Each of the motion vectors represents the displacement of the image block in the picture to be coded (on the encoder side) or decoded (on the decoder side) and the prediction source block in one of the previously coded or decoded pictures. In order to represent motion vectors efficiently, the motion vectors may be coded differentially with respect to block specific predicted motion vectors. In various video codecs, the predicted motion vectors are created in a predefined way, for example, by calculating the median of the encoded or decoded motion vectors of the adjacent blocks. Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor. In addition to predicting the motion vector values, the reference index of previously coded/decoded picture can be predicted. The reference index may be predicted from adjacent blocks and/or or co-located blocks in a temporal reference picture. Moreover, various high efficiency video codecs employ an additional motion information coding/decoding mechanism, often called merging/merge mode, where all the motion field information, which includes motion vectors and a corresponding reference picture index for each available reference picture list, is predicted and used without any modification/correction. Similarly, predicting the motion field information is carried out using the motion field information of adjacent blocks and/or co- located blocks in temporal reference pictures and the used motion field information is signaled among a list of motion field candidates filled with motion field information of available adjacent/co-located blocks.

[0031] Various video codecs support motion compensated prediction from one source image (uni-prediction) and two sources (bi-prediction). In the case of uni-prediction a single motion vector is applied whereas in the case of bi-prediction two motion vectors are signaled and the motion compensated predictions from two sources are averaged to create the final sample prediction. In the case of weighted prediction the relative weights of the two predictions can be adjusted, or a signaled offset can be added to the prediction signal.

[0032] In addition to applying motion compensation for inter picture prediction, similar approaches can be applied to intra picture prediction. For intra picture prediction, the displacement vector indicates from where within the same picture a block of samples can be copied to form a prediction of the block to be coded or decoded. This kind of intra block copying method can improve the coding efficiency substantially in the presence of repeating structures within the frame - such as text or other graphics.

[0033] In various video codecs, the prediction residual after motion compensation or intra prediction is first transformed with a transform kernel (such as DCT) and then coded. The transformation may reduce some correlation among the residual and provide more efficient coding.

[0034] Various video encoders utilize Lagrangian cost functions to find optimal coding modes, e.g. the desired macroblock mode and associated motion vectors. This type of cost function uses a weighting factor l to tie together the (exact or estimated) image distortion due to lossy coding methods and the (exact or estimated) amount of information that is required to represent the pixel values in an image area. An example Lagrangian cost function is provided below:

C = D + XR

[0035] C is the Lagrangian cost to be minimized, D is the image distortion (e.g., Mean Squared Error) with the mode and motion vectors considered, and R the number of bits needed to represent the required data to reconstruct the image block in the decoder (including the amount of data to represent the candidate motion vectors).

[0036] Scalable video coding refers to coding structure where one bitstream can contain multiple representations of the content at different bitrates, resolutions or frame rates. In these cases the receiver can extract the desired representation depending on its characteristics (e.g., the resolution that matches best the display device). Alternatively, a server or a network element can extract the portions of the bitstream to be transmitted to the receiver depending on, e.g., the network characteristics or processing capabilities of the receiver. A scalable bitstream typically consists of a“base layer” providing the lowest quality video available and one or more enhancement layers that enhance the video quality when received and decoded together with the lower layers. In order to improve coding efficiency for the enhancement layers, the coded representation of that layer typically depends on the lower layers. For example, the motion and mode information of the enhancement layer can be predicted from lower layers. Similarly, the pixel data of the lower layers can be used to create a prediction for the enhancement layer.

[0037] A scalable video codec for quality scalability (also known as Signal-to-Noise or SNR) and/or spatial scalability may be implemented as follows. For a base layer, a conventional non-scalable video encoder and decoder is used. The reconstructed/decoded pictures of the base layer are included in the reference picture buffer for an enhancement layer. In H.264/AVC, HEVC, and similar codecs using reference picture list(s) for inter prediction, the base layer decoded pictures may be inserted into a reference picture list(s) for coding/decoding of an enhancement layer picture similarly to the decoded reference pictures of the enhancement layer. Consequently, the encoder may choose a base-layer reference picture as inter prediction reference and indicate its use typically with a reference picture index in the coded bitstream. The decoder decodes from the bitstream, for example from a reference picture index, that a base-layer picture is used as an inter prediction reference for the enhancement layer. When a decoded base-layer picture is used as a prediction reference for an enhancement layer, it is referred to as an inter-layer reference picture.

[0038] In addition to quality scalability, the following scalability modes may be provided: (1) Spatial scalability: Base layer pictures are coded at a lower resolution than enhancement layer pictures; (2) Bit-depth scalability: Base layer pictures are coded at lower bit-depth (e.g.

8 bits) than enhancement layer pictures (e.g. 10 or 12 bits); and (3) Chroma format scalability: Enhancement layer pictures provide higher fidelity in chroma (e.g. coded in 4:4:4 chroma format) than base layer pictures (e.g. 4:2:0 format).

[0039] In all of the above scalability cases, base layer information could be used to code the enhancement layer to minimize the additional bitrate overhead. Scalability can be enabled in two basic ways. The first way is by introducing new coding modes for performing prediction of pixel values or syntax from lower layers of the scalable representation, while the second way is by placing the lower layer pictures in the reference picture buffer/decoded picture buffer (DPB) of the higher layer. The first approach is more flexible and thus can provide better coding efficiency in most cases. However, the second, reference frame based scalability, approach can be implemented efficiently with minimal changes to single layer codecs while still achieving the majority of the coding efficiency gains available. Essentially, a reference frame based scalability codec can be implemented by utilizing the same hardware or software implementation for all the layers, just taking care of the DPB management by external means.

[0040] In order to be able to utilize parallel processing, images can be split into independently codable and decodable image segments (slices or tiles). Slices typically refer to image segments constructed of a certain number of basic coding units that are processed in default coding or decoding order, while tiles typically refer to image segments that have been defined as rectangular image regions that are processed at least to some extent as individual frames. Regardless of the file format of the video bitstream, the apparatus of an example embodiment may be provided by any of a wide variety of computing devices including, for example, a video decoder, a video encoder, a computer workstation, a server or the like, or by any of various mobile computing devices, such as a mobile terminal, e.g., a smartphone, a tablet computer, a video game player, or the like.

[0041] Regardless of the computing device that embodies the apparatus, the apparatus 10 of an example embodiment includes, is associated with or is otherwise in communication with processing circuitry 12, a memory 14, a communication interface 16 and optionally, a user interface 18 as shown in Figure 3.

[0042] The processing circuitry 12 may be in communication with the memory device 14 via a bus for passing information among components of the apparatus 10. The memory device may be non-transitory and may include, for example, one or more volatile and/or non volatile memories. In other words, for example, the memory device may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g., a computing device such as the processing circuitry). The memory device may be configured to store information, data, content, applications, instructions, or the like for enabling the apparatus to carry out various functions in accordance with an example embodiment of the present disclosure. For example, the memory device could be configured to buffer input data for processing by the processing circuitry. Additionally or alternatively, the memory device could be configured to store instructions for execution by the processing circuitry.

[0043] The apparatus 10 may, in some embodiments, be embodied in various computing devices as described above. However, in some embodiments, the apparatus may be embodied as a chip or chip set. In other words, the apparatus may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon. The apparatus may therefore, in some cases, be configured to implement an embodiment of the present disclosure on a single chip or as a single“system on a chip.” As such, in some cases, a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.

[0044] The processing circuitry 12 may be embodied in a number of different ways. For example, the processing circuitry may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. As such, in some embodiments, the processing circuitry may include one or more processing cores configured to perform independently. A multi-core processing circuitry may enable multiprocessing within a single physical package. Additionally or alternatively, the processing circuitry may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.

[0045] In an example embodiment, the processing circuitry 12 may be configured to execute instructions stored in the memory device 14 or otherwise accessible to the processing circuitry. Alternatively or additionally, the processing circuitry may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processing circuitry may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the present disclosure while configured accordingly. Thus, for example, when the processing circuitry is embodied as an ASIC, FPGA or the like, the processing circuitry may be specifically configured hardware for conducting the operations described herein.

Alternatively, as another example, when the processing circuitry is embodied as an executor of instructions, the instructions may specifically configure the processor to perform the algorithms and/or operations described herein when the instructions are executed. However, in some cases, the processing circuitry may be a processor of a specific device (e.g., an image or video processing system) configured to employ an embodiment of the present disclosure by further configuration of the processing circuitry by instructions for performing the algorithms and/or operations described herein. The processing circuitry may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processing circuitry.

[0046] The communication interface 16 may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data, including video bitstreams. In this regard, the communication interface may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network. Additionally or alternatively, the communication interface may include the circuitry for interacting with the antenna(s) to cause transmission of signals via the antenna(s) or to handle receipt of signals received via the antenna(s). In some environments, the

communication interface may alternatively or also support wired communication. As such, for example, the communication interface may include a communication modem and/or other hardware/software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB) or other mechanisms.

[0047] In some embodiments, such as in instances in which the apparatus 10 is configured to encode the video bitstream, the apparatus 10 may optionally include a user interface 18 that may, in turn, be in communication with the processing circuitry 12 to provide output to a user, such as by outputting an encoded video bitstream and, in some embodiments, to receive an indication of a user input. As such, the user interface may include a display and, in some embodiments, may also include a keyboard, a mouse, a joystick, a touch screen, touch areas, soft keys, a microphone, a speaker, or other input/output mechanisms. Alternatively or additionally, the processing circuitry may comprise user interface circuitry configured to control at least some functions of one or more user interface elements such as a display and, in some embodiments, a speaker, ringer, microphone and/or the like. The processing circuitry and/or user interface circuitry comprising the processing circuitry may be configured to control one or more functions of one or more user interface elements through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processing circuitry (e.g., memory device 14, and/or the like).

[0048] Referring now to Figure 4, the operations performed, such as by the apparatus 10 of Figure 3, in order to algorithmically select a sub-block shape and size for high-order motion compensation for the purpose of utilizing the selected sub-blocks in the motion compensation process in a video encoding or decoding process in accordance with an example embodiment are depicted. Various different criteria can be used in the selection process, for example, selecting directionality of the non-square sub-block to be used in the motion compensation process based on shape, size and motion parameters of the coding unit, and the like. As shown in block 402, the apparatus includes means, such as the processing circuitry 12, the communication interface 16 or the like, for receiving a bitstream comprising one or more coding units. The bitstream may be a bitstream formatted in accordance with various video formats previously described.

[0049] The apparatus 10 further includes means, such as the processing circuitry 12, as shown in block 404, for determining if a coding unit needs to be motion compensated by using a using a translational motion model or a higher order motion model. The determination may be performed for each coding unit in the bitstream. The determination may be made by checking a metadata file indicating whether the coding units is to be motion compensated by using a using a translational motion model or a higher order motion model.

[0050] The apparatus 10 further includes means, such as the processing circuitry 12, as shown in block 404, for determining a dimension of the coding unit in response to

determining that the coding unit needs to be motion compensated by using a using a translational motion model or a higher order motion model. The dimension of the coding unit may be indicated in a metadata file associated with the coding unit or the bitstream.

[0051] The apparatus 10 further includes means, such as the processing circuitry 12, as shown in block 406, for determining a shape and size for a set of sub-blocks in the coding unit, wherein the set of sub-blocks fully cover an area of the coding unit based on the dimension of the coding unit.

[0052] For motion compensation using translational motion compensation, the motion vector used in motion compensation is constant for the entire prediction unit or coding unit and is typically indicated with a two-dimensional motion vector v = (dx, dy). Therefore, the apparatus 10, such as the processing circuitry 12, can calculate the location of the reference pixels (rx, ry) in a reference frame using the motion vector (dx, dy) and coordinates (x, y) in the current frame with the following set of equations: rx = x + dx ry = y + dy

[0053] For motion compensation using high order motion compensation, such as affine or higher order polynomial models, the prediction units or coding units are divided into sub- blocks. Each sub-block will have its own motion vector v(x, y) and the equations above can be rewritten as:

rx = x + dx(x, y) ry = y + dy(x, y)

[0054] In order to optimize the computations required for higher order motion

compensation, the prediction unit or coding unit can be split into sub-blocks of selected size and an individual motion vector can be generated for each sub-block instead of each individual sample.

[0055] In order to provide finer granularity control for this trade-off, in some

embodiments, the set of sub-blocks may take the form of non-square MxN and NxM sub- blocks, where M is larger than N, as the minimum sub-block unit for motion compensation with higher order motion models. Block dimensions M and N can be selected to be, e.g., 8 and 4, respectively, for the luminance component and 4 and 2, respectively, for the chrominance component of the video. In the case the chrominance is not sub-sampled, also the chrominance channels can select to use the same minimum sub-block division as used by the luminance channels.

[0056] The apparatus 10 may determine whether to use MxN or NxM sub-blocks for a prediction unit in different ways. For example, the determination can be based on the shape of the prediction unit. If the dimensions of the prediction unit are given as WxH, with W referring to the width and H referring to the height of the prediction unit, the apparatus may determine that wider MxN (M > N) sub-blocks may be used instead of narrower NxM sub- blocks if the width W is smaller than the height H of the prediction unit. This kind of a setting would allow finer granularity of motion difference to be represented in the direction of the larger dimension of the prediction unit.

[0057] In some embodiments, the motion vector difference in the horizontal and vertical directions may be compared in order to determine whether to utilize MxN or NxM sub- blocks. For example, if the motion vector difference is larger in the horizontal direction, the narrower NxM (M > N) sub-blocks can be selected to give a finer representation of the motion vector in the direction of more significant motion. In this embodiment, the motion vector difference in the horizontal and vertical directions can be evaluated in different ways. One example of such evaluation is to compare the change in absolute values of the motion vector components when taking a defined step in different directions. For some motion models, such as an affine motion model, these differences are constants over a prediction unit using the same set of affine motion parameters and can be calculated for example as:

diffHor = abs ( dx(x + 1 , y) - dx(x, y) ) + abs ( dy(x + 1 , y) - dy(x, y) ) diffVer = abs ( dx(x, y + 1) - dx(x, y) ) + abs ( dy(x, y + 1) - dy(x, y) ) wherein diffHor indicates the motion vector difference in the horizontal direction, diffV er indicates the motion vector difference in the vertical direction and abs indicates absolute value. In some embodiments, the motion vectors may be evaluated differently from the active motion model and the motion vector differences can be calculated in various different ways. For example, in some cases the motion in a prediction unit may be represented with translational motion vectors of two or more comers of the prediction unit. In some

embodiments, if a 4-parameter affine motion model is used, a motion vector in a top-left comer of the prediction unit and a motion vector in a top-right comer of the prediction unit may be used. In such an embodiment, the motion vector difference parameters diffHor and diffVer can be calculated using such comer motion vectors and the size of the prediction unit.

[0058] In some embodiments, the minimum sub-block size and shape can change according to additional criteria. For example, non-square MxN and NxM sub-blocks can be used for non-square prediction units, whereas square MxM or NxN sub-blocks can be used for square prediction units. In another example, MxN sub-blocks are used if the height H of the prediction unit is equal to N, NxM sub-blocks are used if the width W of the prediction unit is equal to N, and MxM sub-blocks are used otherwise. Instead of using MxM sub-blocks in the latter case, the apparatus 10, such as the processing circuitry 12, can be also configured to select between MxM, MxN and NxM with other set of rules. Alternatively, the apparatus, such as the processing circuitry, may be configured to always select MxN sub-blocks instead of MxM sub-blocks, or the apparatus, such as the processing circuitry, may be configured to always select NxM sub-blocks instead of MxM sub-blocks. Figure 5 A and 5B illustrate examples of sub-block divisions. Figure 5 A illustrates an example where a minimum dimension N is equal to the width W of a prediction unit. Figure 5B illustrates an example where a minimum dimension N is equal to the height H of a prediction unit.

[0059] In some embodiments, the shape and size of the motion compensation sub-block is determined based on the dimensions of coding unit or prediction unit to which the sub-blocks belong. In some embodiments, different motion compensation sub-block shape and size may be used depending on whether bi-prediction or uni-prediction is indicated for the encapsulating coding unit or prediction unit. For example, a sub-block size of 8x4 (or 4x8, or 8x8) can be used if the coding unit or prediction unit is indicated to be bi-predicted and a sub- block size of 4x4 can be used if the coding unit or prediction unit is indicated to be uni- predicted. Figure 6 illustrates bi-prediction using two reference pictures (one reference picture using a reference picture from list P and one reference picture using another reference picture list Q). In this example, the motion compensation sub-blocks of size NxN are used for list P prediction and motion compensation sub-blocks of size MxM are used for list Q prediction.

[0060] In some embodiments, the motion compensation sub-block shape and size is determined differently for motion predictions with different reference picture lists. For example, a sub-block size of 4x4 can be used for reference picture list 0 and a sub-block size of 8x8 can be used for reference picture list 1. In another example a sub-block size of 8x4 is used for reference picture list 0 and a sub-block size of 4x8 is used for reference picture list 1. In some embodiments, the determination can further depend on whether bi-prediction or uni prediction is used. For example, uni-predicted sub-blocks can use 4x4 minimum block size, bi-predicted sub-blocks with list 0 motion can use 4x4 minimum block size, whereas bi- predicted sub-blocks with list 1 motion can use 8x8 minimum block size.

[0061] The apparatus 10 further includes means, such as the processing circuitry 12, as shown in block 408, for performing motion compensated prediction for the set of sub-blocks. The apparatus further includes means, such as the processing circuitry 12, as shown in block 410, for storing an output of the motion compensated prediction. In some embodiments, the output of the motion compensated prediction is stored in the bitstream, such as a video, picture, or slice parameter set file in the bitstream.

[0062] Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe example embodiments in the context of certain example combinations of elements and/or functions, it should be appreciated that different

combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

THAT WHICH IS CLAIMED:

1. A method comprising:

receiving a bitstream comprising one or more coding units;

for each coding unit:

determining if the coding unit is to be motion compensated by using a translational motion model or a higher order motion model; and

in response to determining that the coding unit is to be motion compensated by using a translational motion model or a higher order motion model:

determining a dimension of the coding unit;

determining a shape and size for a set of sub-blocks in the coding unit, wherein the set of sub-blocks fully cover an area of the coding unit based on the dimension of the coding unit;

performing motion compensated prediction for the set of sub-blocks; and

storing an output of the motion compensated prediction.

2. A method according to Claim 1, wherein the coding unit comprises one or more prediction units, and wherein determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more dimensions of the one or more prediction units.

3. A method according to any of Claims 1 to 2 wherein determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more selected prediction type associated with the one or more sub-block.

4. A method according to Claims 3, wherein the selected prediction type includes one of: a uni-prediction or a bi-prediction.

5. A method according to any of Claims 1 to 4, wherein the set of sub-blocks includes sub-blocks of size M samples in one direction and N samples in another direction.

6. A method according to Claim 5, wherein values of M and N are predefined and M is larger than N, and wherein N corresponds to a vertical direction if a motion vector difference is larger in the vertical direction compared to a horizontal direction.

7. A method according to Claim 5, wherein M is equal to N, and wherein the prediction unit associated with the set of sub-blocks is a square prediction unit.

8. An apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to:

receive a bitstream comprising one or more coding units;

for each coding unit:

determine if the coding unit is to be motion compensated by using a translational motion model or a higher order motion model; and in response to determining that the coding unit is to be motion compensated by using a translational motion model or a higher order motion model:

determine a dimension of the coding unit;

determine a shape and size for a set of sub-blocks in the coding unit, wherein the set of sub-blocks fully cover an area of the coding unit based on the dimension of the coding unit;

perform motion compensated prediction for the set of sub- blocks; and

store an output of the motion compensated prediction.

9. An apparatus according to Claim 8, wherein the coding unit comprises one or more prediction units, and wherein determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more dimensions of the one or more prediction units.

10. An apparatus according any of Claims 8 to 9 wherein determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more selected prediction type associated with the one or more sub-block.

11. An apparatus according according to Claim 10, wherein the selected prediction type includes one of: a uni-prediction or a bi-prediction.

12. An apparatus according to any of Claims 8 to 11, wherein the set of sub-blocks includes sub-blocks of size M samples in one direction and N samples in another direction.

13. An apparatus according to Claim 12, wherein values of M and N are predefined and M is larger than N, and wherein N corresponds to a vertical direction if a motion vector difference is larger in the vertical direction compared to a horizontal direction.

14. An apparatus according to Claim 12, wherein M is equal to N, and wherein the prediction unit associated with the set of sub-blocks is a square prediction unit.

15. A computer program product comprises at least one non-transitory computer- readable storage medium having computer executable program code instructions stored therein, the computer executable program code instructions comprising program code instructions configured, upon execution, to:

receive a bitstream comprising one or more coding units;

for each coding unit:

determine a dimension of the coding unit;

perform motion compensated prediction for the set of sub- blocks; and

store an output of the motion compensated prediction.

16. A computer program product according to Claim 15, wherein the coding unit comprises one or more prediction units, and wherein determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more dimensions of the one or more prediction units.

17. A computer program product according to any of Claims 15 to 16 wherein determining the shape and size for the set of sub-blocks in the coding unit utilizes one or more selected prediction type associated with the one or more sub-block.

18. A computer program product according to Claim 17, wherein the selected prediction type includes one of: a uni-prediction or a bi-prediction.

19. A computer program product according to any of Claims 15 to 18, wherein the set of sub-blocks includes sub-blocks of size M samples in one direction and N samples in another direction.

20. A computer program product according to Claim 19, wherein values of M and N are predefined and M is larger than N, and wherein N corresponds to a vertical direction if a motion vector difference is larger in the vertical direction compared to a horizontal direction.

21. A computer program product according to Claim 19, wherein M is equal to N, and wherein the prediction unit associated with the set of sub-blocks is a square prediction unit.