GB2585021A - Video coding and decoding - Google Patents

Video coding and decoding Download PDF

Info

Publication number
GB2585021A
GB2585021A GB1909056.2A GB201909056A GB2585021A GB 2585021 A GB2585021 A GB 2585021A GB 201909056 A GB201909056 A GB 201909056A GB 2585021 A GB2585021 A GB 2585021A
Authority
GB
United Kingdom
Prior art keywords
motion information
image
predictor
information predictor
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1909056.2A
Other versions
GB201909056D0 (en
Inventor
Gisquet Christophe
Onno Patrice
Laroche Guillaume
Taquet Jonathan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to GB1909056.2A priority Critical patent/GB2585021A/en
Publication of GB201909056D0 publication Critical patent/GB201909056D0/en
Publication of GB2585021A publication Critical patent/GB2585021A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/174Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method of deriving a motion information predictor candidate for a current image portion of an image. The method has the steps of obtaining a History-based Motion Vector Predictor (HMVP) from a set of one or more motion information predictor(s), where each motion information predictor has been used to process an image portion, and deriving the candidate using the motion information from the obtained HMVP. The method may further include obtaining, when available, motion information associated with a first position and/or a second position in a first reference picture from a first set of one or more reference image(s) and, when available, obtaining motion information associated with a third position and/or a fourth position in a second reference picture from a second set of one or more reference image(s). The HMVP candidate may use a history list which is generated/updated with a First-In First-Out (FIFO) process and a redundancy/duplicate checking and pruning. The video coding method may be used in Versatile Video Coding (VVC). The HMVP/history list may be used in the derivation process for a Temporal Motion Vector Predictor (TMVP) candidate, or in the derivation process of a constructed Affine candidate e.g. for a sub-block Merge candidate list.

Description

VIDEO CODING AND DECODING
Field of invention
The present invention relates to video coding and decoding. Background Recently, the Joint Video Experts Team (IVET), a collaborative team formed by MPEG and ITU-T Study Group 16's VCEG, commenced work on a new video coding standard referred to as Versatile Video Coding (VVC). The goal of VVC is to provide significant improvements in compression performance over the existing High Efficiency Video Coding (HEVC) standard (i.e., typically twice as much as before) and to be completed in 2020. The main target applications and services include -but not limited to -360-degree and highdynamic-range (HDR) videos. In total, JVET evaluated responses from 32 organizations using formal subjective tests conducted by independent test labs. Some proposals demonstrated compression efficiency gains of typically 40% or more when compared to using HEVC. Particular effectiveness was shown on ultra-high definition (UHD) video test material. Thus, we may expect compression efficiency gains well-beyond the targeted 50% for the final standard.
The JVET test model (VTM) for the VVC standard uses all the HEVC tools. Some of the HEVC tools have been extended or modified to support new uses/functionalities. In particular, the HEVC merge list, in which motion information from several spatial neighbouring blocks and a temporally collocated block are used as candidates for a motion information predictor, has seen a number of extensions/modifications. In addition, a history list of past motion information is maintained, i.e. stored, making the stored motion information (which was used when processing a previously processed block) available for use with later processed block. Past motion information in the history list may be redundant in light of the motion information from spatial neighbouring blocks but they can also include motion information from areas which are much further away.
In addition, the temporal motion vector predictor (TMVP) derivation process of HEVC has undergone some modifications. Furthermore, a new Subblock Merge mode in which a constructed Affine candidate has been introduced. In the following description, this constructed Affine candidate is described as a kind of TMVP because it uses a control point which obtains its associated motion information from a reference picture, i.e. it uses "temporally collocated/neighbouring/obtained/derived motion information" -temporally collocated/neighbouring because the motion information is obtained from a "collocated or temporally neighbouring" position/block in the reference picture (i.e. from a position/block corresponding to, or associated with, the current position/current block), and temporally obtained/derived because the motion information is obtained/derived from the reference picture which is different from the current picture. Temporally collocated/neighbouring/obtained/derived motion information (i.e. motion information for temporal prediction, or motion information obtained from a corresponding/associated image portion of a reference image, the reference image being different from the current image) is now used in both a regular Merge mode as a TMVP candidate ((regular) Merge candidate), and in a Subblock Merge mode to derive one or more constructed Affine candidate(s) (one or more subblock Merge candidate(s)). While the regular Merge mode considers only one motion information for the current block, the Subblock Merge mode considers each motion information of each subblock of said block.
Despite these modifications, it remains to be the case that when obtaining this "temporally collocated/neighbouring/obtained/derived motion information", the first reference picture from a reference picture list is selected when performing the prediction. However, it has been observed that, when various reference pictures are close together temporally, a reference picture selection can vary quite a lot across different portions of a (en)coded picture. It is therefore inefficient to always target the first reference picture of a reference picture list, as it may offer a less efficient prediction (e.g. from having larger residual due to less accuracy), especially when trying to encode/decode a picture with moving objects. On the other hand, this first reference picture selection has the benefit of always being available, and is generally selected by an encoder as the one providing a better coding efficiency gain on average.
Accordingly, a solution to at least one of the aforementioned problems is desirable. According to aspects of the present invention there are provided a method, an apparatus/device, a program, and a carrier medium as set forth in the appended claims. Other features of the invention will be apparent from the dependent claims, and the description.
According to a first aspect of the present invention, there is provided a method of deriving a motion information predictor candidate for a current image portion of an image, the method comprising: obtaining a history-based motion vector predictor from a set of one or more motion information predictor(s), each motion information predictor having been used to process an image portion; and deriving motion information for the motion information predictor candidate using motion information from the obtained history-based motion vector predictor.
Suitably, the motion information from the obtained history-based motion vector predictor comprises temporally collocated/neighbouring motion information, i.e. motion information associated with a "collocated or temporally neighbouring" position/block in the reference picture/image (i.e. from a position/block corresponding to, or associated with, the current position/current block). Suitably, the derived motion information for the motion information predictor candidate comprises temporally obtained/derived motion information, i.e. motion information obtained/derived from the reference picture/image which is different from the current picture/image portion.
Suitably, the deriving motion information for the motion information predictor candidate comprises: obtaining, when available, motion information associated with a first position and/or a second position in a first reference picture from a first set of one or more reference image(s) and, when available, motion information associated with a third position and/or a fourth position in a second reference picture from a second set of one or more reference image(s); and deriving the motion information for the motion information predictor candidate using the obtained motion information if the obtained motion information meets a condition.
Suitably, the deriving the motion information for the motion information predictor candidate comprises setting the motion for the motion information predictor candidate to the obtained motion information if the obtained motion information meets the condition. Suitably, the deriving the motion information for the motion information predictor candidate comprises using the obtained motion information to obtain information for deriving the motion information predictor candidate.
Suitably, the first/third position is the position at, or near/in the vicinity of, bottom right of the collocated block in the reference picture (e.g. H position in Figure 6). Suitably, the second/fourth position is the position associated with a block within the collocated block in a reference picture (e.g. center position in Figure 6).
Suitably, the condition comprises the motion information being not long term (temporal (position) distance between the image and the reference image (from which the motion information is obtained) does not exceed a threshold), i.e. the image and the reference image are within a certain temporal distance, i.e. temporally not too far apart. Suitably, the deriving the motion information for the motion information predictor candidate comprises scaling motion information (e.g. one or more components of a motion vector) using a temporal distance (e.g. between the image and the reference image).
Suitably, the derived motion information comprises information for identifying a reference image/picture from a set of one or more reference image(s). Suitably, the information for identifying a reference image/picture comprises a reference index for identifying a reference image/picture from the set of one or more reference image(s). Suitably, the set of one or more reference image(s) is an ordered set or a list. Suitably, the set of one or more reference image(s) is a reference picture list. Suitably, the reference picture list consists of reference image(s) that are available for use when an image portion (block/coding unit) is processed using temporal prediction, motion information of said image portion comprising the reference picture list. Suitably, the reference index is for identifying a reference image/picture from the reference picture list.
Suitably, each motion information of the set of one or more motion information predictor(s) is selected or used to process a previous image portion preceding the current image portion in a processing order of image portions. Suitably, each motion information predictor of the set of one or more motion information predictor(s) is used to process an image portion in a reference image.
Suitably, the method further comprises adding a motion information predictor to the set of one or more motion information predictor(s) after the motion information predictor has been selected or used to process a previous image portion, the previous image portion preceding the current image portion in a processing order of image portions. Suitably, this adding generates and/or updates the set of one or more motion information predictor(s). Suitably, the number of one or more motion information predictor(s) includable in the set of one or more motion information predictor(s) is set so that once the set reaches this number, the set size is maintained by performing one or more of: not adding the newest motion information predictor in consideration for addition; checking for any candidate motion information predictor for removal from the set (e.g. based on whether there are any similar or the same motion information among the one or more motion information predictor(s) and/or also with the newest motion information predictor in consideration for addition); and removing one of the motion information predictor already included in the set and then adding the newest motion information predictor.
Suitably, the motion information predictor is added to the set of one or more motion information predictor(s) in a First-In, First-Out basis. Suitably, this adding updates the set of one or more motion information predictor(s).
Suitably, the method further comprises: determining whether the set of one or more motion information predictor(s) comprises a motion information predictor with the same motion information as the motion information predictor being considered for adding; and adding the motion information predictor to the set of one or more motion information predictor(s) when the set does not comprise the motion information predictor with the same motion information. Suitably, this adding generates and/or updates the set of one or more motion information predictor(s).
Suitably, the method further comprises: checking whether the set of one or more motion information predictor(s) comprises any duplicate motion information; and removing one or more motion information predictor(s) with duplicate motion information from the set.
Suitably, this removing updates the set of one or more motion information predictor(s).
Suitably, the set of one or more motion information predictor(s) is initialised when a first image portion among a row of image portions is processed. Suitably, the first image portion among a row of image portions is a first block/C1B in a row of blocks/CTBs.
Suitably, the processing order is the order in which the image portions are encoded and/or decoded.
Suitably, the history-based motion information vector predictor comprises one or more of: a motion vector; information for identifying a set of one or more reference image(s); an index for identifying a reference image in the set of one more reference image(s); and a flag for indicating whether a motion vector or a reference image is available.
Suitably, the motion information predictor candidate is a Temporal Motion Vector Predictor candidate. Suitably, the Temporal Motion Vector Predictor candidate is derived for use with a regular Merge candidate list.
Suitably, the motion information predictor candidate is a constructed Affine candidate.
Suitably, motion information from the obtained history-based motion vector predictor is used to obtain/determine/derive a control point for obtaining/determining/deriving the constructed Affine candidate. Suitably, the constructed Affine candidate is derived for use with a Subblock Merge candidate list.
According to a second aspect of the present invention, there is provided a method of encoding an image comprising one or more image portions, the method comprising: obtaining a set of motion information predictor candidates; selecting, as a motion information predictor for an image portion, a motion information predictor candidate from the set; and encoding the image portion using the selected motion information prediction, wherein the set of motion information predictor candidates comprises a motion information predictor candidate derived according to the first aspect.
Suitably, the method further comprises: encoding information for identifying the selected motion information predictor; and providing, in a bitstream, data for obtaining the encoded information and/or the encoded image portion. Suitably, the set of motion information predictor candidates is a list of candidates, and the information for identifying the selected motion information predictor is an index for this list.
According to a third aspect of the present invention, there is provided a method of decoding an image comprising one or more image portions, the method comprising: obtaining a set of motion information predictor candidates; selecting, as a motion information predictor for an image portion, a motion information predictor candidate from the set; and decoding the image portion using the selected motion information prediction, wherein the set of motion information predictor candidates comprises a motion information predictor candidate derived according to the first aspect.
Suitably, the method further comprises: obtaining, from a bitstream, data for decoding information for identifying the selected motion information predictor; and decoding the information. Suitably, the set of motion information predictor candidates is a list of candidates, and the information for identifying the selected motion information predictor is an index for this list.
According to a fourth aspect of the present invention, there is provided a device comprising means for performing a method of deriving a motion information predictor candidate according to the first aspect.
According to a fifth aspect of the present invention, there is provided a device comprising means for performing a method of encoding an image according to the second 30 aspect.
According to a sixth aspect of the present invention, there is provided a device for encoding an image comprising one or more image portions, the device comprising: means for obtaining a set of motion information predictor candidates; means for selecting, as a motion information predictor for an image portion, a motion information predictor candidate from the set; and means for encoding the image portion using the selected motion information prediction, wherein the set of motion information predictor candidates comprises a motion information predictor candidate derived according to the first aspect.
Suitably, the device further comprises: means for encoding information for identifying the selected motion information predictor; and means for providing, in a bitstream, data for obtaining the encoded information and/or the encoded image portion. Suitably, the set of motion information predictor candidates is a list of candidates, and the information for identifying the selected motion information predictor is an index for this list.
According to a seventh aspect of the present invention, there is provided a device comprising means for performing a method of decoding an image according to the third aspect.
According to an eighth aspect of the present invention, there is provided a device for decoding an image comprising one or more image portions, the device comprising: means for obtaining a set of motion information predictor candidates; means for selecting, as a motion information predictor for an image portion, a motion information predictor candidate from the set; and means for decoding the image portion using the selected motion information prediction, wherein the set of motion information predictor candidates comprises a motion information predictor candidate derived according to the first aspect.
Suitably, the device further comprises: means for obtaining, from a bitstream, data for decoding information for identifying the selected motion information predictor; and means for decoding the information. Suitably, the set of motion information predictor candidates is a list of candidates, and the information for identifying the selected motion information predictor is an index for this list.
According to a ninth aspect of the present invention, there is provided a program which, when run on a computer or processor, causes the computer or processor to carry out the method according to the first aspect, the second aspect, or the third aspect.
According to a tenth aspect of the present invention, there is provided a carrier medium carrying the program according to the ninth aspect.
According to aforementioned aspects, suitably, the set of one or more motion information predictor(s) is an order set or a list. According to aforementioned aspects, suitably, the order of the elements in the set is based on their processing order or a magnitude of motion information (which can be beneficial when checking for a duplicate in the set).
According to aforementioned aspects, suitably, the set of one or more reference image(s) is an ordered set or a list. According to aforementioned aspects, suitably, the set of one or more reference image(s) is a reference picture list.
Yet further aspects of the present invention relate to programs which when executed by a computer or processor cause the computer or processor to carry out any of the methods of the aforementioned aspects. The program may be provided on its own or may be carried on, by or in a carrier medium. The carrier medium may be non-transitory, for example a storage medium, in particular a computer-readable storage medium. The carrier medium may also be transitory, for example a signal or other transmission medium. The signal may be transmitted via any suitable network, including the Internet.
Yet further aspects of the present in invention relate to a camera comprising a device according to any of the aforementioned device aspects. In one embodiment the camera further comprises zooming means. In one embodiment the camera is adapted to indicate when said zooming means is operational and signal a prediction mode in dependence on said indication that the zooming means is operational. In another embodiment the camera further comprises panning means. In another embodiment the camera is adapted to indicate when said panning means is operational and signal a prediction mode in dependence on said indication that the panning means is operational.
According to yet another aspect of the present invention there is provided a mobile device comprising a camera embodying any of the camera aspects above. In one embodiment the mobile device further comprises at least one positional sensor adapted to sense a change in orientation of the mobile device. In one embodiment the mobile device is adapted to signal a prediction mode in dependence on said sensing a change in orientation of the mobile device.
Further features of the invention are characterised by the other independent and dependent claims.
Any feature in one aspect of the invention may be applied to other aspects of the invention, in any appropriate combination. In particular, method aspects may be applied to apparatus aspects, and vice versa. Furthermore, features implemented in hardware may be implemented in software, and vice versa. Any reference to software and hardware features herein should be construed accordingly Any apparatus feature as described herein may also be provided as a method feature, and vice versa. As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure, such as a suitably programmed processor and associated memory.
It should also be appreciated that particular combinations of the various features described and defined in any aspects of the invention can be implemented and/or supplied and/or used independently.
Reference will now be made, by way of example, to the accompanying drawings, in which: Figure 1 is a diagram for use in explaining a coding structure used in HEVC; Figure 2 is a block diagram schematically illustrating a data communication system in 20 which one or more embodiments of the invention may be implemented; Figure 3 is a block diagram illustrating components of a processing device in which one or more embodiments of the invention may be implemented; Figure 4 is a flow chart illustrating steps of an encoding method according to embodiments of the invention; Figure 5 is a flow chart illustrating steps of a decoding method accord ng to embodiments of the invention; Figures 6a and 6b illustrate spatial and temporal blocks that can be used to determine a location/position as well as derive motion information; Figure 7 is a flow chart illustrating steps of a der vat on process for motion informat on according to an embodiment of the invention; Figure 8 is a schematic of a generation/update process for a history-based motion vector predictor (HMVP) candidate list according to an embodiment of the invention; Figure 9 is a flow chart illustrating a derivation process for motion nformat on according to an embodiment of the invention; Figure 10 is a flow chart illustrating a determination process for determining reference indices from the history list according to an embodiment of the invention; Figure 11 is a flow chart illustrating a history list updating process for use with the reference index derivation according to an embodiment of the invention; Figure 12 is a schematic block diagram of a computing device for implementation of one or more embodiments of the invention; Figure 13 is a schematic block diagram of a computing device for implementation of one or more embodiments of the invention; Figure 14 is a diagram illustrating a network camera system for implementation of one 10 or more embodiments of the invention; Figure 15 is a diagram illustrating a smart phone for implementation of one or more embodiments of the invention; and Figure 16 is a flowchart illustrating a derivation process for a constructed Affine candidate of a Subblock Merge mode according to an embodiment of the invention.
Detailed description
Embodiments of the present invention described below relate to improving derivation of motion information predictor from previously processed (e.g. encoded or decoded) image portions' or pictures' motion information. Before describing the embodiments, video encoding and decoding techniques and related encoders and decoders which may implement the embodiments/variants of the present invention will be described.
It is understood that motion information comprises at least one of a motion vector, a reference index for identifying a reference image/picture or a combination index (for identifying a selected combination mechanism for combining more than one motion information). A combination index is also known as a -bi-prediction weight index-. It is also understood that a prediction or method described in this specification as being performed with/for a motion vector may also be performed with/for any motion information.
In this specification 'signalling' may refer to inserting into (providing/including/encoding in), or extracting/obtaining (decoding) from, a bitstream information about one or more syntax element representing use, disuse, enabling or disabling of a mode (e.g. an inter prediction mode) or other information (such as information about a selection).
Figure 1 relates to a coding structure used in the High Efficiency Video Coding (HEVC) video standard. A video sequence 1 is made up of a succession of digital images i. Each such digital image is represented by one or more matrices. The matrix coefficients represent pixels.
An image 2 of the sequence may be divided into slices 3. A slice may in some instances constitute an entire image. These slices are divided into non-overlapping Coding Tree Units (CTUs). A Coding Tree Unit (CTU) is the basic processing unit of the High Efficiency Video Coding (HEVC) video standard and conceptually corresponds in structure to macroblock units that were used in several previous video standards. A CTU is also sometimes referred to as a Largest Coding Unit (LCU). A CTU has luma and chroma component parts, each of which component parts is called a Coding Tree Block (CTB). These different color components are not shown in Figure 1.
A CTU is generally of size 64 pixels x 64 pixels for HEVC, yet for VVC this size can be 128 pixels x 128 pixels. Each CTU may in turn be iteratively divided into smaller variable-size Coding Units (CUs) 5 using a quadtree decomposition.
Coding units are the elementary coding elements and are constituted by two kinds of sub-unit called a Prediction Unit (PU) and a Transform Unit (TU). The maximum size of a PU or TU is equal to the CU size. A Prediction Unit corresponds to the partition of the CU for prediction of pixels values. Various different partitions of a CU into PUs are possible as shown by 606 including a partition into 4 square PUs and two different partitions into 2 rectangular PUs. A Transform Unit is an elementary unit that is subjected to spatial transformation using DCT. A CU can be partitioned into TUs based on a quadtree representation 607. So a slice, a tile, a CTU/LCU, a CTB, a CU, a PU, a TU, or a block of pixels/samples may be referred to as an image portion, i.e. a portion of the image 2 of the sequence.
Each slice is embedded in one Network Abstraction Layer (NAL) unit. In addition, the coding parameters of the video sequence are stored in dedicated NAL units called parameter sets. In HEVC and H.264/AVC two kinds of parameter sets NAL units are employed: first, a Sequence Parameter Set (SP S) NAL unit that gathers all parameters that are unchanged during the whole video sequence. Typically, it handles the coding profile, the size of the video frames and other parameters. Secondly, a Picture Parameter Set (PPS) NAL unit includes parameters that may change from one image (or frame) to another of a sequence. HEVC also includes a Video Parameter Set (VPS) NAL unit which contains parameters describing the overall structure of the bitstream. The WS is a new type of parameter set defined in HEVC, and applies to all of the layers of a bitstream. A layer may contain multiple temporal sub-layers, and all version 1 bitstreams are restricted to a single layer. HEVC has certain layered extensions for scalability and multiview and these will enable multiple layers, with a backwards compatible version 1 base layer.
Figure 2 and Figure 12 illustrate data communication systems in which one or more embodiments of the invention may be implemented. The data communication system comprises a transmission device, e.g. a server 201 in Figure 2 or a content provider 150 in Figure 12, which is operable to transmit data packets of a data stream 204 (or bitstream 101 in Figure 12) to a receiving device, e.g. a client terminal 202 in Figure 2 or a content consumer 100 in Figure 12, via a data communication network 200. The data communication network 200 may be a Wide Area Network (WAN) or a Local Area Network (LAN). Such a network may be for example a wireless network (Wifi / 802.11a or b or g), an Ethernet network, an Internet network or a mixed network composed of several different networks. In a particular embodiment of the invention the data communication system may be a digital television broadcast system in which the sewer 201 sends the same data content to multiple clients.
The data stream 204 (or bitstream 101) provided by the server 201 (or the content provider 150) may be composed of multimedia data representing video and audio data. Audio and video data streams may, in some embodiments of the invention, be captured by the server 201 (or the content provider 150) using a microphone and a camera respectively. In some embodiments data streams may be stored on the server 201 (or the content provider 150) or received by the server 201 (or the content provider 150) from another data provider, or generated at the server 201 (or the content provider 150). The server 201 (or the content provider 150) is provided with an encoder for encoding video and audio streams (e.g. original sequence of images 151 in Figure 12) in particular to provide a compressed bitstream 204, 101 for transmission that is a more compact representation of the data presented as input to the encoder.
In order to obtain a better ratio of the quality of transmitted data to quantity of transmitted data, the compression of the video data may be for example in accordance with the HEVC format or H.264/AVC format or VVC format.
The client 202 (or the content consumer 100) receives the transmitted bitstream and decodes the reconstructed bitstream to reproduce video images (e.g. video signal 109 in Figure 12) on a display device and the audio data by a loud speaker.
Although a streaming scenario is considered in the example of Figure 2, it will be appreciated that in some embodiments of the invention the data communication between an encoder and a decoder may be performed using for example a media storage device such as an optical disc.
In one or more embodiments of the invention a video image may be transmitted with data representative of compensation offsets for application to reconstructed pixels of the image to provide filtered pixels in a final image.
Figure 3 schematically illustrates a processing device 300 configured to implement at least one embodiment of the present invention. The processing device 300 may be a device such as a micro-computer, a workstation or a light portable device. The device 300 comprises a communication bus 313 connected to: -a central processing unit 311, such as a microprocessor, denoted CPU; -a read only memory 307, denoted ROM, for storing computer programs for implementing the invention; -a random access memory 312, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to embodiments of the invention; and -a communication interface 302 connected to a communication network 303 over which digital data to be processed are transmitted or received.
Optionally, the apparatus 300 may also include the following components: -a data storage means 304 such as a hard disk, for storing computer programs for implementing methods of one or more embodiments of the invention and data used or produced during the implementation of one or more embodiments of the invention; -a disk drive 305 for a disk 306, the disk drive being adapted to read data from the disk 306 or to write data onto said disk; -a screen 309 for displaying data and/or serving as a graphical interface with the user, by means of a keyboard 310 or any other pointing means (e.g. a mouse) or input means (e.g. a touch screen).
The apparatus 300 can be connected to various peripherals, such as for example a digital camera 320 or a microphone 308, each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 300.
The communication bus provides communication and interoperability between the various elements included in the apparatus 300 or connected to it. The representation of the bus is not limiting and in particular the central processing unit is operable to communicate instructions to any element of the apparatus 300 directly or by means of another element of the apparatus 300.
The disk 306 can be replaced by any information medium such as for example a compact disk (CD-ROM), rewritable or not, a ZIP disk or a memory card and, in general terms, by an information storage means that can be read by a microcomputer or by a microprocessor, integrated or not into the apparatus, possibly removable and adapted to store one or more programs whose execution enables the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to the invention to be implemented.
The executable code may be stored either in read only memory 307, on the hard disk 304 or on a removable digital medium such as for example a disk 306 as described previously.
According to a variant, the executable code of the programs can be received by means of the communication network 303, via the interface 302, in order to be stored in one of the storage means of the apparatus 300 before being executed, such as the random access memory 312 or the hard disk 304.
The central processing unit 311 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, instructions that are stored in one of the aforementioned storage means. On powering up, the program or programs that are stored in a non-volatile memory, for example on the hard disk 304, the disk 306 or in the read only memory 307, are transferred into the random access 20 memory 312, which then contains the executable code of the program or programs, as well as registers for storing the variables and parameters necessary for implementing the invention. In this embodiment, the apparatus is a programmable apparatus which uses software to implement the invention. However, alternatively, the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).
Figure 4 illustrates a block diagram of an encoder according to at least one embodiment of the invention. The encoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, at least one corresponding step of a method implementing at least one embodiment of encoding an image of a sequence of images according to one or more embodiments of the invention.
An original sequence of digital images i0 to in 401 is received as an input by the encoder 400. Each digital image is represented by a set of samples, sometimes also referred to as pixels (hereinafter, they are referred to as pixels).
A bitstream 410 is output by the encoder 400 after implementation of the encoding process. The bitstream 410 comprises a plurality of encoding units (or coding units) or slices, each slice comprising a slice header for transmitting encoding values of encoding parameters used to encode the slice and a slice body comprising encoded video data.
The input digital images 10 to in 401 are divided into blocks of pixels by module 402.
The blocks correspond to image portions and may be of variable sizes (e.g. 4x4, 8x8, 16x16, 32x32, 64x64, 128x128 pixels and several rectangular block sizes can be also considered). A coding mode is selected for each input block. Two families of coding modes are provided: coding modes based on spatial prediction coding (Intra prediction), and coding modes based on temporal prediction (Inter coding, Merge, SKIP). The possible coding modes are tested.
Module 403 implements an Intra prediction process, in which the given block to be encoded is predicted by a predictor computed from pixels of the neighbourhood of said block to be encoded. An indication of the selected Intra predictor and the difference between the given block and its predictor is encoded to provide a residual if the Intra coding is selected.
Temporal prediction is implemented by motion estimation module 404 and motion compensation module 405. Firstly a reference image from among a set of reference images 416 is selected, and a portion of the reference image, also called reference area or image portion, which is the closest area (closest in terms of pixel value similarity) to the given block to be encoded, is selected by the motion estimation module 404.
It should be noted that when temporal prediction is used for either encoding or decoding an image portion/block, up to two lists of reference pictures, typically termed LO (LO list) and LI (Li list), are usually maintained (e.g. generated and stored as motion information for future access), with one or more pictures in LO also possibly being included in LI (i.e. it is possible to have a duplicate between the lists). Video codecs performing video coding/decoding can often encode/decode images/pictures in an order different from the temporal order in the video.
However, there is a processing order (e.g. an encoding order or a decoding order) in which the images/pictures are processed (e.g. encoded or decoded), thereby introducing the concept of a past (previous) or future (forward) reference picture, defined in relation to the temporal position of an image/picture currently being encoded/decoded (i.e. a current image/picture).
Traditionally, LO is intended for including past reference pictures and Ll is intended for including future reference pictures, but this is no longer the case. However, the framework behind the video coding/decoding remains that a P-frame (made of one or more P-slices and uses prediction from a reference picture from a single past reference picture list, e.g. LO) solely refers to the LO list, while a B-frame (uses either one or both of past and future reference picture lists, e.g. LO and LI) has access to both LO and LI lists.
Motion compensation module 405 then predicts the block to be encoded using the selected area. The difference between the selected reference area and the given block, also called a residual block, is computed by the motion compensation module 405. The selected reference area is indicated using a motion vector and a reference index, corresponding to the index of the corresponding reference picture from its LO or LI list.
Bidirectional motion compensation, which consists in combining predictors (i.e. motion information) coming from two blocks, one from each reference picture list, therefore requires one motion vector and reference index from each reference list. The combination of the predictors has traditionally been achieved by averaging the predictors. VVC proposes various other combination mechanisms such as linear combinations of the two predictors, and furthermore proposes signalling the selected combination mechanism (e.g. the selected linear combination and/or its associated parameters) among a plurality of possible combination mechanisms. It is therefore understood that motion information comprises at least one of a motion vector, a reference index or a combination index (for identifying the selected combination mechanism), and the motion information may also comprise more than one of each, for example two motion vectors and two reference indices from LO and LI, and a combination index.
Thus, in both cases (spatial and temporal prediction), a residual is computed by subtracting the predictor from the original block when it is not in the SKIP mode.
In the INTRA prediction implemented by module 403, a prediction direction is encoded. In the Inter prediction implemented by modules 404, 405, 416, 418, 417, at least one motion vector or data for identifying such motion vector is encoded for the temporal prediction.
Information relevant to the motion vector and the residual block is encoded if the Inter prediction is selected. To further reduce the bitrate, assuming that motion is homogeneous, the motion vector is encoded by difference with respect to a motion vector predictor. Motion vector predictors from a set of motion information predictor candidates is obtained from the motion vectors field 418 by a motion vector prediction and coding module 417. It is understood that any motion information may also be predicted in a similar manner.
The encoder 400 further comprises a selection module 406 for selection of the coding mode by applying an encoding cost criterion, such as a rate-distortion criterion. In order to further reduce redundancies a transform (such as DCT) is applied by transform module 407 to the residual block, the transformed data obtained is then quantized by quantization module 408 and entropy encoded by entropy encoding module 409. Finally, the encoded residual block of the current block being encoded is inserted into the bitstream 410 when it is not in the SKIP mode and the mode requires a residual block to be encoded in the bitstream.
The encoder 400 also performs decoding of the encoded image in order to produce a reference image (e.g. those in Reference images/pictures 416) for the motion estimation of the subsequent images. This enables the encoder and the decoder receiving the bitstream to have the same reference frames (reconstructed images or image portions are used). The inverse quantization ("dequantization") module 411 performs inverse quantization ("dequantization") of the quantized data, followed by an inverse transform by inverse transform module 412. The intra prediction module 413 uses the motion prediction information to determine which predictor to use for a given block and the motion compensation module 414 actually adds the residual obtained by module 412 to the reference area obtained from the set of reference images 416. It is understood that any motion information may also be predicted in a similar manner.
Post filtering is then applied by module 415 to filter the reconstructed frame (image or image portions) of pixels.
Figure 5 illustrates a block diagram of a decoder 60 which may be used to receive data from an encoder according an embodiment of the invention. The decoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 311 of device 300, a corresponding step of a method implemented by the decoder 60.
The decoder 60 receives a bitstream 61 comprising encoded units (e.g. data corresponding to one or more image portion(s), block(s) or coding unit(s)), each one being composed of a header containing information on encoding parameters and a body containing the encoded video data. As explained with respect to Figure 4, the encoded video data is entropy encoded, and the motion vector predictors' indexes are encoded, for a given image portion (e.g. a block or a CU) on a predetermined number of bits, for example. The received encoded video data is entropy decoded by module 62. The residual data are then dequantized by module 63 and then an inverse transform is applied by module 64 to obtain pixel values. The mode data indicating the coding mode are also entropy decoded and based on the 30 mode, an INTRA type decoding or an INTER type decoding is performed on the encoded blocks (units/sets/groups) of image data.
In the case of INTRA mode, an INTRA predictor is determined by intra prediction module 65 based on the intra prediction mode specified in the bitstream.
If the mode is INTER, the (motion) prediction information is extracted from the bitstream so as to find (identify) the reference area used by the encoder. The motion prediction information comprises the reference frame index and the motion vector residual. The motion vector predictor is added to the motion vector residual by motion vector decoding module 70 in order to obtain the motion vector.
Motion vector decoding module 70 applies motion vector decoding for each image portion (e.g. current block or CU) encoded by motion prediction. Once an index of the motion vector predictor for the current block has been obtained, the actual value of the motion vector associated with the image portion (e.g. current block or CU) can be decoded and used to apply motion compensation by module 66. The reference image portion indicated by the decoded motion vector is extracted from a reference image 68 to apply the motion compensation 66. The motion vector field data '71 is updated with the decoded motion vector in order to be used for the prediction of subsequent decoded motion vectors.
Finally, a decoded block is obtained. Where appropriate, post filtering is applied by post filtering module 67. A decoded video signal 69 is finally obtained and provided by the decoder 60.
Coding/decoding of motion information HEVC uses 3 different INTER modes: the Inter mode (Advanced Motion Vector Prediction (AMVP) which signals motion information difference), the "classical" Merge mode (i.e. the "non-Subblock Merge mode" or also known as "regular" Merge mode which does not signal motion information difference) and the "classical" Merge Skip mode (i.e. the "nonSubblock Merge Skip" mode or also known as "regular" Merge Skip mode which does not signal motion information difference and also does not signal residual data for a sample value).
The main difference between these modes is the data signalling in the bitstream. For the Motion vector coding, the current HEVC standard includes a competition based scheme for Motion vector prediction which was not present in earlier versions of the standard. It means that several candidates are competing with the rate distortion criterion at encoder side in order to find the best motion vector predictor or the best motion information for respectively the Inter coding mode (AMVP) or the Merge modes (i.e. the "classical/regular" Merge mode or the "classical/regular" Merge Skip mode). An index or a flag corresponding to the best predictor or the best candidate of the motion information is then inserted in the bitstream. The decoder can derive the same set of predictors or candidates and uses the best one according to the decoded index/fl ag.
The design of the derivation of predictors and candidates is important in achieving the best coding efficiency without a disproportionate impact on complexity. In HEVC two motion vector derivations are used: one for Inter mode (Advanced Motion Vector Prediction (AIVIVP)) and one for Merge modes (Merge derivation process -for the classical Merge mode and the classical Merge Skip mode). In VVC, the Merge mode derivation is extended to include, in addition to the INTER mode, other modes such as a TRIANGLE mode. The TRIANGLE mode splits an image portion (e.g. a block or a CU) into 2 parts (partitions or smaller image portions) which, contrary to the traditional splitting such as HEVC or other VVC partitioning, are neither vertical nor horizontal splitting, but an oblique (diagonal) splitting of the image portion, resulting in two triangular smaller image portions (e.g. triangular prediction units for use with the prediction).
According to an embodiment of the invention, the INTER modes use a list of motion information predictors or motion information predictor candidates. From the list, a motion information predictor for processing/predicting/encoding/decoding an image portion is selected, information/data for identifying the selected motion information predictor from the list (e.g. an index or a flag) is signalled/encoded/decoded, e.g. via bitstream, and the selected motion information predictor is selected/used to process/predict/encode/decode the image portion (at the decoder side, the selected motion information predictor is identified/obtained using this signalled/encoded/decoded information/data) According to the embodiment, how this list is used/generated/updated has been modified from HEVC. For example, the temporal motion vector predictor (TMVP) candidate derivation process has been modified, and a new candidate called a history motion vector predictor (HMVP) has been added as one of the potential motion information predictor candidate. The HMVP uses a history list, which is generated/updated using a process similar to a first-in first-out (FIFO) with redundancy (e.g. duplicate) checking and pruning. The following describes processes according to variants of this embodiment which uses this HMVP/history list in the TMVP (candidate for the regular Merge candidate list) or a constructed Affine candidate (for the subblock Merge candidate list) derivation process.
The Subblock Merge candidate list can include as a candidate, a constructed Affine candidate. This candidate is determined/derived/obtained by defining up to three control points, the position and motion information of which is then used to determine parameters for deriving the motion information of each subblock of the constructed Affine candidate. One of these control points uses motion information obtained from a bottom right corner of a current block (H position -see below), which is determined/derived/obtained using a very similar process to the TMVP derivation process.
Figure 6a and 6b illustrate spatial and temporal blocks that can be used to determine positions as well as motion information for use in various motion information predictor candidate lists, such as a Subblock Merge candidate list or a (regular) Merge candidate list.
Table 1 below outlines the nomenclature used when referring to blocks in relative terms to the current block as shown in Figures 6a and 6b. This nomenclature is used as shorthand but it should be appreciated other systems of labelling may be used, in particular in future versions of a standard.
Block label Relative positional description of neighbouring block AO 'Below left' or left corner' -diagonally down and to the left of the current block Al Theft' or 'Bottom left' -left of the bottom of the current block A2 'Top left' -left of the top of the current block BO 'Above right' -diagonally up and to the right of the current block B1 'Above' -above the top right of the current block B2 'Above left' or 'Top corner' -diagonally up and to the left of the current block B3 'Up' -above the top left of the current block H Bottom right of a collocated block in a reference frame Center A block within a collocated block in a reference frame
Table 1
It should be noted that the 'current block' may be variable in size, for example 4x4, 16x16, 32x32, 64x64, 128x128 or any size in between. The dimensions of a block are preferably factors of 2 (i.e. 2An x 2'm where n and m are positive integers) as this results in a more efficient use of bits when using binary encoding. The current block need not be square, although this is often a preferable embodiment for coding complexity.
Turning to Figure 7, use of the above H and center block positions is illustrated through an example of a derivation process for motion information, which can then be used to derive a TMVP candidate for the (regular) Merge candidate list, used in the INTER and TRIANGLE modes, and/or to obtain/determine/derive motion information for a control point of a constructed Affine candidate for the Subblock Merge candidate list. The following description is described in relation to the TMVP candidate but it is understood that the same steps/processes can be used to obtain motion information for use in the derivation of a control point of a constructed Affine candidate because it is obtained/determined/derived using the same information.
For each reference picture list (i.e. each reference list) available for the current slice, two positions are checked (as potential candidates from which motion information may be obtained) and, if usable/available, the motion information associated with that usable/available position is used to derive the (collocated) motion information for the TMVP candidate (which is an example of "temporally collocated/neighbouring/obtained/derived motion information").
To do this, the H and centre positions (in collocated frame/picture, i.e. in a reference picture) are determined (e.g. using coordinate values of the position of the current block or the current CU). The first reference picture list is selected at step 701, namely LO.
Next, step 703 determines whether the H position is available (for providing the motion information), and selects the 1-I position to proceed to step 704 if it is available. Availability of motion information from the collocated frame/picture at H position (for the current reference picture list LO) is checked at step 704.
If it is not available, or there is no motion information associated with the H position, next position (i.e. center position) is checked for motion information at step 706. Otherwise (i.e. the motion information associated with the H position is available), step 705 checks whether the first reference picture of the current reference picture list (LO) or the reference picture indicated by the motion information at H position is long term. A reference picture is considered to be long term if its temporal position is very different from the current picture's temporal position (i.e. the reference picture is a long distance away from the current picture in terms of its temporal position, e.g. picture order count difference between the two is large or above a certain threshold value). If only one of the two is long term, the motion information at the current position (i.e. H position) in the collocated picture cannot be reused and processing proceeds/skips to the next position (i.e. center position) at step 706. This step 706 checks, in a similar manner as it checked for H, whether there is available motion information at the centre position in the collocated frame/picture (i.e. checks the availability of the motion information associated with the center position). Please note that, by definition, motion information associated with the center position is always available (as otherwise current CU would also have motion information associated with its centre unavailable). If there is no motion information or, similarly to step 705, if the long term match fails at step 707, then step 714 sets the (temporally collocated/neighbouring) motion information (e.g. for the TNIVP)as having no motion information available for the current reference picture list.
Otherwise (i.e. when the motion information is available, e.g. for deriving the (temporally collocated/neighbouring) motion information for the TINIVP), the motion information for current position (of the current block/CU) and reference picture list associated with the current position can be exploited. Therefore, step 708 checks whether the first reference picture of the current reference picture list is a long-term reference picture. These reference pictures can be extremely distant in time or encoded long ago, such as a static or special-purpose area (because these areas with no or small changes are not re-encoded during the encoding process, reference pictures used for these areas are likely to be long-term reference pictures). In this case, the motion vector part of the motion information is not resealed, and directly reused (i.e. used without any modification) at step 711 to set this motion vector, the reference index 0 and motion information availability (e.g. indicated by a flag) as the motion information for the current reference picture list.
If the reference picture is not a long-term, then resealing according to temporal distances must be performed. To do so, two temporal distances, in terms of picture order count (which is in terms of their position in the sequences of pictures forming a video, and hence is different from a picture coding order), are measured on step 709: * Between the current picture and said first reference picture; * Between the collocated reference picture and its reference picture indicated by the reference index.
These two distances are used to determine the scaling factor applied to each component of the motion vector. Once it is known, step 710 scales each component of the motion vector, and the processing continues to the already described step 711.
Whether motion information for the current reference picture list has been set at step 711, or not set at step 714, the processing for said current reference picture list is done, and so step 712 checks whether there is another list. According to a variant, this basically amounts to checking the type of the current slice: if it is a P-Slice, only LO needs to be processed and the TNIVP has been fully determined, and processing proceeds to step 715 to end/terminate the process. Please note that before actually ending/terminating the process, some additional processing steps may also be performed at step 715. Similarly, if the current reference picture list is Ll (and thus the current slice is of type B-Slice), then processing also goes to step 715. In the case where the slice is a B-Slice and the current reference picture list is LO, processing continues on to step 713 to select the next reference picture list, i.e. Ll, since only two reference picture lists LO and L 1 exist.
Concerning step 715, as the availability of motion information for all the available reference picture lists (i.e. LO or LO & L1) is known, it can be additionally checked if the motion information is available at all. Step 715 may thus indicate that, if no motion information is available for the TMVP, the TMVP itself is unavailable. This availability information of the TMVP can then be used to, e.g., to decide/determine whether to add the TMVP candidate to a motion information predictor list such as the Merge or AMVP lists. It is understood that, similarly, for the constructed Affine candidates in the Subblock Merge list, if no (temporally collocated/neighbouring) motion information is available, then the control point associated with said (temporally collocated/neighbouring) motion information is considered unavailable. Figure 8 is a schematic of a generation/update process (also referred to as an "updating process") for a history-based motion vector predictor (FINIVP -may also be referred to as history-based motion information predictor or HMIP) candidate list according to an embodiment of the present invention. A li_MVP is a predictor obtained from a set of motion information which is updated, and stored (i.e. maintained), as each image portion (e.g. current block or current CU) and its associated motion information is processed/encoded/decoded according to the processing/encoding/decoding order of image portions. This 1-11MVP list is generated/updated in a FIFO (first-in, first-out) manner, with an additional excluding/pruning of at least one redundant motion information to ensure it includes diverse motion information, which is a notable departure from the traditional spatial neighbour motion information which does not consider any diversity, e.g. motion information associated with AO, Al, BO, B1, or B2 in Figure 6. At first step 801, new motion information (MI) is input to the process. This MI comprises 2 sets of a list utilization flag, a motion vector and a reference index (one set for each reference list, the reference index being indicative of the respective reference picture in said reference list). The flag indicates whether the related motion vector and reference index are valid and exist (i.e. available for use): for instance, in a P-slice, if the reference list is LI, then the flag always indicates no use (i.e. is not available for use). As it consists in updating motion information from actually decoded motion information (i.e. not an INTRA-coded block), one of LO or L I flags is always TRUE (i.e. there is always motion information available from at least one of LO or L1). For a B-frame, a bi-prediction weight index is present.
To start the updating process, a number of elements of the list are initialized. In particular, whether an identical motion information (i.e. a duplicate) already exists in the list (and thus needs to be pruned/excluded), which is indicated/represented with a variable identicalCandExist for example, and at what position in the list the duplicate is present, which is indicated/represented with a variable removeldx for example, are set at the start. These pieces of information, and the related variables, allow determining if and which motion information item needs to be removed from the list. Therefore, variables identicalCandExist and removeldx, which are set (i.e. initialized) to respectively FALSE and 0 when there is no duplicate in the list. Otherwise, the history list is searched for the duplicate motion information so that it can be pruned/excluded.
Therefore, at step 803, if the history list is empty, then no pruning/excluding is needed, and processing goes directly to step 810. Otherwise, the first/oldest (in terms of when it was added to the list) motion information in the list is selected at step 804.
At step 805, if the current motion information is identical to the input motion information (i.e., if motion vectors, utilization flags and reference indices are equal), then it will be pruned/excluded from the list: at step 806, removeldx is set to the index of the current motion information, and identicalCandExist is set to TRUE, and the "805->807->808->805" loop is interrupted by going to step 809. Otherwise, the loop is continued, and whether this was the last (i.e.newest) motion information of the list or not is checked at step 807: if yes, then the loop is finished and the processing continues to step 809, otherwise the next motion information in the list is selected for processing at step 808.
Step 809 removes/excludes the identified duplicate motion information from the list: if identicalCandExist is TRUE, then the slot at index removeldx is emptied (i.e. motion information associated with the index removeldx is removed/excluded from the list). If it is FALSE, then the item (i.e. the element of the list/set) at index 0 (i.e. the oldest motion information in the list) is removed instead. Then all following motion information items whose index is greater than removeIdx are moved to a slot with one lower index (i.e. their index value is decreased by one). The end result is that the last slot (the newest motion information) is emptied to make a room for the latest input motion information.
Step 810 then inserts said input motion information as the new motion information at the back of the list (i.e. at a slot with the highest index). Processing can then stop at 811, i.e. the list is generated/updated.
According to a variant of the embodiment, the list is fully emptied at the start of processing each row of blocks/CUs/CTBs/CTUs (i.e. the list is generated/updated for each row of blocks/CUs/CTBs/CTUs in the processing/encoding/decoding order, and the list is initialized to an empty list/set at the beginning of each row): this is because a motion in the rightmost part of an image/picture being processed/encoded/decoded is often very different from a motion in the leftmost part, and therefore motion information for the former s ineffective in predicting motion information for the latter.
It is understood that, at the start of any one or more of a slice, a tile, an image/picture, the list/set may be empty because no motion information may be available.
Other embodiments Embodiments of the invention will now be described with reference to remaining Figures. It should be noted that the embodiments may be combined unless explicitly stated otherwise; for example certain combinations of embodiments may improve coding efficiency at increased complexity, but this may be acceptable in certain use cases.
Figure 9 is a modification of the motion information derivation process of Figure 7 according to an embodiment of the invention.
In the following, a TMVP derivation variant of this embodiment is described, in relation to the modification made for the TMVP derivation process (i.e. the obtained/determined/derived motion information is used to derive a TMVP candidate for the (regular) Merge candidate list, used in the INTER and TRIANGLE modes. It is understood that the same steps/processes can also be used to obtain/determine/derive motion information for use in deriving a control point for a constructed Affine candidate (for the subblock Merge candidate list), and the same comments also apply to such use of the "temporally col 1 ocated/neighbouring/obtained/derived motion inform ati on" . All steps from 901 to 915 are almost identical to their counterparts 701 to 715 in Figure 7, except for an additional step 920. The medication enables changing of the reference index, which is used for the TM VP (and which is an example of "temporally collocated/neighbouring/obtained/derived motion information"), and this can be beneficial for coding efficiency. Historically, spatial neighbours (e.g. spatially neighbouring blocks of a collocated block of the current block) were used for this purpose. However, this dependency on the spatial neighbours was removed from HEVC, and not proposed again in VVC, despite VVC's more generous restriction on the complexities involved in processing these, because of the increased complexity involved in using such spatial neighbours. Indeed, said spatial neighbours may be unavailable, or subject to rules/conditions and constraints such as ensuring they are located in specific areas or are unique/distinct/different from each other (and therefore have increased complexity from requiring comparing with other spatial neighbours). This causes additional delay in the processing, and making the TMVP dependent on them further increase this delay.
Therefore, the additional step 920 determines/derives/obtains the reference index (instead of being assumed to be 0, whereby the first reference picture is always selected/targeted) for the TMVP using HMVP from the history list generated/updated/stored using the process of Figure 8. This is particularly advantageous because this history list is already generated/considered after the TMVP candidate has been determined/derived for the VVC merge list for regular inter and/or TRIANGLE inter modes, so any impact on the complexity/processing load from this dependency on the history list is minimal, and in particular, is not an additional one. The same can also be said when the additional step 920 is used to determine/derive/obtain reference index for obtaining motion information (e.g. motion vector) for a control point for use in generating/determining/deriving a constructed Affine candidate (for the subblock merge candidate list).
According to a preferred embodiment, the reference index determination/derivation at step 920 comprises considering/accessing/checking the reference indices of the items in the history list (i.e. elements of the history list/set) from the newest item to the oldest item (i.e. from the last item to the first item for the history list, which may be a list compiled/generated/updated in a FIFO manner).
Figure 10 illustrates this reference index determination/derivation at step 920 according to the preferred embodiment. In the preferred embodiment, only 2 most recent (and available) items (e.g. reference indices) are considered/accessed/checked. Checking more items has two potential drawbacks, requiring more checks, and accessing items that are too old and hence less likely to be relevant for the current block. However, it is understood that considering/accessing/checking less (i.e. I most recent item) or more items is also possible according to an alternative embodiment.
At step 1000, reference index refidx (or refldx) is initialized. According to a variant, refidx is initialized/set to a default value, typically 0, as there is always at least one reference picture in a reference picture list. According to another variant, refidx is initialized/set to a default value, and this default value is signalled, for example at the slice-level or in another structure such as a Picture Parameter Set (PPS) or a Sequence Parameter Set (SPS). According to yet another variant, it is initialized/set to a value which indicates that it was not set for use, e.g. -1, so that it can be set another value later on.
Next, step 1001 checks whether the history list is empty. If it is, no item is available for determining/adapting the reference indices for the TMVP candidate (or indeed "temporally collocated/neighbouring/obtained/derived motion information") so the processing goes to step 1007, where it ends. Otherwise (i.e. the list is not empty), the first item of the history list is selected. Then, at step 1003, whether the current item refltem (i.e. the selected first item) has motion information for the considered reference picture list is checked/determined. If it does not have appropriate motion information for the considered reference picture list, then refidx cannot be updated (note that the update involves accumulating observations/occurrences of refidx by always taking the larger between refidx and refltem) and processing proceeds to step 1005. Otherwise (i.e. the current item has the appropriate motion information for the considered reference picture list), the reference index refltem of the current item is considered for updating refidx at step 1004. In the preferred variant, this updating of refidx at step 1004 comprises taking the maximum value (i.e. the larger value) between refidx and refltem.
In a variant, a count of items having refItem as reference index is maintained/stored/updated. This can be used to identify the reference index which is most frequently included in the history list, which is likely to be the most probable and promising one for the TMVP candidate.
Once refidx for the "temporally collocated/neighbouring/obtained/derived motion information" (e.g. TMVP candidate) has been updated/set using the current item of the history list, processing proceeds to step 1005, where it is checked whether the last item to consider from the history list has been processed.
If it is determined that the last item has been processed, the processing proceeds to step 1007, ending/terminating the process. For example, in the preferred embodiment, if both of the two items have been processed, then the second one would have been the last item and the processing proceeds to step 1007. Another example where the processing proceeds to step 1007 to terminate the process would be simply when the history list has only one item, and hence that one and only last item has been considered. Other examples of the cases when the processing might proceed to step 1007 to terminate the process would be when, all items of the history list have been processed. In another embodiment, step 1004 also updates a count of items (e.g. reference index values) that were processed by it, so as to use said count at step 1005 to determine whether the currently processed item is the last item, e.g. when 2 items have been processed through 1004 (i.e. 2 reference indices have been observed).
h) another embodiment, the number of items to consider for using with the TMVP candidate depends on a property of (or a parameter relating to) the block being processed/encoded/decoded, such as the prediction mode (e.g. TRIANGLE mode). In the preferred variant, when the block is being processed/encoded/decoded in the TRIANGLE mode, there is no updating/maintaining of the history list taking place so the whole update process (including step 1004, e.g. looking for the maximum reference index value) does not take place. One way of doing this is at step 1001, when in the TRIANGLE mode, the list is considered as being empty and the process immediately ends/terminates by proceeding to step 1007. Another variant makes said number dependent on a size based criterion (e.g. one based on the height and/or the width, or their sum, or the number of samples), which is assessed on the block being processed/encoded/decoded. For example, at least 2 motion information items (e.g. reference index values) may be always processed if available, and a third one, if available, checked if the sum of the block width and height is greater than a certain value. According to a further variant, this certain value is 24.
If it is determined that the current item is not the last item to consider from the history list, then the next item to consider from the history list is selected at step 1006, and processing loops back to step 1003 to check/determine whether the current item refltem (i.e. the selected next item) has motion information for the considered reference picture list.
As mentioned previously, different variants have different step 1007. In some variants, if no reference index has been observed in the end, then refidx for the "temporally collocated/neighbouring/obtained/derived motion information" (e.g. refidx of the TMVP candidate) takes a default value.
In other variants which can identify the most frequently included reference index (e.g. using the count of items having refitem a reference index), refidx for the "temporally collocated/neighbouring/obtained/derived motion information" (e.g. refidx of the TMVP candidate) is set to the most frequently included reference index.
Referring back to Figure 9, as a consequence of refidx being adaptively determined, using the history list, determining the "temporally collocated/neighbouring/obtained/derived motion information" (such as TMVP candidate), instead of considering only the first reference picture of the reference picture list, the refidx value derived/obtained at step 920 can also be considered. This affects following steps of the TMVP derivation process of Figure 9: - Step 909: the temporal distance between the current picture and the first reference picture in the current reference picture list (described with reference to step 709), where the latter is replaced by the reference picture indicated by refidx; -Step 911, where the stored reference index used to set the motion information for the current reference picture list (described with reference to step 711) is refidx. Figure 11 illustrates a modification for the generation/update process for the history list (HIvIVP candidate list) of Figure 8, which uses the reference index determination/derivation of step 920. All steps 1101 to 1111 are identical to their counterparts 801 to 811 in Figure 8, except the addition of the step 1120, which updates/obtains/derives/determines data/information related to the derivation of the reference indices for the "temporally collocated/neighbouring/obtained/derived motion information" (Temporal MI) (such as the TMVP candidate).
In an embodiment, step 1120 considers the last item in the history as well as the new Motion Information of step 1101, for instance taking the maximum reference indices for each reference picture list for the 2 Motion Information items. As a consequence, step 1120 simply produces the reference indices that step 920 will reuse as is..
It should be noted that all the preceding embodiments/variants may be further controlled by properties or coding parameters of the block for which TMVP is derived. For instance, in the case of the TRIANGLE mode, the history list is not updated. As a consequence, the derivation of the reference indices for TMVP candidate using HMVP is only invoked if the prediction mode is not TRIANGLE mode. Besides a complexity reduction, this offers the benefit of a competition between the traditionally obtained/derived/determined indices and the ones obtained/derived/determined by an embodiment/variant of the present invention.
According to an embodiment of the invention, motion information from AFFINE candidates (including the SbTMVP) are not used to generate/update/maintain the history list for the HMVP. This is because those temporal derivation processes are related to dynamic and yet continuous modifications of the motion field. So embodiments/variants which use the determination of the most frequently included reference index is with such AFFINE candidates (including the SbTMVP), to obtain/derive/determine their motion information. In some variants of these embodiments/variants, this may comprise comparing the 2 newest items in the history list, and if their reference index matches, using it. And if they do not, setting the reference index to 0.
A variant of any foregoing embodiment, which obtains/determines/derives a control point for generating/obtaining/deriving/determining a constructed Affine candidate (for the subblock Merge candidate list) is described below with reference to Figure 16, which illustrates how control points are used to derive constructed Affine candidates.
A constructed Affine candidate for a current block is constructed (generated/determined/derived) using various parameters to determine a motion information field. Such parameters include a set of several motion information associated with subblocks of the current block), position of each subblock, and affine model parameters.
So, the derivation process for a constructed Affine candidate starts with determining four control points at steps 1601 to 1604 in Figure 16. Each step uses one or more (i.e. a set) of the block positions described in relation to Figure 6a/6b before, and at step 1605, availability of each set of position(s)' corresponding/associated motion information is checked. For example, according to a variant each of the positions are checked in succession (in the order of 1601-1604) until one position is determined to be available and has motion information associated with it which is available for use/access. For instance, the top-left control point 1601 is defined by checking B2, then B3, then A2 (as shown in Figure 6b). The first motion information available is retrieved from these positions and then associated to the control point (i.e. set as the control point). If no motion information is available, the control point is marked as unavailable.
It is understood that other number(s) of control points may also be used depending on how the current block is divided into subblocks but for this particular embodiment, four is considered. It is also understood that different set of block positions (from that shown in Figure 16) may be considered for each control point.
In step 1604, "temporally collocated/neighbouring/obtained/derived motion information" (i.e. Temporal motion information (Temporal MI)) obtained/determined/derived using the motion information derivation process steps of Figure 7 and/or Figure 9 are used. According to a variant, only the motion information derivation steps/process using the H position, not the Center position, are used to obtain/determine/derive motion information at step 1604. According to a yet another variant, various modifications and determination steps describes in relation to Figure 7, Figure 8, Figure 9, Figure 10, & Figure 11 above, e.g. the reference index determination/derivation step of Figure 10 or the modification for the generation/update process for the history list (HMVP candidate list) described in relation to Figure 11, are also used.
After step 1605, each of the control points has either motion information associated with it, or is marked as being unavailable. The use of the control points is illustrated starting from step 1606, where a first (affine) model is selected. Then for the current model (e.g. the first model selected) selected, between 2 and 3 control points, depending on the model, are further selected at step 1607. If any of the selected control points is missing (i.e. unavailable), step 1608 proceeds to step 1615 to check if it was the last affine model for consideration, as a constructed Affine candidate cannot be constructed/generated/determined/derived when any control point is missing (i.e. unavailable).
If all control points are available for the current model, processing continues with step 1609. At step 1609, the first reference picture list available is selected. Step 1610 then checks that all control points have, available for use, motion information for the current reference picture list. If any of the control points have motion information for the current reference picture list unavailable, all the required model parameters cannot be derived, and the current model is marked as being unavailable for the current reference picture list, then processing proceeds to step 1612 to consider another reference picture list.
Otherwise, step 1611 computes/determines/derives/obtains the affine model parameters for the current reference picture list from all selected control points (from steps 1607) and marks the current model as available. There are then, for the current reference picture list, 4 parameters computed for 2 control points (as motion information, for a given reference picture list, comprises a motion vector with 2 components), and 6 for 3 control points. They define the parametrization according to position of the motion information field.
Step 1612 checks whether the currently considered reference picture list is the last reference picture list, e.g. if current reference picture list is LO for a P-slice or LI for a B-slice, then the current reference picture list is the last one. When there is at least one reference picture list left to process (e.g. for a P-slice, none after the first and the only one, and LI for a B-slice after the first one), step 1612 selects the next reference picture list before looping back to step 1610.
Otherwise, all reference picture lists have been processed. As these parameters are all that is needed to define an Affine candidate (i.e. to construct/generate/determine/derive/obtain a constructed Affine candidate), step 1613 checks whether any model is available. Indeed, for each list, whether the model is available for a given reference picture list has been determined by step 1610. If, for all reference picture lists, the model is unavailable, then step 1613 consider the model as empty/unavailable. If it is the case, then the Affine candidate is marked as available and stored, otherwise it is marked as being not available. This ends the processing for the current model: therefore next step 1615 checks whether the current model is the last model: if it is not the last model, the next model is selected at step 1617 before the model processing loops back to step 1607 to select 2 to 3 control points for the new current model. Otherwise, all models have been processed, and all available constructed Affine candidates have been constructed/generated/determined/derived/obtained.
It is understood that, according to a variant, a motion information predictor derivation process, an encoding process, or a decoding process determines/derives/obtains one or both of a TMVP candidate (e.g. for a regular Merge candidate list), or/and one or more control point(s) for a constructed Affine candidate (e.g. for the subblock Merge candidate list) using motion information from IA-MVP/history list according to any of foregoing embodiments/variants, and the process uses the determined/derived/obtained candidate when selecting, encoding and/or decoding a motion information predictor for processing/encoding/decoding an image portion (or an image).
Implementation of embodiments of the invention One or more of the foregoing embodiments or variants are implemented by the processor 311 of a processing device 300 in Figure 3, or corresponding functional module(s)/unit(s) of the encoder 400 in Figure 4, or of the decoder 60 in Figure 5, which perform the method steps of the one or more foregoing embodiments/variants.
Figure 13 is a schematic block diagram of a computing device 2000 for implementation of one or more embodiments of the invention. The computing device 2000 may be a device such as a micro-computer, a workstation or a light portable device. The computing device 2000 comprises a communication bus connected to: -a central processing unit (CPU) 2001, such as a microprocessor; -a random access memory (RAM) 2002 for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method for encoding or decoding at least part of an image according to embodiments of the invention, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port for example; -a read only memory (ROM) 2003 for storing computer programs for implementing embodiments of the invention; -a network interface (NET) 2004 is typically connected to a communication network over which digital data to be processed are transmitted or received. The network interface (NET) 2004 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data packets are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 2001; -a user interface (U1) 2005 may be used for receiving inputs from a user or to display information to a user; -a hard disk (HD) 2006 may be provided as a mass storage device; -an Input/Output module (JO) 2007 may be used for receiving/sending data from/to external devices such as a video source or display. The executable code may be stored either in the ROM 2003, on the HD 2006 or on a removable digital medium such as, for example a disk. According to a variant, the executable code of the programs can be received by means of a communication network, via the NET 2004, in order to be stored in one of the storage means of the communication device 2000, such as the HD 2006, before being executed.
The CPU 2001 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 2001 is capable of executing instructions from main RAM memory 2002 relating to a software application after those instructions have been loaded from the program ROM 2003 or the LID 2006, for example. Such a software application, when executed by the CPU 2001, causes the steps of the method according to the invention to be performed.
It is also understood that according to another embodiment of the present invention, a decoder according to an aforementioned embodiment is provided in a user terminal such as a computer, a mobile phone (a cellular phone), a tablet or any other type of a device (e.g. a display apparatus) capable of providing/displaying a content to a user. According to yet another embodiment, an encoder according to an aforementioned embodiment is provided in an image capturing apparatus which also comprises a camera, a video camera or a network camera (e.g. a closed-circuit television or video surveillance camera) which captures and provides the content for the encoder to encode. Two such examples are provided below with reference to Figures 14 and 15.
Figure 14 is a diagram illustrating a network camera system 2100 including a network camera 2102 and a client apparatus 2104.
The network camera 2102 includes an imaging unit 2106, an encoding unit 2108, a communication unit 2110, and a control unit 2112.
The network camera 2102 and the client apparatus 2104 are mutually connected to be able to communicate with each other via the network 200.
The imaging unit 2106 includes a lens and an image sensor (e.g., a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS)), and captures an image of an object and generates image data based on the image. This image can be a still image or a video image. The imaging unit may also comprise zooming means and/or panning means which are adapted to zoom or pan (either optically or digitally) respectfully.
The encoding unit 2108 encodes the image data by using said encoding methods explained in one or more of the foregoing embodiments. The encoding unit 2108 uses at least one of encoding methods explained in the foregoing embodiments. For another instance, the encoding unit 2108 can use combination of encoding methods explained in the foregoing embodiments.
The communication unit 2110 of the network camera 2102 transmits the encoded image data encoded by the encoding unit 2108 to the client apparatus 2104. Further, the communication unit 2110 receives commands from client apparatus 2104. The commands include commands to set parameters for the encoding of the encoding unit 2108.
The control unit 2112 controls other units in the network camera 2102 in accordance with the commands received by the communication unit 2110.
The client apparatus 2104 includes a communication unit 2114, a decoding unit 2116, and a control unit 2118. The communication unit 2114 of the client apparatus 2104 transmits the commands to the network camera 2102. Further, the communication unit 2114 of the client apparatus 2104 receives the encoded image data from the network camera 2102.
The decoding unit 2116 decodes the encoded image data by using said decoding methods explained in one or more of the foregoing embodiments. For another instance, the decoding unit 2116 can use combination of decoding methods explained in the foregoing embodiments.
The control unit 2118 of the client apparatus 2104 controls other units in the client apparatus 2104 in accordance with the user operation or commands received by the communication unit 2114. The control unit 2118 of the client apparatus 2104 controls a display apparatus 2120 so as to display an image decoded by the decoding unit 2116. The control unit 2118 of the client apparatus 2104 also controls a display apparatus 2120 so as to display GUI (Graphical User Interface) to designate values of the parameters for the network camera 2102 includes the parameters for the encoding of the encoding unit 2108.
The control unit 2118 of the client apparatus 2104 also controls other units in the client apparatus 2104 in accordance with user operation input to the GUI displayed by the display apparatus 2120. The control unit 2118 of the client apparatus 2104 controls the communication unit 2114 of the client apparatus 2104 so as to transmit the commands to the network camera 2102 which designate values of the parameters for the network camera 2102, in accordance with the user operation input to the GUI displayed by the display apparatus 2120. The network camera system 2100 may determine if the camera 2102 utilizes zoom or pan during the recording of video, and such information may be used when encoding a video 30 stream.
Figure 15 is a diagram illustrating a smart phone 2200.
The smart phone 2200 includes a communication unit 2202, a decoding/encoding unit 2204, a control unit 2206 and a display unit 2208.
The communication unit 2202 receives the encoded image data via network.
The decoding/encoding unit 2204 decodes the encoded image data received by the communication unit 2202. The decoding/encoding unit 2204 decodes the encoded image data by using said decoding methods explained in one or more of the foregoing embodiments. The decoding/encoding unit 2204 can use at least one of decoding methods explained in the foregoing embodiments. For another instance, the decoding/encoding unit 2204 can use combination of decoding or encoding methods explained in the foregoing embodiments. The control unit 2206 controls other units in the smart phone 2200 in accordance with a user operation or commands received by the communication unit 2202. For example, the control unit 2206 controls a display apparatus 2208 so as to display an image decoded by the decoding unit 2204.
The smart phone may further comprise an image recording device 2210 (for example a digital camera an associated circuity) to record images or videos. Such recorded images or videos may be encoded by the decoding/encoding unit 2204 under instruction of the control unit 2206. The smart phone may further comprise sensors 2212 adapted to sense the orientation of the mobile device. Such sensors could include an accelerometer, gyroscope, compass, global positioning (GPS) unit or similar positional sensors. Such sensors 2212 can determine if the smart phone changes orientation and such information may be used when encoding a video stream.
Alternatives and modifications It will be appreciated that an object of the present invention is to ensure that the use of (temporally collocated/neighbouring/obtained/derived) motion information (e.g. in temporal prediction such as Temporal Motion Vector Prediction or constructed Affine candidate) is utilised in a most efficient manner, and certain examples discussed above relate to signalling the use of temporal motion information prediction in dependence on a perceived likelihood of the temporal motion information prediction being useful. A further example of this may apply to encoders when it is known that complex motion (where the temporal prediction such as Temporal Motion Vector Prediction or use of the constructed Affine candidate may be particularly efficient) is being encoded. Examples of such cases include: a) A camera zooming in / out b) A portable camera (e.g. a mobile phone) changing orientation during filming (i.e a rotational movement) c) A 'fisheye' lens camera panning (e.g. a stretching / distortion of a portion of the image As such, an indication of complex motion may be raised during the recording process so that the Temporal Motion Vector Prediction may be given a higher likelihood of being used for the slice, sequence of frames or indeed the entire video stream.
In a further example, the temporal (motion information) prediction (such as Temporal Motion Vector Prediction) or use of the constructed Affine candidate may be given a higher likelihood of being used depending on a feature or functionality of the device used to record the video. For example, a mobile device may be more likely to change orientation than (say) a fixed security camera so the Temporal Motion Vector Prediction may be more appropriate for encoding video from the former. Examples of features or functionality include: the presence/use of zooming means, the presence/use of a positional sensor, the presence/use of panning means, whether or not the device is portable, or a user-selection on the device.
While the present invention has been described with reference to embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. It will be appreciated by those skilled in the art that various changes and modification might be made without departing from the scope of the invention, as defined in the appended claims. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
It is understood that, according to an embodiment of the invention, the same processes and method steps for deriving/obtaining TMVP candidate for the (regular) Merge list in any foregoing embodiments/variants can also be used to obtain/determine/derive motion information for a control point for a constructed Affine candidate for the subblock Merge candidate list because they are obtained/determined/derived using the same information (e.g. reference indices).
It is also understood that any result of comparison, determination, assessment, selection, execution, performing, or consideration described above, for example a selection made during an encoding or filtering process, may be indicated in or determinable/inferable from data in a bitstream, for example a flag or data indicative of the result, so that the indicated or determined/inferred result can be used in the processing instead of actually performing the comparison, determination, assessment, selection, execution, performing, or consideration, for example during a decoding process.
In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.
Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.
In the preceding embodiments, the functions described may be implemented in 10 hardware, software, firmware, or any combination thereof If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer-readable medium and executed by a hardware-based processing unit.
Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transient, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.

Claims (26)

  1. CLAIMS1. A method of deriving a motion information predictor candidate for a current image portion of an image, the method comprising: obtaining a history-based motion vector predictor from a set of one or more motion information predictor(s), each motion information predictor having been used to process an image portion; and deriving motion information for the motion information predictor candidate using motion information from the obtained history-based motion vector predictor.
  2. 2. The method of claim 1, wherein the deriving motion information for the motion information predictor candidate comprises: obtaining, when available, motion information associated with a first position and/or a second position in a first reference picture from a first set of one or more reference image(s) and, when available, motion information associated with a third position and/or a fourth position in a second reference picture from a second set of one or more reference image(s); and deriving the motion information for the motion information predictor candidate using the obtained motion information if the obtained motion information meets a condition.
  3. 3. The method of any preceding claim, wherein the derived motion information comprises information for identifying a reference image/picture from a set of one or more reference image(s).
  4. 4. The method of any preceding claim, wherein each motion information of the set of one or more motion information predictor(s) is selected or used to process a previous image portion preceding the current image portion in a processing order of image portions.
  5. The method of any preceding claim further comprising adding a motion information predictor to the set of one or more motion information predictor(s) after the motion information predictor has been selected or used to process a previous image portion, the previous image portion preceding the current image portion in a processing order of image portions.
  6. 6. The method of claim 5, wherein the motion information predictor is added to the set of one or more motion information predictor(s) in a First-In, First-Out basis.
  7. The method of claim 5 or 6 further comprising: determining whether the set of one or more motion information predictor(s) comprises a motion information predictor with the same motion information as the motion information predictor being considered for adding; and adding the motion information predictor to the set of one or more motion information predictor(s) when the set does not comprise the motion information predictor with the same motion information.
  8. The method of any one of claims 5 to 7, further comprising: checking whether the set of one or more motion information predictor(s) comprises any duplicate motion information; and removing one or more motion information predictor(s) with duplicate motion information from the set.
  9. 9. The method of any one of claims 5 to 8, wherein the set of one or more motion information predictor(s) is initialised when a first image portion among a row of image portions is processed.
  10. 10. The method of any one of claims 4 to 9, wherein the processing order is the order in which the image portions are encoded and/or decoded.
  11. 11. The method of any preceding claim, wherein the history-based motion information vector predictor comprises one or more of: a motion vector; information for identifying a set of one or more reference image(s); an index for identifying a reference image in the set of one more reference image(s); and a flag for indicating whether a motion vector or a reference image is available.
  12. 12. The method of any preceding claim, wherein the motion information predictor candidate is a Temporal Motion Vector Predictor candidate.
  13. 13. The method of any one of claims 1 to I I, wherein the motion information predictor candidate is a constructed Affine candidate.
  14. 14. A method of encoding an image comprising one or more image portions, the method comprising: obtaining a set of motion information predictor candidates; selecting, as a motion information predictor for an image portion, a motion information predictor candidate from the set; and encoding the image portion using the selected motion information prediction, wherein the set of motion information predictor candidates comprises a motion information predictor candidate derived according to any one of claims 1 to 13
  15. 15. The method of claim 14 further comprising: encoding information for identifying the selected motion information predictor; and providing, in a bitstream, data for obtaining the encoded information and/or the encoded image portion.
  16. 16. A method of decoding an image comprising one or more image portions, the method compri sing: obtaining a set of motion information predictor candidates; selecting, as a motion information predictor for an image portion, a motion information predictor candidate from the set; and decoding the image portion using the selected motion information prediction, wherein the set of motion information predictor candidates comprises a motion information predictor candidate derived according to any one of claims 1 to 13.
  17. 17. The method of claim 16 further comprising: obtaining, from a bitstream, data for decoding information for identifying the selected motion information predictor; and decoding the information.
  18. 18. A device comprising means for performing a method of deriving a motion information predictor candidate according to any one of claims 1 to 13.
  19. 19. A device comprising means for performing a method of encoding an image according to claim 14 or 15.
  20. 20. A device for encoding an image comprising one or more image portions, the device compri sing: means for obtaining a set of motion information predictor candidates; means for selecting, as a motion information predictor for an image portion, a motion information predictor candidate from the set; and means for encoding the image portion using the selected motion information prediction, wherein the set of motion information predictor candidates comprises a motion information predictor candidate derived according to any one of claims 1 to 13.
  21. 21. The device of claim 20 further comprising: means for encoding information for identifying the selected motion information predictor; and means for providing, in a bitstream, data for obtaining the encoded information and/or the encoded image portion.
  22. 22. A device comprising means for performing a method of decoding an image according to claim 16 or 17.
  23. 23. A device for decoding an image comprising one or more image portions, the device 15 comprising: means for obtaining a set of motion information predictor candidates; means for selecting, as a motion information predictor for an image portion, a motion information predictor candidate from the set; and means for decoding the image portion using the selected motion information prediction, wherein the set of motion information predictor candidates comprises a motion information predictor candidate derived according to any one of claims 1 to 13.
  24. 24. The device of 23 further comprising: means for obtaining, from a bitstream, data for decoding information for identifying the selected motion information predictor; and means for decoding the information.
  25. 25. A program which, when run on a computer or processor, causes the computer or processor to carry out the method of any one of claims 1 to 17.
  26. 26. A carrier medium carrying the program of claim 25.
GB1909056.2A 2019-06-24 2019-06-24 Video coding and decoding Withdrawn GB2585021A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1909056.2A GB2585021A (en) 2019-06-24 2019-06-24 Video coding and decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1909056.2A GB2585021A (en) 2019-06-24 2019-06-24 Video coding and decoding

Publications (2)

Publication Number Publication Date
GB201909056D0 GB201909056D0 (en) 2019-08-07
GB2585021A true GB2585021A (en) 2020-12-30

Family

ID=67511738

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1909056.2A Withdrawn GB2585021A (en) 2019-06-24 2019-06-24 Video coding and decoding

Country Status (1)

Country Link
GB (1) GB2585021A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110944190B (en) * 2018-09-22 2023-01-10 上海天荷电子信息有限公司 Encoding method and device, decoding method and device for image compression

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
(JVET-K0104) Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11; "CE4-related: History-based Motion Vector Prediction"; 11th Meeting: Ljubljana, SI, 10 18 July 2018; < http://phenix.it-sudparis.eu/jvet/ > *
(JVET-L0266-v1) Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11; "CE4: History-based Motion Vector Prediction (Test 4.4.7)"; 12th Meeting: Macao, CN, 3 12 Oct. 2018; < http://phenix.it-sudparis.eu/jvet/ > *
(JVET-M0125) Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11; "CE2: History Based Affine Motion Candidate (Test 2.2.3)"; 13th Meeting: Marrakech, MA, 9 18 Jan. 2019; < http://phenix.it-sudparis.eu/jvet/ > *
(JVET-M0126) Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11; "CE4: Modification on History-based Motion Vector Prediction"; 13th Meeting: Marrakech, MA, 9 18 Jan. 2019; < http://phenix.it-sudparis.eu/jvet/ > *
(JVET-M0266) Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11; "CE2-related: History-based affine merge candidates"; 13th Meeting: Marrakech, MA, 9 18 Jan. 2019; < http://phenix.it-sudparis.eu/jvet/ > *
(JVET-N0263-v1) Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11; CE2-5.5: "History-parameter-based affine model inheritance"; 14th Meeting: Geneva, CH, 19 27 March 2019; < http://phenix.it-sudparis.eu/jvet/ > *

Also Published As

Publication number Publication date
GB201909056D0 (en) 2019-08-07

Similar Documents

Publication Publication Date Title
CN113196769B (en) Encoding and decoding information related to motion information predictors
KR102408765B1 (en) Video coding and decoding
CN113056910B (en) Motion vector predictor index coding for video coding
GB2606281A (en) Video coding and decoding
US11849138B2 (en) Video coding and decoding
GB2585021A (en) Video coding and decoding
CN112868231B (en) Video encoding and decoding
GB2585022A (en) Video coding and decoding
WO2023198701A2 (en) Video coding and decoding
GB2597616A (en) Video coding and decoding
GB2589735A (en) Video coding and decoding
GB2606278A (en) Video coding and decoding
GB2606280A (en) Video coding and decoding

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)