WO2022117369A1 - Sélection d'image de référence de colocalisation adaptative au contenu - Google Patents

Sélection d'image de référence de colocalisation adaptative au contenu Download PDF

Info

Publication number
WO2022117369A1
WO2022117369A1 PCT/EP2021/082325 EP2021082325W WO2022117369A1 WO 2022117369 A1 WO2022117369 A1 WO 2022117369A1 EP 2021082325 W EP2021082325 W EP 2021082325W WO 2022117369 A1 WO2022117369 A1 WO 2022117369A1
Authority
WO
WIPO (PCT)
Prior art keywords
reference picture
picture
blocks
pictures
collocated
Prior art date
Application number
PCT/EP2021/082325
Other languages
English (en)
Inventor
Krit Panusopone
Seungwook Hong
Limin Wang
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2022117369A1 publication Critical patent/WO2022117369A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • H04N19/139Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/521Processing of motion vectors for estimating the reliability of the determined motion vectors or motion vector field, e.g. for smoothing the motion vector field or for correcting motion vectors

Definitions

  • An example embodiment relates generally to techniques in video coding, and, more particularly, to techniques for efficient management of temporal motion vector prediction processes.
  • VVC Versatile Video Coding
  • JVET Joint Video Experts Team
  • HEVC High Efficiency Video Coding
  • pictures that have been decoded earlier can be used for prediction of the image data of later pictures so that only the difference needs to be encoded.
  • this prediction e.g., temporal prediction
  • the earlier decoded pictures need to be stored, often temporarily, to the memory so that they can be quickly retrieved for prediction.
  • older reference pictures need to be dropped away from the memory to e.g. make space for new ones. That is, some new pictures that have been decoded are marked to indicate that they are to be used for reference, and kept in the reference picture memory.
  • Some decoded pictures are marked unused for reference and they can be dropped from the memory. To carry out this process effectively, the pictures to be used for reference can be signaled in the video transmission. However, additional signaling in the video transmission adds to the amount of data that needs to be sent.
  • a coded video sequence typically consists of intra coded pictures (e.g., I picture) and inter coded pictures (e.g., P and B pictures).
  • Intra coded pictures usually use many more bits than inter coded pictures.
  • Inter coded pictures are often more efficient since they employ temporal prediction to exploit temporal correlation across multiple pictures.
  • MV motion vectors
  • TMVP Temporal Motion Vector Prediction
  • VVC and HEVC support TMVP by storing MV of a decoded picture in buffer to be used in future pictures.
  • Both VVC and HEVC also include syntax to indicate reference picture to be used as collocated reference picture and semantics to determine collocated block position so that its MV is used as MV for TMVP.
  • VVC and HEVC allows any reference picture stored in reference buffer to be used as collocated reference picture, same collocated reference picture must be used for the entire picture.
  • a method, apparatus, and computer program product provide for content adaptive collocated reference picture selection.
  • simplified selection criteria are used to determine a level of temporal motion vector prediction (TMVP) support provided by two or more reference pictures from a picture sequence.
  • selection of a reference picture as collocation reference picture may be based at least on a comparison of a number of inter coded blocks (e.g., 4x4 blocks having valid motion vector data) of reference pictures where the reference picture having a greatest number of inter coded blocks may be selected as the collocation reference picture.
  • selection may also or alternatively comprise a comparison of a number of bi-prediction coded blocks and/or a temporal distance of reference pictures from a current picture of the picture sequence, quantization parameters, a reference list, sequential selection processes, weighted sum values, or the like.
  • an apparatus includes at least one processor and at least one memory including computer program code.
  • the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to determine, for a first reference picture in a picture sequence, one or more characteristics; determine, for a second reference picture in the picture sequence, the one or more characteristics; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, the collocated reference picture.
  • a picture sequence may comprise a set of reference pictures comprising the first reference picture and the second reference picture.
  • said selecting may comprise selecting, from among the set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the set of reference pictures may comprise all or substantially all reference pictures in the picture sequence.
  • the set of reference pictures may comprise the first reference picture, the second reference, picture, and one or more other reference pictures from the picture sequence.
  • the set of reference pictures may comprise only the first reference picture and the second reference picture.
  • apparatus may be configured to carry out operations such as described herein more than once or iteratively.
  • the apparatus may carry out the selecting step, in a first instance, based upon only a first portion of all reference pictures of the picture sequence and then carry out the selecting step, in a second instance, based upon the first portion of all reference pictures of the picture sequence and a second portion of all reference pictures of the picture sequence.
  • the apparatus may be configured to carry out the determining or selecting operations iteratively using the same or different set of reference pictures.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: generate an indication of the collocated reference picture; and include the indication of the collocated reference picture in metadata associated with the picture sequence.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: transmit, to a codec or a playback device, the collocated reference picture in metadata associated with the picture sequence.
  • the one or more characteristics comprises a number of coding blocks having valid motion vectors in reference pictures.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: initiate a counter to count the number of coding blocks having valid motion vectors in reference pictures.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: compare the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture; and select, based upon said comparison of the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: in an instance in which the first and second reference pictures are to be coded as I slice, determine that blocks of the first and second reference pictures are to be intra coded and determine that the first and second reference pictures should not be selected as a collocated reference picture.
  • the coding blocks are intra coded blocks.
  • the coding blocks are inter coded blocks.
  • the coding blocks are bi-prediction coded blocks.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: determine which of the first and second reference pictures is a shortest distance in the picture sequence from a current picture of the picture sequence; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon said determination of which of the first and second reference pictures is the shortest distance in the picture sequence from the current picture of the picture sequence, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: assign predetermined weights to one or more block statistics and coding information associated with the one or more blocks of the first reference picture and the one or more blocks of the second reference picture to determine a weighted sum value associated with each of the first and second reference pictures; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the weighted sum value of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: determine a quantization parameter for each of the first and second reference pictures; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the quantization parameter of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • an apparatus comprises: means for determining, for a first reference picture in a picture sequence, one or more characteristics; means for determining, for a second reference picture in the picture sequence, the one or more characteristics; and means for selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the apparatus can further comprise: means for generating an indication of the collocated reference picture; and means for including the indication of the collocated reference picture in metadata associated with the picture sequence.
  • the apparatus can further comprise: means for transmitting, to a codec or a playback device, the collocated reference picture in metadata associated with the picture sequence.
  • the one or more characteristics comprises a number of coding blocks having valid motion vectors in reference pictures.
  • the apparatus can further comprise: means for initiating a counter to count the number of coding blocks having valid motion vectors in reference pictures.
  • the apparatus can further comprise: means for comparing the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture; and means for selecting, based upon said comparison of the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the apparatus can further comprise: means for, in an instance in which the first and second reference pictures are to be coded as I slice, determining that blocks of the first and second reference pictures are to be intra coded and determine that the first and second reference pictures should not be selected as a collocated reference picture.
  • the coding blocks are intra coded blocks.
  • the coding blocks are inter coded blocks.
  • the coding blocks are bi-prediction coded blocks.
  • the apparatus can further comprise: means for determining which of the first and second reference pictures is a shortest distance in the picture sequence from a current picture of the picture sequence; and means for selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon said determination of which of the first and second reference pictures is the shortest distance in the picture sequence from the current picture of the picture sequence, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the apparatus can further comprise: means for assigning predetermined weights to one or more block statistics and coding information associated with the one or more blocks of the first reference picture and the one or more blocks of the second reference picture to determine a weighted sum value associated with each of the first and second reference pictures; and means for selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the weighted sum value of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the apparatus can further comprise: means for determining a quantization parameter for each of the first and second reference pictures; and means for selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the quantization parameter of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • a method comprises: determining, for a first reference picture in a picture sequence, one or more characteristics; determining, for a second reference picture in the picture sequence, the one or more characteristics; and selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the method can further comprise: generating an indication of the collocated reference picture; and including the indication of the collocated reference picture in metadata associated with the picture sequence.
  • the method can further comprise: transmitting, to a codec or a playback device, the collocated reference picture in metadata associated with the picture sequence.
  • the one or more characteristics comprises a number of coded blocks having valid motion vectors in reference pictures.
  • the method can further comprise: initiating a counter to count the number of coded blocks having valid motion vectors in reference pictures.
  • the method can further comprise: comparing the number of coded blocks having valid motion vectors in the first reference picture to the number of coded blocks having valid motion vectors in the second reference picture; and selecting, based upon said comparison of the number of coded blocks having valid motion vectors in the first reference picture to the number of coded blocks having valid motion vectors in the second reference picture, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the method can further comprise: in an instance in which the first and second reference pictures are to be coded as I slice, determining that blocks of the first and second reference pictures are to be intra coded and determine that the first and second reference pictures should not be selected as a collocated reference picture.
  • the coding blocks are intra coded blocks.
  • the coding blocks are inter coded blocks.
  • the coding blocks are bi-prediction coded blocks.
  • the method can further comprise: determining which of the first and second reference pictures is a shortest distance in the picture sequence from a current picture of the picture sequence; and selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon said determination of which of the first and second reference pictures is the shortest distance in the picture sequence from the current picture of the picture sequence, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the method can further comprise: assigning predetermined weights to one or more block statistics and coding information associated with the one or more blocks of the first reference picture and the one or more blocks of the second reference picture to determine a weighted sum value associated with each of the first and second reference pictures; and selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the weighted sum value of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the method can further comprise: determining a quantization parameter for each of the first and second reference pictures; and selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the quantization parameter of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • a computer program product comprises: a non-transitory computer readable storage medium having program code portions stored thereon.
  • the program code portions are configured, upon execution, to: determine, for a first reference picture in a picture sequence, one or more characteristics; determine, for a second reference picture in the picture sequence, the one or more characteristics; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the program code portions are configured, upon execution, to: generate an indication of the collocated reference picture; and include the indication of the collocated reference picture in metadata associated with the picture sequence.
  • the program code portions are configured, upon execution, to: transmit, to a codec or a playback device, the collocated reference picture in metadata associated with the picture sequence.
  • the one or more characteristics comprises a number of coding blocks having valid motion vectors in reference pictures.
  • the program code portions are configured, upon execution, to: initiate a counter to count the number of coding blocks having valid motion vectors in reference pictures .
  • the program code portions are configured, upon execution, to: compare the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture; and select, based upon said comparison of the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the program code portions are configured, upon execution, to: in an instance in which the first and second reference pictures are to be coded as I slice, determine that blocks of the first and second reference pictures are to be intra coded and determine that the first and second reference pictures should not be selected as a collocated reference picture.
  • the coding blocks are intra coded blocks.
  • the coding blocks are inter coded blocks.
  • the coding blocks are bi-prediction coded blocks.
  • the program code portions are configured, upon execution, to: determine which of the first and second reference pictures is a shortest distance in the picture sequence from a current picture of the picture sequence; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon said determination of which of the first and second reference pictures is the shortest distance in the picture sequence from the current picture of the picture sequence, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the program code portions are configured, upon execution, to: assign predetermined weights to one or more block statistics and coding information associated with the one or more blocks of the first reference picture and the one or more blocks of the second reference picture to determine a weighted sum value associated with each of the first and second reference pictures; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the weighted sum value of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the program code portions are configured, upon execution, to: determine a quantization parameter for each of the first and second reference pictures; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the quantization parameter of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • an apparatus if provided that comprises at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: determine, for each reference picture of a set of reference pictures from a picture sequence, one or more characteristics; and select, based upon at least said one or more characteristics, from among the set of reference pictures, a particular reference picture to be a collocated reference picture.
  • an encoder may compute a characteristic for each reference picture in a reference picture buffer of a current picture. In some embodiments, this computation can be done in any order (sequentially or in parallel).
  • the encoder may then select a particular reference picture with an optimal characteristic among all reference pictures in a reference picture buffer to be a collocated reference picture.
  • the apparatus may be configured for storing, in a buffer associated with a current picture from among the picture sequence, the collocated reference picture or an indication of the collocated reference picture.
  • an apparatus comprises means for determining, for each reference picture of a set of reference pictures from a picture sequence, one or more characteristics; and means for selecting, based upon at least said one or more characteristics, from among the set of reference pictures, a particular reference picture to be a collocated reference picture.
  • an encoder may compute a characteristic for each reference picture in a reference picture buffer of a current picture. In some embodiments, this computation can be done in any order (sequentially or in parallel). In some embodiments, after the characteristic of all reference pictures in a reference picture buffer are computed, the encoder may then select a particular reference picture with an optimal characteristic among all reference pictures in a reference picture buffer to be a collocated reference picture.
  • the apparatus may further comprise means for storing, in a buffer associated with a current picture from among the picture sequence, the collocated reference picture or an indication of the collocated reference picture.
  • a method can be carried out that comprises: determining, for each reference picture of a set of reference pictures from a picture sequence, one or more characteristics; and selecting, based upon at least said one or more characteristics, from among the set of reference pictures, a particular reference picture to be a collocated reference picture.
  • an encoder may compute a characteristic for each reference picture in a reference picture buffer of a current picture. In some embodiments, this computation can be done in any order (sequentially or in parallel).
  • the encoder may then select a particular reference picture with an optimal characteristic among all reference pictures in a reference picture buffer to be a collocated reference picture .
  • the method can further comprise storing, in a buffer associated with a current picture from among the picture sequence, the collocated reference picture or an indication of the collocated reference picture.
  • a computer program product comprises a non-transitory computer readable storage medium having program code portions stored thereon, the program code portions configured, upon execution, to: determine, for each reference picture of a set of reference pictures from a picture sequence, one or more characteristics; and select, based upon at least said one or more characteristics, from among the set of reference pictures, a particular reference picture to be a collocated reference picture.
  • an encoder may compute a characteristic for each reference picture in a reference picture buffer of a current picture. In some embodiments, this computation can be done in any order (sequentially or in parallel).
  • the encoder may then select a particular reference picture with an optimal characteristic among all reference pictures in a reference picture buffer to be a collocated reference picture.
  • the computer program product may comprise program code portions further configured, upon execution, to store, in a buffer associated with a current picture from among the picture sequence, the collocated reference picture or an indication of the collocated reference picture.
  • FIG. 1 is a block diagram of an example apparatus that may be specifically configured in accordance with an example embodiment of the present disclosure
  • FIG. 2 is an illustration of an example apparatus that may be specifically configured in accordance with an example embodiment of the present disclosure
  • FIG. 3 is a block diagram of an example apparatus that may be specifically configured in accordance with an example embodiment of the present disclosure
  • FIG. 4 is an illustration of an example system that may comprise one or more apparatus, such as those illustrated in FIGs. 1-4, that are specifically configured in accordance with an example embodiment of the present disclosure
  • FIG. 5A is an illustration of a picture sequence comprising reference pictures and a current picture, in accordance with an example embodiment of the present disclosure
  • FIG. 5B is an illustration of a reference picture comprising multiple coding blocks, in accordance with an example embodiment of the present disclosure
  • FIG. 5 C is an illustration of a subset of a picture sequence comprising two reference pictures having a sequential distance in the picture sequence from a current picture, in accordance with an example embodiment of the present disclosure
  • FIG. 6 is a flow diagram illustrating operations performed in accordance with an example embodiment
  • FIG. 7 is a flow diagram illustrating operations performed in accordance with an example embodiment
  • FIG. 8 is a flow diagram illustrating operations performed in accordance with an example embodiment
  • FIG. 9 is a flow diagram illustrating operations performed in accordance with an example embodiment
  • FIG. 10 is a flow diagram illustrating operations performed in accordance with an example embodiment
  • FIG. 11 is a flow diagram illustrating operations performed in accordance with an example embodiment
  • FIG. 12 is a flow diagram illustrating operations performed in accordance with an example embodiment.
  • FIG. 13 is a flow diagram illustrating operations performed in accordance with an example embodiment.
  • circuitry refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present.
  • This definition of ‘circuitry’ applies to all uses of this term herein, including in any claims.
  • circuitry also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware.
  • circuitry as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device (such as a core network apparatus), field programmable gate array, and/or other computing device.
  • VVC Versatile Video Coding
  • AVC Advanced Video Coding
  • HEVC High Efficiency Video Coding
  • all coded pictures may have a similar number of bits so that the encoder to decoder delay can be reduced to around one (1) picture interval.
  • intra coded pictures seem to not be ideal for (ultra) low delay applications.
  • an intra coded picture is needed at a random access point.
  • the H.264/AVC standard was developed by the Joint Video Team (JVT) of the Video Coding Experts Group (VCEG) of the Telecommunications Standardisation Sector of International Telecommunication Union (ITU-T) and the Moving Picture Experts Group (MPEG) of International Standardisation Organisation (ISO) / International Electrotechnical Commission (IEC).
  • JVT Joint Video Team
  • VCEG Video Coding Experts Group
  • MPEG Moving Picture Experts Group
  • ISO International Standardisation Organisation
  • IEC International Electrotechnical Commission
  • the H.264/AVC standard is published by both parent standardization organizations, and it is referred to as ITU-T Recommendation H.264 and ISO/IEC International Standard 14496-10, also known as MPEG- 4 Part 10 Advanced Video Coding (AVC).
  • AVC MPEG- 4 Part 10 Advanced Video Coding
  • SVC Scalable Video Coding
  • MVC Multiview Video Coding
  • HEVC High Efficiency Video Coding
  • JCT-VC Joint Collaborative Team - Video Coding
  • H.264/AVC and HEVC Some key definitions, bitstream and coding structures, and concepts of H.264/AVC and HEVC are described in this section as an example of a video encoder, decoder, encoding method, decoding method, and a bitstream structure, wherein the embodiments may be implemented.
  • Some of the key definitions, bitstream and coding structures, and concepts of H.264/AVC are the same as in the current working draft of HEVC - hence, they are described below jointly.
  • the aspects of the invention are not limited to H.264/AVC, VVC, or HEVC, but rather the description is given for one possible basis on top of which the invention may be partly or fully realized.
  • bitstream syntax and semantics as well as the decoding process for error-free bitstreams are specified in H.264/AVC and HEVC.
  • the encoding process is not specified, but encoders must generate conforming bitstreams.
  • Bitstream and decoder conformance can be verified with the Hypothetical Reference Decoder (HRD).
  • HRD Hypothetical Reference Decoder
  • the standards contain coding tools that help in coping with transmission errors and losses, but the use of the tools in encoding is optional and no decoding process has been specified for erroneous bitstreams.
  • the elementary unit for the input to an H.264/AVC or HEVC encoder and the output of an H.264/AVC or HEVC decoder, respectively, is a picture.
  • a picture may either be a frame or a field.
  • a picture is a frame.
  • a frame comprises a matrix of luma samples and corresponding chroma samples.
  • a field is a set of alternate sample rows of a frame and may be used as encoder input.
  • Chroma pictures may be subsampled when compared to luma pictures. For example, in the 4:2:0 sampling pattern the spatial resolution of chroma pictures is half of that of the luma picture along both coordinate axes.
  • a macroblock is a 16x16 block of luma samples and the corresponding blocks of chroma samples. For example, in the 4:2:0 sampling pattern, a macroblock contains one 8x8 block of chroma samples per each chroma component.
  • a picture is partitioned to one or more slice groups, and a slice group contains one or more slices.
  • a slice consists of an integer number of macroblocks ordered consecutively in the raster scan within a particular slice group.
  • a CU coding units
  • a CU consists of one or more prediction units (PU) defining the prediction process for the samples within the CU and one or more transform units (TU) defining the prediction error coding process for the samples in the said CU.
  • PU prediction units
  • TU transform units
  • a CU consists of a square block of samples with a size selectable from a predefined set of possible CU sizes.
  • a CU with the maximum allowed size is typically named as LCU (largest coding unit) and the video picture is divided into non-overlapping LCUs.
  • An LCU can be further split into a combination of smaller CUs, e.g. by recursively splitting the LCU and resultant CUs.
  • Each resulting CU typically has at least one PU and at least one TU associated with it.
  • Each PU and TU can be further split into smaller PUs and TUs in order to increase granularity of the prediction and prediction error coding processes, respectively.
  • the PU splitting can be realized by splitting the CU into four equal size square PUs or splitting the CU into two rectangle PUs vertically or horizontally in a symmetric or asymmetric way.
  • the division of the image into CUs, and division of CUs into PUs and TUs is typically signalled in the bitstream allowing the decoder to reproduce the intended structure of these units.
  • a picture can be partitioned in tiles, which are rectangular and contain an integer number of LCUs.
  • the partitioning to tiles forms a regular grid, where heights and widths of tiles differ from each other by one LCU at the maximum.
  • a slice consists of an integer number of CUs. The CUs are scanned in the raster scan order of LCUs within tiles or within a picture, if tiles are not in use. Within an LCU, the CUs have a specific scan order.
  • in-picture prediction may be disabled across slice boundaries.
  • slices can be regarded as a way to split a coded picture into independently decodable pieces, and slices are therefore often regarded as elementary units for transmission.
  • encoders may indicate in the bitstream which types of in-picture prediction are turned off across slice boundaries, and the decoder operation takes this information into account for example when concluding which prediction sources are available. For example, samples from a neighboring macroblock or CU may be regarded as unavailable for intra prediction, if the neighboring macroblock or CU resides in a different slice.
  • NAL Network Abstraction Layer
  • H.264/AVC and HEVC Fortransport over packet-oriented networks or storage into structured files, NAL units are typically encapsulated into packets or similar structures.
  • a bytestream format has been specified in H.264/AVC and HEVC for transmission or storage environments that do not provide framing structures. The bytestream format separates NAL units from each other by attaching a start code in front of each NAL unit.
  • encoders run a byte-oriented start code emulation prevention algorithm, which adds an emulation prevention byte to the NAL unit payload if a start code would have occurred otherwise.
  • start code emulation prevention is performed always regardless of whether the bytestream format is in use or not.
  • NAL units consist of a header and payload.
  • the NAL unit header indicates the type of the NAL unit and whether a coded slice contained in the NAL unit is a part of a reference picture or a non-reference picture.
  • H.264/AVC includes a 2 -bit nal_ref_idc syntax element, which when equal to 0 indicates that a coded slice contained in the NAL unit is a part of a non-reference picture and when greater than 0 indicates that a coded slice contained in the NAL unit is a part of a reference picture.
  • the draft HEVC includes a 1 -bit nal ref idc syntax element, which when equal to 0 indicates that a coded slice contained in the NAL unit is a part of a non-reference picture and when equal to 1 indicates that a coded slice contained in the NAL unit is a part of a reference picture.
  • the header for SVC and MVC NAL units additionally contains various indications related to the scalability and multiview hierarchy.
  • the NAL unit header includes the temporal id syntax element, which specifies a temporal identifier for the NAL unit.
  • the bitstream created by excluding all VCL NAL units having a temporal_id greater than or equal to a selected value and including all other VCL NAL units remains conforming.
  • a picture having temporal id equal to TID does not use any picture having a temporal id greater than TID as inter prediction reference.
  • the reference picture list initialization is limited to only reference picture marked as “used for reference” and having a temporal id less than or equal to the temporal id of the current picture.
  • NAL units can be categorized into Video Coding Layer (VCL) NAL units and non-VCL NAL units.
  • VCL NAL units are typically coded slice NAL units.
  • coded slice NAL units contain syntax elements representing one or more coded macroblocks, each of which corresponds to a block of samples in the uncompressed picture.
  • coded slice NAL units contain syntax elements representing one or more CU.
  • a coded slice NAL unit can be indicated to be a coded slice in an Instantaneous Decoding Refresh (IDR) picture or coded slice in a non-IDR picture.
  • IDR Instantaneous Decoding Refresh
  • a coded slice NAL unit can be indicated to be a coded slice in a Clean Decoding Refresh (CDR) picture (which may also be referred to as a Clean Random Access picture).
  • CDR Clean Decoding Refresh
  • a non-VCL NAL unit may be for example one of the following types: a sequence parameter set, a picture parameter set, a supplemental enhancement information (SEI) NAL unit, an access unit delimiter, an end of sequence NAL unit, an end of stream NAL unit, or a filler data NAL unit.
  • SEI Supplemental Enhancement Information
  • Parameter sets are essential for the reconstruction of decoded pictures, whereas many of the other non-VCL NAL units are not necessary for the reconstruction of decoded sample values.
  • Parameters that remain unchanged through a coded video sequence are included in a sequence parameter set.
  • the sequence parameter set may optionally contain video usability information (VUI), which includes parameters that are important for buffering, picture output timing, rendering, and resource reservation.
  • VUI video usability information
  • a picture parameter set contains such parameters that are likely to be unchanged in several coded pictures.
  • APS Adaptation Parameter Set
  • H.264/AVC and HEVC syntax allows many instances of parameter sets, and each instance is identified with a unique identifier.
  • each slice header includes the identifier of the picture parameter set that is active for the decoding of the picture that contains the slice, and each picture parameter set contains the identifier of the active sequence parameter set. Consequently, the transmission of picture and sequence parameter sets does not have to be accurately synchronized with the transmission of slices. Instead, it is sufficient that the active sequence and picture parameter sets are received at any moment before they are referenced, which allows transmission of parameter sets using a more reliable transmission mechanism compared to the protocols used for the slice data.
  • parameter sets can be included as a parameter in the session description for Real-time Transport Protocol (RTP) sessions. If parameter sets are transmitted in-band, they can be repeated to improve error robustness.
  • RTP Real-time Transport Protocol
  • An SEI NAL unit contains one or more SEI messages, which are not required for the decoding of output pictures but assist in related processes, such as picture output timing, rendering, error detection, error concealment, and resource reservation.
  • SEI messages are specified in H.264/AVC and HEVC, and the user data SEI messages enable organizations and companies to specify SEI messages for their own use.
  • H.264/AVC and HEVC contain the syntax and semantics for the specified SEI messages but no process for handling the messages in the recipient is defined.
  • encoders are required to follow the H.264/AVC standard or the HEVC standard when they create SEI messages, and decoders conforming to the H.264/AVC standard or the HEVC standard, respectively, are not required to process SEI messages for output order conformance.
  • One of the reasons to include the syntax and semantics of SEI messages in H.264/AVC and HEVC is to allow different system specifications to interpret the supplemental information identically and hence interoperate. It is intended that system specifications can require the use of particular SEI messages both in the encoding end and in the decoding end, and additionally the process for handling particular SEI messages in the recipient can be specified.
  • a coded picture is a coded representation of a picture.
  • a coded picture in H.264/AVC consists of the V CL NAL units that are required for the decoding of the picture .
  • a coded picture can be a primary coded picture or a redundant coded picture.
  • a primary coded picture is used in the decoding process of valid bitstreams, whereas a redundant coded picture is a redundant representation that should only be decoded when the primary coded picture cannot be successfully decoded. In the draft HEVC, no redundant coded picture has been specified.
  • an access unit consists of a primary coded picture and those NAL units that are associated with it.
  • the appearance order of NAL units within an access unit is constrained as follows.
  • An optional access unit delimiter NAL unit may indicate the start of an access unit. It is followed by zero or more SEI NAL units.
  • the coded slices of the primary coded picture appear next, followed by coded slices for zero or more redundant coded pictures.
  • a coded video sequence is defined to be a sequence of consecutive access units in decoding order from an IDR access unit, inclusive, to the next IDR access unit, exclusive, or to the end of the bitstream, whichever appears earlier.
  • a group of pictures is and its characteristics may be defined as follows.
  • a GOP can be decoded regardless of whether any previous pictures were decoded.
  • An open GOP is such a group of pictures in which pictures preceding the initial intra picture in output order might not be correctly decodable when the decoding starts from the initial intra picture of the open GOP.
  • pictures of an open GOP may refer (in inter prediction) to pictures belonging to a previous GOP.
  • An H.264/AVC decoder can recognize an intra picture starting an open GOP from the recovery point SEI message in an H.264/AVC bitstream.
  • An HEVC decoder can recognize an intra picture starting an open GOP, because a specific NAL unit type, CDRNAL unit type, is used for its coded slices .
  • a closed GOP is such a group of pictures in which all pictures can be correctly decoded when the decoding starts from the initial intra picture of the closed GOP. In other words, no picture in a closed GOP refers to any pictures in previous GOPs.
  • a closed GOP starts from an IDR access unit.
  • closed GOP structure has more error resilience potential in comparison to the open GOP structure, however at the cost of possible reduction in the compression efficiency.
  • Open GOP coding structure is potentially more efficient in the compression, due to a larger flexibility in selection of reference pictures.
  • the bitstream syntax of H.264/AVC and HEVC indicates whether a particular picture is a reference picture for inter prediction of any other picture.
  • Pictures of any coding type (I, P, B) can be reference pictures or non-reference pictures in H.264/AVC and HEVC.
  • the NAL unit header indicates the type of the NAL unit and whether a coded slice contained in the NAL unit is a part of a reference picture or a non-reference picture.
  • pixel or sample values in a certain picture area or “block” are predicted. These pixel or sample values can be predicted, for example, by motion compensation mechanisms, which involve finding and indicating an area in one of the previously encoded video frames that corresponds closely to the block being coded. Additionally, pixel or sample values can be predicted by spatial mechanisms which involve finding and indicating a spatial region relationship.
  • Prediction approaches using image information from a previously coded image can also be called as inter prediction methods which may be also referred to as temporal prediction and motion compensation. Prediction approaches using image information within the same image can also be called as intra prediction methods.
  • the second phase is one of coding the error between the predicted block of pixels or samples and the original block of pixels or samples. This may be accomplished by transforming the difference in pixel or sample values using a specified transform. This transform may be a Discrete Cosine Transform (DCT) or a variant thereof. After transforming the difference, the transformed difference is quantized and entropy encoded.
  • DCT Discrete Cosine Transform
  • the encoder can control the balance between the accuracy of the pixel or sample representation (i.e. the visual quality of the picture) and the size of the resulting encoded video representation (i.e. the file size or transmission bit rate).
  • the decoder reconstructs the output video by applying a prediction mechanism similar to that used by the encoder in order to form a predicted representation of the pixel or sample blocks (using the motion or spatial information created by the encoder and stored in the compressed representation of the image) and prediction error decoding (the inverse operation of the prediction error coding to recover the quantized prediction error signal in the spatial domain).
  • the decoder After applying pixel or sample prediction and error decoding processes the decoder combines the prediction and the prediction error signals (the pixel or sample values) to form the output video frame.
  • the decoder may also apply additional filtering processes in order to improve the quality of the output video before passing it for display and/or storing as a prediction reference for the forthcoming pictures in the video sequence.
  • motion information is indicated by motion vectors associated with each motion compensated image block.
  • Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder) or decoded (at the decoder) and the prediction source block in one of the previously coded or decoded images (or pictures).
  • H.264/AVC and HEVC as many other video compression standards, divides a picture into a mesh of rectangles, for each of which a similar block in one of the reference pictures is indicated for inter prediction. The location of the prediction block is coded as motion vector that indicates the position of the prediction block compared to the block being coded.
  • Inter prediction process may be characterized using one or more of the following factors.
  • motion vectors may be of quarter-pixel accuracy, and sample values in fractional-pixel positions may be obtained using a finite impulse response (FIR) filter.
  • FIR finite impulse response
  • Block partitioning for inter prediction Many coding standards, including H.264/AVC and HEVC, allow selection of the size and shape of the block for which a motion vector is applied for motion-compensated in the encoder, and indicating the selected size and shape in the bitstream so that decoders can reproduce the motion-compensated prediction done in the encoder.
  • the sources of inter prediction are previously decoded pictures.
  • Many coding standards including H.264/AVC and HEVC, enable storage of multiple reference pictures for inter prediction and selection of the used reference picture on block basis. For example, reference pictures may be selected on macroblock or macroblock partition basis in H.264/AVC and on PU or CU basis in HEVC.
  • Many coding standards, such as H.264/AVC and HEVC include syntax structures in the bitstream that enable decoders to create one or more reference picture lists.
  • a reference picture index to a reference picture list may be used to indicate which one of the multiple reference pictures is used for inter prediction for a particular block.
  • a reference picture index may be coded by an encoder into the bitstream is some inter coding modes or it may be derived (by an encoder and a decoder) for example using neighboring blocks in some other inter coding modes.
  • motion vector prediction In order to represent motion vectors efficiently in bitstreams, motion vectors may be coded differentially with respect to a block-specific predicted motion vector. In many video codecs, the predicted motion vectors are created in a predefined way, for example by calculating the median of the encoded or decoded motion vectors of the adjacent blocks. Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signalling the chosen candidate as the motion vector predictor. In addition to predicting the motion vector values, the reference index of previously coded/decoded picture can be predicted. The reference index is typically predicted from adjacent blocks and/or or co-located blocks in temporal reference picture. Differential coding of motion vectors is typically disabled across slice boundaries.
  • Multi-hypothesis motion-compensated prediction H.264/AVC and HEVC enable the use of a single prediction block in P slices (herein referred to as uni-predictive slices) or a linear combination of two motion-compensated prediction blocks for bi-predictive slices, which are also referred to as B slices.
  • Individual blocks in B slices may be bi-predicted, uni -predicted, or intra-predicted, and individual blocks in P or slices may be uni-predicted or intra-predicted.
  • the reference pictures for a bi- predictive picture are not limited to be the subsequent picture and the previous picture in output order, but rather any reference pictures can be used.
  • reference picture list 0 In many coding standards, such as H.264/AVC and HEVC, one reference picture list, referred to as reference picture list 0, is constructed for P slices, and two reference picture lists, list 0 and list 1, are constructed for B slices.
  • B slices when prediction in forward direction may refer to predicting from a reference picture in reference picture list 0, and prediction in backward direction may refer to predicting from a reference picture in reference picture list 1, even though the reference pictures for prediction may have any decoding or output order relation to each other or to the current picture.
  • Weighted prediction Many coding standards use a prediction weight of 1 for prediction blocks of inter (P) pictures and 0.5 for each prediction block of a B picture (resulting into averaging). H.264/AVC allows weighted prediction for both P and B slices. In implicit weighted prediction, the weights are proportional to picture order counts, while in explicit weighted prediction, prediction weights are explicitly indicated.
  • the prediction residual after motion compensation is first transformed with a transform kernel (like DCT) and then coded.
  • a transform kernel like DCT
  • each PU has prediction information associated with it defining what kind of a prediction is to be applied for the pixels within that PU (e.g. motion vector information for inter predicted PUs and intra prediction directionality information for intra predicted PUs).
  • each TU is associated with information describing the prediction error decoding process for the samples within the said TU (including e.g. DCT coefficient information). It is typically signalled at CU level whether prediction error coding is applied or not for each CU. In the case there is no prediction error residual associated with the CU, it can be considered there are no TUs for the said CU.
  • the prediction weight may be scaled according to the POC difference between the POC of the current picture and the POC of the reference picture.
  • a default prediction weight may be used, such as 0.5 in implicit weighted prediction for bi-predicted blocks.
  • Some video coding formats include the frame_num syntax element, which is used for various decoding processes related to multiple reference pictures.
  • the value of frame num for IDR pictures is 0.
  • the value of frame num for non-IDR pictures is equal to the frame num of the previous reference picture in decoding order incremented by 1 (in modulo arithmetic, i.e., the value of frame num wrap over to 0 after a maximum value of frame num).
  • H.264/AVC and HEVC include a concept of picture order count (POC).
  • a value of POC is derived for each picture and is non-decreasing with increasing picture position in output order. POC therefore indicates the output order of pictures.
  • POC may be used in the decoding process for example for implicit scaling of motion vectors in the temporal direct mode of bi-predictive slices, for implicitly derived weights in weighted prediction, and for reference picture list initialization. Furthermore, POC may be used in the verification of output order conformance. In H.264/AVC, POC is specified relative to the previous IDR picture or a picture containing a memory management control operation marking all pictures as “unused for reference”.
  • H.264/AVC and the draft HEVC specify the process for decoded reference picture marking in order to control the memory consumption in the decoder.
  • the maximum number of reference pictures used for inter prediction referred to as M, is determined in the sequence parameter set.
  • M The maximum number of reference pictures used for inter prediction
  • a reference picture is decoded, it is marked as “used for reference”. If the decoding of the reference picture caused more than M pictures marked as “used for reference”, at least one picture is marked as “unused for reference”.
  • the operation mode for decoded reference picture marking is selected on picture basis.
  • the adaptive memory control enables explicit signalling which pictures are marked as “unused for reference” and may also assign long-term indices to short-term reference pictures.
  • the adaptive memory control requires the presence of memory management control operation (MMCO) parameters in the bitstream. If the sliding window operation mode is in use and there are M pictures marked as “used for reference”, the short-term reference picture that was the first decoded picture among those short-term reference pictures that are marked as “used for reference” is marked as “unused for reference”. In other words, the sliding window operation mode results into first-in-first-out buffering operation among short-term reference pictures.
  • MMCO memory management control operation
  • One of the memory management control operations in H.264/AVC and HEVC causes all reference pictures except for the current picture to be marked as “unused for reference”.
  • An instantaneous decoding refresh (IDR) picture contains only intra-coded slices and causes a similar “reset” of reference pictures.
  • a Decoded Picture Buffer may be used in the encoder and/or in the decoder. There are two reasons to buffer decoded pictures, for references in inter prediction and for reordering decoded pictures into output order. As H.264/AVC and HEVC provide a great deal of flexibility for both reference picture marking and output reordering, separate buffers for reference picture buffering and output picture buffering may waste memory resources. Hence, the DPB may include a unified decoded picture buffering process for reference pictures and output reordering. A decoded picture may be removed from the DPB when it is no longer used as reference and needed for output.
  • the reference picture for inter prediction is indicated with an index to a reference picture list.
  • the index is coded with variable length coding, i.e., the smaller the index is, the shorter the corresponding syntax element becomes.
  • Two reference picture lists (reference picture list 0 and reference picture list 1) are generated for each bi-predictive (B) slice, and one reference picture list (reference picture list 0) is formed for each inter-coded (P) slice.
  • Typical high efficiency video codecs such as the draft HEVC codec employ an additional motion information coding/decoding mechanism, often called merging/merge mode/process/mechanism, where all the motion information of a block/PU is predicted and used without any modification/correction.
  • the aforementioned motion information for a PU comprises 1) The information whether ’the PU is uni -predicted using only reference picture listO’ or ’the PU is unipredicted using only reference picture listl ’ or ’the PU is bi -predicted using both reference picture listO and listl ’ 2) Motion vector value corresponding to the reference picture listO 3) Reference picture index in the reference picture listO 4) Motion vector value corresponding to the reference picture listl 5) Reference picture index in the reference picture listl.
  • predicting the motion information is carried out using the motion information of adjacent blocks and/or co-located blocks in temporal reference pictures.
  • a list is constructed by including motion prediction candidates associated with available adjacent/co-located blocks and the index of selected motion prediction candidate in the list is signaled. Then the motion information of the selected candidate is copied to the motion information of the current PU.
  • this type of coding/decoding the CU is typically named as skip mode (or skip direct mode) or merge based skip mode.
  • the merge mechanism is also employed for individual PUs (not necessarily the whole CU as in skip mode) and in this case, prediction residual may be utilized to improve prediction quality.
  • This type of prediction mode is typically named as inter-merge mode.
  • a reference picture list such as reference picture list 0 and reference picture list 1 is typically constructed in two steps: First, an initial reference picture list is generated.
  • the initial reference picture list may be generated for example on the basis of frame num, POC, temporal id, or information on the prediction hierarchy such as GOP structure, or any combination thereof.
  • the initial reference picture list may be reordered by reference picture list reordering (RPLR) commands contained in slice headers.
  • the RPLR commands indicate the pictures that are ordered to the beginning of the respective reference picture list.
  • This second step may also referred to as the reference picture list modification process and the RPLR commands may be included in a reference picture list modification syntax structure.
  • the merge list may be generated on the basis of reference picture list 0 and/or reference picture list 1 for example using the reference picture lists combination syntax structure included in the slice header syntax.
  • There may be a reference picture lists combination syntax structure, created into the bitstream by an encoder and decoded from the bitstream by a decoder, which indicates the contents of the merge list.
  • the syntax structure may indicate that the reference picture list 0 and the reference picture list 1 are combined to be an additional reference picture lists combination used for the prediction units being uni-directional predicted.
  • the syntax structure may include a flag which, when equal to a certain value, indicates that the reference picture list 0 and reference picture list 1 are identical thus reference picture list 0 is used as the reference picture lists combination.
  • the syntax structure may include a list of entries, each specifying a reference picture list (list 0 or list 1) and a reference index to the specified list, where an entry specifies a reference picture to be included in the merge list.
  • a syntax structure for reference picture marking may exist in a video coding system.
  • the decoded reference picture marking syntax structure if present, may be used to adaptively mark pictures as “unused for reference” or “used for long-term reference”. If the decoded reference picture marking syntax structure is not present and the number of pictures marked as “used for reference” can no longer increase, a sliding window reference picture marking may be used, which basically marks the earliest (in decoding order) decoded reference picture as unused for reference.
  • the reference picture marking arrangement may be modified in a certain manner so that in the same process, at least one reference picture list may be constructed and/or managed.
  • mere addition of a list management system to the marking arrangement would lead to a system with inefficiencies in terms of coding.
  • efficiencies have surprisingly been achieved by employing a synergy between the reference picture marking process and a list management process and/or by employing various other coding improvements.
  • a reference picture lists syntax structure may include three parts, reference picture list 0 description for P and B slices, reference picture list 1 description for B slices, and idle reference picture list description for any slices including those reference pictures that are not included in either reference picture list 0 or 1 but are still to be kept marked as “used for reference”.
  • there may e.g. be one syntax structure (instead of more than one) that provides the information for both reference picture marking and reference picture list construction.
  • the reference picture lists syntax structure may be parsed.
  • the syntax structure includes a reference picture list description for list 0, which is decoded.
  • the reference picture list description syntax structure may list pictures identified by their picture order count (POC) value in the order they appear in the reference picture list.
  • the reference picture lists syntax structure may include a reference picture list description for list 1, which is decoded.
  • a reference picture list initialization process and/or reference picture list modification process may be omitted, and the reference picture lists may be directly described in the syntax structures.
  • the reference picture lists syntax structure may include a reference picture list description for an idle reference picture list, which, if present, is decoded.
  • Pictures that are in any of the reference picture lists may be marked as “used for reference”. Pictures that are in no reference picture list may be marked as “unused for reference”.
  • a reference picture list construction and reference picture marking processes and syntax structures may be handled in a single unified process and syntax structure.
  • a coded video sequence consists of intra coded pictures (e.g. I picture) and inter coded pictures (e.g. P and B pictures). Intra coded pictures usually use many more bits than inter coded pictures. Inter coded pictures are more efficient since they employ temporal prediction to exploit temporal correlation across multiple pictures.
  • HEVC High Efficiency Video Coding
  • WC Versatile Video Coding
  • motion vectors (MV) from similar location (collocated position) in the temporally close picture to the current coding block position may be highly correlated to the MV of the current coding block since motion is generally similar across multiple picture.
  • a difference between MV from a collocated position and MV of a current coding block may be small.
  • Temporal Motion Vector Prediction (TMVP) is an inter coding tool that exploits this behavior by setting motion vector prediction (MVP) candidate based on MV at collocated position.
  • VVC and HEVC support TMVP by storing MV of decoded picture in buffer to be used in future picture. Both VVC and HEVC also include syntax to indicate reference picture to be used as collocated reference picture and semantics to determine collocated block position so that its MV is used as MV for TMVP. Although VVC and HEVC allows any reference picture stored in reference buffer to be used as collocated reference picture, same collocated reference picture must be used for the entire picture.
  • VVC Versatile Video Coding
  • VTM Test Model
  • Selection criteria based on QP is simple, yet it does not find optimal collocated reference picture in case that QP information is unreliable or is constant for all pictures in a sequence.
  • rate control is necessary to ensure that bit rate doesn’t exceed allocated network bandwidth.
  • video encoder must adjust bit rate to meet target bit rate and QP is normally adjusted at coding unit (CU) level and slice level to meet this goal.
  • CU coding unit
  • slice level to meet this goal.
  • QP at picture level used in VTM reference software is unreliable since picture level QP is used as initial QP, not the final QP in video encoder.
  • Using QP as decision criteria for collocated reference picture selection is hence suboptimal.
  • Methods, apparatuses, and computer program products are described herein for improving coding efficiency of TMVP by selecting a collocated reference picture adaptively according to content characteristic(s).
  • simple selection criteria can be used to determine the collocated reference picture. For instance, instead of relying on QP, which can be unreliable in some applications, ratio of blocks coded using inter coding modes may be used.
  • example apparatuses are described herein for carrying out the described methods, it should be understood that other apparatuses may be configured to carry out the described methods, and that the present disclosure entirely contemplates these other apparatuses configured to carry out the described methods.
  • FIG. 1 illustrates an example apparatus 100 that may be configured to carry out operations in accordance with embodiments described herein.
  • the apparatus includes, is associated with or is in communication with processing circuity 102, a memory 104 and a communication interface 106.
  • the processing circuitry 102 may be in communication with the memory via a bus for passing information among components of the apparatus.
  • the memory may be non- transitory and may include, for example, one or more volatile and/or non-volatile memories.
  • the memory may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g., a computing device like the processing circuitry).
  • the memory may be configured to store information, data, content, applications, instructions, or the like for enabling the apparatus to carry out various functions in accordance with an example embodiment of the present disclosure.
  • the memory could be configured to buffer input data for processing by the processing circuitry. Additionally, or alternatively, the memory could be configured to store instructions for execution by the processing circuitry.
  • the apparatus 100 may, in some embodiments, be embodied in various computing devices. However, in some embodiments, the apparatus may be embodied as a chip or chip set. In other words, the apparatus may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon. The apparatus may therefore, in some cases, be configured to implement an embodiment of the present disclosure on a single chip or as a single “system on a chip.” As such, in some cases, a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
  • a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
  • the processing circuitry 102 may be embodied in a number of different ways.
  • the processing circuitry may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like.
  • the processing circuitry may include one or more processing cores configured to perform independently.
  • a multi-core processing circuitry may enable multiprocessing within a single physical package.
  • the processing circuitry may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.
  • the processing circuitry 102 may be configured to execute instructions stored in the memory device 104 or otherwise accessible to the processing circuitry. Alternatively, or additionally, the processing circuitry may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processing circuitry may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the present disclosure while configured accordingly. Thus, for example, when the processing circuitry is embodied as an ASIC, FPGA or the like, the processing circuitry may be specifically configured hardware for conducting the operations described herein.
  • the processing circuitry when the processing circuitry is embodied as an executor of instructions, the instructions may specifically configure the processor to perform the algorithms and/or operations described herein when the instructions are executed.
  • the processing circuitry may be a processor of a specific device (e.g., an image or video processing system) configured to employ one embodiment by further configuration of the processing circuitry by instructions for performing the algorithms and/or operations described herein.
  • the processing circuitry may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processing circuitry.
  • ALU arithmetic logic unit
  • the communication interface 106 may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data, including media content in the form of video or image files, one or more audio tracks or the like.
  • the communication interface may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network.
  • the communication interface may include the circuitry for interacting with the antenna(s) to cause transmission of signals via the antenna(s) or to handle receipt of signals received via the antenna(s).
  • the communication interface may alternatively or also support wired communication.
  • the communication interface may include a communication modem and/or other hardware/software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB) or other mechanisms.
  • the apparatus 100 may be configured in accordance with an architecture for providing for video encoding, decoding, and/or compression.
  • the apparatus 100 may be configured as a video coding device.
  • the apparatus 100 may be configured to code video in accordance with one or more video compression standards, such as, for example, the VVC Specification. While certain embodiments herein refer to operations associated with the VVC standard, it is to be appreciated that the processes discussed herein may be utilized for any video coding standard.
  • a method is provided for VVC or HEVC that allowed for setting a (e.g., one) reference picture in a reference buffer to be used as collocated reference picture for TMVP.
  • a good reference picture for collocated reference picture may allow for a higher chance of TMVP to be used.
  • efficient algorithms may be described for analyzing one or more reference pictures to determine a picture that encourages TMVP usage to be used as a collocated reference picture.
  • a simple statistic may be used to determine a level of TMVP support by one or more reference pictures.
  • a method may comprise determining (e.g., measuring) a number of inter coded blocks that have a valid MV, as an indicator or metric for facilitating collocated reference picture selection.
  • methods described herein may comprise counting a number of 4x4 blocks in a reference picture (e.g., each reference picture) that is inter coded.
  • a counter may be provided for counting the number of 4x4 blocks in a reference picture.
  • the counter for the one or more reference pictures is/are then compared against corresponding counters (of 4x4 block count) for one or more other reference pictures.
  • the method can further comprise determining a picture with a highest number of coded blocks (e.g., inter coded blocks, 4x4 inter coded blocks, biprediction coded blocks, etc.), which may be selected as the collocated reference picture.
  • the determination of which reference picture to select as the collocation reference picture may be further facilitated or based upon at least one or more other characteristics of the reference pictures, such as their relative distance in the picture sequence from a current picture, a weighted sum value associated with the reference pictures, and/or other suitable characteristics.
  • the described method may be effective at least because it adjusts in accordance with the nature of the content of each inter coded picture. For example, if the reference picture is coded as an I slice, then all its blocks (e.g., 4x4 blocks) are intra coded and using this reference picture as the collocated reference picture may not have meaning at least because TMVP typically is not used with this collocated reference picture.
  • the 4x4 block measurement (and related approach for the determination of the collocated reference picture) may, in some embodiments, be independent of the QP setting and may be reliably used with rate control functionality or with fixed QP for all pictures in a picture sequence.
  • a related statistic such as a number of intra coded blocks, a number of bi-prediction coded blocks, or the like, can be used instead of a number of inter coded blocks.
  • a number of intra coded blocks can be used for the determination or selection of a collocated reference picture according to a content adaptive collocated reference picture selection algorithm, such as those described herein.
  • a number of inter prediction blocks, a number of bi-prediction coded blocks can be used for the determination or selection of a collocated reference picture according to a content adaptive collocated reference picture selection algorithm, such as those described herein.
  • a number of skip blocks may be used, alone or together with one or more other statistic or metric discussed herein, for the determination or selection of a collocated reference picture according to a content adaptive collocated reference picture selection algorithm
  • a content adaptive collocated reference picture selection algorithm One of skill in the art, however, will understand that other metrics, indications, or values may be used to determine a reference picture from a picture sequence to be used as the collocated reference picture.
  • the method can further comprise storing the collocated reference picture or an indication of the collocated reference picture.
  • an apparatus can be provided to carry out one or more elements of the method(s) described.
  • the apparatus can be configured to cause the collocated reference picture or an indication of the collocated reference picture in a non-volatile or volatile memory or a buffer.
  • the method can further comprise transmitting the sequence of pictures or a portion thereof, with an indication of the collocated reference picture from the sequence of pictures.
  • a collocated reference picture selection algorithm can be improved by combining one or more block statistics (e.g., a number of inter coded blocks) with other coding information.
  • factors impacting TMVP usage may include but are not limited to: a temporal distance of a reference picture from a reference point (e.g., from a current picture) in a picture sequence, quantization parameters (QP), a reference list, or the like.
  • QP quantization parameters
  • Two nonlimiting examples of multiple factors collocated reference picture are sequential selection and weighted sum decision.
  • a sequential selection method effectively decides on collocated reference picture by performing a series of comparisons in order, according to some embodiments.
  • An example for comparison order is to pick reference picture with shortest temporal distance to the current picture. If there is more than one such picture, then pick reference picture with the highest number of inter coded block. If there is more than one picture with the highest number of inter coded block, then pick reference picture with the highest QP. If there is more than one picture with the highest QP, then pick reference picture with lowest index in reference list 0.
  • a weighted sum decision method decides on collocated reference picture by measuring combined sum of each reference picture and the one with highest combined sum is selected to be collocated reference picture, according to some embodiments.
  • a combined sum is computed by assigning predetermined weights to block statistic and coding information and then adding all the weighted values.
  • predetermined weights is 1 for number of a block statistic, (a number of 4x4 blocks in a picture)/64 for QP, and a number of 4x4 blocks in a picture for absolute temporal distance.
  • an initial collocation reference picture may be selected from among the reference pictures identified in the first portion of the picture sequence, the first portion of the picture sequence may be encoded and the collocation reference picture or an identifier of the collocation reference picture may be stored in a reference buffer or identified in metadata associated with the encoded portion of the picture sequence, and then one or more subsequent collocation reference pictures may be selected from among the reference pictures identified in the one or more subsequent portions of the picture sequence once the one or more subsequent portions of the picture sequence is/are received, extracted, or provided for encoding.
  • FIG. 2 shows a layout of an apparatus 50 according to an example embodiment.
  • FIG. 3 shows a block diagram of the apparatus 50 according to an example embodiment.
  • FIG. 4 shows a block diagram of a video coding system according to an example embodiment as a schematic block diagram of an exemplary apparatus or electronic device (e.g., the apparatus 50), which may incorporate a codec, playback device, or the like, according to at least some of the embodiments described herein.
  • the elements of FIGs. 2-4 will be explained next.
  • the electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system. However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may require encoding and decoding or encoding or decoding video images.
  • the apparatus 50 may comprise a housing 30 for incorporating and protecting the device.
  • the apparatus 50 further may comprise a display 32 in the form of a liquid crystal display.
  • the display may be any suitable display technology suitable to display an image or video.
  • the apparatus 50 may further comprise a keypad 34.
  • any suitable data or user interface mechanism may be employed.
  • the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
  • the apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input.
  • the apparatus 50 may further comprise an audio output device which, in some embodiments, may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection.
  • the apparatus 50 may also comprise a battery 40 (or, in other embodiments, the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator).
  • the apparatus may further comprise an infrared port 42 for short range line of sight communication to other devices.
  • the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/firewire wired connection.
  • the apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50.
  • the controller 56 may be connected to memory 58 which, in some embodiments, may store both data in the form of image and audio data and/or may also store instructions for implementation on the controller 56.
  • the controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller 56.
  • the apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
  • the apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network.
  • the apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).
  • the apparatus 50 comprises a camera capable of recording or detecting individual frames which are then passed to the codec 54 or controller for processing.
  • the apparatus may receive the video image data for processing from another device prior to transmission and/or storage.
  • the apparatus 50 may receive either wirelessly or by a wired connection the image for coding/decoding.
  • the system 10 may comprise an arrangement for video coding comprising a plurality of apparatuses, networks and network elements according to an example embodiment.
  • the system 10 comprises multiple communication devices which can communicate through one or more networks.
  • the system 10 may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM, UMTS, CDMA network, etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.
  • a wireless cellular telephone network such as a GSM, UMTS, CDMA network, etc.
  • WLAN wireless local area network
  • the system 10 may include both wired and wireless communication devices or apparatus 50 suitable for implementing embodied methods described herein.
  • the system shown in FIG. 4 shows a mobile telephone network 11 and a representation of the internet 28.
  • Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.
  • the example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22.
  • PDA personal digital assistant
  • IMD integrated messaging device
  • the apparatus 50 may be stationary or mobile when carried by an individual who is moving.
  • the apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle or any similar suitable mode of transport.
  • Some or further apparatuses may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24.
  • the base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the internet 28.
  • the system may include additional communication devices and communication devices of various types.
  • the communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocolinternet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.11 and any similar wireless communication technology.
  • CDMA code division multiple access
  • GSM global systems for mobile communications
  • UMTS universal mobile telecommunications system
  • TDMA time divisional multiple access
  • FDMA frequency division multiple access
  • TCP-IP transmission control protocolinternet protocol
  • SMS short messaging service
  • MMS multimedia messaging service
  • email instant messaging service
  • Bluetooth IEEE 802.11 and any similar wireless communication technology.
  • a communications device involved in implementing various embodiments of the present disclosure may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection.
  • FIGs. 5A-5C illustrate an example approach for adaptive selection of a collocation reference picture from among a picture sequence 200 comprising a plurality of pictures 202.
  • a current picture 204 may be identified and multiple reference pictures 206 may be identified, such as 206a, 206b, 206c, and 206d.
  • the reference pictures may comprise coding blocks 208, such as 4x4 coded blocks, which may be inter coding blocks, intra coding blocks, or the like.
  • the coding blocks 208 may support prediction methods, such as temporal motion vector prediction (TMVP).
  • TMVP temporal motion vector prediction
  • an approach such as that described elsewhere herein, may be used to compare the number of coding blocks 208 in each reference picture 206 to select a reference picture 206 (e.g., 206c) to be the collocation reference picture.
  • a reference picture 206 e.g., 206c
  • other values, factors, characteristics, or considerations may be used in addition to or instead of coding block count to select a reference picture 206 to be the collocation reference picture, such as a sequential distance of each reference picture 206 from the current picture 204 in the picture sequence 200.
  • the reference picture 206c is distance D 1 from the current picture 204 while reference picture 206d is distance D2 from the current picture 204, distance D2 being greater than distance DI.
  • the reference picture 206c may be selected as the collocation reference picture.
  • a flowchart includes operations of a method 300 that may be carried out by the apparatus 100, in some embodiments.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for a first reference picture in a picture sequence, one or more characteristics.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for a second reference picture in the picture sequence, the one or more characteristics.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for selecting, based upon at least said one or more characteristics of the first and second reference pictures in the picture sequence, from among a set of reference blocks comprising at least the first reference picture and the second reference picture, a collocated reference picture.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more other reference pictures in the picture sequence, the one or more characteristics.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, the one or more characteristics.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more of the first reference picture, the second reference picture, and the one or more other reference pictures in the picture sequence, one or more other characteristics.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, the one or more other characteristics.
  • a flowchart includes operations of a method 400 that may be carried out by the apparatus 100, in some embodiments.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for a first reference picture in a picture sequence, one or more characteristics.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for a second reference picture in the picture sequence, the one or more characteristics.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for selecting, based upon at least said one or more characteristics of the first and second reference pictures in the picture sequence, from among a set of reference blocks comprising at least the first reference picture and the second reference picture, a collocated reference picture.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for generating an indication of the collocated reference picture.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for including the indication of the collocated reference picture in metadata associated with the picture sequence.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more other reference pictures in the picture sequence, the one or more characteristics. In some embodiments, the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, the one or more characteristics. In some embodiments, the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more of the first reference picture, the second reference picture, and the one or more other reference pictures in the picture sequence, one or more other characteristics. In some embodiments, the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, the one or more other characteristics.
  • a flowchart includes operations of a method 500 that may be carried out by the apparatus 100, in some embodiments.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for a first reference picture in a picture sequence, one or more characteristics.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for a second reference picture in the picture sequence, the one or more characteristics.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for selecting, based upon at least said one or more characteristics of the first and second reference pictures in the picture sequence, from among a set of reference blocks comprising at least the first reference picture and the second reference picture, a collocated reference picture.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for transmitting, to a codec or a playback device, the collocated reference picture or an indication of the collocated reference picture in metadata associated with the picture sequence.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more other reference pictures in the picture sequence, the one or more characteristics. In some embodiments, the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, the one or more characteristics. In some embodiments, the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more of the first reference picture, the second reference picture, and the one or more other reference pictures in the picture sequence, one or more other characteristics. In some embodiments, the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, the one or more other characteristics.
  • a flowchart includes operations of a method 600 that may be carried out by the apparatus 100, in some embodiments.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for a first reference picture in a picture sequence, a number of coding blocks that support a prediction method, such as temporal motion vector prediction (TMVP).
  • TMVP temporal motion vector prediction
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for a second reference picture in the picture sequence, the number of coding blocks that support the prediction method, e.g., TMVP.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for comparing the number of coding blocks supporting the prediction method, e.g., TMVP, in the first and second reference pictures.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for selecting, based upon said comparison of the number of coding blocks supporting the prediction method, e.g., TMVP, in the first and second reference pictures, from among a set of reference blocks comprising at least the first reference picture and the second reference picture, a collocated reference picture.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more other reference pictures in the picture sequence, the number of coding blocks that support the prediction method, e.g., TMVP.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, the number of coding blocks supporting the prediction method, e.g., TMVP.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more of the first reference picture, the second reference picture, and the one or more other reference pictures in the picture sequence, one or more other characteristics.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, the one or more other characteristics.
  • a flowchart includes operations of a method 700 that may be carried out by the apparatus 100, in some embodiments.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for each of a first and second reference picture in a picture sequence, a number of coding blocks that support prediction methods, such as temporal motion vector prediction (TMVP).
  • TMVP temporal motion vector prediction
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining which of the first and second reference pictures is a shortest distance in the picture sequence from a current picture of the picture sequence.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon said determination of which of the first and second reference pictures is the shortest distance in the picture sequence from the current picture of the picture sequence, from among a set of reference blocks comprising at least the first reference picture and the second reference picture, a collocated reference picture.
  • means such as the processing circuitry 102, memory 104, and/or the like, configured for selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon said determination of which of the first and second reference pictures is the shortest distance in the picture sequence from the current picture of the picture sequence, from among a set of reference blocks comprising at least the first reference picture and the second reference picture, a collocated reference picture.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more other reference pictures in the picture sequence, a distance in the picture sequence from the current picture of the picture sequence, and selecting, based upon which of the reference pictures is the shortest distance in the picture sequence from the current picture, a particular reference picture as the collocated reference picture.
  • means such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more other reference pictures in the picture sequence, a distance in the picture sequence from the current picture of the picture sequence, and selecting, based upon which of the reference pictures is the shortest distance in the picture sequence from the current picture, a particular reference picture as the collocated reference picture.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, the distance in the picture sequence from the current picture of the picture sequence, and selecting, based upon which of the reference pictures is the shortest distance in the picture sequence from the current picture, a particular reference picture as the collocated reference picture.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more of the first reference picture, the second reference picture, and the one or more other reference pictures in the picture sequence, one or more other characteristics.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, the one or more other characteristics. In some embodiments, the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for selecting, based upon at least said one or more other characteristics, a particular reference picture as the collocated reference picture.
  • a flowchart includes operations of a method 800 that may be carried out by the apparatus 100, in some embodiments.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for each of a first and second reference picture in a picture sequence, a number of coding blocks that support a prediction method, such as temporal motion vector prediction (TMVP).
  • TMVP temporal motion vector prediction
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for assigning predetermined weights to one or more block statistics and coding information associated with the one or more blocks of the first reference picture and the one or more blocks of the second reference picture to determine a weighted sum value associated with each of the first and second reference pictures.
  • means such as the processing circuitry 102, memory 104, and/or the like, configured for assigning predetermined weights to one or more block statistics and coding information associated with the one or more blocks of the first reference picture and the one or more blocks of the second reference picture to determine a weighted sum value associated with each of the first and second reference pictures.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for selecting, based upon at least the number of coding blocks of the first and second reference pictures that support the prediction method, e.g., TMVP, and the weighted sum values of at least the first and second reference pictures, from among a set of reference blocks comprising at least the first reference picture and the second reference picture, a collocated reference picture.
  • the processing circuitry 102 e.g., memory 104, and/or the like, configured for selecting, based upon at least the number of coding blocks of the first and second reference pictures that support the prediction method, e.g., TMVP, and the weighted sum values of at least the first and second reference pictures, from among a set of reference blocks comprising at least the first reference picture and the second reference picture, a collocated reference picture.
  • TMVP prediction method
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more other reference pictures in the picture sequence, the weighted sum value associated with the one or more other reference pictures, and selecting, based upon said weighted sum values associated with the first, second, and one or more other reference pictures, a particular reference picture as the collocated reference picture.
  • means such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more other reference pictures in the picture sequence, the weighted sum value associated with the one or more other reference pictures, and selecting, based upon said weighted sum values associated with the first, second, and one or more other reference pictures, a particular reference picture as the collocated reference picture.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, respective weighted sum values associated with all reference pictures in the picture sequence, and selecting, based upon said weighted sum values associated with all reference pictures in the picture sequence, a particular reference picture as the collocated reference picture.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more of the first reference picture, the second reference picture, and the one or more other reference pictures in the picture sequence, one or more other characteristics.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, the one or more other characteristics. In some embodiments, the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for selecting, based upon at least said one or more other characteristics, a particular reference picture as the collocated reference picture.
  • a flowchart includes operations of a method 900 that may be carried out by the apparatus 100, in some embodiments.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for each of a first and second reference picture in a picture sequence, a number of coding blocks that support a prediction method, such as temporal motion vector prediction (TMVP).
  • TMVP temporal motion vector prediction
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining a quantization parameter for each of the first and second reference pictures.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for selecting, based upon at least said number of coding blocks that support the prediction method, e.g., TMVP, for at least the first and second reference pictures and the quantization parameter of at least the first and second reference pictures, from among a set of reference blocks comprising at least the first reference picture and the second reference picture, a collocated reference picture.
  • TMVP prediction method
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining the quantization parameter for one or more other reference pictures in the picture sequence, and selecting, based upon said quantization parameters, from among the first, second, and one or more other reference pictures, a particular reference picture as the collocated reference picture.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining the quantization parameter for all reference pictures in the picture sequence, and selecting, based upon said quantization parameters, from among all reference pictures of the picture sequence, a particular reference picture as the collocated reference picture.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for one or more of the first reference picture, the second reference picture, and the one or more other reference pictures in the picture sequence, one or more other characteristics.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for all reference pictures in the picture sequence, the one or more other characteristics.
  • the apparatus 100 may include means, such as the processing circuitry 102, memory 104, and/or the like, configured for selecting, based upon at least said one or more other characteristics, a particular reference picture as the collocated reference picture.
  • a flowchart includes operations of a method 1000 that may be carried out by the apparatus 100, in some embodiments.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for determining, for each reference picture of a set of reference pictures from a picture sequence, one or more characteristics.
  • the apparatus 100 includes means, such as the processing circuitry 102, memory 104, and/or the like, configured for selecting, based upon at least said one or more characteristics, from among the set of reference pictures, a particular reference picture to be a collocated reference picture.
  • an encoder may compute a characteristic for each reference picture in a reference picture buffer of a current picture.
  • this computation can be done in any order (sequentially or in parallel).
  • the encoder may then select a particular reference picture with an optimal characteristic among all reference pictures in a reference picture buffer to be a collocated reference picture.
  • the apparatus 100 may, optionally, include means, such as the processing circuitry 102, memory 104, and/or the like, configured for storing, in a buffer associated with a current picture from among the picture sequence, the collocated reference picture or an indication of the collocated reference picture.
  • a method, apparatus, and computer program product are disclosed for adaptive collocation reference picture selection.
  • Benefits of this design include increased optimization of collocation reference picture selection, reduced complexity of collocation reference picture selection, quicker encoding of a picture sequence, and improved decoding of the encoded picture sequence based upon more optimized collocation reference picture selection.
  • FIGs. 6-13 illustrate flowcharts depicting methods according to a certain example embodiment. It will be understood that each block of the flowcharts and combination of blocks in the flowcharts may be implemented by various means, such as hardware, firmware, processor, circuitry, and/or other communication devices associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory device 104 of an apparatus employing an embodiment of the present disclosure and executed by a processor 102.
  • any such computer program instructions may be loaded onto a computer or other programmable apparatus (for example, hardware) to produce a machine, such that the resulting computer or other programmable apparatus implements the functions specified in the flowchart blocks.
  • These computer program instructions may also be stored in a computer-readable memory that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture the execution of which implements the function specified in the flowchart blocks.
  • the computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide operations for implementing the functions specified in the flowchart blocks.
  • blocks of the flowcharts support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flowcharts, and combinations of blocks in the flowcharts, can be implemented by special purpose hardware -based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions.
  • a method, apparatus, and computer program product provide for content adaptive collocated reference picture selection.
  • simplified selection criteria are used to determine a level of temporal motion vector prediction (TMVP) support provided by two or more reference pictures from a picture sequence.
  • selection of a reference picture as collocation reference picture may be based at least on a comparison of a number of inter coded blocks (e.g., 4x4 blocks having valid motion vector data) of reference pictures where the reference picture having a greatest number of inter coded blocks may be selected as the collocation reference picture.
  • selection may also or alternatively comprise a comparison of a number of inter coded blocks, a number of bi-prediction coded blocks and/or a temporal distance of reference pictures from a current picture of the picture sequence, quantization parameters, a reference list, sequential selection processes, weighted sum values, or the like.
  • an apparatus includes at least one processor and at least one memory including computer program code.
  • the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to determine, for a first reference picture in a picture sequence, one or more characteristics; determine, for a second reference picture in the picture sequence, the one or more characteristics; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: generate an indication of the collocated reference picture; and include the indication of the collocated reference picture in metadata associated with the picture sequence.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: transmit, to a codec or a playback device, the collocated reference picture in metadata associated with the picture sequence.
  • the one or more characteristics comprises a number of coding blocks having valid motion vectors in reference pictures.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: initiate a counter to count the number of coding blocks having valid motion vectors in reference pictures.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: compare the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture; and select, based upon said comparison of the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: in an instance in which the first and second reference pictures are to be coded as I slice, determine that blocks of the first and second reference pictures are to be intra coded and determine that the first and second reference pictures should not be selected as a collocated reference picture.
  • the coding blocks are intra coded blocks.
  • the coding blocks are inter coded blocks.
  • the coding blocks are bi-prediction coded blocks.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: determine which of the first and second reference pictures is a shortest distance in the picture sequence from a current picture of the picture sequence; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon said determination of which of the first and second reference pictures is the shortest distance in the picture sequence from the current picture of the picture sequence, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: assign predetermined weights to one or more block statistics and coding information associated with the one or more blocks of the first reference picture and the one or more blocks of the second reference picture to determine a weighted sum value associated with each of the first and second reference pictures; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the weighted sum value of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to: determine a quantization parameter for each of the first and second reference pictures; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the quantization parameter of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • an apparatus comprises: means for determining, for a first reference picture in a picture sequence, one or more characteristics; means for determining, for a second reference picture in the picture sequence, the one or more characteristics; and means for selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the apparatus can further comprise: means for generating an indication of the collocated reference picture; and means for including the indication of the collocated reference picture in metadata associated with the picture sequence.
  • the apparatus can further comprise: means for transmitting, to a codec or a playback device, the collocated reference picture in metadata associated with the picture sequence.
  • the one or more characteristics comprises a number of coding blocks having valid motion vectors in reference pictures.
  • the apparatus can further comprise: means for initiating a counter to count the number of coding blocks having valid motion vectors in reference pictures.
  • the apparatus can further comprise: means for comparing the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture; and means for selecting, based upon said comparison of the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the apparatus can further comprise: means for, in an instance in which the first and second reference pictures are to be coded as I slice, determining that blocks of the first and second reference pictures are to be intra coded and determine that the first and second reference pictures should not be selected as a collocated reference picture.
  • the coding blocks are intra coded blocks.
  • the coding blocks are inter coded blocks.
  • the coding blocks are bi-prediction coded blocks.
  • the apparatus can further comprise: means for determining which of the first and second reference pictures is a shortest distance in the picture sequence from a current picture of the picture sequence; and means for selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon said determination of which of the first and second reference pictures is the shortest distance in the picture sequence from the current picture of the picture sequence, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the apparatus can further comprise: means for assigning predetermined weights to one or more block statistics and coding information associated with the one or more blocks of the first reference picture and the one or more blocks of the second reference picture to determine a weighted sum value associated with each of the first and second reference pictures; and means for selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the weighted sum value of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the apparatus can further comprise: means for determining a quantization parameter for each of the first and second reference pictures; and means for selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the quantization parameter of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • a method comprises: determining, for a first reference picture in a picture sequence, one or more characteristics; determining, for a second reference picture in the picture sequence, the one or more characteristics; and selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the method can further comprise: generating an indication of the collocated reference picture; and including the indication of the collocated reference picture in metadata associated with the picture sequence.
  • the method can further comprise: transmitting, to a codec or a playback device, the collocated reference picture in metadata associated with the picture sequence.
  • the one or more characteristics comprises a number of coding blocks having valid motion vectors in reference pictures.
  • the method can further comprise: initiating a counter to count the number of coding blocks having valid motion vectors in reference pictures.
  • the method can further comprise: comparing the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture; and selecting, based upon said comparison of the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the method can further comprise: in an instance in which the first and second reference pictures are to be coded as I slice, determining that blocks of the first and second reference pictures are to be intra coded and determine that the first and second reference pictures should not be selected as a collocated reference picture.
  • the coding blocks are intra coded blocks.
  • the coding blocks are inter coded blocks.
  • the coding blocks are bi-prediction coded blocks.
  • the method can further comprise: determining which of the first and second reference pictures is a shortest distance in the picture sequence from a current picture of the picture sequence; and selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon said determination of which of the first and second reference pictures is the shortest distance in the picture sequence from the current picture of the picture sequence, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the method can further comprise: assigning predetermined weights to one or more block statistics and coding information associated with the one or more blocks of the first reference picture and the one or more blocks of the second reference picture to determine a weighted sum value associated with each of the first and second reference pictures; and selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the weighted sum value of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the method can further comprise: determining a quantization parameter for each of the first and second reference pictures; and selecting, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the quantization parameter of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • a computer program product comprises: a non-transitory computer readable storage medium having program code portions stored thereon.
  • the program code portions are configured, upon execution, to: determine, for a first reference picture in a picture sequence, one or more characteristics; determine, for a second reference picture in the picture sequence, the one or more characteristics; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the program code portions are configured, upon execution, to: generate an indication of the collocated reference picture; and include the indication of the collocated reference picture in metadata associated with the picture sequence.
  • the program code portions are configured, upon execution, to: transmit, to a codec or a playback device, the collocated reference picture in metadata associated with the picture sequence.
  • the one or more characteristics comprises a number of coding blocks having valid motion vectors in reference pictures.
  • the program code portions are configured, upon execution, to: initiate a counter to count the number of coding blocks having valid motion vectors in reference pictures .
  • the program code portions are configured, upon execution, to: compare the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture; and select, based upon said comparison of the number of coding blocks having valid motion vectors in the first reference picture to the number of coding blocks having valid motion vectors in the second reference picture, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the program code portions are configured, upon execution, to: in an instance in which the first and second reference pictures are to be coded as I slice, determine that blocks of the first and second reference pictures are to be intra coded and determine that the first and second reference pictures should not be selected as a collocated reference picture.
  • the coding blocks are intra coded blocks.
  • the coding blocks are inter coded blocks.
  • the coding blocks are bi-prediction coded blocks.
  • the program code portions are configured, upon execution, to: determine which of the first and second reference pictures is a shortest distance in the picture sequence from a current picture of the picture sequence; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon said determination of which of the first and second reference pictures is the shortest distance in the picture sequence from the current picture of the picture sequence, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the program code portions are configured, upon execution, to: assign predetermined weights to one or more block statistics and coding information associated with the one or more blocks of the first reference picture and the one or more blocks of the second reference picture to determine a weighted sum value associated with each of the first and second reference pictures; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the weighted sum value of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • the program code portions are configured, upon execution, to: determine a quantization parameter for each of the first and second reference pictures; and select, based upon at least said one or more characteristics of the one or more first blocks and the one or more second blocks, and further based at least upon the quantization parameter of each of the first and second reference pictures, from among a set of reference pictures comprising the first reference picture and the second reference picture, a particular reference picture to be the collocated reference picture.
  • an apparatus if provided that comprises at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: determine, for each reference picture of a set of reference pictures from a picture sequence, one or more characteristics; and select, based upon at least said one or more characteristics, from among the set of reference pictures, a particular reference picture to be a collocated reference picture.
  • an encoder may compute a characteristic for each reference picture in a reference picture buffer of a current picture. In some embodiments, this computation can be done in any order (sequentially or in parallel).
  • the encoder may then select a particular reference picture with an optimal characteristic among all reference pictures in a reference picture buffer to be a collocated reference picture.
  • the apparatus may be configured for storing, in a buffer associated with a current picture from among the picture sequence, the collocated reference picture or an indication of the collocated reference picture.
  • an apparatus comprises means for determining, for each reference picture of a set of reference pictures from a picture sequence, one or more characteristics; and means for selecting, based upon at least said one or more characteristics, from among the set of reference pictures, a particular reference picture to be a collocated reference picture.
  • an encoder may compute a characteristic for each reference picture in a reference picture buffer of a current picture. In some embodiments, this computation can be done in any order (sequentially or in parallel). In some embodiments, after the characteristic of all reference pictures in a reference picture buffer are computed, the encoder may then select a particular reference picture with an optimal characteristic among all reference pictures in a reference picture buffer to be a collocated reference picture.
  • the apparatus may further comprise means for storing, in a buffer associated with a current picture from among the picture sequence, the collocated reference picture or an indication of the collocated reference picture.
  • a method can be carried out that comprises: determining, for each reference picture of a set of reference pictures from a picture sequence, one or more characteristics; and selecting, based upon at least said one or more characteristics, from among the set of reference pictures, a particular reference picture to be a collocated reference picture.
  • an encoder may compute a characteristic for each reference picture in a reference picture buffer of a current picture. In some embodiments, this computation can be done in any order (sequentially or in parallel).
  • the encoder may then select a particular reference picture with an optimal characteristic among all reference pictures in a reference picture buffer to be a collocated reference picture .
  • the method can further comprise storing, in a buffer associated with a current picture from among the picture sequence, the collocated reference picture or an indication of the collocated reference picture.
  • a computer program product comprises a non-transitory computer readable storage medium having program code portions stored thereon, the program code portions configured, upon execution, to: determine, for each reference picture of a set of reference pictures from a picture sequence, one or more characteristics; and select, based upon at least said one or more characteristics, from among the set of reference pictures, a particular reference picture to be a collocated reference picture.
  • an encoder may compute a characteristic for each reference picture in a reference picture buffer of a current picture. In some embodiments, this computation can be done in any order (sequentially or in parallel).
  • the encoder may then select a particular reference picture with an optimal characteristic among all reference pictures in a reference picture buffer to be a collocated reference picture.
  • the computer program product may comprise program code portions further configured, upon execution, to store, in a buffer associated with a current picture from among the picture sequence, the collocated reference picture or an indication of the collocated reference picture.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne des procédés, des appareils et des produits-programmes d'ordinateur permettant de sélectionner une image de référence de colocalisation adaptative au contenu. Des critères de sélection simplifiés sont utilisés pour déterminer un niveau de support de prédiction de vecteur de mouvement temporel (TMVP) fourni par au moins deux images de référence à partir d'une séquence d'images. La sélection d'une image de référence en tant qu'image de référence de colocalisation peut être basée sur les images de référence (ou un sous-ensemble des images de référence) dans la séquence d'images qui ont les plus grands blocs de codage pour lesquels des données de vecteur de mouvement valides sont disponibles. La sélection peut également ou en variante être basée sur une comparaison d'un certain nombre de blocs intercodés, un nombre de blocs codés de bi-prédiction et/ou une distance temporelle d'images de référence à partir d'une image courante de la séquence d'images, des paramètres de quantification, une liste de référence, des processus de sélection séquentielle, des valeurs de somme pondérée, ou similaires. Une image de référence de colocalisation peut être stockée dans un tampon pour l'image courante.
PCT/EP2021/082325 2020-12-01 2021-11-19 Sélection d'image de référence de colocalisation adaptative au contenu WO2022117369A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063119765P 2020-12-01 2020-12-01
US63/119,765 2020-12-01

Publications (1)

Publication Number Publication Date
WO2022117369A1 true WO2022117369A1 (fr) 2022-06-09

Family

ID=78820274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/082325 WO2022117369A1 (fr) 2020-12-01 2021-11-19 Sélection d'image de référence de colocalisation adaptative au contenu

Country Status (1)

Country Link
WO (1) WO2022117369A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018052986A1 (fr) * 2016-09-16 2018-03-22 Qualcomm Incorporated Identification de vecteur de décalage d'un indicateur prévisionnel de vecteur de mouvement temporel
US20190246103A1 (en) * 2016-10-04 2019-08-08 Electronics And Telecommunications Research Institute Method and apparatus for encoding and decoding image, and recording medium for storing bitstream
CN110213590A (zh) * 2019-06-25 2019-09-06 浙江大华技术股份有限公司 时域运动矢量获取、帧间预测、视频编码的方法及设备
WO2020003256A1 (fr) * 2018-06-29 2020-01-02 Beijing Bytedance Network Technology Co., Ltd. Restriction de partage d'informations de mouvement
CA3113860A1 (fr) * 2018-09-24 2020-04-02 B1 Institute Of Image Technology, Inc. Procede et dispositif de codage/decodage d'image
WO2020084552A1 (fr) * 2018-10-24 2020-04-30 Beijing Bytedance Network Technology Co., Ltd. Dérivation de candidat de mouvement à base de bloc voisin spatial dans une prédiction de vecteur de mouvement de sous-bloc

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018052986A1 (fr) * 2016-09-16 2018-03-22 Qualcomm Incorporated Identification de vecteur de décalage d'un indicateur prévisionnel de vecteur de mouvement temporel
US20190246103A1 (en) * 2016-10-04 2019-08-08 Electronics And Telecommunications Research Institute Method and apparatus for encoding and decoding image, and recording medium for storing bitstream
WO2020003256A1 (fr) * 2018-06-29 2020-01-02 Beijing Bytedance Network Technology Co., Ltd. Restriction de partage d'informations de mouvement
CA3113860A1 (fr) * 2018-09-24 2020-04-02 B1 Institute Of Image Technology, Inc. Procede et dispositif de codage/decodage d'image
WO2020084552A1 (fr) * 2018-10-24 2020-04-30 Beijing Bytedance Network Technology Co., Ltd. Dérivation de candidat de mouvement à base de bloc voisin spatial dans une prédiction de vecteur de mouvement de sous-bloc
CN110213590A (zh) * 2019-06-25 2019-09-06 浙江大华技术股份有限公司 时域运动矢量获取、帧间预测、视频编码的方法及设备

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
B. BROSSJ. CHENS. LINY-K. WANG: "Versatile Video Coding", JVET-Q2001-VL3, January 2020 (2020-01-01)
GARY J SULLIVAN; JENS-RAINER OHM; WOO-JIN HAN; THOMAS WIEGAND: "Overview of the High Efficiency Video Coding (HEVC) Standard", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1 January 2012 (2012-01-01), USA, pages 1 - 19, XP055045358, ISSN: 1051-8215, DOI: 10.1109/TCSVT.2012.2221191 *

Similar Documents

Publication Publication Date Title
US11212546B2 (en) Reference picture handling
US10264288B2 (en) Method for video coding and an apparatus, a computer-program product, a system, and a module for the same
US10771805B2 (en) Apparatus, a method and a computer program for video coding and decoding
US10397610B2 (en) Method and apparatus for video coding
US20130343459A1 (en) Method and apparatus for video coding
US20130272372A1 (en) Method and apparatus for video coding
US20140056356A1 (en) Method and apparatus for efficient signaling of weighted prediction in advanced coding schemes
US20140092977A1 (en) Apparatus, a Method and a Computer Program for Video Coding and Decoding
WO2022117369A1 (fr) Sélection d'image de référence de colocalisation adaptative au contenu

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21819079

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21819079

Country of ref document: EP

Kind code of ref document: A1