EP3942803A1 - Procédé, appareil et produit de programme informatique pour le codage et le décodage vidéo - Google Patents

Procédé, appareil et produit de programme informatique pour le codage et le décodage vidéo

Info

Publication number
EP3942803A1
EP3942803A1 EP20772663.9A EP20772663A EP3942803A1 EP 3942803 A1 EP3942803 A1 EP 3942803A1 EP 20772663 A EP20772663 A EP 20772663A EP 3942803 A1 EP3942803 A1 EP 3942803A1
Authority
EP
European Patent Office
Prior art keywords
picture
region
refresh
bitstream
coded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP20772663.9A
Other languages
German (de)
English (en)
Other versions
EP3942803A4 (fr
Inventor
Limin Wang
Miska Hannuksela
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of EP3942803A1 publication Critical patent/EP3942803A1/fr
Publication of EP3942803A4 publication Critical patent/EP3942803A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/174Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop

Definitions

  • the present invention relates to an apparatus, a method and a computer program for video coding and decoding.
  • a video coding system may comprise an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form.
  • the encoder may discard some information in the original video sequence in order to represent the video in a more compact form, for example, to enable the storage/transmission of the video information at a lower bitrate than otherwise might be needed.
  • Video compression systems such as Advanced Video Coding standard (H.264/AVC), the Multiview Video Coding (MVC) extension of H.264/AVC or scalable extensions of HEVC (High Efficiency Video Coding) can be used.
  • H.264/AVC Advanced Video Coding standard
  • MVC Multiview Video Coding
  • HEVC High Efficiency Video Coding
  • a method comprising generating a bitstream of coded video sequence comprising pictures in a picture order; wherein the generating comprises for a first picture of the video sequence, coding a region on a comer of the picture in intra mode to generate a refresh region; repeating the following for each subsequent picture of the video sequence in a picture order: coding a region corresponding to the refresh region of a previous picture in inter mode, and coding a region that is diagonally next to the region corresponding to the refresh region of the previous picture in intra mode to expand the refresh region; indicating in a bitstream a use of a diagonal refresh; and transmitting the coded bitstream to a decoder.
  • the method further comprises indicating in a bitstream, if a tile group header signals information on a last region of the refresh region per picture or a tile group.
  • a region in a first picture is a region being located on a top- left comer of the picture.
  • a method for decoding comprising receiving a bitstream of coded video sequence comprising pictures in a picture order; decoding from the bitstream an indication of a use of a diagonal refresh; for a first picture of the video sequence, decoding a region on a comer of the picture in intra mode to generate a refresh region; repeating the following for each subsequent picture of the video sequence in a picture order: decoding a region corresponding to the refresh region of a previous picture in inter mode, and decoding a region that is diagonally next to the region corresponding to the refresh region of the previous picture in intra mode to expand the refresh region, and generating a video sequence for rendering.
  • an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: to generate a bitstream of coded video sequence comprising pictures in a picture order; wherein the generating comprises for a first picture of the video sequence, to code a region on a comer of the picture in intra mode to generate a refresh region; to repeat the following for each subsequent picture of the video sequence in a picture order: to code a region corresponding to the refresh region of a previous picture in inter mode, and coding a region that is diagonally next to the region corresponding to the refresh region of the previous picture in intra mode to expand the refresh region, and to indicate in a bitstream a use of a diagonal refresh; to transmit the coded bitstream to a decoder.
  • the apparatus further comprises computer program code to cause the apparatus to indicate in a bitstream, if a tile group header signals information on a last region of the refresh region per picture or a tile group.
  • a region in a first picture is a region being located on a top- left comer of the picture.
  • an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: to receive a bitstream of coded video sequence comprising pictures in a picture order; to decode from the bitstream an indication of a use of a diagonal refresh; for a first picture of the video sequence, to decode a region on a comer of the picture in intra mode to generate a refresh region; to repeat the following for each subsequent picture of the video sequence in a picture order: to decode a region corresponding to the refresh region of a previous picture in inter mode, and to decode a region that is diagonally next to the region corresponding to the refresh region of the previous picture in intra mode to expand the refresh region, and to generate a video sequence for rendering.
  • a computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to generate a bitstream of coded video sequence comprising pictures in a picture order; wherein the generating comprises for a first picture of the video sequence, to code a region on a comer of the picture in intra mode to generate a refresh region; to repeat the following for each subsequent picture of the video sequence in a picture order: to code a region corresponding to the refresh region of a previous picture in inter mode, and coding a region that is diagonally next to the region corresponding to the refresh region of the previous picture in intra mode to expand the refresh region, and to indicate in a bitstream a use of a diagonal refresh; to transmit the coded bitstream to a decoder.
  • a computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to receive a bitstream of coded video sequence comprising pictures in a picture order; to decode from the bitstream an indication of a use of a diagonal refresh; for a first picture of the video sequence, to decode a region on a comer of the picture in intra mode to generate a refresh region; to repeat the following for each subsequent picture of the video sequence in a picture order: to decode a region corresponding to the refresh region of a previous picture in inter mode, and to decode a region that is diagonally next to the region corresponding to the refresh region of the previous picture in intra mode to expand the refresh region, and to generate a video sequence for rendering.
  • the computer program product is embodied on a non-transitory computer readable medium.
  • Fig. 1 shows an example of a horizontal progress intra refresh (PIR);
  • Fig. 2 shows an example of vertical PIR
  • FIG. 3 shows an example of central-diffusion PIR
  • Fig. 4 shows an example of wavefront PIR
  • Fig. 5 shows another example of wavefront PIR
  • Fig. 6 shows an example of horizontal-diffusion PIR
  • Fig. 7 shows an example of vertical-diffusion PIR
  • Fig. 8 shows an example of wavefront-diffusion PIR
  • Fig. 9 shows an example of subsample PIR
  • Fig. 10 shows an example of wavefront PIR for two tile groups
  • FIG. 11 is a flowchart illustrating a method according to an embodiment
  • FIG. 12 is a flowchart illustrating a method according to another embodiment
  • FIG. 13 shows an apparatus according to an embodiment
  • FIG. 14 shows an encoding process according to an embodiment
  • Fig. 15 shows a decoding process according to an embodiment.
  • the Advanced Video Coding standard (which may be abbreviated AVC or H.264/AVC) was developed by the Joint Video Team (JVT) of the Video Coding Experts Group (VCEG) of the Telecommunications Standardization Sector of International Telecommunication Union (ITU-T) and the Moving Picture Experts Group (MPEG) of International Organization for Standardization (ISO) / International Electrotechnical Commission (IEC).
  • JVT Joint Video Team
  • MPEG Moving Picture Experts Group
  • ISO International Organization for Standardization
  • IEC International Electrotechnical Commission
  • the H.264/AVC standard is published by both parent standardization organizations, and it is referred to as ITU-T Recommendation H.264 and ISO/IEC International Standard 14496-10, also known as MPEG-4 Part 10 Advanced Video Coding (AVC).
  • H.264/AVC High Efficiency Video Coding
  • SVC Scalable Video Coding
  • MVC Multiview Video Coding
  • JCT-VC Joint Collaborative Team - Video Coding
  • ITU-T Recommendation H.265 and ISO/IEC International Standard 23008-2 also known as MPEG-H Part 2 High Efficiency Video Coding (HEVC).
  • Extensions to H.265/HEVC include scalable, multiview, three-dimensional, and fidelity range extensions, which may be referred to as SHVC, MV-HEVC, 3D- HEVC, and REXT, respectively.
  • SHVC scalable, multiview, three-dimensional, and fidelity range extensions
  • MV-HEVC scalable, multiview, three-dimensional, and fidelity range extensions
  • REXT REXT
  • the Versatile Video Coding standard (WC, H.266, or H.266/WC) is presently under development by the Joint Video Experts Team (JVET), which is a collaboration between the ISO/IEC MPEG and ITU-T VCEG.
  • JVET Joint Video Experts Team
  • bitstream and coding structures, and concepts of H.264/AVC and HEVC and some of their extensions are described in this section as an example of a video encoder, decoder, encoding method, decoding method, and a bitstream structure, wherein the embodiments may be implemented.
  • Some of the key definitions, bitstream and coding structures, and concepts of H.264/AVC are the same as in HEVC standard - hence, they are described below jointly.
  • the aspects of various embodiments are not limited to H.264/AVC or HEVC or their extensions, but rather the description is given for one possible basis on top of which the present embodiments may be partly or fully realized.
  • Video codec may comprise an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form.
  • the compressed representation may be referred to as a bitstream or a video bitstream.
  • a video encoder and/or a video decoder may also be separate from each other, i.e. need not form a codec.
  • the encoder may discard some information in the original video sequence in order to represent the video in a more compact form (that is, at lower bitrate).
  • Hybrid video codecs may encode the video information in two phases. At first, pixel values in a certain picture area (or“block”) are predicted for example by motion compensation means (finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means (using the pixel values around the block to be coded in a specified manner). Then, the prediction error, i.e. the difference between the predicted block of pixels and the original block of pixels, is coded. This may be done by transforming the difference in pixel values using a specified transform (e.g.
  • DCT Discreet Cosine Transform
  • inter prediction In temporal prediction, the sources of prediction are previously decoded pictures (a.k.a. reference pictures).
  • IBC intra block copy
  • prediction is applied similarly to temporal prediction, but the reference picture is the current picture and only previously decoded samples can be referred in the prediction process.
  • Inter- layer or inter- view prediction may be applied similarly to temporal prediction, but the reference picture is a decoded picture from another scalable layer or from another view, respectively.
  • inter prediction may refer to temporal prediction only, while in other cases inter prediction may refer collectively to temporal prediction and any of intra block copy, inter-layer prediction, and inter-view prediction provided that they are performed with the same or similar process than temporal prediction.
  • Inter prediction or temporal prediction may sometimes be referred to as motion compensation or motion- compensated prediction.
  • Intra prediction utilizes the fact that adjacent pixels within the same picture are likely to be correlated. Intra prediction can be performed in spatial or transform domain, i.e., either sample values or transform coefficients can be predicted. Intra prediction may be exploited in intra coding, where no inter prediction is applied.
  • One outcome of the coding procedure is a set of coding parameters, such as motion vectors and quantized transform coefficients. Many parameters can be entropy-coded more efficiently if they are predicted first from spatially or temporally neighboring parameters. For example, a motion vector may be predicted from spatially adjacent motion vectors and only the difference relative to the motion vector predictor may be coded. Prediction of coding parameters and intra prediction may be collectively referred to as in-picture prediction.
  • Entropy coding/decoding may be performed in many ways. For example, context-based coding/decoding may be applied, where in both the encoder and the decoder modify the context state of a coding parameter based on previously coded/decoded coding parameters.
  • Context-based coding may for example be context adaptive binary arithmetic coding (CABAC) or context-based variable length coding (CAVLC) or any similar entropy coding.
  • Entropy coding/decoding may alternatively or additionally be performed using a variable length coding scheme, such as Huffman coding/decoding or Exp-Golomb coding/decoding. Decoding of coding parameters from an entropy-coded bitstream or codewords may be referred to as parsing.
  • Video coding standards may specify the bitstream syntax and semantics as well as the decoding process for error-free bitstreams, whereas the encoding process might not be specified, but encoders may just be required to generate conforming bitstreams. Bitstream and decoder conformance can be verified with the Hypothetical Reference Decoder (HRD).
  • HRD Hypothetical Reference Decoder
  • the standards may contain coding tools that help in coping with transmission errors and losses, but the use of the tools in encoding may be optional and decoding process for erroneous bitstreams might not have been specified.
  • a syntax element may be defined as an element of data represented in the bitstream.
  • a syntax structure may be defined as zero or more syntax elements present together in the bitstream in a specified order.
  • An elementary unit for the input to an encoder and the output of a decoder, respectively, in most cases is a picture.
  • a picture given as an input to an encoder may also be referred to as a source picture, and a picture decoded by a decoded may be referred to as a decoded picture or a reconstructed picture.
  • the source and decoded pictures are each comprised of one or more sample arrays, such as one of the following sets of sample arrays:
  • RGB Green, Blue and Red
  • these arrays may be referred to as luma (or L or Y) and chroma, where the two chroma arrays may be referred to as Cb and Cr; regardless of the actual color representation method in use.
  • the actual color representation method in use can be indicated e.g. in a coded bitstream e.g. using the Video Usability Information (VUI) syntax of HEVC or alike.
  • VUI Video Usability Information
  • a component may be defined as an array or single sample from one of the three sample arrays (luma and two chroma) or the array or a single sample of the array that compose a picture in monochrome format.
  • a picture may be defined to be either a frame or a field.
  • a frame comprises a matrix of luma samples and possibly the corresponding chroma samples.
  • a field is a set of alternate sample rows of a frame and may be used as encoder input, when the source signal is interlaced. Chroma sample arrays may be absent (and hence monochrome sampling may be in use) or chroma sample arrays may be subsampled when compared to luma sample arrays.
  • each of the two chroma arrays has half the height and half the width of the luma array.
  • each of the two chroma arrays has the same height and half the width of the luma array.
  • each of the two chroma arrays has the same height and width as the luma array.
  • Coding formats or standards may allow to code sample arrays as separate color planes into the bitstream and respectively decode separately coded color planes from the bitstream. When separate color planes are in use, each one of them is separately processed (by the encoder and/or the decoder) as a picture with monochrome sampling.
  • the location of chroma samples with respect to luma samples may be determined in the encoder side (e.g. as pre- processing step or as part of encoding).
  • the chroma sample positions with respect to luma sample positions may be pre-defined for example in a coding standard, such as H.264/AVC or HEVC, or may be indicated in the bitstream for example as part of VUI of H.264/AVC or HEVC.
  • the source video sequence(s) provided as input for encoding may either represent interlaced source content or progressive source content. Fields of opposite parity have been captured at different times for interlaced source content. Progressive source content contains captured frames.
  • An encoder may encode fields of interlaced source content in two ways: a pair of interlaced fields may be coded into a coded frame or a field may be coded as a coded field.
  • an encoder may encode frames of progressive source content in two ways: a frame of progressive source content may be coded into a coded frame or a pair of coded fields.
  • a field pair or a complementary field pair may be defined as two fields next to each other in decoding and/or output order, having opposite parity (i.e.
  • Some video coding standards or schemes allow mixing of coded frames and coded fields in the same coded video sequence.
  • predicting a coded field from a field in a coded frame and/or predicting a coded frame for a complementary field pair may be enabled in encoding and/or decoding.
  • Partitioning may be defined as a division of a set into subsets such that each element of the set is in exactly one of the subsets.
  • a macroblock is a 16x16 block of luma samples and the corresponding blocks of chroma samples. For example, in the 4:2:0 sampling pattern, a macroblock contains one 8x8 block of chroma samples per each chroma component.
  • a picture is partitioned to one or more slice groups, and a slice group contains one or more slices.
  • a slice consists of an integer number of macroblocks ordered consecutively in the raster scan within a particular slice group.
  • a coding block may be defined as an NxN block of samples for some value of N such that the division of a coding tree block into coding blocks is a partitioning.
  • a coding tree block may be defined as an NxN block of samples for some value of N such that the division of a component into coding tree blocks is a partitioning.
  • a coding tree unit may be defined as a coding tree block of luma samples, two corresponding coding tree blocks of chroma samples of a picture that has three sample arrays, or a coding tree block of samples of a monochrome picture or a picture that is coded using three separate color planes and syntax structures used to code the samples.
  • a coding unit may be defined as a coding block of luma samples, two corresponding coding blocks of chroma samples of a picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is coded using three separate color planes and syntax structures used to code the samples.
  • video pictures may be divided into coding units (CU) covering the area of the picture.
  • a CU consists of one or more prediction units (PU) defining the prediction process for the samples within the CU and one or more transform units (TU) defining the prediction error coding process for the samples in the said CU.
  • the CU may consist of a square block of samples with a size selectable from a predefined set of possible CU sizes.
  • a CU with the maximum allowed size may be named as LCU (largest coding unit) or coding tree unit (CTU) and the video picture is divided into non-overlapping LCUs.
  • An LCU can be further split into a combination of smaller CUs, e.g. by recursively splitting the LCU and resultant CUs.
  • Each resulting CU may have at least one PU and at least one TU associated with it.
  • Each PU and TU can be further split into smaller PUs and TUs in order to increase granularity of the prediction and prediction error coding processes, respectively.
  • Each PU has prediction information associated with it defining what kind of a prediction is to be applied for the pixels within that PU (e.g. motion vector information for inter predicted PUs and intra prediction directionality information for intra predicted PUs).
  • Each TU can be associated with information describing the prediction error decoding process for the samples within the said TU (including e.g. DCT coefficient information). It may be signalled at CU level whether prediction error coding is applied or not for each CU. In the case there is no prediction error residual associated with the CU, it can be considered there are no TUs for the said CU.
  • the division of the image into CUs, and division of CUs into PUs and TUs maybe signalled in the bitstream allowing the decoder to reproduce the intended structure of these units.
  • the multi-type tree leaf nodes are called coding units (CUs).
  • CU, PU and TU have the same block size, unless the CU is too large for the maximum transform length.
  • a segmentation structure for a CTU is a quadtree with nested multi-type tree using binary and ternary splits, i.e. no separate CU, PU and TU concepts are in use except when needed for CUs that have a size too large for the maximum transform length.
  • a CU can have either a square or rectangular shape.
  • the decoder reconstructs the output video by applying prediction means similar to the encoder to form a predicted representation of the pixel blocks (using the motion or spatial information created by the encoder and stored in the compressed representation) and prediction error decoding (inverse operation of the prediction error coding recovering the quantized prediction error signal in spatial pixel domain). After applying prediction and prediction error decoding means the decoder sums up the prediction and prediction error signals (pixel values) to form the output video frame.
  • the decoder (and encoder) can also apply additional filtering means to improve the quality of the output video before passing it for display and/or storing it as prediction reference for the forthcoming frames in the video sequence.
  • the filtering may for example include one more of the following: deblocking, sample adaptive offset (SAO), and/or adaptive loop filtering (ALF).
  • deblocking sample adaptive offset (SAO)
  • ALF adaptive loop filtering
  • the deblocking loop filter may include multiple filtering modes or strengths, which may be adaptively selected based on the features of the blocks adjacent to the boundary, such as the quantization parameter value, and/or signaling included by the encoder in the bitstream.
  • the deblocking loop filter may comprise a normal filtering mode and a strong filtering mode, which may differ in terms of the number of filter taps (i.e. number of samples being filtered on both sides of the boundary) and/or the filter tap values. For example, filtering of two samples along both sides of the boundary may be performed with a filter having the impulse response of (3 7 9 -3)/16, when omitting the potential impact of a clipping operation.
  • the motion information may be indicated with motion vectors associated with each motion compensated image block in video codecs.
  • Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder side) or decoded (in the decoder side) and the prediction source block in one of the previously coded or decoded pictures.
  • the predicted motion vectors may be created in a predefined way, for example calculating the median of the encoded or decoded motion vectors of the adjacent blocks.
  • Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor.
  • the reference index of previously coded/decoded picture can be predicted.
  • the reference index may be predicted from adjacent blocks and/or co-located blocks in temporal reference picture.
  • high efficiency video codecs may employ an additional motion information coding/decoding mechanism, often called merging/merge mode, where all the motion field information, which includes motion vector and corresponding reference picture index for each available reference picture list, is predicted and used without any modification/correction.
  • predicting the motion field information is carried out using the motion field information of adjacent blocks and/or co-located blocks in temporal reference pictures and the used motion field information is signaled among a list of motion field candidate list filled with motion field information of available adjacent/co-located blocks.
  • Video codecs may support motion compensated prediction from one source image (uni- prediction) and two sources (bi-prediction).
  • uni-prediction a single motion vector is applied whereas in the case of bi-prediction two motion vectors are signaled and the motion compensated predictions from two sources are averaged to create the final sample prediction.
  • weighted prediction the relative weights of the two predictions can be adjusted, or a signaled offset can be added to the prediction signal.
  • the displacement vector indicates where from the same picture a block of samples can be copied to form a prediction of the block to be coded or decoded.
  • This kind of intra block copying methods can improve the coding efficiency substantially in presence of repeating structures within the frame - such as text or other graphics.
  • the prediction residual after motion compensation or intra prediction may be first transformed with a transform kernel (like DCT) and then coded.
  • a transform kernel like DCT
  • Video encoders may utilize Lagrangian cost functions to find optimal coding modes, e.g. the desired Macroblock mode and associated motion vectors.
  • This kind of cost function uses a weighting factor l to tie together the (exact or estimated) image distortion due to lossy coding methods and the (exact or estimated) amount of information that is required to represent the pixel values in an image area:
  • C D + /.R (Eq. 1)
  • C the Lagrangian cost to be minimized
  • D the image distortion (e.g. Mean Squared Error) with the mode and motion vectors considered
  • R the number of bits needed to represent the required data to reconstruct the image block in the decoder (including the amount of data to represent the candidate motion vectors).
  • Some codecs use a concept of picture order count (POC).
  • POC picture order count
  • a value of POC is derived for each picture and is non-decreasing with increasing picture position in output order. POC therefore indicates the output order of pictures.
  • POC may be used in the decoding process for example for implicit scaling of motion vectors and for reference picture list initialization. Furthermore, POC may be used in the verification of output order conformance.
  • a compliant bit stream must be able to be decoded by a hypothetical reference decoder that may be conceptually connected to the output of an encoder and consists of at least a pre-decoder buffer, a decoder and an output/display unit.
  • This virtual decoder may be known as the hypothetical reference decoder (HRD) or the video buffering verifier (VBV).
  • HRD hypothetical reference decoder
  • VBV video buffering verifier
  • a stream is compliant if it can be decoded by the HRD without buffer overflow or, in some cases, underflow. Buffer overflow happens if more bits are to be placed into the buffer when it is full. Buffer underflow happens if some bits are not in the buffer when said bits are to be fetched from the buffer for decoding/playback.
  • One of the motivations for the HRD is to avoid so-called evil bitstreams, which would consume such a large quantity of resources that practical decoder implementations would not be able to handle.
  • HRD models may include instantaneous decoding, while the input bitrate to the coded picture buffer (CPB) of HRD may be regarded as a constraint for the encoder and the bitstream on decoding rate of coded data and a requirement for decoders for the processing rate.
  • An encoder may include a CPB as specified in the HRD for verifying and controlling that buffering constraints are obeyed in the encoding.
  • a decoder implementation may also have a CPB that may but does not necessarily operate similarly or identically to the CPB specified for HRD.
  • a Decoded Picture Buffer may be used in the encoder and/or in the decoder. There may be two reasons to buffer decoded pictures, for references in inter prediction and for reordering decoded pictures into output order. Some coding formats, such as HEVC, provide a great deal of flexibility for both reference picture marking and output reordering, separate buffers for reference picture buffering and output picture buffering may waste memory resources. Hence, the DPB may include a unified decoded picture buffering process for reference pictures and output reordering. A decoded picture may be removed from the DPB when it is no longer used as a reference and is not needed for output.
  • An HRD may also include a DPB. DPBs of an HRD and a decoder implementation may but do not need to operate identically.
  • Output order may be defined as the order in which the decoded pictures are output from the decoded picture buffer (for the decoded pictures that are to be output from the decoded picture buffer).
  • a decoder and/or an HRD may comprise a picture output process.
  • the output process may be considered to be a process in which the decoder provides decoded and cropped pictures as the output of the decoding process.
  • the output process may be a part of video coding standards, e.g. as a part of the hypothetical reference decoder specification.
  • lines and/or columns of samples may be removed from decoded pictures according to a cropping rectangle to form output pictures.
  • a cropped decoded picture may be defined as the result of cropping a decoded picture based on the conformance cropping window specified e.g. in the sequence parameter set that is referred to by the corresponding coded picture.
  • One or more syntax structures for (decoded) reference picture marking may exist in a video coding system.
  • An encoder generates an instance of a syntax structure e.g. in each coded picture, and a decoder decodes an instance of the syntax structure e.g. from each coded picture.
  • the decoding of the syntax structure may cause pictures to be adaptively marked as "used for reference” or "unused for reference”.
  • a reference picture set (RPS) syntax structure of HEVC is an example of a syntax structure for reference picture marking.
  • a reference picture set valid or active for a picture includes all the reference pictures that may be used as reference for the picture and all the reference pictures that are kept marked as "used for reference” for any subsequent pictures in decoding order.
  • the reference pictures that are kept marked as "used for reference” for any subsequent pictures in decoding order but that are not used as reference picture for the current picture or image segment may be considered inactive. For example, they might not be included in the initial reference picture list(s).
  • reference picture for inter prediction may be indicated with an index to a reference picture list.
  • two reference picture lists (reference picture list 0 and reference picture list 1) are generated for each bi-predictive (B) slice, and one reference picture list (reference picture list 0) is formed for each inter-coded (P) slice.
  • a reference picture list such as the reference picture list 0 and the reference picture list 1, may be constructed in two steps: First, an initial reference picture list is generated.
  • the initial reference picture list may be generated using an algorithm pre-defined in a standard. Such an algorithm may use e.g. POC and/or temporal sub-layer, as the basis.
  • the algorithm may process reference pictures with particular marking(s), such as "used for reference”, and omit other reference pictures, i.e. avoid inserting other reference pictures into the initial reference picture list.
  • An example of such other reference picture is a reference picture marked as "unused for reference” but still residing in the decoded picture buffer waiting to be output from the decoder.
  • the initial reference picture list may be reordered through a specific syntax structure, such as reference picture list reordering (RPLR) commands of H.264/AVC or reference picture list modification syntax structure of HEVC or anything alike.
  • RPLR reference picture list reordering
  • the number of active reference pictures may be indicated for each list, and the use of the pictures beyond the active ones in the list as reference for inter prediction is disabled.
  • One or both the reference picture list initialization and reference picture list modification may process only active reference pictures among those reference pictures that are marked as "used for reference” or alike.
  • Scalable video coding refers to coding structure where one bitstream can contain multiple representations of the content at different bitrates, resolutions or frame rates.
  • the receiver can extract the desired representation depending on its characteristics (e.g. resolution that matches best the display device).
  • a server or a network element can extract the portions of the bitstream to be transmitted to the receiver depending on e.g. the network characteristics or processing capabilities of the receiver.
  • a scalable bitstream may include a "base layer" providing the lowest quality video available and one or more enhancement layers that enhance the video quality when received and decoded together with the lower layers.
  • the coded representation of that layer may depend on the lower layers.
  • the motion and mode information of the enhancement layer can be predicted from lower layers.
  • the pixel data of the lower layers can be used to create prediction for the enhancement layer.
  • a scalable video codec for quality scalability also known as Signal-to-Noise or SNR
  • spatial scalability may be implemented as follows.
  • a base layer a conventional non-scalable video encoder and decoder is used.
  • the reconstructed/decoded pictures of the base layer are included in the reference picture buffer for an enhancement layer.
  • the base layer decoded pictures may be inserted into a reference picture list(s) for coding/decoding of an enhancement layer picture similarly to the decoded reference pictures of the enhancement layer. Consequently, the encoder may choose a base-layer reference picture as inter prediction reference and indicate its use e.g.
  • the decoder decodes from the bitstream, for example from a reference picture index, that a base-layer picture is used as inter prediction reference for the enhancement layer.
  • a base-layer picture is used as inter prediction reference for an enhancement layer, it is referred to as an inter-layer reference picture.
  • Scalability modes or scalability dimensions may include but are not limited to the following:
  • Base layer pictures are coded at a lower quality than enhancement layer pictures, which may be achieved for example using a greater quantization parameter value (i.e., a greater quantization step size for transform coefficient quantization) in the base layer than in the enhancement layer.
  • a greater quantization parameter value i.e., a greater quantization step size for transform coefficient quantization
  • Spatial scalability Base layer pictures are coded at a lower resolution (i.e. have fewer samples) than enhancement layer pictures. Spatial scalability and quality scalability may sometimes be considered the same type of scalability.
  • Bit-depth scalability Base layer pictures are coded at lower bit-depth (e.g. 8 bits) than enhancement layer pictures (e.g. 10 or 12 bits).
  • Dynamic range scalability Scalable layers represent a different dynamic range and/or images obtained using a different tone mapping function and/or a different optical transfer function.
  • Chroma format scalability Base layer pictures provide lower spatial resolution in chroma sample arrays (e.g. coded in 4:2:0 chroma format) than enhancement layer pictures (e.g. 4:4:4 format).
  • Color gamut scalability enhancement layer pictures have a richer/broader color representation range than that of the base layer pictures - for example the enhancement layer may have UHDTV (ITU-R BT.2020) color gamut and the base layer may have the ITU-R BT.709 color gamut.
  • ROI scalability An enhancement layer represents of spatial subset of the base layer. ROI scalability may be used together with other types of scalability, e.g. quality or spatial scalability so that the enhancement layer provides higher subjective quality for the spatial subset.
  • the base layer represents a first view
  • an enhancement layer represents a second view
  • Depth scalability which may also be referred to as depth-enhanced coding.
  • a layer or some layers of a bitstream may represent texture view(s), while other layer or layers may represent depth view(s).
  • base layer information could be used to code enhancement layer to minimize the additional bitrate overhead.
  • Scalability can be enabled in two basic ways. Either by introducing new coding modes for performing prediction of pixel values or syntax from lower layers of the scalable representation or by placing the lower layer pictures to the reference picture buffer (decoded picture buffer, DPB) of the higher layer.
  • the first approach is more flexible and thus can provide better coding efficiency in most cases.
  • the second, reference frame -based scalability, approach can be implemented very efficiently with minimal changes to single layer codecs while still achieving majority of the coding efficiency gains available.
  • a reference frame -based scalability codec can be implemented by utilizing the same hardware or software implementation for all the layers, just taking care of the DPB management by external means.
  • NAL Network Abstraction Layer
  • NAL units consist of a header and payload.
  • HEVC a two-byte NAL unit header is used for all specified NAL unit types, while in other codecs NAL unit header may be similar to that in HEVC.
  • the NAL unit header contains one reserved bit, a six-bit NAL unit type indication, a three-bit temporal_id_plusl indication for temporal level or sub-layer (may be required to be greater than or equal to 1) and a six-bit nuh layer id syntax element.
  • the abbreviation TID may be used to interchangeably with the Temporalld variable.
  • Temporalld equal to 0 corresponds to the lowest temporal level.
  • temporal_id_plusl is required to be non-zero in order to avoid start code emulation involving the two NAL unit header bytes.
  • the bitstream created by excluding all VCL NAL units having a Temporalld greater than or equal to a selected value and including all other VCL NAL units remains conforming. Consequently, a picture having Temporalld equal to tid value does not use any picture having a Temporalld greater than tid value as inter prediction reference.
  • a sub-layer or a temporal sub- layer may be defined to be a temporal scalable layer (or a temporal layer, TL) of a temporal scalable bitstream.
  • Such temporal scalable layer may comprise VCL NAL units with a particular value of the Temporalld variable and the associated non-VCL NAL units nuh layer id can be understood as a scalability layer identifier.
  • NAL units can be categorized into Video Coding Layer (VCL) NAL units and non-VCL NAL units.
  • VCL NAL units may be coded slice NAL units.
  • VCL NAL units contain syntax elements representing one or more CU.
  • the NAL unit type within a certain range indicates a VCL NAL unit, and the VCL NAL unit type indicates a picture type.
  • Images can be split into independently codable and decodable image segments (e.g. slices or tiles or tile groups). Such image segments may enable parallel processing, “Slices” in this description may refer to image segments constructed of certain number of basic coding units that are processed in default coding or decoding order, while “tiles” may refer to image segments that have been defined as rectangular image regions. A tile group may be defined as a group of one or more tiles. Image segments may be coded as separate units in the bitstream, such as VCL NAL units in H.264/AVC and HEVC. Coded image segments may comprise a header and a payload, wherein the header contains parameter values needed for decoding the payload.
  • a picture can be partitioned in tiles, which are rectangular and contain an integer number of CTUs.
  • the partitioning to tiles forms a grid that may be characterized by a list of tile column widths (in CTUs) and a list of tile row heights (in CTUs).
  • Tiles are ordered in the bitstream consecutively in the raster scan order of the tile grid.
  • a tile may contain an integer number of slices.
  • a slice consists of an integer number of CTUs.
  • the CTUs are scanned in the raster scan order of CTUs within tiles or within a picture, if tiles are not in use.
  • a slice may contain an integer number of tiles or a slice can be contained in a tile.
  • the CUs have a specific scan order.
  • a slice is defined to be an integer number of coding tree units contained in one independent slice segment and all subsequent dependent slice segments (if any) that precede the next independent slice segment (if any) within the same access unit.
  • a slice segment is defined to be an integer number of coding tree units ordered consecutively in the tile scan and contained in a single NAL (Network Abstraction Layer) unit. The division of each picture into slice segments is a partitioning.
  • an independent slice segment is defined to be a slice segment for which the values of the syntax elements of the slice segment header are not inferred from the values for a preceding slice segment
  • a dependent slice segment is defined to be a slice segment for which the values of some syntax elements of the slice segment header are inferred from the values for the preceding independent slice segment in decoding order.
  • a slice header is defined to be the slice segment header of the independent slice segment that is a current slice segment or is the independent slice segment that precedes a current dependent slice segment
  • a slice segment header is defined to be a part of a coded slice segment containing the data elements pertaining to the first or all coding tree units represented in the slice segment.
  • the CUs are scanned in the raster scan order of LCUs within tiles or within a picture, if tiles are not in use. Within an LCU, the CUs have a specific scan order.
  • a draft version of H.266/W C pictures are partitioned to tile along a tile grid (similarly to HEVC). Tiles are ordered in the bitstream in tile raster scan order within a picture, and CTUs are ordered in the bitstream in raster scan order within a tile. A tile group contains one or more entire tiles in bitstream order (i.e. tile raster scan order within a picture), and a VCL NAL unit contains one tile group. Slices have not been included in the draft version of H.266/WC. It is noted that what was described in this paragraph might still evolve in later draft versions of H.266/WC until the standard is finalized.
  • a motion-constrained tile set is such that the inter prediction process is constrained in encoding such that no sample value outside the motion-constrained tile set, and no sample value at a fractional sample position that is derived using one or more sample values outside the motion- constrained tile set, is used for inter prediction of any sample within the motion-constrained tile set. Additionally, the encoding of an MCTS is constrained in a manner that motion vector candidates are not derived from blocks outside the MCTS.
  • an MCTS may be defined to be a tile set that is independent of any sample values and coded data, such as motion vectors, that are outside the MCTS.
  • An MCTS sequence may be defined as a sequence of respective MCTSs in one or more coded video sequences or alike. In some cases, an MCTS may be required to form a rectangular area.
  • an MCTS may refer to the tile set within a picture or to the respective tile set in a sequence of pictures.
  • the respective tile set may be, but in general need not be, collocated in the sequence of pictures.
  • a motion-constrained tile set may be regarded as an independently coded tile set, since it may be decoded without the other tile sets.
  • sample locations used in inter prediction may be saturated so that a location that would be outside the picture otherwise is saturated to point to the corresponding boundary sample of the picture.
  • motion vectors may effectively cross that boundary or a motion vector may effectively cause fractional sample interpolation that would refer to a location outside that boundary, since the sample locations are saturated onto the boundary.
  • encoders may constrain the motion vectors on picture boundaries similarly to any MCTS boundaries.
  • the temporal motion-constrained tile sets SEI (Supplemental Enhancement Information) message of HEVC can be used to indicate the presence of motion-constrained tile sets in the bitstream.
  • each block row (such as CTU row in HEVC) of an image segment can be encoded and decoded in parallel.
  • WPP wavefront parallel processing
  • the state of the entropy codec at the beginning of a block row is obtained from the state of the entropy codec of the block row above after processing a certain block, such as the second block, of that row. Consequently, block rows can be processed in parallel with a delay of a certain number of blocks (e.g. 2 blocks) per each block row.
  • the processing of the current block row can be started when the processing of the block with certain index of the previous block row has been finished.
  • block rows can be processed in a parallel fashion.
  • it may be pre-defined e.g. in a coding standard which CTU is used for transferring the entropy (de)coding state of the previous row of CTUs or it may be determined and indicated in the bitstream by the encoder and/or decoded from the bitstream by the decoder.
  • Wavefront parallel processing with a delay less than 2 blocks may require constraining some prediction modes so that prediction from above and right side of the current block is avoided.
  • the per-block-row delay of wavefronts may be pre-defined, e.g. in a coding standard, and/or indicated by the encoder in or along the bitstream, and/or concluded by the decoder from or along the bitstream.
  • WPP processes rows of coding tree units (CTU) in parallel while preserving all coding dependencies.
  • CTU coding tree units
  • entropy coding, predictive coding as well as in-loop filtering can be applied in a single processing step, which makes the implementations of WPP rather straightforward.
  • CTU rows or tiles may be byte-aligned in the bitstream and may be preceded by a start code.
  • entry points may be provided in the bitstream (e.g. in the slice header) and/or externally (e.g. in a container file).
  • An entry point is a byte pointer or a byte count or a similar straightforward reference mechanism to the start of a CTU row (for wavefront-enabled coded pictures) or a tile.
  • entry points may be specified using entry_point_offset_minusl [ i ] of the slice header.
  • a non-VCL NAL unit may be for example one of the following types: a sequence parameter set, a picture parameter set, a supplemental enhancement information (SEI) NAL unit, an access unit delimiter, an end of sequence NAL unit, an end of bitstream NAL unit, or a filler data NAL unit.
  • SEI Supplemental Enhancement Information
  • Parameter sets may be needed for the reconstruction of decoded pictures, whereas many of the other non-VCL NAL units are not necessary for the reconstruction of decoded sample values.
  • Some coding formats specify parameter sets that may carry parameter values needed for the decoding or reconstruction of decoded pictures.
  • Parameters that remain unchanged through a coded video sequence may be included in a sequence parameter set (SPS).
  • the sequence parameter set may optionally contain video usability information (VUI), which includes parameters that may be important for buffering, picture output timing, rendering, and resource reservation.
  • VUI video usability information
  • a picture parameter set (PPS) contains such parameters that are likely to be unchanged in several coded pictures.
  • a picture parameter set may include parameters that can be referred to by the coded image segments of one or more coded pictures.
  • a header parameter set (HPS) has been proposed to contain such parameters that may change on picture basis.
  • a parameter set may be activated when it is referenced e.g. through its identifier.
  • a header of an image segment such as a slice header, may contain an identifier of the PPS that is activated for decoding the coded picture containing the image segment.
  • a PPS may contain an identifier of the SPS that is activated, when the PPS is activated.
  • An activation of a parameter set of a particular type may cause the deactivation of the previously active parameter set of the same type.
  • video coding formats may include header syntax structures, such as a sequence header or a picture header.
  • a sequence header may precede any other data of the coded video sequence in the bitstream order.
  • a picture header may precede any coded video data for the picture in the bitstream order.
  • the phrase along the bitstream (e.g. indicating along the bitstream) or along a coded unit of a bitstream (e.g. indicating along a coded tile) may be used in claims and described embodiments to refer to transmission, signaling, or storage in a manner that the "out-of-band" data is associated with but not included within the bitstream or the coded unit, respectively.
  • the phrase decoding along the bitstream or along a coded unit of a bitstream or alike may refer to decoding the referred out-of-band data (which may be obtained from out-of-band transmission, signaling, or storage) that is associated with the bitstream or the coded unit, respectively.
  • the phrase along the bitstream may be used when the bitstream is contained in a container file, such as a file conforming to the ISO Base Media File Format, and certain file metadata is stored in the file in a manner that associates the metadata to the bitstream, such as boxes in the sample entry for a track containing the bitstream, a sample group for the track containing the bitstream, or a timed metadata track associated with the track containing the bitstream.
  • a container file such as a file conforming to the ISO Base Media File Format
  • certain file metadata is stored in the file in a manner that associates the metadata to the bitstream, such as boxes in the sample entry for a track containing the bitstream, a sample group for the track containing the bitstream, or a timed metadata track associated with the track containing the bitstream.
  • a coded picture is a coded representation of a picture.
  • a Random Access Point (RAP) picture which may also be referred to as an intra random access point (IRAP) picture, may comprise only intra-coded image segments. Furthermore, a RAP picture may constrain subsequence pictures in output order to be such that they can be correctly decoded without performing the decoding process of any pictures that precede the RAP picture in decoding order.
  • RAP Random Access Point
  • IRAP intra random access point
  • An access unit may comprise coded video data for a single time instance and associated other data.
  • an access unit (AU) may be defined as a set of NAL units that are associated with each other according to a specified classification rule, are consecutive in decoding order, and contain at most one picture with any specific value of nuh layer id.
  • an access unit may also contain non-VCL NAL units. Said specified classification rule may for example associate pictures with the same output time or picture output count value into the same access unit.
  • coded pictures may appear in certain order within an access unit. For example, a coded picture with nuh layer id equal to nuhLayerldA may be required to precede, in decoding order, all coded pictures with nuh layer id greater than nuhLayerldA in the same access unit.
  • a bitstream may be defined as a sequence of bits, which may in some coding formats or standards be in the form of a NAL unit stream or a byte stream, that forms the representation of coded pictures and associated data forming one or more coded video sequences.
  • a first bitstream may be followed by a second bitstream in the same logical channel, such as in the same file or in the same connection of a communication protocol.
  • An elementary stream (in the context of video coding) may be defined as a sequence of one or more bitstreams.
  • the end of the first bitstream may be indicated by a specific NAL unit, which may be referred to as the end of bitstream (EOB) NAL unit and which is the last NAL unit of the bitstream.
  • EOB end of bitstream
  • a coded video sequence may be defined as such a sequence of coded pictures in decoding order that is independently decodable and is followed by another coded video sequence or the end of the bitstream.
  • Bitstreams or coded video sequences can be encoded to be temporally scalable as follows. Each picture may be assigned to a particular temporal sub-layer. Temporal sub-layers may be enumerated e.g. from 0 upwards. The lowest temporal sub-layer, sub-layer 0, may be decoded independently. Pictures at temporal sub-layer 1 may be predicted from reconstructed pictures at temporal sub-layers 0 and 1. Pictures at temporal sub-layer 2 may be predicted from reconstructed pictures at temporal sub-layers 0, 1 , and 2, and so on. In other words, a picture at temporal sub-layer N does not use any picture at temporal sub-layer greater than N as a reference for inter prediction. The bitstream created by excluding all pictures greater than or equal to a selected sub-layer value and including pictures remains conforming.
  • a sub-layer access picture may be defined as a picture from which the decoding of a sub- layer can be started correctly, i.e. starting from which all pictures of the sub-layer can be correctly decoded.
  • TSA temporal sub-layer access
  • STSA step-wise temporal sub-layer access
  • the TSA picture type may impose restrictions on the TSA picture itself and all pictures in the same sub- layer that follow the TSA picture in decoding order. None of these pictures is allowed to use inter prediction from any picture in the same sub-layer that precedes the TSA picture in decoding order.
  • the TSA definition may further impose restrictions on the pictures in higher sub-layers that follow the TSA picture in decoding order. None of these pictures is allowed to refer a picture that precedes the TSA picture in decoding order if that picture belongs to the same or higher sub-layer as the TSA picture.
  • TSA pictures have Temporalld greater than 0.
  • the STSA is similar to the TSA picture but does not impose restrictions on the pictures in higher sub-layers that follow the STSA picture in decoding order and hence enable up-switching only onto the sub-layer where the STSA picture resides.
  • Available media file format standards include ISO base media file format (ISO/IEC 14496- 12, which may be abbreviated ISOBMFF), MPEG-4 file format (ISO/IEC 14496-14, also known as the MP4 format), file format for NAL unit structured video (ISO/IEC 14496-15) and 3GPP file format (3GPP TS 26.244, also known as the 3GP format).
  • ISOBMFF ISO base media file format
  • MPEG-4 file format ISO/IEC 14496-14, also known as the MP4 format
  • file format for NAL unit structured video ISO/IEC 14496-15
  • 3GPP file format 3GPP TS 26.244
  • ISOBMFF Some concepts, structures, and specifications of ISOBMFF are described below as an example of a container file format, based on which the embodiments may be implemented.
  • the aspects of the invention are not limited to ISOBMFF, but rather the description is given for one possible basis on top of which the invention may be partly or fully realized.
  • a basic building block in the ISO base media file format is called a box.
  • Each box has a header and a payload.
  • the box header indicates the type of the box and the size of the box in terms of bytes.
  • a box may enclose other boxes, and the ISO file format specifies which box types are allowed within a box of a certain type. Furthermore, the presence of some boxes may be mandatory in each file, while the presence of other boxes may be optional. Additionally, for some box types, it may be allowable to have more than one box present in a file. Thus, the ISO base media file format may be considered to specify a hierarchical structure of boxes.
  • a file includes media data and metadata that are encapsulated into boxes. Each box is identified by a four character code (4CC) and starts with a header which informs about the type and size of the box.
  • 4CC four character code
  • the media data may be provided in a media data‘mdat‘ box and the movie‘moov’ box may be used to enclose the metadata.
  • the movie ‘moov’ box may include one or more tracks, and each track may reside in one corresponding TrackBox (‘trak’).
  • a track may be one of the many types, including a media track that refers to samples formatted according to a media compression format (and its encapsulation to the ISO base media file format).
  • a track may be regarded as a logical channel.
  • Movie fragments may be used e.g. when recording content to ISO files e.g. in order to avoid losing data if a recording application crashes, runs out of memory space, or some other incident occurs. Without movie fragments, data loss may occur because the file format may require that all metadata, e.g., the movie box, be written in one contiguous area of the file. Furthermore, when recording a file, there may not be sufficient amount of memory space (e.g., random access memory RAM) to buffer a movie box for the size of the storage available, and re-computing the contents of a movie box when the movie is closed may be too slow. Moreover, movie fragments may enable simultaneous recording and playback of a file using a regular ISO file parser.
  • memory space e.g., random access memory RAM
  • a smaller duration of initial buffering may be required for progressive downloading, e.g., simultaneous reception and playback of a file when movie fragments are used, and the initial movie box is smaller compared to a file with the same media content but structured without movie fragments.
  • the movie fragment feature may enable splitting the metadata that otherwise might reside in the movie box into multiple pieces. Each piece may correspond to a certain period of time of a track.
  • the movie fragment feature may enable interleaving file metadata and media data. Consequently, the size of the movie box may be limited, and the use cases mentioned above be realized.
  • the media samples for the movie fragments may reside in an mdat box, if they are in the same file as the moov box.
  • a moof box may be provided.
  • the moof box may include the information for a certain duration of playback time that would previously have been in the moov box.
  • the moov box may still represent a valid movie on its own, but in addition, it may include an mvex box indicating that movie fragments will follow in the same file.
  • the movie fragments may extend the presentation that is associated to the moov box in time.
  • the movie fragment there may be a set of track fragments, including anywhere from zero to a plurality per track.
  • the track fragments may in turn include anywhere from zero to a plurality of track runs (a.k.a. track fragment runs), each of which document is a contiguous run of samples for that track.
  • track runs a.k.a. track fragment runs
  • the metadata that may be included in the moof box may be limited to a subset of the metadata that may be included in a moov box and may be coded differently in some cases. Details regarding the boxes that can be included in a moof box may be found from the ISO base media file format specification.
  • a self-contained movie fragment may be defined to consist of a moof box and an mdat box that are consecutive in the file order and where the mdat box contains the samples of the movie fragment (for which the moof box provides the metadata) and does not contain samples of any other movie fragment (i.e. any other moof box).
  • the track reference mechanism can be used to associate tracks with each other.
  • the TrackReferenceBox includes box(es), each of which provides a reference from the containing track to a set of other tracks. These references are labeled through the box type (i.e. the four-character code of the box) of the contained box(es).
  • TrackGroupBox which is contained in TrackBox, enables indication of groups of tracks where each group shares a particular characteristic or the tracks within a group have a particular relationship.
  • the box contains zero or more boxes, and the particular characteristic or the relationship is indicated by the box type of the contained boxes.
  • the contained boxes include an identifier, which can be used to conclude the tracks belonging to the same track group.
  • the tracks that contain the same type of a contained box within the TrackGroupBox and have the same identifier value within these contained boxes belong to the same track group.
  • a coded video sequence comprises intra coded pictures (i.e. I pictures) and inter coded pictures (e.g. P and B pictures).
  • Intra coded pictures may use many more bits than inter coded pictures. Transmission time of such large (in size) intra coded pictures increases the encoder to decoder delay.
  • intra coded picture are not suitable for (ultra) low delay applications because of the long encoder to decoder delay.
  • an intra coded picture is needed at random access point. Therefore, for (ultra) low delay applications, it may be desirable that the both intra coded pictures and inter coded pictures have similar number of bits so that the encoder to decoder delay can be reduced to around 1 picture interval.
  • PIR Progressive intra refresh
  • Pictures within the refresh period i.e. pictures from the random access point (inclusive) to the recovery point (exclusive), may be considered to have at least two regions, a refreshed region and a "dirty" region.
  • the refreshed region can be exactly or approximately correctly decoded when the decoding is started from the random access point, while the decoded "dirty" region might not be correct in content when the decoding is started from the random access point.
  • the refreshed region may only inter-predicted from the refreshed region of the reference pictures within the same refresh period, i.e. sample values of the "dirty" region are not used in inter prediction of the refreshed region. Since the refreshed region in a picture may be larger than the refreshed region in the previous pictures, the intra coding may be used for the coding block locations that are newly added in the refreshed region compared to the refreshed regions of earlier pictures in the same refresh period.
  • Figure 1 shows a basic concept of horizontal PIR, where intra coded regions are spread over several pictures and a clean area (i.e. area that comprises the coded regions (inter coded denoted with A and intra coded denoted with B)) is gradually expanded horizontally from the top to the bottom.
  • POC(n) Picture Order Count
  • the white area C represents a dirty area.
  • Figure 2 shows an example of vertical PIR, where intra coded regions B are spread over several pictures and a clean (inter denoted with A and intra denoted with B) area is gradually expanded vertically from the left to the right.
  • the white area C represents a dirty area.
  • Figure 3 shows an example of central-diffusion PIR, where intra coded regions B are spread over several pictures and a clean (inter denoted with A and intra denoted with B) area is gradually expanded from the picture center to the picture boundary.
  • the advantage of central-diffusion PIR is that the content around the center of a picture are often more important than around the picture edges.
  • the white area C represents a dirty area.
  • the present embodiments are related to an improved progressive intra refresh, called wavefront PIR.
  • Wavefront PIR according to the present embodiments spreads intra coded regions over several pictures and gradually expands a clean area diagonally from the top-left comer to the bottom- right comer.
  • FIG. 1 - 3 shows that there are at least two advantages with wavefront PIR.
  • all the neighbouring blocks for an intra block in the intra coded clean area are in either the inter or the intra coded clean area and therefore, there is no need to impose restrictions for intra prediction modes.
  • the wavefront PIR enables parallel processing.
  • Wavefront PIR spreads intra coded regions over several pictures and gradually expands a clean area (having both inter coded regions A and intra coded regions B) diagonally from the top-left comer to the bottom-right comer, as shown in Figure 4.
  • POC(n) is a random access point
  • POC(n+N-l) is the recovery point.
  • the top-left comer regions (B) is coded in intra mode.
  • the intra coded region at POC(n) becomes the clean area (A), which is inter coded, and the wavefront area (B) next to the inter coded clean area is coded in intra mode.
  • the inter and intra coded clean areas at POC(n+2) becomes the inter coded clean area (A), and the wavefront area (B) next to the inter coded clean area is coded in intra mode.
  • the process continues, and the clean area is gradually expanded.
  • the inter and intra coded clean area at POC(n+N-2) becomes the inter coded clean area (A), and the bottom-right comer region (B) next to the inter coded clean area is coded in intra mode.
  • the numbers of intra coded blocks in intra coded regions in the N pictures should be similar, if not the same.
  • the area denoted with C is the dirty area.
  • wavefront PIR One advantage with wavefront PIR is that all the information required for decoding the intra coded blocks in the wavefront area are in the clean area.
  • Figure 5 shows an example of wavefront PIR, where the (light gray) blocks denoted with A are in the inter clean area, the (dark gray) blocks denoted with B are in the coded clean area, and the (white) blocks denoted with C are in the dirty area.
  • blocks in the intra clean area their left, above, above-left and above-right neighboring blocks are in either the inter clean area or in the intra clean area.
  • a block 19 in intra clean area its neighboring blocks 13 (left), 7 (above), 2 (above-left) and 12 (above-right) are in the inter clean area
  • block 24 in the intra clean area its neighboring blocks 12 (above) and 7 (above-left) are in the inter clean area and its neighboring blocks 19 (left) and 18 (above-right) are in the intra clean area.
  • Another advantage with wavefront PIR is a parallel processing.
  • the blocks on the wavefront can be processed in parallel because their left, above, above-left and above-right neighboring blocks are already coded and reconstructed.
  • blocks 1-16 make up the neighboring blocks for blocks 17-22.
  • blocks 1-16 should already have been coded and reconstructed.
  • blocks 17-22 can be coded in intra mode in parallel using the coding information from blocks 1-16.
  • blocks 23-28 can be intra coded in parallel using the coding information of blocks 6-22.
  • the same“exact match” is required, some restrictions have to be imposed for PIR.
  • the coding information such as reconstructed pixels, code modes, etc.
  • the blocks in the clean area can only temporally refer to the clean area of the previously-coded pictures.
  • the in-loop filtering e.g. deblocking, SAO, ALF
  • SAO deblocking
  • ALF intra prediction modes that do not use the reference pixels in the dirty area can be employed to the blocks in the intra clean area.
  • the coded pictures have similar, if not the same, number of the intra coded blocks for (ultra) low delay applications. For wavefront PIR, this will result in the last block in the intra coded clean area landed at any possible position in a picture.
  • the information about the position, or index, of the last block in the intra coded clean area within a current picture needs to be available for both encoder and decoder. So, both encoder and decoder will know the boundary between the clean area and dirty area.
  • the information about the position, or index, of the last block of the intra coded clean area within a current picture can be passed to decoder implicitly. For example, if the number of intra coded blocks for all the pictures are the same, the decoder will know the start and the end intra coded block in the intra coded clean area for each picture, and therefore the boundary between the clean area and the dirty area. Or if the number of intra coded blocks for a picture is set to a prefixed number depending upon the picture position within the window between the random access point and the recovery point, the decoder will also be able to figure out the boundary between the clean area and the dirty area.
  • the information about the position, or index, of the last block of the intra coded clean area within a current picture can also be explicitly signaled to decoder.
  • the signal can be in tile group header.
  • Figures 6 - 10 illustrate other possible PIR methods, where a clean area is gradually expanded from the center of picture.
  • Figure 6 shows a horizontal-diffusion PIR, where intra coded regions (B) are spread over several pictures from POC(n) to POC(n+N-l) and a clean (inter denoted with A, intra denoted with B) area is gradually expanded from the picture center to the picture top and bottom boundary.
  • Figure 7 shows a vertical-diffusion PIR, where intra coded regions (B) are spread over several pictures from POC(n) to POC(n+N-l), and a clean (inter denoted with A, intra denoted with B) area is gradually expanded from the picture center to the picture left and right boundaries. Area denoted with C represents the dirty area.
  • Figure 8 shows a wavefront-diffusion PIR, where intra coded regions (B) are spread over several pictures from POC(n) to POC(n+N-l) and a clean (inter denoted with A, intra denoted with B) area is gradually expanded from the picture center to the top-left and the bottom-right of picture. Area denoted with C represents the dirty area.
  • Figure 9 shows a subsample PIR, where subsampled intra coded blocks (green) are spread over several pictures from POC(n) to POC(n+N-l). At each POC, a low-resolution picture may be formed by using those subsampled intra and inter coded blocks or a low-quality picture may be constructed by interpolating the subsampled intra and inter coded blocks. The picture resolution or the picture quality is gradually improved from POC(n) to POC(n+N-l). Area denoted with C represents the dirty area
  • Figure 10 shows an example of wavefront PIR for two tile groups. For each tile, intra coded regions are spread over several pictures from POC(n) to POC(n+N-l), and clean (inter denoted with A, intra denoted with B) area is gradually expanded from the top-left to the bottom-right of the two tiles. Area denoted with C represents the dirty area.
  • SPS or PPS may contain flags and signals relating to PIR, such as PIR enable flag. If PlR enable flag is on, PlR type flag may further indicate which type of PIR is used; horizontal PIR, vertical PIR, wavefront PIR, or other type PIR. If wavefront PIR is indicated, additional flags, such as last blk idx enable flag, may follow last blk idx enable flag indicates if the tile group header will signal the information about the last block position, or index, of the intra coded clean area per picture, or tile group.
  • PIR enable flag If PlR enable flag is on, PlR type flag may further indicate which type of PIR is used; horizontal PIR, vertical PIR, wavefront PIR, or other type PIR. If wavefront PIR is indicated, additional flags, such as last blk idx enable flag, may follow last blk idx enable flag indicates if the tile group header will signal the information about the last block position, or index, of the intra coded clean area per
  • one or more syntax elements controlling the loop filtering along a wavefront-based boundary may be present, wherein controlling may for example mean enabling or disabling.
  • the boundary is indicated in either or the following ways:
  • n is signaled e.g. in tile group header.
  • the boundary is wavefront with index n until CTU with index m (within that wavefront).
  • the boundary is along the previous wavefront boundary.
  • the CTUs may be indexed e.g. starting from the top boundary of the picture towards the left boundary of the picture along a wavefront.
  • an encoder encodes one or more syntax elements (such as flags) controlling the loop filtering along a wavefront-based boundary and turns off loop filtering along at least one wavefront-based boundary when reconstructing a picture.
  • a decoder decodes encodes one or more syntax elements (such as flags) controlling the loop filtering along a wavefront- based boundary and turns off loop filtering along at least one wavefront-based boundary when reconstructing a picture.
  • n equal to 5 can be indicated (the boundary to turn off loop filtering is along the 5th wavefront) and m is not present (the loop filtering is disabled for the entire wavefront boundary).
  • FIG. 11 is a flowchart illustrating a method according to an embodiment.
  • a method comprises generating 1110 a bitstream of coded video sequence comprising pictures in a picture order; wherein the generating comprises for a first picture of the video sequence, coding 1120 a region on a comer of the picture in intra mode to generate a refresh region; repeating 1130 the following for each subsequent picture of the video sequence in a picture order: coding 1140 a region corresponding to the refresh region of a previous picture in inter mode, and coding a region that is diagonally next to the region corresponding to the refresh region of the previous picture in intra mode to expand the refresh region, and indicating 1150 in a bitstream a use of a diagonal refresh; transmitting 1160 the coded bitstream to a decoder.
  • An apparatus comprises means for generating a bitstream of coded video sequence comprising pictures in a picture order; wherein the means for generating comprises means for coding, for a first picture of the video sequence, a region on a comer of the picture in intra mode to generate a refresh region; for each subsequent picture of the video sequence in a picture order, means for coding a region corresponding to the refresh region of a previous picture in inter mode, and means for coding a region that is diagonally next to the region corresponding to the refresh region of the previous picture in intra mode to expand the refresh region; means for indicating in a bitstream a use of a diagonal refresh; and means for transmitting the coded bitstream to a decoder.
  • the means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry.
  • the memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of flowchart in Figure 11 according to various embodiments.
  • FIG. 12 is a flowchart illustrating a method according to another embodiment.
  • a method comprises receiving a bitstream of coded video sequence comprising pictures in a picture order; decoding from the bitstream an indication of a use of a diagonal refresh; for a first picture of the video sequence, decoding a region on a comer of the picture in intra mode to generate a refresh region; repeating the following for each subsequent picture of the video sequence in a picture order: decoding a region corresponding to the refresh region of a previous picture in inter mode, and decoding a region that is diagonally next to the region corresponding to the refresh region of the previous picture in intra mode to expand the refresh region, and generating a video sequence for rendering.
  • An apparatus comprises means for receiving a bitstream of coded video sequence comprising pictures in a picture order; means for decoding from the bitstream an indication of a use of a diagonal refresh; for a first picture of the video sequence, means for decoding a region on a comer of the picture in intra mode to generate a refresh region; for each subsequent picture of the video sequence in a picture order: means for decoding a region corresponding to the refresh region of a previous picture in inter mode, and means for decoding a region that is diagonally next to the region corresponding to the refresh region of the previous picture in intra mode to expand the refresh region, and means for generating a video sequence for rendering.
  • the means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry.
  • the memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of flowchart in Figure 12 according to various embodiments.
  • FIG. 13 An example of a data processing system for an apparatus is illustrated in Figure 13. Several functionalities can be carried out with a single physical device, e.g. all calculation procedures can be performed in a single processor if desired.
  • the data processing system comprises a main processing unit 100, a memory 102, a storage device 104, an input device 106, an output device 108, and a graphics subsystem 110, which are all connected to each other via a data bus 112.
  • the main processing unit 100 is a conventional processing unit arranged to process data within the data processing system.
  • the main processing unit 100 may comprise or be implemented as one or more processors or processor circuitry.
  • the memory 102, the storage device 104, the input device 106, and the output device 108 may include conventional components as recognized by those skilled in the art.
  • the memory 102 and storage device 104 store data in the data processing system 100.
  • Computer program code resides in the memory 102 for implementing, for example a method as illustrated in a flowchart of Figure 11 or Figure 12 according to various embodiments.
  • the input device 106 inputs data into the system while the output device 108 receives data from the data processing system and forwards the data, for example to a display.
  • the data bus 112 is a conventional data bus and while shown as a single line it may be any combination of the following: a processor bus, a PCI bus, a graphical bus, an ISA bus. Accordingly, a skilled person readily recognizes that the apparatus may be any data processing device, such as a computer device, a personal computer, a server computer, a mobile phone, a smart phone or an Internet access device, for example Internet tablet computer.
  • Figure 14 illustrates an example of a video encoder, where In: Image to be encoded; P’n: Predicted representation of an image block; Dn: Prediction error signal; D’n: Reconstructed prediction error signal; f’n: Preliminary reconstructed image; R’n: Final reconstructed image ; T, T-l : Transform and inverse transform; Q, Q-l : Quantization and inverse quantization; E: Entropy encoding; RFM: Reference frame memory; Pinter: Inter prediction; Pintra: fntra prediction; MS: Mode selection; F: Filtering.
  • Figure 15 illustrates a block diagram of a video decoder
  • P’n Predicted representation of an image block
  • D’n Reconstructed prediction error signal
  • f’n Preliminary reconstructed image
  • R’n Final reconstructed image
  • T-l Inverse transform
  • Q-l Inverse quantization
  • E-l Entropy decoding
  • RFM Reference frame memory
  • P Prediction (either inter or intra)
  • F Filtering.
  • An apparatus according to an embodiment may comprise only an encoder or a decoder, or both.
  • the various embodiments can be implemented with the help of computer program code that resides in a memory and causes the relevant apparatuses to carry out the method.
  • a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment.
  • a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.
  • the computer program code comprises one or more operational characteristics. Said operational characteristics are being defined through configuration by said computer based on the type of said processor, wherein a system is connectable to said processor by a bus. The programmable operational characteristic of the system are for implementing a method according to Figure 11 or Figure 12 according to various embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un procédé et un appareil de codage vidéo. Un procédé peut consister à générer un train de bits d'une séquence vidéo codée comprenant des images dans un ordre d'image; la génération comprenant 1) pour une première image de la séquence vidéo, coder une région sur un coin de l'image en mode intra pour générer une région de rafraîchissement; et 2) à répéter les étapes suivantes pour chaque image suivante de la séquence vidéo dans un ordre d'image : le codage d'une région correspondant à la région de rafraîchissement d'une image précédente en mode inter, et coder une région qui est adjacente en diagonale à la région correspondant à la région de rafraîchissement de l'image précédente en mode intra pour étendre la région de rafraîchissement, et indiquer dans un train de bits une utilisation d'un rafraîchissement diagonal et transmettre le train de bits codé à un décodeur.
EP20772663.9A 2019-03-21 2020-03-19 Procédé, appareil et produit de programme informatique pour le codage et le décodage vidéo Withdrawn EP3942803A4 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962821597P 2019-03-21 2019-03-21
PCT/FI2020/050176 WO2020188149A1 (fr) 2019-03-21 2020-03-19 Procédé, appareil et produit de programme informatique pour le codage et le décodage vidéo

Publications (2)

Publication Number Publication Date
EP3942803A1 true EP3942803A1 (fr) 2022-01-26
EP3942803A4 EP3942803A4 (fr) 2022-12-21

Family

ID=72520599

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20772663.9A Withdrawn EP3942803A4 (fr) 2019-03-21 2020-03-19 Procédé, appareil et produit de programme informatique pour le codage et le décodage vidéo

Country Status (2)

Country Link
EP (1) EP3942803A4 (fr)
WO (1) WO2020188149A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230362390A1 (en) * 2020-09-28 2023-11-09 Nokia Technologies Oy History-based motion vector prediction and mode selection for gradual decoding refresh
CN114630113B (zh) * 2021-02-23 2023-04-28 杭州海康威视数字技术股份有限公司 基于自适应帧内刷新机制的解码、编码方法及相关设备
CN114630122B (zh) * 2021-03-19 2023-04-28 杭州海康威视数字技术股份有限公司 基于自适应帧内刷新机制的解码、编码方法及相关设备
CN114630129A (zh) * 2022-02-07 2022-06-14 浙江智慧视频安防创新中心有限公司 一种基于智能数字视网膜的视频编解码方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI114679B (fi) * 2002-04-29 2004-11-30 Nokia Corp Satunnaisaloituspisteet videokoodauksessa
JP5481251B2 (ja) * 2010-03-31 2014-04-23 日立コンシューマエレクトロニクス株式会社 画像符号化装置
US20130094571A1 (en) * 2011-10-13 2013-04-18 Ati Technologies Ulc Low latency video compression
US10652572B2 (en) * 2016-04-29 2020-05-12 Ati Technologies Ulc Motion-adaptive intra-refresh for high-efficiency, low-delay video coding

Also Published As

Publication number Publication date
WO2020188149A1 (fr) 2020-09-24
EP3942803A4 (fr) 2022-12-21

Similar Documents

Publication Publication Date Title
US11818385B2 (en) Method and apparatus for video coding
US10136150B2 (en) Apparatus, a method and a computer program for video coding and decoding
CA2942730C (fr) Procede et appareil de codage et de decodage video
US10547867B2 (en) Method, an apparatus and a computer program product for video coding and decoding
US10154274B2 (en) Apparatus, a method and a computer program for video coding and decoding
KR102077900B1 (ko) 비디오 코딩 및 디코딩을 위한 장치, 방법 및 컴퓨터 프로그램
US20140003504A1 (en) Apparatus, a Method and a Computer Program for Video Coding and Decoding
US9900609B2 (en) Apparatus, a method and a computer program for video coding and decoding
US20150312580A1 (en) Apparatus, a method and a computer program for video coding and decoding
US20140254681A1 (en) Apparatus, a method and a computer program for video coding and decoding
US20140085415A1 (en) Method and apparatus for video coding
WO2020188149A1 (fr) Procédé, appareil et produit de programme informatique pour le codage et le décodage vidéo
US20170078703A1 (en) Apparatus, a method and a computer program for video coding and decoding
CA3137353A1 (fr) Procede pour le mode d'etablissement des couches de sortie dans un flux video multicouche
US20230217017A1 (en) Method, An Apparatus and a Computer Program Product for Implementing Gradual Decoding Refresh
US20220329787A1 (en) A method, an apparatus and a computer program product for video encoding and video decoding with wavefront-based gradual random access
EP4300957A1 (fr) Procédé, appareil et produit de programme informatique pour mettre en uvre un rafraîchissement de décodage progressif
WO2023131435A1 (fr) Rafraîchissement de décodage progressif
WO2019211514A1 (fr) Codage et décodage vidéo

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20211021

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20221121

RIC1 Information provided on ipc code assigned before grant

Ipc: G06T 9/00 20060101ALI20221115BHEP

Ipc: H04N 19/70 20140101ALI20221115BHEP

Ipc: H04N 19/174 20140101ALI20221115BHEP

Ipc: H04N 19/593 20140101ALI20221115BHEP

Ipc: H04N 19/503 20140101ALI20221115BHEP

Ipc: H04N 19/167 20140101ALI20221115BHEP

Ipc: H04N 19/159 20140101ALI20221115BHEP

Ipc: H04N 19/107 20140101AFI20221115BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20230620