WO2020008107A1 - Procédé, appareil et produit-programme informatique pour codage et décodage vidéo - Google Patents

Procédé, appareil et produit-programme informatique pour codage et décodage vidéo Download PDF

Info

Publication number
WO2020008107A1
WO2020008107A1 PCT/FI2019/050493 FI2019050493W WO2020008107A1 WO 2020008107 A1 WO2020008107 A1 WO 2020008107A1 FI 2019050493 W FI2019050493 W FI 2019050493W WO 2020008107 A1 WO2020008107 A1 WO 2020008107A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
block
samples
prediction
prediction block
Prior art date
Application number
PCT/FI2019/050493
Other languages
English (en)
Inventor
Alireza Aminlou
Miska Hannuksela
Alireza ZARE
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2020008107A1 publication Critical patent/WO2020008107A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/55Motion estimation with spatial constraints, e.g. at image or region borders
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel

Definitions

  • Various example embodiments relate to video coding and decoding, in particular, to prediction-constrained motion compensation.
  • a video coding system may comprise an encoder that transforms an input video into a compressed representation suited for storage/transmission, and a decoder that can uncompress the compressed video representation back into a viewable form.
  • the encoder may discard some information in the original video sequence in order to represent the video in a more compact form, for example, to enable the storage/transmission of the video information at a lower bitrate than otherwise might be needed.
  • Hybrid video codecs may encode the video information in two phases. Firstly, sample values (i.e. pixel values) in a certain picture area are predicted, e.g., by motion compensation means or by spatial mans. Secondly, the prediction error, i.e. the difference between the prediction block of samples and the original block of samples, is coded.
  • the video decoder reconstructs the output video by applying prediction means similar to the encoder to form a prediction representation of the sample blocks and prediction error decoding. After applying prediction and prediction error decoding means, the decoder sums up the prediction and prediction error signals to form the output video frame.
  • a method comprising obtaining at least one reference frame being partitioned into at least one region; determining at least one motion vector for the at least one reference frame for at least one sample in a block in a current frame, the at least one motion vector having a motion vector component for each direction of the block; determining at least one prediction block having prediction samples with values based on the at least one reference frame and the at least one motion vector; determining a corresponding region in the at least one reference frame corresponding to the at least one sample in the block; detecting contaminated samples in the least one prediction block, wherein the contaminated samples have been at least partly generated based on the at least one reference frame samples originating from other regions than the corresponding region; detecting non-contaminated samples in the at least one prediction block, wherein the non- contaminated samples have been generated only based on the at least one reference frame samples originating from the corresponding region; and generating a modified prediction block, wherein each sample of the modified prediction block is generated using the non- contaminated samples of the at least
  • an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: obtain at least one reference frame being partitioned into at least one region; determine at least one motion vector for the at least one reference frame for at least one sample in a block in a current frame, the at least one motion vector having a motion vector component for each direction of the block; determine at least one prediction block having prediction samples with values based on the at least one reference frame and the at least one motion vector; determine a corresponding region in the at least one reference frame corresponding to the at least one sample in the block; detect contaminated samples in the least one prediction block, wherein the contaminated samples have been at least partly generated based on the at least one reference frame samples originating from other regions than the corresponding region; detect non-contaminated samples in the at least one prediction block, wherein the non- contaminated samples have been generated only based on the at least one reference frame samples originating from the corresponding region;
  • a computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to obtain at least one reference frame being partitioned into at least one region; to determine at least one motion vector for the at least one reference frame for at least one sample in a block in a current frame, the at least one motion vector having a motion vector component for each direction of the block; to determine at least one prediction block having prediction samples with values based on the at least one reference frame and the at least one motion vector; to determine a corresponding region in the at least one reference frame corresponding to the at least one sample in the block; to detect contaminated samples in the least one prediction block, wherein the contaminated samples have been at least partly generated based on the at least one reference frame samples originating from other regions than the corresponding region; to detect non-contaminated samples in the at least one prediction block, wherein the non-contaminated samples have been generated only based on the at least one reference frame samples originating from the corresponding region; to generate a modified prediction block, where
  • the apparatus is configured to generate each sample of the modified prediction block by a weighted average of the co-located non-contaminated samples from the at least one prediction block when at least two co-located non-contaminated samples are available.
  • the apparatus is configured to generate each sample of the modified prediction block by copying each sample of the modified prediction block from the co-located non-contaminated sample when only one co-located non-contaminated sample is available in the at least one prediction blocks.
  • the apparatus is configured to generate each sample of the modified prediction block by setting each sample to a value, wherein the value is calculated from the non-contaminated samples when no co-located non-contaminated sample is available in the at least one prediction blocks.
  • the apparatus is configured to calculate the value from the non- contaminated samples by setting the value to a value of such non-contaminated sample that is located closest to the current sample in the at least one prediction block.
  • the apparatus is configured to calculate the value from the non- contaminated samples of the at least one prediction block by setting the value to one of the following: an average of the non-contaminated samples of the at least one prediction block; an average of the non-contaminated samples located in a same row; an average of the non- contaminated samples located in a same column.
  • the apparatus is configured to generate each sample of the modified prediction block by calculating weights of the non-contaminated samples based on a location of each sample of the modified prediction block, coordinates of the current region, the at least one motion vector, and a filter tap of a motion compensation filter.
  • the apparatus is configured to calculate the value from the non- contaminated samples of the at least one prediction block by setting the value to a fixed value.
  • the apparatus is configured to indicate the fixed value in a high- level header.
  • the apparatus is configured to generate each sample of the modified prediction block by calculating each sample of at least one intermediate block by non-contaminated samples of the at least one prediction block, and generating each sample of the modified prediction block using co-located samples in the at least one intermediate blocks by weighted average.
  • the apparatus is configured to determine the corresponding region by determining a location of the prediction sample of the at least one prediction block in the at least one reference frame according to a location of the sample in the current frame and a value of the motion vector, and determining the corresponding region according to the determined location.
  • the apparatus is configured to determine a location of a co located sample in the reference frame as the location of the sample of the block in the current frame and determine the corresponding region according to the determined location.
  • the apparatus is configured to set a region that comprises the majority of the prediction samples of the block in the reference frame to be the corresponding region.
  • the apparatus is configured to determine a motion vector for all samples in the block using the motion vector of the block.
  • the apparatus is configured to determine a motion vector for all samples in each subblock of the block using motion information of the block. According to an embodiment, the apparatus is configured to determine a motion vector for a sample by determining a motion vector for each sample in the block using motion information of the block.
  • the apparatus is configured to indicate with a flag whether the modification contaminated samples is applied.
  • the block has at least horizontal and vertical directions. According to an embodiment, different flags for different block sizes are used.
  • a region is one of the following: a tile, a tile set or a slice.
  • Fig. 1 shows an encoder according to an embodiment
  • Fig. 2 shows a decoder according to an embodiment
  • Figs. 3a, b show examples of motion vector candidate positions within a block
  • Fig. 4 shows an example of filtering for fractional motion compensation
  • Fig. 5 shows an example of motion compensation filter for fractional motion vector
  • Fig. 6 shows an example of four parameter affine motion compensation
  • Fig. 7 shows an example of an alternative method for four parameter
  • Fig. 8 shows an example of fractional motion compensation filtering near tile boundaries
  • Fig. 9 shows an example of using extractor tracks for tile-based omnidirectional video streaming
  • Fig. 10 shows an example of combining tiles from bitstreams of different resolution
  • Fig. 11 shows an example of encoding an input picture sequence of the equirectangular projection format
  • Fig. 12 shows an example of bitstreams of different resolutions for viewport-adaptive streaming
  • Fig. 13 shows an example of a method for reducing the number of extractor tracks in a resolution-adaptive viewport-dependent delivery
  • Fig. 14 shows an example for using tile tracks of the same resolution for tile-based omnidirectional video streaming
  • Fig. 15 shows an example of encoding several SHVC bitstreams for differing bitrates
  • Fig. 16 shows an example of constrained inter-layer prediction
  • Fig. 17 shows an example of spatially packed constrained inter-layer prediction
  • Fig. 18 shows an example of encoding two bitstreams with MCTS(s);
  • Fig. 19 shows an example of switching from enhanced quality tiles at first non-IRAP switching point
  • Fig. 20 shows an example of a file arrangement and a respective arrangement of
  • Fig. 21 shows an example of a motion compensation filtering near a tile boundary
  • Fig. 22 shows an example of merging uni-prediction blocks together to generate a merged bi-prediction block
  • Fig. 23 shows an example of region border processing
  • Fig. 24 shows another example of merging uni-prediction blocks together to generate a merged bi-prediction block
  • Fig. 25 shows yet another example of merging uni-prediction blocks together to generate a merged bi-prediction block
  • Fig. 26 shows an example of generating a block by using sample values of two neighboring blocks
  • Fig. 27 shows an example of affine motion compensation
  • Fig. 28 is a flowchart illustrating a method according to an embodiment.
  • Fig. 29 shows an apparatus according to an embodiment.
  • Example Embodiments will be described in the context of one video coding arrangement ft is to be noted, however, that the invention is not limited to this particular arrangement.
  • the invention may be applicable to video coding systems like streaming system, DVD (Digital Versatile Disc) players, digital television receivers, personal video recorders, systems and computer programs on personal computers, handheld computers and communication devices, as well as network elements such as transcoders and cloud computing arrangements where video data is handled.
  • video coding systems like streaming system, DVD (Digital Versatile Disc) players, digital television receivers, personal video recorders, systems and computer programs on personal computers, handheld computers and communication devices, as well as network elements such as transcoders and cloud computing arrangements where video data is handled.
  • decoding decoding and/or encoding
  • the Advanced Video Coding standard (which may be abbreviated AVC or H.264/AVC) was developed by the Joint Video Team (JVT) of the Video Coding Experts Group (VCEG) of the Telecommunications Standardization Sector of International Telecommunication Union (ITU- T) and the Moving Picture Experts Group (MPEG) of International Organization for Standardization (ISO) / International Electrotechnical Commission (IEC).
  • JVT Joint Video Team
  • VCEG Video Coding Experts Group
  • MPEG Moving Picture Experts Group
  • ISO International Organization for Standardization
  • ISO International Electrotechnical Commission
  • the H.264/AVC standard is published by both parent standardization organizations, and it is referred to as ITU- T Recommendation H.264 and ISO/IEC International Standard 14496-10, also known as MPEG-4 Part 10 Advanced Video Coding (AVC).
  • AVC MPEG-4 Part 10 Advanced Video Coding
  • High Efficiency Video Coding standard (which may be abbreviated HEVC or H.265/HEVC) was developed by the Joint Collaborative Team - Video Coding (JCT-VC) of VCEG and MPEG.
  • JCT-VC Joint Collaborative Team - Video Coding
  • the standard is published by both parent standardization organizations, and it is referred to as ITU-T Recommendation H.265 and ISO/IEC International Standard 23008-2, also known as MPEG-H Part 2 High Efficiency Video Coding (HEVC).
  • Extensions to H.265/HEVC include scalable, multiview, three-dimensional, and fidelity range extensions, which may be referred to as SHVC, MV-HEVC, 3D-HEVC, and REXT, respectively.
  • a syntax element may be defined as an element of data represented in the bitstream.
  • a syntax structure may be defined as zero or more syntax elements present together in the bitstream in a specified order.
  • bitstream syntax and semantics as well as the decoding process for error-free bitstreams are specified in H.264/AVC and HEVC.
  • the encoding process is not specified, but encoders must generate conforming bitstreams.
  • Bitstream and decoder conformance can be verified with the Hypothetical Reference Decoder (HRD).
  • HRD Hypothetical Reference Decoder
  • the standards contain coding tools that help in coping with transmission errors and losses, but the use of the tools in encoding is optional and no decoding process has been specified for erroneous bitstreams.
  • the elementary unit for the input to an H.264/AVC or HEVC encoder and the output of an H.264/AVC or HEVC decoder, respectively, is a picture.
  • a picture given as an input to an encoder may also be referred to as a source picture, and a picture decoded by a decoder may be referred to as a decoded picture.
  • the source and decoded pictures may each be comprised of one or more sample arrays, such as one of the following sets of sample arrays, wherein each of the samples represent one color component :
  • RGB Green, Blue and Red
  • Arrays representing other unspecified monochrome or tri-stimulus color samplings for example, YZX, also known as XYZ).
  • these arrays may be referred to as luma (or L or Y) and chroma, where the two chroma arrays may be referred to as Cb and Cr; regardless of the actual color representation method in use.
  • the actual color representation method in use may be indicated e.g. in a coded bitstream e.g. using the Video Usability Information (VUI) syntax of H.264/AVC and/or HEVC.
  • VUI Video Usability Information
  • a component may be defined as an array or a single sample from one of the three sample arrays (luma and two chroma) or the array or a single sample of the array that compose a picture in monochrome format.
  • a picture may either be a frame or a field.
  • a frame comprises a matrix of luma samples and possibly the corresponding chroma samples.
  • a field is a set of alternate sample rows of a frame. Fields may be used as encoder input for example when the source signal is interlaced. Chroma sample arrays may be absent (and hence monochrome sampling may be in use) or may be subsampled when compared to luma sample arrays.
  • each of the two chroma arrays has half the height and half the width of the luma array.
  • each of the two chroma arrays has the same height and half the width of the luma array.
  • each of the two chroma arrays has the same height and width as the luma array.
  • H.264/AVC and HEVC it is possible to code sample arrays as separate color planes into the bitstream and respectively decode separately coded color planes from the bitstream.
  • each one of them is separately processed (by the encoder and/or the decoder) as a picture with monochrome sampling.
  • the location of chroma samples with respect to luma samples may be determined in the encoder side (e.g. as pre-processing step or as part of encoding).
  • the chroma sample positions with respect to luma sample positions may be pre-defined for example in a coding standard, such as H.264/AVC or HEVC, or may be indicated in the bitstream for example as part of VUI of H.264/AVC or HEVC.
  • the source video sequence(s) provided as input for encoding may either represent interlaced source content or progressive source content. Fields of opposite parity have been captured at different times for interlaced source content. Progressive source content contains captured frames.
  • An encoder may encode fields of interlaced source content in two ways: a pair of interlaced fields may be coded into a coded frame or a field may be coded as a coded field.
  • an encoder may encode frames of progressive source content in two ways: a frame of progressive source content may be coded into a coded frame or a pair of coded fields.
  • a field pair or a complementary field pair may be defined as two fields next to each other in decoding and/or output order, having opposite parity (i.e.
  • Some video coding standards or schemes allow mixing of coded frames and coded fields in the same coded video sequence.
  • predicting a coded field from a field in a coded frame and/or predicting a coded frame for a complementary field pair may be enabled in encoding and/or decoding.
  • a partitioning may be defined as a division of a set into subsets such that each element of the set is in exactly one of the subsets.
  • a picture partitioning may be defined as a division of a picture into smaller non-overlapping units.
  • a block partitioning may be defined as a division of a block into smaller non- overlapping units, such as sub-blocks ln some cases, term block partitioning may be considered to cover multiple levels of partitioning, for example partitioning of a picture into slices, and partitioning of each slice into smaller units, such as macroblocks of H.264/AVC. ft is noted that the same unit, such as a picture, may have more than one partitioning. For example, a coding unit of HEVC may be partitioned into prediction units and separately by another quadtree into transform units.
  • Motion compensation mechanisms (which may also be referred to as temporal prediction or motion-compensated temporal prediction or motion-compensated prediction or MCP), which involve finding and indicating an area in one of the previously encoded video frames that corresponds closely to the block being coded.
  • syntax prediction where sample values can be predicted by spatial mechanisms which involve finding and indicating a spatial region relationship.
  • syntax prediction which may also be referred to as parameter prediction
  • syntax elements and/or syntax element values and/or variables derived from syntax elements are predicted from syntax elements (de)coded earlier and/or variables derived earlier.
  • Non limiting examples of syntax prediction are provided below:
  • motion vectors e.g. for inter and/or inter- view prediction may be coded differentially with respect to a block-specific motion vector ln many video codecs, the predicted motion vectors are created in a predefined way, for example by calculating the media of the encoded or decoded motion vectors of the adjacent blocks.
  • Another way to created motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference frames and signaling the chosen candidate as the motion vector predictor.
  • the reference index or previously coded/decoded picture can be predicted.
  • the reference index may be predicted from adjacent blocks and/or co-located blocks in temporal reference frame. Differential coding of motion vectors may be disabled across slice boundaries.
  • the block partitioning e.g. from coding tree unit (CTU) to coding units (CU) and down to prediction units (PUs) may be predicted.
  • CTU coding tree unit
  • CU coding units
  • PUs prediction units
  • filtering parameters e.g. for sample adaptive offset may be predicted.
  • Prediction approaches using image information from a previously coded image can also be called as inter prediction methods which may also be referred to as temporal prediction and motion compensation.
  • Prediction approaches using image information within the same image can also be called as intra prediction methods.
  • the second phase is coding the error between the prediction block of samples and the original block of samples. This may be accomplished by transforming the difference in sample values using a specified transform. This transform may be e.g. a Discrete Cosine Transform (DCT) or a variant thereof. After transforming the difference, the transformed difference is quantized, and entropy coded.
  • DCT Discrete Cosine Transform
  • the encoder can control the balance between the accuracy of the sample representation (i.e. the visual quality of the picture) and the size of the resulting encoded video representation (i.e. the file size or transmission bit rate).
  • the decoder reconstructs the output video by applying a prediction mechanism similar to that used by the encoder in order to form a prediction representation of the sample blocks (using the motion or spatial information created by the encoder and included in the compressed representation of the image) and prediction error decoding (the inverse operation of the prediction error coding to recover the quantized error signal in the spatial domain).
  • prediction error decoding the inverse operation of the prediction error coding to recover the quantized error signal in the spatial domain.
  • the decoder After applying sample prediction and error decoding processes the decoder combines the prediction and the prediction error signals (the sample values) to form the output video frame.
  • FIG. 1 illustrates an image to be encoded (I n ); a prediction representation of an image block (P’ n ); a prediction error signal (D n ); a reconstructed prediction error signal (D’ n ); a preliminary reconstructed image (I’ n ); a final reconstructed image (R’ n ); a transform (T) and inverse transform a quantization (Q) and inverse quantization (Q 1 ); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pinter); intra prediction (P mtra ); mode selection (MS) and filtering (F).
  • An example of a decoding process is illustrated in Fig. 2.
  • Fig. 2 An example of a decoding process is illustrated in Fig. 2.
  • Fig. 2 An example of a decoding process is illustrated in Fig. 2.
  • Fig. 2 An example of a decoding process is illustrated in Fig. 2.
  • Fig. 2 An example of a decoding process is illustrated
  • FIG. 2 illustrates a prediction representation of an image block (P’ n ); a reconstructed prediction error signal (D’ n ); a preliminary reconstructed image (I’ n ); a final reconstructed image (R’ n ); an inverse transform (T-l); an inverse quantization (Q 1 ); an entropy decoding (E 1 ); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F).
  • the decoder may also apply additional filtering processes in order to improve the quality of the output video before passing it for display and/or storing as a prediction reference for the forthcoming pictures in the video sequence.
  • motion information is indicated by motion vectors associated with each motion compensated image block, and an index which refers to one of the reference frames in RFM.
  • Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder) or decoded (at the decoder) and the prediction sources block in one of the previously coded or decoded image (or picture), named reference frame.
  • H.264/AVC and HEVC as many other video compression standards, divide a picture into a mesh of rectangles, for each of which a similar block in one of the reference frames is indicated for inter prediction. The location of the prediction block is coded as a motion vector that indicates the position of the prediction block relative to the block being coded.
  • H.264/AVC and HEVC include a concept of picture order count (POC).
  • a value of POC is derived for each picture and is non- decreasing with increasing picture position in output order. POC therefore indicates the output order of pictures.
  • POC may be used in the decoding process for example for implicit scaling of motion vectors in the temporal direct mode of bi-predictive slices, for implicitly derived weights in weighted prediction, and for reference frame list initialization. Furthermore, POC may be used in the verification of output order conformance.
  • Inter prediction process may comprise one or more of the following features:
  • motion vectors may be of quarter-sample accuracy
  • sample values in fractional-sample potions may be obtained using a finite impulse response (FIR) filter.
  • FIR finite impulse response
  • Inter prediction mode and/or motion vector prediction mode such asaffine motion compensation (e.g. as specified in the JEM exploratory codec of the Joint Video Exploration Team), overlapped block motion compensation (OBMC, e.g. as specified in JEM or H.263) and merge mode (e.g. as specified in HEVC).
  • affine motion compensation e.g. as specified in the JEM exploratory codec of the Joint Video Exploration Team
  • OBMC overlapped block motion compensation
  • merge mode e.g. as specified in HEVC
  • reference frames for inter prediction.
  • the sources of inter prediction are previously decoded pictures.
  • Many coding standards including H.264/AVC and HEVC, enable storage of multiple reference frames for inter prediction and selection of the used reference frame on a block basis. For example, reference frames may be selected on macroblock or macroblock partition basis in H.264/AVC and on PU or CU basis in HEVC.
  • Many coding standards, such as H.264/AVC and HEVC include syntax structures in the bitstream that enable decoders to create one or more reference frames lists. A reference frame index to a reference frame list may be used to indicate which one of the multiple reference frames is used for inter prediction for a particular block.
  • a reference frame index may be coded by an encoder into the bitstream in some inter coding modes or it may be derived (by an encoder and a decoder) for example using neighboring blocks in some other inter coding modes, for example merge mode, OBMC, etc.
  • motion vectors may be coded differentially with respect to a block-specific predicted motion vector.
  • the predicted motion vectors are created in a predefined way, for example, by calculating the median of the encoded or decoded motion vectors of the adjacent blocks.
  • Another way to create motion vectors predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference frames and signaling the chosen candidate as the motion vector predictor.
  • the reference index of previously coded/decoded picture can be predicted. The reference index may be predicted, e.g. from adjacent blocks and/or co-located blocks in temporal reference frame. Differential coding of motion vectors may be disabled across slice boundaries.
  • High efficiency video codecs may employ an additional motion information coding/decoding mechanism, often called merging/merge mode, where all the motion field information, which includes motion vector and corresponding reference frame index for each available reference frame list, is predicted and used without any modification/correction.
  • predicting the motion field information may be carried out using the motion field information of adjacent blocks and/or co-located blocks in temporal reference frames and the used motion field information is signaled among a list of motion field candidate list filled with motion field information of available adjacent/co-located blocks.
  • H.264/AVC and HEVC enable the use of a single prediction block in P slices (herein referred to as uni-predictive slices) or a linear combination of two motion-compensated prediction blocks for bi-predictive slices, which are also referred to as B slices, or a weighted prediction of uni-predictive and bi-predictive block where weights may be signaled in slice or frame level lndividual blocks in B slices may be bi-prediction, uni-prediction, or intra-prediction, and individual blocks in P slices may be uni-prediction or intra-prediction.
  • the reference frames for a bi-predictive picture may not be limited to be the subsequent picture and the previous picture in output order, but rather any reference frames may be used ln many coding standards, such as H.264/AVCE and HEVC, one reference frame list, referred to as reference frames list 0, in constructed for P slices, and two reference frame lists, list 0 and list 1 are constructed for B slices.
  • reference frames list 0 in constructed for P slices
  • list 0 and list 1 are constructed for B slices.
  • prediction in forward direction may refer to prediction from a reference frame in reference frame list 0
  • prediction in backward direction may refer to prediction from a reference frame in reference frame list 1 , even though the reference frames for prediction may have any decoding or output order relation to each other or to the current picture.
  • H.264/AVC allows weighted prediction for both P and B slices ln implicit weighted prediction, the weights are proportional to picture order counts (POC), while in explicit weighted prediction, prediction weights are explicitly indicated.
  • a final prediction block is derived as a sample-wise weighted sum of a number of constituent prediction blocks, where the number may for example be equal to 3.
  • One of the constituent prediction blocks is derived using the motion vector(s) of the current block.
  • Other constituent prediction blocks may be derived using motion vectors of adjacent blocks (e.g. two constituent prediction blocks may be derived, one using the motion vector of the block above or below, and another using the motion vector from the left or right side of the current block). For each prediction block or sample position, the adjacent motion vectors of the blocks at the two nearest block borders are used.
  • the sample values are weighted sample-wise in a manner that the constituent prediction block derived with the motion vector of the current block has the greatest weights among all constituent prediction blocks, and its weights decrease towards the block edges (considered to be furthest away from the location of the motion vector of the current block).
  • the weights of the other constituent prediction blocks increase towards the block where their motion vector originates from.
  • the motion vector prediction process may involve spatially adjacent motion vectors and/or motion vectors from other pictures (temporal, inter-layer, or inter- view reference frames).
  • the motion vector candidate positions for blocks are shown as an example in Figs. 3a and 3b, where black dots 301 indicate sample positions directly adjacent to block X, defining positions of possible MVPs.
  • Fig. 3a illustrates spatial MVPs positions Ao, Ai, Bo, Bi, B 2
  • Fig. 3b illustrates temporal MVPs positions Co, Ci, where Y is the collocated block of X in a reference frame (does not necessarily match with a PB in this reference frame).
  • Positions Co and Ci are candidates for the TMVP.
  • a picture partitioning may be defined as a division of a set into subsets such that each element of the sets is in exactly one of the subsets.
  • a macroblock is a 16x16 block of luma samples and the corresponding blocks of chroma samples. For example, in the 4:2:0 sampling pattern, a macroblock contains one 8x8 block of chroma samples per each chroma component.
  • a picture is partitioned to one or more slice groups, and a slice group contains one or more slices.
  • a slice consists of an integer number of macroblock ordered consecutively in the raster scan within a particular slice group.
  • a coding block may be defined as an NxN block of samples for some value of N such that the division of a coding tree block into coding blocks is a partitioning.
  • a coding tree block may be defined as an NxN block of samples for some value of N such that the division of a component into coding tree blocks is a partitioning.
  • a coding tree unit may be defined as a coding tree block of luma samples, two tree block of samples of a monochrome picture or a picture that is coded using three separate color planes and syntax structures used to code the samples.
  • a coding unit may be defined as a coding block of luma samples, two corresponding coding blocks of chroma samples of a picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is coded using three separate color planes and syntax structures used to code the samples.
  • video pictures are divided into coding units (CU) covering the area of the picture.
  • a CU consists of one or more prediction units (PU) defining the prediction process for the samples within the CU and one or more transform units (TU) defining the prediction error coding process for the samples in said CU.
  • a CU may consist of a square block of samples with a size selectable from a predefined set of possible CU sizes.
  • a CU with the maximum allowed size may be named as largest coding unit (LCU) or coding tree unit (CTU) and the video picture is divided into non- overlapping LCUs.
  • An LCU can be further split into a combination of smaller CUs, e.g.
  • Each resulting CU may have at least one PU and at least one TU associated with it.
  • Each PU and TU can be further split into smaller PUs and TUs in order to increase granularity of the prediction and prediction error coding processes, respectively.
  • Each PU has prediction information associated with it defining what kind of a prediction is to be applied for the samples within that PU (e.g. motion vector information for inter predicted PUs and intra prediction directionality information for intra predicted PUs).
  • Each TU can be associated with information describing the prediction error decoding process for the samples within said TU (including e.g. DCT coefficient information) ft may be signaled at CU level whether prediction error coding is applied or not for each CU. ln the case there is no prediction error residual associated with the CU, it can be considered that there are not TUs for said CU.
  • the division of the image into CUs, and division of CUs into PUs and TUs may be signaled in the bitstream allowing the decoder to reproduce the intended structure of these units.
  • a picture can be partitioned into tiles, which are rectangular and contain an integer number of CTUs.
  • the partitioning to tiles forms a grid that may be characterized by a list of tile column widths (in CTUs) and a list of tile row heights (in CTUs).
  • Tiles are ordered in the bitstream consecutively in the raster scan order of the tile grid.
  • a tile may contain an integer number of slices.
  • a slice consists of an integer number of CTUs. The CTUs are scanned in the raster scan order of CTUs within tiles or within a picture, if tiles are not in use.
  • a slice may contain an integer number of tiles or a slice can be contained in a tile.
  • the CUs have a specific scan order.
  • a slice is defined to be an integer number of coding tree units contained in one independent slice segment and all subsequent dependent slice segments (if any) that precede the next independent slice segment (if any) within the same access unit.
  • a slice segment is defined to be an integer number of coding tree units ordered consecutively in the tile scan and contained in a single NAL (Network Abstraction Layer) unit. The division of each picture into slice segments is a partitioning.
  • an independent slice segment is defined to be a slice segment for which the values of the syntax elements of the slice segment header are not inferred from the values for a preceding slice segment
  • a dependent slice segment is defined to be a slice segment for which the values of some syntax elements of the slice segment header are inferred form the values for the preceding independent slice segment in decoding order.
  • a slice header is defined to be the slice segment header of the independent slice segment that is current slice segment or is the independent slice segment that precedes a current dependent slice segment
  • a slice segment header is defined to be part of a coded slice segment containing the data elements pertaining to the first or all coding tree units represented in the slice segment.
  • the CUs are scanned in the raster scan order of LCUs within tiles or within a picture, if tiles are not in use. Within an LCU, the CUs have a specific scan order.
  • the filtering for fractional motion compensation may be done by a finite impulse response filter as shown in Fig. 4 illustrating an embodiment in one dimension, where Ci’s are the filter coefficients (in this example the filter tap is 8).
  • Fractional motion compensation may be performed in two consecutive steps, called horizontal and vertical filtering. An example of this is shown in Fig. 5, where fractional motion vector is (0.5, 0.5)
  • Horizontal filtering The input single samples 501 (samples of the reference frame in integer sample positions, shown in square shape) are filtered in horizontal direction according to the horizontal component of the fractional motion vector, and intermediate samples 502 (shown in triangle shape) are generated as output.
  • the filtering in this example is 8-tap, so for each intermediate sample 502 (i.e., triangles), four samples in the left and right side of that sample are needed.
  • Next the intermediate samples 502 (i.e., triangles) are filtered in vertical direction according to the vertical component of the fractional motion vector, and the final samples 503 (shown in circle) are generated.
  • the vertical filtering in this example is 8-tap, so for each final sample 503 (i.e., circle), four samples in the top and bottom side of that sample are needed.
  • Motion compensation can be done using several complex motion models which need more motion information, for example six parameter affine, motion model with four motion vectors for four comers of the block, or elastic motion compensation which represent the motion information of the samples in the block based on cosine functions. There are other motion models including perspective or polynomial.
  • Affine motion compensation can be done according to various ways.
  • limited affine motion compensation, supporting only zooming and rotation (and their combination) needs two motion vectors for top-left and top-right of the block. Then the motion vectors for all the samples inside the block are calculated based on these two motion vectors ln this case, as the (fractional) motion vector for each sample inside the block can be different, each sample should be calculated separately. This needs huge amount of calculation for a block. This case is shown in Figure 6.
  • the alternative method is to divide the block to smaller subblocks (e.g.
  • each subblock has a (fractional) motion vector, and the number of calculation for filtering is reduced significantly.
  • the more optimal way is to select the size of the subblock based on those two motion vectors. Particularly, when the difference of two motion vectors is small, larger subblocks can be used, and when the difference of motion vectors is large, smaller subblocks should be used. This case is shown in Figure 7.
  • the abovementioned techniques can be used with other complex motion compensation models.
  • the sources of prediction are previously decoded pictures (a.k.a. reference pictures).
  • intra block copy IBC; a.k.a. intra-block-copy prediction or IntraBlockCopy
  • prediction is applied similarly to temporal prediction but the reference picture is the current picture and only previously decoded samples can be referred in the prediction process.
  • the required integer samples in the reference frame may come from the other tiles.
  • motion compensation of a block happened close to top and left boundary 800 of the current tile, so some of the samples (shown in with reference number 801) are needed from other tiles.
  • Motion compensation of a block may be limited to use the samples of a specific region in the reference frame.
  • This region can be a tile, tile set, slice, or a combination of slices.
  • “tile” is used as an example of“region”. However, it should be appreciated that whenever a/the“tile” is mentioned, any type of region (a tile set, a slice, or a combination of slices) can be meant.
  • a motion-constrained tile set refers to a solution, where the inter prediction process is constrained in encoding such that no sample value outside the motion-constrained tile set, and no sample value at a fractional sample position that is derived using one or more sample values outside the motion-constrained tile set, is used for inter prediction of any sample within the motion-constrained tile set.
  • sample locations used in inter prediction are saturated so that a location that would be outside the picture otherwise is saturated to point to the corresponding boundary sample of the picture.
  • motion vectors may effectively cross that boundary or a motion vector may effectively cause fractional sample interpolation that would refer to a location outside that boundary, since the sample locations are saturated onto the boundary. In some applications, it may be desired not to let the motion vector cross the boundary of the tile which is located in picture boundary.
  • the temporal motion-constrained tile sets SEI (supplemental enhancement information) message of HEVC can be used to indicate the presence of motion-constrained tile sets in the bitstream.
  • viewport dependent delivery One of the trends in streaming for reducing the streaming bitrate of video (especially virtual reality (VR) contents) is known as viewport dependent delivery and can be explained as follows: a subset of video content (e.g. 360-degree video content) covering the primary viewport (i.e., the current view orientation) is transmitted at the best quality/resolution, while the remaining of 360-degree video is transmitted at a lower quality/resolution.
  • video content e.g. 360-degree video content
  • the primary viewport i.e., the current view orientation
  • viewport-adaptive streaming There are generally two approaches for viewport-adaptive streaming:
  • Viewport-specific encoding and streaming a.k.a. viewport-dependent encoding and streaming, a.k.a. asymmetric projection, a.k.a. packed VR video.
  • 360- degree image content is packed into the same frame with an emphasis (e.g. greater spatial area) on the primary viewport.
  • the packed VR frames are encoded into a single bitstream.
  • the front face of a cube map may be sampled with a higher resolution compared to other cube faces and the cube faces may be mapped to the same packed VR frame, where the front cube face is sampled with twice the resolution compared to the other cube faces.
  • VR viewport video a.k.a. tile-based encoding and streaming.
  • 360- degree content is encoded and made available in a manner that enables selective streaming of viewports from different encodings.
  • An approach of tile-based encoding and streaming which may be referred to as tile rectangle-based encoding and streaming or sub-picture- based encoding and streaming, may be used with any video codec, even if tiles similar to HEVC were not available in the codec or even if motion-constrained tile sets or alike were not implemented in an encoder.
  • tile rectangle-based encoding the source content is split into tile rectangle sequences (a.k.a. sub-picture sequences) before encoding.
  • Each tile rectangle sequence covers a subset of the spatial area of the source content, such as full panorama content, which may e.g. be of equirectangular projection format.
  • Each tile rectangle sequence is then encoded independently from each other as a single-layer bitstream.
  • bitstreams may be encoded from the same tile rectangle sequence, e.g. for different bitrates.
  • Each tile rectangle bitstream may be encapsulated in a file as its own track (or alike) and made available for streaming.
  • the client may receive tracks covering the entire omnidirectional content. Better quality or higher resolution tracks may be received for the current viewport compared to the quality or resolution covering the remaining, currently non- visible viewports.
  • each track may be decoded with a separate decoder instance.
  • each cube face may be separately encoded and encapsulated in its own track (and Representation). More than one encoded bitstream for each cube face may be provided, e.g. each with different spatial resolution.
  • Players can choose tracks (or Representations) to be decoded and played based on the current viewing orientation. High-resolution tracks (or Representations) may be selected for the cube faces used for rendering for the present viewing orientation, while the remaining cube faces may be obtained from their low-resolution tracks (or Representations).
  • bitstream comprises motion-constrained tile sets.
  • bitstreams of the same source content are encoded using motion-constrained tile sets.
  • one or more motion-constrained tile set sequences are extracted from a bitstream, and each extracted motion-constrained tile set sequence is stored as a tile set track (e.g. an HEVC tile track or a full-picture-compliant tile set track) in a file.
  • a tile base track e.g. an HEVC tile base track or a full picture track comprising extractors to extract data from the tile set tracks
  • the tile base track represents the bitstream by implicitly collecting motion-constrained tile sets from the tile set tracks or by explicitly extracting (e.g. by HEVC extractors) motion-constrained tile sets from the tile set tracks.
  • Tile set tracks and the tile base track of each bitstream may be encapsulated in an own file, and the same track identifiers may be used in all files.
  • the tile set tracks to be streamed may be selected based on the viewing orientation.
  • the client may receive tile set tracks covering the entire omnidirectional content. Better quality or higher resolution tile set tracks may be received for the current viewport compared to the quality or resolution covering the remaining, currently non- visible viewports.
  • equirectangular panorama content is encoded using motion-constrained tile sets. More than one encoded bitstream may be provided, e.g. with different spatial resolution and/or picture quality. Each motion-constrained tile set is made available in its own track (and Representation).
  • Players can choose tracks (or Representations) to be decoded and played based on the current viewing orientation.
  • High-resolution or high- quality tracks (or Representations) may be selected for tile sets covering the present primary viewport, while the remaining area of the 360-degree content may be obtained from low-resolution or low-quality tracks (or Representations).
  • ft is also possible to combine the above approaches 1 (viewport-specific encoding and streaming) and 2 (tile-based encoding and streaming). It needs to be understood that tile-based encoding and streaming may be realized by splitting a source picture in tile rectangle sequences that are partly overlapping. Alternatively, or additionally, bitstreams with motion-constrained tile sets may be generated from the same source content with different tile grids or tile set grids.
  • the 360-degrees space divided into a discrete set of viewports, each separate by a given distance (e.g., expressed in degrees), so that the omnidirectional space can be imagined as a map of overlapping viewports, and the primary viewport is switched discretely as the user changes his/her orientation while watching content with an HMD.
  • the overlapping between viewports is reduced to zero, the viewports could be imagined as adjacent non- overlapping tiles within the 360-degrees space.
  • HEVC bitstreams of the same omnidirectional source content can be encoded at different resolutions using motion-constrained tile sets.
  • tile tracks are formed from each motion-constrained tile set sequence.
  • Clients that are capable of decoding HEVC tile streams can receive and decode tile tracks independently.
  • 'hvc2'/'hev2' tracks containing extractors can be formed for each expected viewing orientation.
  • An extractor track corresponds to a dependent Representation in the DASH MPD, with @dependencyld including the Representation identifiers of the tile tracks from which the tile data is extracted.
  • Clients that are not capable of decoding HEVC tile streams but only fully compliant HEVC bitstreams can receive and decode the extractor tracks.
  • Fig. 9 illustrates an example of using extractor tracks for tile-based omnidirectional video streaming.
  • a 4x2 tile grid has been used in forming of the motion-constrained tile sets 901, 902. In many viewing orientations 2x2 tiles out of the 4x2 tile grid are needed to cover a typical field of view of a head-mounted display.
  • the presented extractor track for high- resolution MCTSs 1, 2, 5 and 6 covers certain viewing orientations, while the extractor track for low-resolution MCTSs 3, 4, 7, and 8 includes a region assumed to be non- visible for these viewing orientations.
  • Two HEVC decoders are used in this example, one for the high- resolution extractor track and another for the low-resolution extractor track. While Fig. 9 refers to tile tracks, it should be understood that sub-picture tracks can be similarly formed.
  • Tile merging in a coded domain is beneficial for the following purposes:
  • Fig. 10 By selecting the vertical or horizontal tile grid to be aligned in bitstreams of different resolution, it is possible to combine tiles from bitstreams of different resolution and use a single decoder for decoding the resulting the bitstream. This is illustrated as an example in Fig. 10, where black boundaries 1001 indicate motion-constrained tile sets and light gray boundaries 1002 indicate tile boundaries without motion constraints.
  • black boundaries 1001 indicate motion-constrained tile sets and light gray boundaries 1002 indicate tile boundaries without motion constraints.
  • four tiles of the high-resolution version are selected.
  • Four tiles of 4x2 tile grid provides high-resolution viewport of 90° field-of-view (FOV) in all viewing orientations (at 98% coverage of the viewport) and in a vast majority of viewing orientations (at 100% coverage).
  • FOV field-of-view
  • Tile boundaries break in-picture prediction. For example, intra prediction and spatial motion prediction is not applied across tile boundaries, and entropy coding state is not carried over a tile boundary. Since the high-resolution bitstream has a tile grid that is twice as fine as the motion- constrained tile set grid, the rate- distortion of the high-resolution bitstream is compromised.
  • Codecs that do not have tiles but use slices for realizing multi-resolution sub-picture merging may be limited in the number of slices that can be supported. For example, H.264/AVC would not be able to handle 12 vertically arranged slices that are required to realize the above scenario.
  • Fig. 11 shows an example of encoding an input picture sequence that is of the equirectangular projection format.
  • two versions of the low- resolution content are encoded.
  • the versions have a horizontal offset equivalent to half of the low-resolution tile width or a yaw angle of half of the yaw range of the low-resolution tile. Note that as the content covers 360-degrees horizontally, the horizontal offset may be understood by moving a vertical slice of a picture from one side to the other side, e.g. from the left side to the right side.
  • Fig. 12 illustrates an example for improving the rate-distortion (RD) performance and/or lowering the viewport quality update delay related to viewport changes of MCTS-based viewport-adaptive streaming.
  • RD rate-distortion
  • SAPs stream access points
  • MCTSs whose resolution does not change in response to the viewing orientation change (i.e., tile rectangles 1, 3, 5, and 7)
  • o are located in the same position within the merged frame as earlier, and o are taken from the bitstreams with a long SAP interval.
  • Tile rectangles whose resolution changes in response to the viewing orientation change are taken from the bitstreams with a short SAP interval and are located in available positions within the merged frame.
  • Indications of region- wise packing are included in or along the merged bitstream.
  • the number of extractor tracks in resolution-adaptive viewport-dependent delivery or playback can be reduced, as shown in example of Fig. 13.
  • the encoded bitstreams are stored as tile or sub-picture tracks in the file.
  • a group of tile or sub-picture tracks that are alternatives for extraction is indicated.
  • the tiles or sub-picture tracks need not represent the same packed region but are of the same size in terms of width and height in pixels.
  • the track group identifier is required to differ from all track identifiers.
  • two altemative-for-extraction groups are generated, a first one comprising 1280x1280 tiles from the high-resolution bitstream, and a second comprising 1280x640 tiles from the two low-resolution bitstreams.
  • Extractors are set to refer to an alternative-for- extraction track group rather than individual tile or sub-picture tracks.
  • a sample in this example comprises six extractors a, b, c, d, e, f.
  • Extractors a to d extract from the altemative- for-extraction track group comprising 1280x1280 tile or sub-picture tracks.
  • Extractors e and f extract from theretemative-for-extraction track group comprising 1280x640 tile or sub-picture tracks.
  • the region- wise packing information is split into two pieces, where a first piece excludes the packed region location and is stored in the tile or sub-picture tracks, and a second piece incudes the packed region location and is stored in the extractor track.
  • Omnidirectional video preselections are indicated, each defining a combination of tile or sub picture tracks. Each preselection indicates from which individual sub-picture or tile track(s) data is extracted from. Characteristics of a preselection may be indicated, e.g. comprising the sphere region of a higher resolution (than other regions) and its effective resolution in terms of width and height of the projected picture that it originates from.
  • An extractor track may be needed for each distinct combination to select high-resolution or high-quality tiles.
  • a single extractor track is needed, where each distinct combination to select high-resolution or high-quality tiles only requires an omnidirectional preselection box, thus the complexity of managing a large number of extractor tracks may be avoided.
  • HEVC bitstreams of the same omnidirectional source content may be encoded at the same resolution but different qualities and bitrates using motion-constrained tile sets.
  • the MCTS grid in all bitstreams is identical ln order to enable the client to use of the same tile base track for reconstructing a bitstream from MCTSs received from different original bitstreams, each bitstream is encapsulated in its own file, and the same track identifier is used for each tile track of the same tile grid position in all these files.
  • HEVC tile tracks are formed from each motion-constrained tile set sequence, and a tile base track is additionally formed. The client parses tile base track to implicitly reconstruct a bitstream from the tile tracks.
  • the reconstructed bitstream can be decoded with a conforming HEVC decoder.
  • Clients can choose which version of each MCTS is received.
  • the same tile base track suffices for combining MCTSs from different bitstreams, since the same track identifiers are used in the respective tile tracks.
  • Fig. 14 presents an example of using tile tracks of the same resolution for tile-based omnidirectional video streaming.
  • a 4x2 tile grid has been used in forming of the motion- constrained tile sets.
  • Two HEVC bitstreams originating from the same source content are encoded at different picture qualities and bitrates.
  • Each bitstream is encapsulated in its own file wherein each motion-constrained tile set sequence is included in one tile track and a tile base track is also included.
  • the client chooses the quality at which each tile track is received based on the viewing orientation. In this example the client receives tile tracks 1, 2, 5, and 6 at a particular quality and tile tracks 3, 4, 7, and 8 at another quality.
  • the tile base track is used to order the received tile track data into a bitstream that can be decoded with an HEVC decoder.
  • a base layer may be coded conventionally.
  • region-of-interest (ROI) enhancement layers are encoded with SHVC Scalable Main profile. For example, several layers per each tile position can be coded, each for different bitrate or resolution.
  • the ROI enhancement layers may be spatial or quality scalability layers.
  • SHVC bitstreams can be encoded for significantly differing bitrates, since it can be assumed that bitrate adaptation can be handled to a great extent with enhancement layers only. This encoding approach is illustrated in Fig. 15.
  • the base layer is always received and decoded.
  • enhancement layers (EL) selected on the basis of the current viewing orientation are received and decoded.
  • Stream access points (SAPs) for the enhancement layers are inter-layer predicted from the base layer and are hence more compact than similar SAPs realized with intra-coded pictures. Since the base layer is consistently received and decoded, the SAP interval for the base layer can be longer than that for enhancement layers (EL).
  • the input picture sequence 1601 is encoded 1602 into two or more bitstreams, each representing the entire input picture sequence, i.e., the same input pictures are encoded in the bitstreams or a subset of the same input pictures, potentially with a reduced picture rate, are encoded in the bitstreams.
  • Certain input pictures are chosen to be encoded into two coded pictures in the same bitstream, the first referred to as a shared coded picture.
  • a shared coded picture is either intra coded or uses only other shared coded pictures (or the respective reconstructed pictures) as prediction references.
  • a shared coded picture in a first bitstream (of the encoded two or more bitstreams) is identical to the respective shared coded picture in a second bitstream (of the encoded two or more bitstreams), wherein "identical" may be defined to be identical coded representation, potentially excluding certain high-level syntax structures, such as SEI messages, and/or identical reconstructed picture. Any picture subsequent to a particular shared coded picture in decoding order is not predicted from any picture that precedes the particular shared coded picture and is not a shared coded picture.
  • a shared coded picture may be indicated to be a non-output picture. As a response to decoding a non-output picture indication, the decoder does not output the reconstructed shared coded picture.
  • 16 facilitates decoding a first bitstream up to a selected shared coded picture, exclusive, and decoding a second bitstream starting from the respective shared coded picture. No intra-coded picture is required to start the decoding of the second bitstream, and consequently compression efficiency may be improved.
  • the SHVC ROI approach significantly outperforms MCTS-based viewport-dependent delivery, and enabling inter- layer prediction provides a significant compression gain compared to using no inter- layer prediction.
  • the SHVC ROI approach has the disadvantage that inter- layer prediction is enabled only in codec extensions, such as the SHVC extension of HEVC. Such codec extensions might not be commonly supported in decoding, particularly when considering hardware decoder implementations.
  • Constrained Inter-Layer Prediction also known as shared coding picture
  • CILP enables the use of HEVC Main profile encoder and decoder, and hence has better compatibility with implementations than the SHVC ROI approach.
  • CILP takes advantage of relatively low intra picture frequency (similarly to the SHVC ROI approach).
  • SHVC ROI the physical layer of the SHVC ROI
  • CILP suffers from the use of MCTSs for the base-quality tiles.
  • the streaming rate-distortion performance of CILP is close to that of SHVC-ROI in relatively coarse tile grids (up to 6x3).
  • CILP has inferior streaming rate-distortion performance compared to SHVC-ROI when finer tile grids are used, presumably due to the use of MCTSs for the base quality.
  • SP-CILP aims at:
  • Encoding with a single-layer encoder such as HEVC Main profile encoder.
  • Decoding with a single-layer decoder such as HEVC Main profile decoder.
  • FIG. 17 An example of SP-CILP encoding is illustrated with Fig. 17.
  • a solid line 1710 indicates a picture boundary or such a tile boundary over which motion constraints identical or similar to those for MCTS apply.
  • a dashed line 1720 indicates a tile boundary where motion constraints need not be applied.
  • the picture area comprises two parts:
  • certain input pictures may be encoded as two coded pictures.
  • the tile area In the first coded picture of these two coded picture, the tile area may for example be blank (e.g. have a constant color).
  • the tile area In the second coded picture of these two coded pictures, the tile area may be predicted from the base-quality constituent picture of the first coded picture.
  • the constituent picture area of the second coded picture may be blank (e.g. constant color) or may be coded with reference to the first coded picture with zero motion and without prediction error (referred to as "skip coded" here).
  • any conventional inter prediction hierarchy may be used. Motion constraints are applied so that the constituent picture area forms a MCTS, and the tile area comprises one or more MCTSs.
  • bitstreams are encoded, each with different selection of enhanced quality tiles, but with the same base-quality constituent pictures. For example, when the 4x2 tile grid is used, and four tiles are selected to be coded at enhanced quality matching a viewing orientation, about 40 bitstreams may need to be coded for different selection of enhanced quality tiles.
  • the IRAP picture interval may be selected to be longer than the interval of coding an input picture as two coded pictures as described above.
  • Fig. 18 Coding an input picture as two coded pictures as described with reference to Fig. 18 forms a switching point that enables switching from one bitstream to another. Since the base-quality constituent picture is identical in the encoded bitstreams, the base-quality constituent picture at the switching point can be predicted from earlier picture(s).
  • Fig. 19 shows an example of switching from enhanced quality tiles 1,2, 5, 6 to 3, 4, 7, 8 at the first non- IRAP switching point.
  • MCTSs (comprising enhanced quality tiles) may be encapsulated into a file as sub-picture tracks (e.g. 'hvcT or 'hevT tracks for HEVC).
  • a sequence of the base-quality constituent pictures may be encapsulated into a file as a sub-picture track (e.g. 'hvcf or 'hevf track for HEVC).
  • One extractor track may be formed for each selection of enhanced quality tiles. The extractor track extracts the base-quality constituent pictures and the enhanced quality tiles from their respective tracks.
  • Fig. 20 illustrates a possible file arrangement and a respective arrangement of Representations for streaming.
  • Reference number 2010 stands for base-quality track/Representation
  • reference number 2020 stands for enhanced quality tile tracks/Representations, where there is one track/Representation for each pair of positions in the original picture and in the extractor track.
  • Reference number 2030 is for extractor tracks/Representations, where there is one track/Representation for each assignment of (a, b, c, d, ).
  • Motion compensation (and lntraBlockCopy) for a fractional motion vector uses an n-tap interpolation filter, so it needs more samples from each side (left/right/top/bottom) of the signal in the reference frame ln the case of MCTS, this n-tap increases the chance the data from outside of the current tile (or tile set) to be used in motion compensation process. So, to guarantee the MCTS coding, at the encoder side, these motion vectors should be avoided in motion estimation and merge candidate selection. As a result, either a suboptimal motion vector should be selected (which may increase residual values), or the block should be split into sub-blocks to have different motion vectors for different sub-blocks. Each of the above solution results in degrading the RD performance.
  • Another solution is to modify the motion compensation (and lntraBlockCopy) filter for a block that the calculation of the corresponding prediction block needs reference frame's samples outside of the current tile.
  • the MC filter is modified in a way that no sample value is used from other tiles to calculate the prediction samples ln a more detailed manner, according to the motion vector of the current block and the block location, the number of the samples going outside of the tile is calculated.
  • some of the filter coefficients may be changed to zero, and their original values may be added to other coefficients of that filter. This modification may be done for horizontal and vertical direction separately, but similarly.
  • the filter modification may be done in block level (as mentioned above) or may be done in row/column/sample level to achieve higher RD performance, but with a bit more complexity.
  • conditional motion compensation filter modification which can be summarized into following: - a prediction block (as the output of motion compensation or IntraBlockCopy process) is calculated;
  • contaminated sample is a term that may be defined as a sample in a prediction block being affected by tile(s) other than the current tile (for which the prediction block is intended). Embodiments could likewise be described by using another phrase with essentially the same definition.
  • the specific values can be e.g. a fixed value, an average of non-contaminated sample in the same row or column, or values from the non-contaminated samples of the prediction block.
  • the eliminated values can be, for example, copied from the neighboring row/column of the prediction block similar to padding technique. Any padding technique which exist for reference frame padding may be used as well.
  • the prediction blocks may have different contaminated samples.
  • the bi-prediction block samples may be generated by combining the samples of prediction blocks.
  • FIG. 21 A motion compensation filtering near tile boundary is illustrated in Fig. 21 for 1D case.
  • the various embodiments of the present solution are targeted to a situation of Fig. 21.
  • Fig. 21 the squares (indicated by n) are samples of the reference frame, and circles (indicated by p ) are the prediction samples for a block with associated motion vector.
  • n the squares
  • p the prediction samples for a block with associated motion vector.
  • Fig. 21 shows the case where the motion vector is fractional. So, each prediction sample is calculated using a motion compensation filter (8-tap filter in Fig. 21 with filter coefficients form Co to C7).
  • the goal is to avoid using any reference sample from other tiles.
  • r -2 and r_i should not have effect in the generation of the prediction sample po.
  • the various embodiments are targeted to detect contaminated samples, and replace their values with values that are independent of other tiles.
  • the contaminated samples are the ones that are affected by the reference sample values from other tiles. These samples can be detected using the location of tile boundaries, the location of the block, motion vector of the block and filter taps of the motion compensation.
  • the calculation of a sample po needs two reference samples (i.e. r -2 and r_i) from the other tile (i.e. TileO).
  • the calculation of a sample pi needs one reference sample (i.e. r_i) from the other tile (i.e. TileO).
  • two of the samples i.e. po and pi
  • the other samples i.e. p 2 , p3, p 4 , etc.
  • the values of contaminated samples are discarded and replaced with a value which may be calculated based on non-contaminated samples.
  • any of padding techniques can be used to calculate a value of the contaminated samples using non-contaminated samples.
  • the value of the contaminated samples in a block can be replaced in different ways. In the following, few examples are given, which relates to uni-prediction, bi-prediction, affine prediction, weighted prediction and overlapped block motion compensation (OBMC).
  • OBMC overlapped block motion compensation
  • the value of the contaminated samples can be replaced with
  • a fixed value e.g. a zero, half of the dynamic range, or a value signaled using a high- level syntax for example in slice header;
  • a row-wise average e.g. an average of the non-contaminated samples in a row or in a column
  • block-wise average e.g. an average of all the non-contaminated samples in the block
  • row-wise extrapolation where the non-contaminated samples in each row (column) may be modeled with a function based on their location in the row, and then the values of the contaminated samples can be calculated using the function and based on their location in that row.
  • block-wise extrapolation where the non-contaminated samples in the block may be modeled with a function based on their location in the block, and then the values of the contaminated samples can be calculated using the function and based on their location in that block.
  • the alternative method can be finding the optimum direction that the texture of the non-contaminated samples is changing, and predict the contaminated samples based on that direction;
  • each of the uni-prediction block from each reference list L0, Ll may have different area of contaminated samples. Therefore, they should be merged in an appropriate way to generate the merged bi-prediction block. This merging should generate a block with better continuity in sample values, and less noticeable edges in the region boundaries.
  • the general approach may be to average the sample values when both co-located samples are non-contaminated samples in the uni-prediction blocks or to take the sample from one of them when only one of the co-located samples in the uni-prediction blocks is non- contaminated.
  • Example 1 Bi-prediction ln Fig. 22, the contaminated samples 2201 are located on one side (e.g. left) of the uni prediction blocks 2202, but they have different size (e.g. width).
  • the samples in the right side of the block 2203 are calculated using averaging corresponding samples in P0 and PL
  • the samples in the middle regions come only from one uni-prediction block 2204 (i.e. P0 in this example), and the samples 2205 in the left region (i. e. Pad from P0) may be calculated using one of the methods presented in the previous section for uni-prediction case (e.g. padding).
  • Blending the weight of averaging at the corresponding region (right region in this case) may be smoothly changed near the border, for example, according to Fig. 23 when at the left side of the border, the weight of P0 is 1.0, which means that the samples are calculated only based on P0, but at the right side of the border, weight of P0 is decreased to 0.5 and weight of PI is increased to 0.5. This transition of the weights may happen during few samples (e.g. 2-4 samples)
  • the contaminated samples may be on two different sides of the uni-prediction blocks (e.g. it is on the left side for P0, and on the right side for PI) ln this case, the samples of the merged bi-prediction block are generated as presented below:
  • the middle size is generated as average of P0 and PI
  • samples at the regions’ borders may be filtered or blinded as described above
  • the contaminated samples in the uni-prediction block may happen at two borders of the prediction blocks as presented in Fig. 25. ln this example, the merged bi-prediction block may be generated as presented below:
  • the common region 2503 is generated as the average of P0 and PI;
  • two regions 2502 connected to the common regions may be copied only from one of the uni-prediction regions ( P0 or PI);
  • the regions 2504 which are connected only to the abovementioned region may be generated by padding
  • the regions that are connected to two neighboring regions may be generated using the sample values which exist on their two borders.
  • o M0 may be generated using P0 and PI ;
  • o Ml may be generated using M0 and "Pad from P0”;
  • o M2 may be generated using M0 and "Pad from PI ";
  • o M3 may be generated using Ml and M2.
  • Mi block may be generated using the sample values of two neighboring blocks, as shown in Fig. 26.
  • MR0 and MP1 are the blocks (with the size of M) which are generated based on the sample values of P0 and PI block using one of the methods presented in for uni-prediction example above, respectively. Then sample values of M may be generated as presented below:
  • weighted average based on the location of the sample in M block For example, weight for P0 (i.e., wPO) and weight for PI (i.e., wPl) maybe determined as shown below for a sample with the location of (x, y) inside M:
  • o wPO 0.5+(y-x)/A
  • wPl (x-y)/A
  • the uni-prediction block may be as shown in example of Fig. 27.
  • the contaminated samples do not have a rectangular shape.
  • the contaminated samples can be detected based on the affine motion parameters.
  • motion vector of each sample may be calculated and based on the it is determined that the samples is contaminated or non-contaminated.
  • the contaminated samples may be replaced using the methods presented for uni-prediction and bi-prediction examples above.
  • weight can be modified according to the contaminated samples. For example, for the contaminated sample, weights can be set to zero and offset can be set to half of the dynamic range.
  • weight and offset parameters may be modified as presented below:
  • Weight, Offset may be set to (1.0, 0) for the block that the corresponding samples are non-contaminated, and be set to (0.0, 0) for the block that the corresponding samples are contaminated; Weight can be set to 0.0 for both prediction block when corresponding samples in both block are contaminated. Offset may be set to half of the dynamic range.
  • OBMC Overlapped block motion compensation
  • weights of a constituent prediction block are combined together with different weights.
  • the weights are sample-wise and are typically not identical for different constituent prediction blocks. Weights may differ for different sample positions in the constituent prediction block. These weight values may be modified similar to what has been described above for the weighted prediction example. Weights of a constituent prediction block may be represented by a weighting matrix.
  • the OBMC process is modified in a manner that the values of the weighting matrices are adaptively inferred in the encoder and in the decoder.
  • a weighting matrix entry corresponds to a contaminated sample, it is set equal to 0, and the collocated entry or entries in other weighting matrices are increased proportionally to the weight value of the zeroed matrix entry.
  • JEM weights ⁇ 1/4, 1/8, 1/16, 1/32 ⁇ are used for samples of a 4x4 prediction block (A) derived from a neighboring motion vector, and the weighting factors (3/4, 7/8, 15/16, 31/32 ⁇ are used for samples of a prediction block (B) derived from the motion vector of the current sub-PU (which may be defined as a 4x4 block that is currently predicted and is within the current block being coded).
  • weights ⁇ 0, 1/8, 1/16, 1/32 ⁇ may be used for prediction block A, and weights (1, 7/8, 15/16, 32/32 ⁇ may be used for prediction block B.
  • any of the above-listed methods may be used for deriving a prediction sample value in this example.
  • a neighboring integer-position motion vector may be inferred when a motion vector component of the motion vector of the current sub-PU is fractional.
  • This neighboring integer-position motion vector may be used only for deriving a constituent prediction block in the described OBMC process. For example, the closest integer motion vector (relative to the motion vector of the current sub-PU) may be selected or an adjacent integer motion vector (relative to the motion vector of the current sub-PU) resulting into fewest contaminated samples may be selected.
  • the weight values may be inferred as explained above. Otherwise, the values for the weighting matrix of the only prediction block may be selected to be equal to 0, when the corresponding sample is contaminated and 1 otherwise.
  • the above-described embodiment may decrease the number of sample positions in the final prediction block that are created from zero non-contaminated samples.
  • a sample of a prediction block may be determined to be contaminated if it is affected by a slice of a reference picture that is not collocated with the slice containing the current block
  • an encoder indicates region(s), such as MCTS(s), on the boundaries of which the embodiments are applied in or along the bitstream, such as in a sequence parameter set.
  • a decoder decodes region(s), such as MCTS(s), on the boundaries of which the embodiments are applied from or along the bitstream, such as from a sequence parameter set.
  • Fig. 28 is a flowchart illustrating a method according to an embodiment.
  • a method comprises obtaining 2810 at least one reference frame being partitioned into at least one region; determining 2820 at least one motion vector for the at least one reference frame for at least one sample in a block in a current frame, the at least one motion vector having a motion vector component for each direction of the block; determining 2830 at least one prediction block having prediction samples with values based on the at least one reference frame and the at least one motion vector; determining 2840 a corresponding region in the at least one reference frame corresponding to the at least one sample in the block; detecting 2850 contaminated samples in the least one prediction block, wherein the contaminated samples have been at least partly generated based on the at least one reference frame samples originating from other regions than the corresponding region; detecting 2860 non-contaminated samples in the at least one prediction block, wherein the non-contaminated samples have been generated only based on the at least one reference frame samples originating from the corresponding region; and generating 2870 a modified prediction
  • An apparatus comprises means for obtaining at least one reference frame being partitioned into at least one region; means for determining at least one motion vector for the at least one reference frame for at least one sample in a block in a current frame, the at least one motion vector having a motion vector component for each direction of the block; means for determining at least one prediction block having prediction samples with values based on the at least one reference frame and the at least one motion vector; means for determining a corresponding region in the at least one reference frame corresponding to the at least one sample in the block; means for detecting contaminated samples in the least one prediction block, wherein the contaminated samples have been at least partly generated based on the at least one reference frame samples originating from other regions than the corresponding region; means for detecting non-contaminated samples in the at least one prediction block, wherein the non-contaminated samples have been generated only based on the at least one reference frame samples originating from the corresponding region; and means for generating a modified prediction block, wherein each sample of the modified prediction block is generated using the non-contaminated samples
  • the means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry.
  • the memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Fig. 28 according to various embodiments.
  • An apparatus according to an embodiment is illustrated in Figure 29.
  • An apparatus of this embodiment is a camera having multiple lenses and imaging sensors, but also other types of cameras may be used to capture wide view images and/or wide view video.
  • wide view image and wide view video mean an image and a video, respectively, which comprise visual information having a relatively large viewing angle, larger than 100 degrees.
  • a so called 360 panorama image/video as well as images/videos captured by using a fish eye lens may also be called as a wide view image/video in this specification.
  • the wide view image/video may mean an image/video in which some kind of projection distortion may occur when a direction of view changes between successive images or frames of the video so that a transform may be needed to find out co-located samples from a reference image or a reference frame. This will be described in more detail later in this specification.
  • the camera 2700 of Figure 29 comprises two or more camera units 2701 and is capable of capturing wide view images and/or wide view video.
  • Each camera unit 2701 is located at a different location in the multi-camera system and may have a different orientation with respect to other camera units 2501.
  • the camera units 2701 may have an omnidirectional constellation so that it has a 360-viewing angle in a 3D-space. ln other words, such camera 2700 may be able to see each direction of a scene so that each spot of the scene around the camera 2700 can be viewed by at least one camera unit 2701.
  • the camera 2700 of Figure 29 may also comprise a processor 2704 for controlling the operations of the camera 2700. There may also be a memory 2706 for storing data and computer code to be executed by the processor 2704, and a transceiver 2708 for communicating with, for example, a communication network and/or other devices in a wireless and/or wired manner.
  • the camera 2700 may further comprise a user interface (Ul) 2710 for displaying information to the user, for generating audible signals and/or for receiving user input.
  • Ul user interface
  • the camera 2700 need not comprise each feature mentioned above or may comprise other features as well. For example, there may be electric and/or mechanical elements for adjusting and/or controlling optics of the camera units 2701 (not shown).
  • Figure 29 also illustrates some operational elements which may be implemented, for example, as a computer code in the software of the processor, in a hardware, or both.
  • a focus control element 2714 may perform operations related to adjustment of the optical system of camera unit or units to obtain focus meeting target specifications or some other predetermined criteria.
  • An optics adjustment element 2716 may perform movements of the optical system or one or more parts of it according to instructions provided by the focus control element 2714. ft should be noted here that the actual adjustment of the optical system need not be performed by the apparatus, but it may be performed manually, wherein the focus control element 2714 may provide information for the user interface 2710 to indicate a user of the device how to adjust the optical system.
  • the various embodiments may provide advantages. For example, no algorithm is needed to be implemented at the encoder side to restrict motion vectors to achieve MCTS bitstream ln addition, the method according to various embodiments improves the rate- distortion performance compared to using MCTS. In addition, the encoding and decoding time may be reduced, because there is no need to split the block at the tile border to smaller sub-blocks. Changes are small and quite local.
  • the various embodiments need an easy block-wise post processing of prediction block. This change is applied to certain number of blocks which are in tile border, so there is no significant extra computational complexity. In addition, there is no need to add low level (CU level) syntax element, so no change is needed in parsing.
  • a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment.
  • a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.
  • the computer program code comprises one or more operational characteristics.
  • Said operational characteristics are being defined through configuration by said computer based on the type of said processor, wherein a system is connectable to said processor by a bus, wherein a programmable operational characteristic of the system comprises obtaining at least one reference frame being partitioned into at least one region; determining at least one motion vector for the at least one reference frame for at least one sample in a block in a current frame, the at least one motion vector having a motion vector component for each direction of the block; determining at least one prediction block having prediction samples with values based on the at least one reference frame and the at least one motion vector; determining a corresponding region in the at least one reference frame corresponding to the at least one sample in the block; detecting contaminated samples in the least one prediction block, wherein the contaminated samples have been at least partly generated based on the at least one reference frame samples originating from other regions than the corresponding region; detecting non-contaminated samples in the at least one prediction block, wherein the non- contaminated samples have been generated only based on the at least one reference frame samples originating from

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Les modes de réalisation concernent un procédé consistant à : obtenir au moins une trame de référence partitionnée en au moins une zone ; déterminer au moins un vecteur de mouvement pour une trame de référence, le vecteur de mouvement ayant une composante de vecteur de mouvement pour chaque direction du bloc. Un bloc de prédiction ayant des échantillons de prédiction avec des valeurs est déterminé d'après une trame de référence et le vecteur de mouvement. Une zone correspondante est déterminée dans la ou les trames de référence. Des échantillons contaminés sont détectés dans le bloc de prédiction, les échantillons contaminés étant générés d'après les échantillons de trame de référence provenant d'autres zones que la zone correspondante. Des échantillons non contaminés sont détectés dans le bloc de prédiction, les échantillons non contaminés étant générés uniquement d'après les échantillons de trame de référence provenant de la zone correspondante. Un bloc de prédiction fusionné est généré, chaque échantillon étant généré à l'aide des échantillons non contaminés du bloc de prédiction.
PCT/FI2019/050493 2018-07-05 2019-06-25 Procédé, appareil et produit-programme informatique pour codage et décodage vidéo WO2020008107A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862694125P 2018-07-05 2018-07-05
US62/694,125 2018-07-05

Publications (1)

Publication Number Publication Date
WO2020008107A1 true WO2020008107A1 (fr) 2020-01-09

Family

ID=69059426

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2019/050493 WO2020008107A1 (fr) 2018-07-05 2019-06-25 Procédé, appareil et produit-programme informatique pour codage et décodage vidéo

Country Status (1)

Country Link
WO (1) WO2020008107A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11778171B2 (en) 2019-01-02 2023-10-03 Nokia Technologies Oy Apparatus, a method and a computer program for video coding and decoding

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150245059A1 (en) * 2014-02-21 2015-08-27 Panasonic Corporation Image decoding method, image encoding method, image decoding apparatus, and image encoding apparatus
US9374578B1 (en) * 2013-05-23 2016-06-21 Google Inc. Video coding using combined inter and intra predictors
WO2017094298A1 (fr) * 2015-12-04 2017-06-08 ソニー株式会社 Appareil de traitement d'image, procédé de traitement d'image et programme

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9374578B1 (en) * 2013-05-23 2016-06-21 Google Inc. Video coding using combined inter and intra predictors
US20150245059A1 (en) * 2014-02-21 2015-08-27 Panasonic Corporation Image decoding method, image encoding method, image decoding apparatus, and image encoding apparatus
WO2017094298A1 (fr) * 2015-12-04 2017-06-08 ソニー株式会社 Appareil de traitement d'image, procédé de traitement d'image et programme

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11778171B2 (en) 2019-01-02 2023-10-03 Nokia Technologies Oy Apparatus, a method and a computer program for video coding and decoding

Similar Documents

Publication Publication Date Title
US20210297697A1 (en) Method, an apparatus and a computer program product for coding a 360-degree panoramic video
US10979727B2 (en) Apparatus, a method and a computer program for video coding and decoding
US20190268599A1 (en) An apparatus, a method and a computer program for video coding and decoding
WO2017158236A2 (fr) Procédé, appareil et produit programme d'ordinateur permettant de coder des images panoramiques à 360 degrés et une vidéo
US20190349598A1 (en) An Apparatus, a Method and a Computer Program for Video Coding and Decoding
US9838688B2 (en) Method and apparatus of adaptive intra prediction for inter-layer and inter-view coding
US20240098261A1 (en) Cross component filtering-based image coding apparatus and method
CN114745547A (zh) 编码装置及解码装置
US11936875B2 (en) In-loop filtering-based image coding apparatus and method
KR20220049486A (ko) 필터링 기반 영상 코딩 장치 및 방법
KR20220050088A (ko) 크로스-컴포넌트 적응적 루프 필터링 기반 영상 코딩 장치 및 방법
JP7375113B2 (ja) 符号化装置及び復号装置
KR20240005981A (ko) 부호화 장치, 복호 장치, 부호화 방법 및 복호 방법
US11595697B2 (en) Adaptive loop filtering-based image coding apparatus and method
WO2019158812A1 (fr) Procédé et appareil de compensation de mouvement
JP7401309B2 (ja) 符号化装置、復号装置、符号化方法及び復号方法
WO2020008107A1 (fr) Procédé, appareil et produit-programme informatique pour codage et décodage vidéo
KR20180058649A (ko) 스케일러블 비디오를 처리하기 위한 방법 및 장치
CN113875257A (zh) 编码装置、解码装置、编码方法和解码方法
CN112136326A (zh) 编码装置、解码装置、编码方法和解码方法
KR102648652B1 (ko) 고해상도 영상의 처리 방법 및 그 장치
Zare Analysis and Comparison of Modern Video Compression Standards for Random-access Light-field Compression
WO2020234512A2 (fr) Procédé, appareil et produit-programme informatique pour codage et décodage vidéo
WO2023047014A1 (fr) Appareil, procédé et programme informatique pour le codage et le décodage vidéo
CN118118669A (zh) 基于自适应环路滤波的图像编译装置和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19831035

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19831035

Country of ref document: EP

Kind code of ref document: A1