WO2020178144A1 - Method and apparatus for encoding and decoding a video bitstream for merging regions of interest - Google Patents

Method and apparatus for encoding and decoding a video bitstream for merging regions of interest Download PDF

Info

Publication number
WO2020178144A1
WO2020178144A1 PCT/EP2020/055184 EP2020055184W WO2020178144A1 WO 2020178144 A1 WO2020178144 A1 WO 2020178144A1 EP 2020055184 W EP2020055184 W EP 2020055184W WO 2020178144 A1 WO2020178144 A1 WO 2020178144A1
Authority
WO
WIPO (PCT)
Prior art keywords
tile
quantisation parameter
offset
group
subportion
Prior art date
Application number
PCT/EP2020/055184
Other languages
French (fr)
Inventor
Eric Nassor
Frédéric Maze
Naël OUEDRAOGO
Gérald Kergourlay
Original Assignee
Canon Kabushiki Kaisha
Canon Europe Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Kabushiki Kaisha, Canon Europe Limited filed Critical Canon Kabushiki Kaisha
Publication of WO2020178144A1 publication Critical patent/WO2020178144A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/88Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving rearrangement of data among different coding units, e.g. shuffling, interleaving, scrambling or permutation of pixel data or permutation of transform coefficient data among different blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/174Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/188Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a video data packet, e.g. a network abstraction layer [NAL] unit

Definitions

  • the present disclosure concerns a method and a device for encoding and decoding a video bitstream that facilitates the merge of regions of interest. It concerns more particularly the encoding and decoding of a video bitstream resulting of the merging of regions coming from different video bitstreams. In addition, it is proposed a corresponding method of generating such bitstream resulting from the merge of different regions coming from different video bitstreams.
  • Figures 1a and 1 b illustrate two different application examples for the combination of regions of interest.
  • Figure 1a illustrates an example where a frame (or picture) 100 from a first video bitstream and a frame 101 from a second video bitstream are merged into a frame 102 of the resulting bitstream.
  • Each frame is composed of four regions of interest numbered from 1 to 4.
  • the frame 100 has been encoded using encoding parameters resulting in a high quality encoding.
  • the frame 101 has been encoded using encoding parameters resulting in a low quality encoding.
  • the frame encoded with a low quality is associated with a lower bitrate than the frame encoded with a high quality.
  • the resulting frame 102 combines the regions of interest 1 , 2 and 4 from the frame 101 , thus encoded with a low quality, with the region of interest 3 from frame 100 encoded with a high quality.
  • the goal of such combination is generally to get a region of interest, here the region 3, in high quality, while keeping the resulting bitrate reasonable by having regions 1 , 2 and 4 encoded in low quality.
  • Such kind of scenario may happen in particular in the context of omnidirectional content allowing a higher quality for the content actually visible while the remaining parts have a lower quality.
  • Figure 1 b illustrates a second example where four different videos A, B, C and D are merged to form a resulting video.
  • a frame 103 of video A is composed of regions of interest A1 , A2, A3, and A4.
  • a frame 104 of video B is composed of regions of interest
  • a frame 105 of video C is composed of regions of interest C1 , C2, C3, and C4.
  • a frame 106 of video D is composed of regions of interest D1 , D2, D3, and D4.
  • the frame 107 of the resulting video is composed by regions B4, A3, C3, and D1.
  • the resulting video is a mosaic video of different regions of interest of each original video stream. The regions of interest of the original video streams are rearranged and combined in a new location of the resulting video stream.
  • a video is composed of a sequence of frames or pictures or images or samples which may be displayed at several different times.
  • a frame may be also composed of different image components. For instance, for encoding the luminance, the chrominances or depth information.
  • FIG. 2 illustrates some partitioning in encoding systems.
  • the frames 201 and 202 are divided in coded tree units (CTU) illustrated by the dotted lines.
  • CTU coded tree units
  • a CTU is the elementary unit of encoding and decoding.
  • the CTU can encode an area of 128 by 128 pixels.
  • a Coding Tree Unit could also be named block, macro block , coding unit. It can encode simultaneously the different image components or it can be limited to only one image component.
  • the frame can be partitioned according to a grid of tiles, illustrated by the thin solid lines.
  • the tiles are frame portions, thus rectangular regions of pixels that may be defined independently of the CTU partitioning.
  • the boundaries of tiles and the boundaries of the CTU may be different.
  • a tile may also correspond to a sequence of CTUs, as in the represented example, meaning that the boundaries of tiles and CTUs coincide.
  • Tiles definition provides that tile boundaries break the spatial encoding dependencies. This means that encoding of a CTU in a tile is not based on pixel data from another tile in the frame.
  • Some encoding systems like for example VVC, provide the notion of tile groups.
  • This mechanism allows the partitioning of the frame into one or several groups of tiles.
  • Each group of tiles is composed by one or several tiles.
  • Two different kinds of tile groups are provided as illustrated by frames 201 and 202.
  • a first kind of tile group is restricted to tile groups forming a rectangular area in the frame.
  • Frame 201 illustrates the portioning of a frame into five different rectangular tile groups.
  • a second kind of tile group is restricted to successive tiles in raster scan order.
  • Frame 202 illustrates the partitioning of a frame into three different tile groups composed of successive tiles in raster scan order. Rectangular tile groups is a structure of choice for dealing with regions of interest in a video.
  • a tile group can be encoded in the bitstream as one or several NAL units.
  • a NAL unit standing for a Network Abstraction Layer unit, is a logical unit of data for the encapsulation of data in the encoded bitstream.
  • a tile group is encoded as a single NAL unit.
  • a sub-picture is a portion of a picture that represents a spatial subset of the original video content, which has been split into spatial subsets before video encoding at the content production side.
  • a sub picture is for example one or more Tile Groups.
  • Figure 2b illustrates an example of partitioning of a picture in sub pictures.
  • a sub picture represents a picture portion that covers a rectangular region of a picture.
  • Each sub picture may have different sizes and coding parameters. For instance, different tile grids and tile groups or slice partitioning may be defined for each sub picture.
  • the pictures are divided into frame portions corresponding to subpictures, the frame portion are divided into subportions corresponding to slices or tile groups.
  • the picture 204 is subdivided in 24 sub pictures including the sub pictures 205 and 206. These two sub pictures further describe a tile grid and a partitioning in tile group similar to the picture 201 and 202 of figure 2.
  • a picture is first decomposed in tiles and tile groups or slices. Then the subpictures are defined as sets of tile groups or slices with the constraints that each subpicture covers a rectangular area of a picture and the subpictures create a partition of the picture.
  • a picture could be partionned into several regions that may be independently coded as layers (e.g a VVC or HEVC layers). We may refer to such layer as“sub picture layer” or“region layer”. Each sub picture layer could be independently coded. When combined, the pictures of the sub picture layers may form a new picture of greater size equal to the size of the combination of the sub picture layers.
  • a picture may be spatially divided into sub pictures, each sub picture defining a grid of tiles and being spatially divided into tile groups.
  • a picture may be divided into layers, each layer defining a grid of tiles and being spatially divided into tile groups. Tiles and tile groups may be defined at the picture level, at the sub picture level, or at the layer level. The invention will apply to all these configurations.
  • Figure 3 illustrates the organisation of the bitstream in the exemplary coding system VVC.
  • a bitstream 300 according to the VVC coding system is composed of an ordered sequence of syntax elements and coded data.
  • the syntax element and coded data are placed into NAL unit 301-305.
  • NAL unit 301-305 There are different NAL unit types.
  • the network abstraction layer provides the ability to encapsulate the bitstream into different protocols, like RTP/IP, standing for Real Time Protocol / Internet Protocol, ISO Base Media File Format, etc.
  • the network abstraction layer also provides a framework for packet loss resilience.
  • NAL units are divided into VCL NAL units and non-VCL NAL units, VCL standing for Video Coding Layer.
  • the VCL NAL units contain the actual encoded video data.
  • the non-VCL NAL units contain additional information. This additional information may be parameters needed for the decoding of the encoded video data or supplemental data that may enhance usability of the decoded video data.
  • NAL units 305 correspond to tile groups and constitute the VCL NAL units of the bitstream. Different NAL units 301-304 correspond to different parameter sets, these NAL units are non-VCL NAL units.
  • the VPS NAL unit 301 VPS standing for Video Parameter Set, contains parameters defined for the whole video, and thus the whole bitstream. The naming of VPS may change and for instance becomes DPS in VVC.
  • the SPS NAL unit 302, SPS standing for Sequence Parameter Set, contains parameters defined for a video sequence.
  • the PPS NAL unit 303 PPS standing for Picture Parameter Set, contains parameters defined for a picture or a group of pictures.
  • the APS NAL unit 304 APS standing for Adaptive Loop Filter (ALF) Parameter Set, contains parameters for the ALF that are defined at the tile group level.
  • the bitstream may also contain SEI, standing for Supplemental Enhancement Information, NAL units. The periodicity of occurrence of these parameter sets in the bitstream is variable.
  • a VPS that is defined for the whole bitstream needs to occur only once in the bitstream.
  • an APS that is defined for a tile group may occur once for each tile group in each picture. Actually, different tile groups may rely on the same APS and thus there are generally fewer APS than tile groups in each picture.
  • the VCL NAL units 305 contain each a tile group.
  • a tile group may correspond to the whole picture, a single tile or a plurality of tiles.
  • a tile group is composed of a tile group header 310 and a raw byte sequence payload, RBSP, 311 that contains the tiles.
  • the tile group index is the index of the tile group in the frame in raster scan order.
  • the number in a round represents the tile group index for each tile group.
  • Tile group 203 has a tile group index of 0.
  • the tile group identifier is a value, meaning an integer or any bit sequence, which is associated to a tile group.
  • the PPS contains the association for each tile group between the tile group index and the tile group identifier for one or several pictures.
  • the tile group 203 with tile group index 0 can have a tile group identifier of‘345’.
  • the tile group address is a syntax element present in the header of the tile group NAL unit.
  • the tile group address may refer to the tile group index, to the tile group identifier or even to the tile index. In the latter case, it will be the index of the first tile in the tile group.
  • the semantic of the tile group address is defined by several flags present in one of the Parameters Set NAL units. In the example of tile group 203 in Figure 2, the tile group address may be the tile group index 0, the tile group identifier 345 or the tile index 0.
  • the tile group index, identifier and address are used to define the partitioning of the frame into tile groups.
  • the tile group index is related with the location of the tile group in the frame.
  • the decoder parses the tile group address in the tile group NAL unit header and uses it to locate the tile group in the frame and determine the location of the first sample in the NAL unit.
  • the decoder uses the association indicated by the PPS to retrieve the tile group index associated with the tile group identifier and thus determine the location of the tile group and of the first sample in the NAL unit.
  • the descriptor column gives the encoding of a syntax element
  • u(1) means that the syntax element is encoded using one bit
  • ue(v) means that the syntax element is encoded using unsigned integer 0-th order Exp-Golomb-coded syntax element with the left bit first that is a variable length encoding.
  • the syntax elements num_tile_columns_minus1 and num_tile_rows_minus1 respectively indicate the number of tile columns and rows in the frame.
  • the syntax element tile_column_width_minus1 [] and tile_row_height_minus1[] specify the widths and heights of each column and rows of the tile grid.
  • the tile group partitioning is expressed with the following syntax elements:
  • the syntax element single_tile_in_pic_flag states whether the frame contains a single tile. In other words, there is only one tile and one tile group in the frame when this flag is true.
  • single_tile_per_tile_group_flag states whether each tile group contains a single tile. In other words, all the tiles of the frame belong to a different tile group when this flag is true.
  • the syntax element rect_tile_group_flag indicates that tile groups of the frames form a rectangular shape as represented in the frame 201.
  • the syntax element num_tile_groups_in_pic_minus1 is equal to the number of rectangular tile groups in the frame minus one.
  • top_left_tile_idx[] and bottom_right_tile_idx[] are arrays that respectively specify the first tile (top left) tile and the last (bottom right) tile in a rectangular tile group. Theses arrays are indexed by tile group index.
  • the tile group identifiers are specified when the signalled_tile_group_id_flag is equal to 1.
  • the signalled_tile_group_id_length_minus1 syntax element indicates the number of bits used to code each tile group identifier value.
  • the tile_group_id[] association table is indexed by tile group index and contains the identifier of the tile group. When the signalled_tile_group_id_flag equal to 0 the tile_group_id is indexed by tile group index and contains the tile group index of the tile group.
  • the tile group header comprises the tile group address according to the following syntax in the current VVC version:
  • the tile group header indicates the number of tiles in the tile group NAL unit with help of num_tiles_in_tile_group_minus1 syntax element.
  • Each tile 320 may comprise a tile segment header 330 and a tile segment data 331.
  • the tile segment data 331 comprises the encoded coding units 340.
  • the tile segment header is not present and tile segment data contains the coding unit data 340.
  • Figure 4 illustrates the process of generating a video bitstream composed of different regions of interest from one or several original bitstreams.
  • a step 400 the regions to be extracted from the original bitstreams are selected.
  • the regions may correspond for instance to a specific region of interest or a specific viewing direction in an omnidirectional content.
  • the tile groups comprising encoded samples present in the selected set of regions are selected in the original bitstreams.
  • the identifier of each tile group in the original bitstreams is determined. For example, the identifiers of the tile groups 1 , 2, and 4 of frame 101 and of the tile group 3 of frame 100 in Figure 1 are determined.
  • a new arrangement for the selected tile groups in the resulting video is determined. This consists in determining the size and location of each selected tile group in the resulting video. For instance, the new arrangement conforms to a predetermined ROI composition. Alternatively, a user defines a new arrangement.
  • a step 402 the tile partitioning of the resulting video needs to be determined.
  • the same tile partitioning is kept for the resulting video.
  • the number of rows and columns of the tile grid with the width and height of the tiles is determined and, advantageously stored in memory.
  • the location of a tile group in the video may change regarding its location in the original video.
  • the new locations of the tile groups are determined.
  • the tile group partitioning of the resulting video is determined.
  • the location of the tile groups are determined in reference with the new tile grid as determined in step 402.
  • new parameters sets are generated for the resulting bitstream.
  • new PPS NAL units are generated.
  • These new PPS contains syntax elements to encode the tile grid partitioning, the tile group partitioning and positioning and the association of the tile group identifier and the tile group index.
  • the tile group identifier is extracted from each tile group and associated with the tile group index depending of the new decoding location of the tile group. It is reminded that each tile group, in the exemplary embodiment, is identified by an identifier in the tile group header and that each tile group identifier is associated with an index corresponding to the tile group index of the tile group in the picture in raster scan order. This association is stored in a PPS NAL unit.
  • the VCL NAL unit namely the tile groups
  • the VCL NAL units are extracted from the original bitstreams to be inserted in the resulting bitstream. It may happen that these VCL NAL units need to be amended. In particular, some parameters in the tile group header may not be compatible with the resulting bitstream and need to be amended. It would be advantageous to avoid this amending step, as decoding, amending and recoding the tile group header is resource consuming.
  • the quantisation parameter is an important parameter when encoding a coding unit as it determines the compression bitrate and thus the quality of the encoding.
  • Encoding a coding unit using a high quantisation parameter leads to a high compression ratio, thus a low bitrate and a low quality of the compressed image.
  • Using a low quantisation parameter leads to a low compression ratio, thus a high bitrate and a high quality of the compressed image.
  • variable quantisation parameter that changes from coding unit to coding unit and between frames in order to adapt to the complexity of the content of the coding unit and to the structure of the compressed video when using successive temporally predicted frames.
  • a variable quantisation parameter By using a variable quantisation parameter, it is possible to obtain a uniform perceived quality of the decompressed image independently of the content of the different coding units.
  • the global quality targeted for a video sequence is determined by a default initial value of the quantisation parameter that is fixed for the sequence. This default initial quantisation parameter value is modified at different levels by applying some modifying offsets to this default initial value.
  • a quantisation parameter delta is defined at the level of a tile group and stored in the tile group header.
  • the encoder uses a quantisation parameter that corresponds to the addition of the default initial quantisation value corrected by the addition of the quantisation parameter delta defined for the tile group. Then the encoding process will modify the quantisation parameter for each coding unit of the tile group based on this first value of the quantisation parameter.
  • each original bitstream defining its own default initial quantisation parameter
  • the quantisation parameter deltas encoded in each tile group needs to be adapted to the default initial quantisation parameter value defined in the resulting bitstream. This implies that the tile group headers need to be decoded, amended and re-encoded in order to fix this quantisation parameter delta issue.
  • the present invention has been devised to address one or more of the foregoing concerns. It concerns an encoding and decoding method for a bitstream that allow solving the quantisation parameter delta issue when merging tile groups from different bitstreams without amending the tile group encoded data.
  • a method of encoding video data comprising frames into a bitstream of logical units, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising:
  • the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to groups of frame portions.
  • the quantisation parameter information is encoded for each group of frame portions in the second logical unit as an associated default initial quantisation parameter.
  • a quantization parameter offset is associated to each group in a subset of the groups of frame portions
  • the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the group and a quantization parameter offset.
  • the second logical unit is a PPS.
  • a method for decoding a bitstream of logical units of video data comprising frames, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: - parsing a first logical unit comprising a group of frame portions to determine a quantisation parameter delta associated with the group of frame portions;
  • the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to groups of frame portions.
  • the quantisation parameter information is encoded for each group of frame portions in the second logical unit as an associated default initial quantisation parameter.
  • a quantization parameter offset is associated to each group in a subset of the groups of frame portions
  • the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the group and a quantization parameter offset.
  • the second logical unit is a PPS.
  • bitstreams being composed of logical units comprising frames, frames being divided into tiles, tiles being grouped into groups of frame portions, the method comprising:
  • the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to groups of frame portions.
  • the quantisation parameter information is encoded for each group of frame portions in the logical unit as an associated default initial quantisation parameter.
  • a quantisation parameter offset is associated to each group in a subset of the groups of frame portions
  • the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the group and a quantization parameter offset.
  • the logical unit comprising the quantisation parameter information is a PPS.
  • a computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to the invention, when loaded into and executed by the programmable apparatus.
  • a computer- readable storage medium storing instructions of a computer program for implementing a method according to the invention.
  • the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, microcode, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module” or "system”.
  • the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
  • a tangible, non-transitory carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid-state memory device and the like.
  • a transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.
  • Figure 1a and 1b illustrate two different application examples for the combination of regions of interest
  • Figure 2a and 2b illustrates some partitioning in encoding systems
  • Figure 3 illustrates the organisation of the bitstream in the exemplary coding system VVC
  • Figure 4 illustrates the process of generating a video bitstream composed of different regions of interest from one or several original bitstreams
  • Figure 5 illustrates the quantisation parameter issue when merging different tile groups from different bitstreams
  • Figure 6 illustrates the embodiment of the invention where a quantisation parameter offset is associated with a tile group in a non-VCL NAL unit of the bitstream
  • Figure 7 illustrates the main steps of an encoding process according to an embodiment of the invention
  • Figure 8 illustrates the main steps of a decoding process according to an embodiment of the invention
  • Figure 9 illustrates the extraction and merge operation of two bitstreams stored in a file to form a resulting bitstream stored in a resulting file in an embodiment of the invention
  • Figure 10 illustrates the main step of the extraction and merge process at file format level in an embodiment of the invention
  • Figure 11 is a schematic block diagram of a computing device for implementation of one or more embodiments of the invention.
  • Figure 5 illustrates the quantisation parameter issue when merging different tile groups from different bitstreams.
  • the quantisation parameter is an important parameter for determining the quality of each encoded coded unit. A higher QP creates more losses during encoding and decrease the quality of the decoded image.
  • the QP range is 0 to 51
  • the range is 0 to 63.
  • the QP value used in a block encoding can be in the range -QpBdOffsetY to +63, where QpBdOffsetY is a value depending of the bit depth of the luma component of the video.
  • the PPS defines a default initial QP value used as reference by all the tile groups referencing the PPS.
  • the value is named init_qp_minus26, and it defines the initial QP value decremented by 26.
  • This value is encoded with a variable length (se(v) means signed integer 0-th order Exp-Golomb-coded). 26 is estimated as the default value for QP, thus it will have a 1-bit length when encoded.
  • the syntax of the PPS for signalling the default initial quantisation parameter is typically as follows:
  • tile_group_qp_delta which can be used to modify the default initial quantisation parameter value for all the CTU inside the tile group. Typically, this value changes in each picture. This value is encoded in variable length and thus a value closer to 0 will use a lower number of bits.
  • the quantisation parameter used for encoding or decoding the first coding unit inside the tile group is the sum of 26 with init_qp_minus26 and tile_group_qp_delta.
  • TileGroupQpY 26 + init_qp_minus26 + tile_group_qp_delta;
  • tile_group_qp_delta 20.
  • the first operation consisting in defining the default initial quantisation parameter value is not too complex because the non-VCL NAL unit PPS is relatively small and not frequent. Moreover it is encoded with mostly fixed length values.
  • the second operation consisting in modifying all the tile group headers from one or several videos is very complex and time consuming because it requires reading all the video bitstreams (J-00 and J-01), decoding all the tile group headers which contains many variable length fields and rewriting the complete bitstream.
  • the value tile_group_qp_delta for at least one of the tile groups will have a value far different from 0 and thus will use a large number of bits in its variable length encoding.
  • an optional quantisation parameter offset is associated to each tile group and stored in a non-VCL NAL unit of the bitstream. Accordingly, the tile group structure of each tile group may be kept unchanged. The actual quantisation parameter used for a given tile group will be calculated based on the default initial quantisation parameter of the bitstream, modified with the quantisation parameter offset associated to the tile group and summed with the quantisation parameter delta signalled in the tile group.
  • Figure 6 illustrates the embodiment of the invention where a quantisation parameter offset is associated with a tile group in a non-VCL NAL unit of the bitstream.
  • the quantisation parameter offset is signalled in the PPS NAL unit, for example using the following syntax:
  • the flag signalled_qp_offset_flag indicates that a quantisation parameter offset is provided associated with each tile group.
  • the association table tile_group_qp_offset stores an offset encoded in signed variable length coding associated with each tile group.
  • the decoding quantisation parameter of the first coding unit inside the tile group is the sum of 26 with init_qp_minus26, tile_group_qp_delta and tile_group_qp_offset:
  • TileGroupQpY 26 + init_qp_minus26 + tile_group_qp_offset + tile_group_qp_delta;
  • semantics associated with the syntax element of the PPS are the following:
  • signalled_qp_offset_flag 1 specifies the presence of tile_group_qp_offset[ i ] in the PPS.
  • signalled_qp_offset_flag 0 specifies the absence of the tile_group_qp_offset[ i ].
  • tile_group_qp_offset [ i ] when present, specifies the offset value that applies to the initial value of TileGroupQpy for the i-th tile group.
  • the value of tile_group_qp_offset[ i] shall be in the range of -QpBdOffsety to+63, inclusive .
  • QpBdOffsety is an offset computed in function of the bit depth of the luma samples. For instance QpBdOffsety is equal to 6 times the bit depth of the luma samples.
  • the value of tile_group_qp_offset[ i] shall be in the range of -rangeQp to rangeQp where rangeQp is an integer value.
  • tile_group_qp_offset[ i ] is inferred to be equal 0 for each value of i in range of 0 to num_tile_groups_in_pic_minus1 , inclusive.
  • the syntax elements of the tile group header remain identical but the semantics of tile_group_qp_delta are for example the following:
  • tile_group_qp_delta specifies the initial value of QpY to be used for the coding blocks in the tile group until modified by the value of CuQpDeltaVal in the coding unit layer.
  • the initial value of the QpY quantization parameter for the tile group, TileGroupQpY is derived as follows:
  • TileGroupQpY 26 + init_qp_minus26 + tile_group_qp_offset [ tileGroupQpldx ] + tile_group_qp_delta
  • TileGroupQpY shall be in the range of -QpBdOffsetY to +63, inclusive.
  • variable tileGroupQpldx which specifies the index of the tile group is derived as follows:
  • tileGroupQpldx 0
  • tile_group_address ! tile_group_id[ tileGroupQpldx] )
  • a first bitstream 600 comprises a tile group 3.
  • the bitstream 600 contains a PPS 610 comprising a default initial quantisation parameter with the value‘20’ associated with a high quality encoding.
  • the tile group 3 comprises a quantisation parameter delta with a value 0. It means that the first coding unit of tile group 3 is encoded with a quantisation parameter of 20, and that this value of quantisation parameters must be used when decoding this coding unit.
  • a second bitstream 601 comprises a tile group 4.
  • the bitstream 601 contains a PPS 611 comprising a default initial quantisation parameter with the value‘40’ associated with a low quality encoding.
  • the tile group 4 comprises a quantisation parameter delta with a value 0. It means that the first coding unit of tile group 4 is encoded with a quantisation parameter of 40, and that this value of quantisation parameters must be used when decoding this coding unit.
  • bitstream 602 After merging, the bitstream 602 comprises both the tile group 3 from bitstream 600 and the tile group 4 from bitstream 601. Both tile groups still comprise a quantisation parameter delta with the value 0.
  • Bitstream 602 comprises a PPS 620 with a default initial quantisation parameter value of 20 identical to the one in bitstream 600.
  • the PPS 620 also comprises an association table associating each tile group with a quantisation parameter offset.
  • the quantisation parameter offset associated with tile group 4 has a value‘20’.
  • the quantisation parameter used for the first coding unit of tile group 4 has the right value‘40’ corresponding to the sum of the default initial quantisation parameter‘20’, the quantisation parameter offset‘20’ associated with tile group 4, and the quantisation parameter delta ⁇ ’ in tile group 4.
  • the PPS does not contain any more a global default initial quantisation parameter, but instead an association table associating to each tile group a default initial quantisation parameter value.
  • These default initial quantisation parameter values may be encoded minus 26 as was done for the global default initial quantisation parameter according, for example, to the following syntax:
  • the decoding quantisation parameter of the first coding unit inside the tile group is the sum of 26 with the tile_group_init_qp_minus26 associated with the tile group of index ⁇ , and the tile_group_qp_delta of the tile group.
  • TileGroupQpY 26 + tile_group_ init_qp_minus26[i] + tile_group_qp_delta;
  • semantics associated with the syntax element of the PPS are the following:
  • tile_group_init_qp_minus26 [ i ] plus 26 specifies the initial value of TileGroupQpy for the i-th tile group.
  • the initial value of TileGroupQpy is modified at the tile group layer when a non-zero value of tile_group_qp_delta is decoded.
  • the value of init_qp_minus26 shall be in the range of -( 26 + QpBdOffsety ) to +37, inclusive.
  • QpBdOffsety is an offset computed in function of the bit depth of the luma samples. For instance QpBdOffsety is equal to 6 times the bit depth of the luma samples.
  • the value of tile_group_ init_qp_minus26 [ i] shall be in the range of -rangeQp to rangeQp where rangeQp is an integer value.
  • tile_group_qp_delta The syntax elements of the tile group header remain identical but the semantics of tile_group_qp_delta are for example the following:
  • tile_group_qp_delta specifies the initial value of QpY to be used for the coding blocks in the tile group until modified by the value of CuQpDeltaVal in the coding unit layer.
  • the initial value of the QpY quantization parameter for the tile group, TileGroupQpY is derived as follows:
  • TileGroupQpY 26 + tile_group_init_qp_minus26[tileGroupQpldx] + tile_group_qp_delta
  • TileGroupQpY shall be in the range of -QpBdOffsetY to +63, inclusive.
  • variable tileGroupQpldx which specifies the index of the tile group is derived as follows:
  • tileGroupQpldx 0
  • tile_group_address ! tile_group_id[ tileGroupQpldx] )
  • tileGroupQpldx ++ Another advantage of the new syntax in a different usage is now explained.
  • the encoder may know that a particular region in an image of the video is more important for the viewer than the other parts of the images: this is a region of interest.
  • a region of interest may be at a fixed position in the image.
  • the installer of the camera can position the regions of interest to visualize an interesting part of the scene for example the doors to enter a room.
  • the center part of the image is the region of interest. It is interesting to encode the video with a higher quality in the regions of interest compared to the other parts of the image.
  • the tile groups at the spatial position of the regions of interest will thus have a different (lower) quantisation parameter than the other tile groups.
  • the encoder can use the list of tile_group_qp_offset in the PPS to set a different offset value to the tile groups of the regions of interest.
  • the tile_group_qp_delta value in the tile group headers will be closer to value 0 and thus will be encoded with a lower number of bits because it is encoded in variable length.
  • the PPS is written only once for a large number of frames while the tile groups are written many time in each frames so it is useful to decrease the size the fields in the tile group header to obtain a better compression ratio of the video.
  • the tile_group_qp_offset is specified only for a subset of the tile groups of the frame using the tile group identifier. It advantageously reduces the signalling overhead when the number of non-null offset is low.
  • the syntax of the PPS is the following:
  • semantics associated with the syntax element of the PPS are the following:
  • num_signaled_qp_offset_minus1 plus 1 specifies the number of tile_group_qp_address [ i ] and tile_group_qp_offset [ i ] specified in the PPS; num_signaled_qp_offset_minus1 shall be in the range of 0 to num_tile_group_in_pic_minus1 , inclusive.
  • tile_group_qp_address [ i ] when present specifies the tile group address of each tile group that has a QP offset signalled in the PPS.
  • tile_group_qp_address[i] specifies the tile group index of each tile group that has a QP signalled in the APS.
  • the range of tile_group_qp_address[i] shall be in range of 0 to num_tile_group_in_pic_minus1 , inclusive.
  • tile_group_qp_offset [ i ] specifies the offset value that applies to the initial value of TileGroupQpv for the tile group with tile_group_address equal to tile_group_qp_address[ i ] .
  • the value of tile_group_qp_offset[ i] shall be in the range of -QpBdOffsety to+63, inclusive .
  • QpBdOffsety is an offset computed in function of the bit depth of the luma samples. For instance, QpBdOffsety is equal to 6 times the bit depth of the luma samples.
  • the value of tile_group_qp_offset[ i] shall be in the range of -rangeQp to rangeQp where rangeQp is an integer value.
  • variable qpOffsetTGIdx which specifies the index of the tile group is derived as follows:
  • tile_group_qp_delta The syntax elements of the tile group header remain identical but the semantics of tile_group_qp_delta are for example the following:
  • tile_group_qp_delta specifies the initial value of QpY to be used for the coding blocks in the tile group until modified by the value of CuQpDeltaVal in the coding unit layer.
  • the initial value of the QpY quantization parameter for the tile group, TileGroupQpY is derived as follows:
  • TileGroupQpY 26 + init_qp_minus26 + qpOffsetTGIdx[tileGroupQpldx] + tile_group_qp_delta;
  • TileGroupQpY shall be in the range of -QpBdOffsetY to +63, inclusive.
  • variable tileGroupQpldx which specifies the index of the tile group is derived as follows:
  • tileGroupQpldx 0
  • tile_group_address ! tile_group_id[ tileGroupQpldx] )
  • Figure 7 illustrates the main steps of an encoding process according to an embodiment of the invention.
  • the described encoding process concerns the encoding according to an embodiment of the invention of a single bitstream.
  • the obtained encoded bitstream may be used in a merging operation as described above as an original bitstream or as the resulting bitstream.
  • a tile partitioning of the frames is determined.
  • the encoder defines the number of columns and rows so that each region of interest of the video is covered by at least one tile.
  • the encoder is encoding an omnidirectional video where each tile corresponds to a predetermined field of view in the video.
  • the tile partitioning of the frame according to a tile grid is typically represented in a parameter set NAL unit, for example a PPS according to the syntax presented in reference to Figure 3.
  • a set of tile groups are defined, each tile group comprising one or more tiles.
  • a tile group is defined for each tile of the frame.
  • a tile group identifier is defined for each tile group in the bitstream. The tile group identifiers are determined in order to be unique for the tile group. The unicity of the tile group identifiers may be defined at the level of a set of bitstreams comprising the bitstream currently encoded.
  • the number of bits used to encode the tile group identifier is determined as a function of the number of tile groups in the encoded bitstream or as a function of a number of tile groups in a set of different bitstreams comprising the bitstream being currently encoded.
  • the length of the tile group identifier and the association of each tile group with an identifier is specifically specified in parameter set NAL unit as the PPS.
  • each tile group is associated to a quantisation parameter information.
  • This quantisation parameter information is then encoded into a parameter set.
  • the parameter set is a PPS.
  • the quantisation parameter information may be encoded as a quantisation parameter offset based on a default initial quantisation parameter and taking into account the quantisation parameter delta encoded into the tile group header.
  • the quantisation parameter information is encoded in the parameter set as a dedicated default initial quantisation parameter associated to the tile group.
  • the parameter set can take the form of the PPS described in reference to Figure 6.
  • a step 703 the samples of each tile group are encoded according to the parameters defined in the different parameter sets.
  • the encoding will be based on the quantisation parameter information associated with the tile group.
  • a complete bitstream is generated comprising both the non-VCL NAL units corresponding to the different parameter sets and the VCL NAL units corresponding to the encoded data of the different tile groups.
  • Figure 8 illustrates the main steps of a decoding process according to an embodiment of the invention.
  • the decoder parses the bitstream in order to determine the tile portioning of the frames. This information is obtained from a parameter set, typically from the PPS NAL unit. The syntax elements of the PPS are parsed and decoded to determine the grid of tiles.
  • the decoder determines the tile group partitioning of the frame and in particular obtain the number of tile group associated with an identification information of each tile group. This information is valid for at least one frame, but stay valid generally for many frames. It may take the form of the tile group identifier that may be obtained from a parameter set as the PPS NAL unit as described in Figure 6.
  • the decoder parses the bitstream to determine the quantisation parameter information that is associated with each tile group. This is typically done by extracting a quantisation parameter delta from the tile group header and by combining this information with a quantisation parameter information associated with the tile group in a parameter set, typically in a PPS.
  • the quantisation parameter information associated to the tile group may be a quantisation parameter offset based on a default initial quantisation parameter.
  • the quantisation parameter information associated to the tile group is a dedicated default initial quantisation parameter.
  • the decoder decodes the VCL NAL units corresponding to the tile groups according to the parameters determined in the previous steps. In particular, the decoding uses the quantisation parameter associated with the tile group.
  • Figure 9 illustrates the extraction and merge operation of two bitstreams stored in a file to form a resulting bitstream stored in a resulting file in an embodiment of the invention.
  • Figure 10 illustrates the main step of the extraction and merge process at file format level in an embodiment of the invention.
  • Figure 9 illustrates the merge of two ISO BMFF streams 900 and 901 resulting in a new ISO BMFF bitstream 902 according to the method of Figure 10.
  • the encapsulation of the VVC streams consists in this embodiment in defining one tile track for each tile group of the stream and one tile base track for the NAL units common to the tile groups.
  • the stream 900 contains two tile groups one with the identifier ⁇ .1’ and another one with identifier ⁇ .2’.
  • the samples corresponding to each tile group ⁇ .1’ and ⁇ .2’ are described respectively in one tile track similarly to tile tracks of in ISO/IEC 14496-15.
  • the VVC tile groups could be encapsulated in tile tracks. This VVC tile track could be differentiated from HEVC tile track by defining a new sample entry for instance‘vvtT instead of ‘hvtT.
  • the merging method consists in determining in step 1000 the set of tile tracks from the two streams to be merged in a single bitstream. For instance, it corresponds to the tile tracks of the tile group with the identifier‘2.1’ of 901 file and of the tile group with identifier ⁇ .2’ of the file 900.
  • the advantage of this method is that combining two streams consists mainly in generating a new tile base track, update the track reference boxes and copying as is the tile tracks samples corresponding to the selected tile groups.
  • the processing is simplified since rewriting process of the tile tracks samples is avoided compared to prior art.
  • sub-pictures are divided into slices instead of tile groups.
  • Slices comprises the notion of tile groups with the addition that slices may also correspond to a sub part of a tile, namely a number of lines of CTU within a tile. Everything that has been described relatively to tile groups and tile group headers is relevant regarding slices and slice headers.
  • the quantization parameter offset may be defined at the sub-picture level to apply to all slices of the sub-picture.
  • some quantization parameter offsets may be defined for the chrominance Cr and Cb components independently of the quantization parameter offset defined for the luminance component.
  • a possible syntax of the PPS for defining the quantization parameter offsets associated with a sub-picture may be:
  • pps_subpic_qp_offset_present_flag 1 specifies the presence of pps_subpic_qp_offset[ i ], pps_subpic_cb_qp_offset[ i ], pps_subpic_cr_qp_offset[ i ] and pps_subpicJoint_cbcr_qp_offset_value[ i ] in the PPS.
  • pps_subpic_qp_offset_present_flag 0 specifies the absence of the pps_subpic_qp_offset[ i ], pps_subpic_cb_qp_offset[ i ], pps_subpic_cr_qp_offset[ i ] and pps_subpicJoint_cbcr_qp_offset_value[ i ] .
  • pps_subpic_qp_offset[ i ] when present, specifies the offset value that applies to the initial value of SliceQpy for the i-th subpicture.
  • the value of 26 + init_qp_minus26 + pps_subpic_qp_offset[ i ] shall be in the range of -QpBdOffsetY to +63, inclusive.
  • the value of pps_subpic_qp_offset[ i ] is inferred to be equal 0 for each value of i in range of 0 to pps_num_subpic_minus1 , inclusive.
  • pps_subpic_cb_qp_offset[ i ] and pps_subpic_cr_qp_offset[ i ] specify differences to be added to the values of pps_cb_qp_offset and pps_cr_qp_offset when determining the value of the quantization parameters Qp'c b and Qp'cr respectively for the i-th subpicture.
  • the values of pps_subpic_cb_qp_offset[ i ] and pps_subpic_cr_qp_offset[ i ] shall be in the range of -12 to +12, inclusive.
  • pps_subpic_cb_qp_offset[ i ] and pps_subpic_cr_qp_offset[ i ] are inferred to be equal to 0.
  • the values of pps_cb_qp_offset + pps_subpic_cb_qp_offset[ i ] and pps_cr_qp_offset + pps_subpic_cr_qp_offset[ i ] shall be in the range of -12 to +12, inclusive.
  • pps_subpic_joint_cbcr_qp_offset_value[ i ] specifies the difference to be added to the value of ppsJoint_cbcr_qp_offset when determining the value of the quantization parameter Qp'c b cr for the i-th subpicture.
  • the value of pps_subpicJoint_cbcr_qp_offset_value[ i ] shall be in the range of -12 to +12, inclusive.
  • the value of pps_subpicJoint_cbcr_qp_offset_value[ i ] is inferred to be equal to 0.
  • ppsJoint_cbcr_qp_offset_value + pps_subpicJoint_cbcr_qp_offset_value[ i ] shall be in the range of -12 to +12, inclusive.
  • slice_qp_delta The syntax elements of the slice header remain identical but the semantics of slice_qp_delta, slice_cb_qp_offset, slice_cr_qp_offset and sliceJoint_cbcr_qp_offset are the following:
  • slice_qp_delta specifies the initial value of Qpy to be used for the coding blocks in the slice until modified by the value of CuQpDeltaVal in the coding unit layer.
  • the initial value of the Qpy quantization parameter for the slice, SliceQpy is derived as follows:
  • SliceQpy 26 + init_qp_minus26 + pps_subpic_qp_offset[ SubPicldx ] + slice_qp_delta;
  • slice_cb_qp_offset specifies a difference to be added to the value of pps_cb_qp_offset when determining the value of the Qp'c b quantization parameter.
  • the value of slice_cb_qp_offset shall be in the range of -12 to +12, inclusive.
  • pps_cb_qp_offset + pps_subpic_cb_qp_offset[ SubPicldx ] + slice_cb_qp_offset shall be in the range of -12 to +12, inclusive.
  • slice_cr_qp_offset specifies a difference to be added to the value of pps_cr_qp_offset when determining the value of the Qp'cr quantization parameter.
  • the value of slice_cr_qp_offset shall be in the range of -12 to +12, inclusive. When slice_cr_qp_offset is not present, it is inferred to be equal to 0.
  • the value of pps_cr_qp_offset + pps_subpic_cr_qp_offset[ SubPicldx ] + slice_cr_qp_offset shall be in the range of -12 to +12, inclusive.
  • sliceJoint_cbcr_qp_offset specifies a difference to be added to the value of ppsJoint_cbcr_qp_offset_value when determining the value of the Qp'c b cr.
  • the value of sliceJoint_cbcr_qp_offset shall be in the range of -12 to +12, inclusive. When sliceJoint_cbcr_qp_offset is not present, it is inferred to be equal to 0.
  • ppsJoint_cbcr_qp_offset_value + pps_subpicJoint_cbcr_qp_offset_value[ SubPicldx ] + sliceJoint_cbcr_qp_offset shall be in the range of -12 to +12, inclusive.
  • similar syntax may be proposed in the SPS (Sequence
  • quantization parameter offsets at the sub-picture level it is possible to define at once a quantization parameter offset that applies to all the slices of a given sub-picture.
  • the quantization parameter offsets defined for index i are applied to all slices of all sub-pictures of index i in all pictures using the SPS or the PPS respectively.
  • the quantization parameter offsets defined for index i are applied to all slices of the subpicture of index i using the picture header.
  • an ID may be given to each subpicture in the PPS using the pps_subpic_id.
  • pps_subpic_id[ i ] specifies the subpicture ID of the i-th subpicture. In this case, only the subpicture ID is specified in the slice header and not the subpicture index.
  • the order of the subpictures may be changed in each Picture Header using a second table ph_subpic_id.
  • ph_subpic_id[ i ] specifies the subpicture ID of the i-th subpicture.
  • the decoder In order to find the index SubPicldx to find the value to use in the offset tables defined in the PPS (pps_subpic_qp_offset, pps_subpic_cb_qp_offset, pps_subpic_cr_qp_offset and pps_subpicJoint_cbcr_qp_offset_value) the decoder must use the value SubPicldx such that pps_subpic_id[ SubPicldx] is equal to the ID defined in the slice header and not the values of the table ph_subpic_id which indicates the position in the picture where the subpicture is decoded.
  • a list of quantization parameter offsets may be defined for the chrominance components.
  • Each Coding Unit CU can use a different value in the list by indicating the index in the list, to adjust its chrominance QP value.
  • the list of chroma quantization parameter offsets may take for example the following syntax in the PPS: pic_parameter_set_rbsp( ) ⁇ Descriptor
  • pps_num_qp_offset_lists_minus1 plus 1 specifies the number of chroma_qp_offset_list_len_minus1[ i ] syntax elements that are present in the PPS RBSP syntax structure. In other words it defines the number of chroma qp offset tables defined in the pps.
  • the value of pps_num_qp_offset_lists_minus1 shall be in the range of 0 to 5, inclusive.
  • chroma_qp_offset_list_len_minus1 [ i ] plus 1 specifies the number of cb_qp_offset_list[ i ], cr_qp_offset_list[ i ] and joint_cbcr_qp_offset_list[ i ] syntax elements that are present in the PPS RBSP syntax structure. In other words it defines the number of offset values in the ith cb, cr and joint_cbcr quantization parameter offset tables.
  • the value of the sum of the chroma_qp_offset_list_len_minus1 [i] for i in the range 0 to pps_num_qp_offset_lists_minus1 shall be in the range of 0 to 5, inclusive. In other words the number of offset values defined in all the chroma offset tables should be limited to limit the complexity of the decoder hardware.
  • cb_qp_offset_list[ i ][ j ], cr_qp_offset_list[ i ][ j ] and joint_cbcr_qp_offset_list[ i ][ j ] specify offsets used in the derivation of Qp'c b , Qp'cr, and Qp'c b cr, respectively.
  • the values of cb_qp_offset_list[ i ][ j ], cr_qp_offset_list[ i ][ j ], and joint_cbcr_qp_offset_list[ i ][ j ] shall be in the range of -12 to +12, inclusive.
  • cb_qp_offset_list[ i ][ j ][ cr_qp_offset_list[ i ][ j ] and joint_cbcr_qp_offset_list[ i ] are inferred to be equal to 0.
  • subpic_chroma_qp_offset_list_index[ i ] specifies the index of the chroma qp offset tables used by the ith subpicture.
  • subpic_chroma_qp_offset_list_index[ i ] shall be in the range 0 to pps_num_qp_offset_lists_minus1 inclusive.
  • the syntax elements of the transform unit containing a CU remain identical but the semantics of cu_chroma_qp_offset_idx is the following:
  • cu_chroma_qp_offset_idx when present, specifies the index into the cb_qp_offset_list[ ][ ], cr_qp_offset_list[ ][ ], and joint_cbcr_qp_offset_list[ ][ ] that is used to determine the value of CuQpOffsetcb, CuQpOffsetcr, and CuQpOffsetcbcr.
  • the value of cu_chroma_qp_offset_idx shall be in the range of 0 to chroma_qp_offset_list_len_minus1 [subpic_chroma_qp_offset_list_index[ SubPicldx ] ] inclusive.
  • the value of cu_chroma_qp_offset_idx is inferred to be equal to 0.
  • CuQpOffsetcr, and CuQpOffsetcbcr are all set equal to 0.
  • Similar syntax may be proposed in the SPS (Sequence Parameter Set) or in the Picture Header to define the chroma quantization parameter offset tables associated with sub-pictures.
  • the syntax of the chroma quantization parameter offset tables may be simplified to directly define a chroma quantization parameter offset table for each subpicture.
  • This syntax has the advantage to avoid the indirection through the subpic_chroma_qp_offset_list_index table to obtain the tables associated to each subpicture: each chroma quantization parameter offset table is indexed directly by the subpicture index. But it has the disterio to require to define more offset values and thus it may increase the complexity of the decoder.
  • Figure 11 is a schematic block diagram of a computing device 1100 for implementation of one or more embodiments of the invention.
  • the computing device 1100 may be a device such as a microcomputer, a workstation or a light portable device.
  • the computing device 1100 comprises a communication bus connected to:
  • central processing unit 1101 such as a microprocessor, denoted CPU;
  • RAM random access memory 1102
  • the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method according to embodiments of the invention, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port, for example;
  • ROM read only memory
  • a network interface 1104 is typically connected to a communication network over which digital data to be processed are transmitted or received.
  • the network interface 1104 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data packets are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 1101 ;
  • a user interface 1105 may be used for receiving inputs from a user or to display information to a user;
  • HD hard disk 1106 denoted HD may be provided as a mass storage device
  • an I/O module 1107 may be used for receiving/sending data from/to external devices such as a video source or display.
  • the executable code may be stored either in read only memory 1103, on the hard disk 1106 or on a removable digital medium such as for example a disk.
  • the executable code of the programs can be received by means of a communication network, via the network interface 1104, in order to be stored in one of the storage means of the communication device 1100, such as the hard disk 1106, before being executed.
  • the central processing unit 1101 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 1101 is capable of executing instructions from main RAM memory 1102 relating to a software application after those instructions have been loaded from the program ROM 1103 or the hard disk (HD) 1106 for example. Such a software application, when executed by the CPU 1101 , causes the steps of the flowcharts of the invention to be performed.
  • Any step of the algorithms of the invention may be implemented in software by execution of a set of instructions or program by a programmable computing machine, such as a PC (“Personal Computer”), a DSP (“Digital Signal Processor”) or a microcontroller; or else implemented in hardware by a machine or a dedicated component, such as an FPGA (“Field-Programmable Gate Array”) or an ASIC (“Application-Specific Integrated Circuit”).
  • a programmable computing machine such as a PC (“Personal Computer”), a DSP (“Digital Signal Processor”) or a microcontroller
  • a machine or a dedicated component such as an FPGA (“Field-Programmable Gate Array”) or an ASIC (“Application-Specific Integrated Circuit”).
  • a Tile Group could also be a slice, a tile set, a motion constrained tile set (MCTS), a region of interest or a sub picture.
  • MCTS motion constrained tile set
  • the information coded in the Picture Parameter Set PPS could also be encoded in other non VCL units like a Video Parameter Set VPS, Sequence Parameter Set SPS or the DPS or new units like Layer Parameter Set, or Tile Group Parameter Set.
  • These units define parameters valid for several frames and thus there are at a higher hierarchical level than the tile group units or the APS units in the video bitstream.
  • the tile group units are valid only inside one frame.
  • the APS units can be valid for some frames but their usage changes rapidly from one frame to another.
  • the word“comprising” does not exclude other elements or steps
  • the indefinite article“a” or“an” does not exclude a plurality.
  • the mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Abstract

The present invention concerns an encoding and decoding method for a bitstream that allow solving the quantisation parameter delta issue when merging tile groups from different bitstreams without amending the tile group NAL unit by introducing a quantisation parameter information associated with each tile group in a parameter set.

Description

METHOD AND APPARATUS FOR ENCODING AND DECODING A VIDEO BITSTREAM FOR MERGING REGIONS OF INTEREST
FIELD OF THE INVENTION
The present disclosure concerns a method and a device for encoding and decoding a video bitstream that facilitates the merge of regions of interest. It concerns more particularly the encoding and decoding of a video bitstream resulting of the merging of regions coming from different video bitstreams. In addition, it is proposed a corresponding method of generating such bitstream resulting from the merge of different regions coming from different video bitstreams.
BACKGROUND OF INVENTION
Figures 1a and 1 b illustrate two different application examples for the combination of regions of interest.
For instance, Figure 1a illustrates an example where a frame (or picture) 100 from a first video bitstream and a frame 101 from a second video bitstream are merged into a frame 102 of the resulting bitstream. Each frame is composed of four regions of interest numbered from 1 to 4. The frame 100 has been encoded using encoding parameters resulting in a high quality encoding. The frame 101 has been encoded using encoding parameters resulting in a low quality encoding. As well known, the frame encoded with a low quality is associated with a lower bitrate than the frame encoded with a high quality. The resulting frame 102 combines the regions of interest 1 , 2 and 4 from the frame 101 , thus encoded with a low quality, with the region of interest 3 from frame 100 encoded with a high quality. The goal of such combination is generally to get a region of interest, here the region 3, in high quality, while keeping the resulting bitrate reasonable by having regions 1 , 2 and 4 encoded in low quality. Such kind of scenario may happen in particular in the context of omnidirectional content allowing a higher quality for the content actually visible while the remaining parts have a lower quality.
Figure 1 b illustrates a second example where four different videos A, B, C and D are merged to form a resulting video. A frame 103 of video A is composed of regions of interest A1 , A2, A3, and A4. A frame 104 of video B is composed of regions of interest
B1 , B2, B3, and B4. A frame 105 of video C is composed of regions of interest C1 , C2, C3, and C4. A frame 106 of video D is composed of regions of interest D1 , D2, D3, and D4. The frame 107 of the resulting video is composed by regions B4, A3, C3, and D1. In this example, the resulting video is a mosaic video of different regions of interest of each original video stream. The regions of interest of the original video streams are rearranged and combined in a new location of the resulting video stream.
The compression of video relies on block-based video coding in most coding systems like HEVC, standing for High Efficiency Video Coding, or the emerging VVC, standing for Versatile Video Coding, standard. In these encoding systems, a video is composed of a sequence of frames or pictures or images or samples which may be displayed at several different times. In the case of multi layered video (for example scalable, stereo, 3D videos), several frames may be decoded to compose the resulting image to display at one instant. A frame can be also composed of different image components. For instance, for encoding the luminance, the chrominances or depth information.
The compression of a video sequence relies on several partitioning techniques for each frame. Figure 2 illustrates some partitioning in encoding systems. The frames 201 and 202 are divided in coded tree units (CTU) illustrated by the dotted lines. A CTU is the elementary unit of encoding and decoding. For example, the CTU can encode an area of 128 by 128 pixels.
A Coding Tree Unit (CTU) could also be named block, macro block , coding unit. It can encode simultaneously the different image components or it can be limited to only one image component.
As illustrated by Figure 2a, the frame can be partitioned according to a grid of tiles, illustrated by the thin solid lines. The tiles are frame portions, thus rectangular regions of pixels that may be defined independently of the CTU partitioning. The boundaries of tiles and the boundaries of the CTU may be different. A tile may also correspond to a sequence of CTUs, as in the represented example, meaning that the boundaries of tiles and CTUs coincide.
Tiles definition provides that tile boundaries break the spatial encoding dependencies. This means that encoding of a CTU in a tile is not based on pixel data from another tile in the frame.
Some encoding systems, like for example VVC, provide the notion of tile groups.
This mechanism allows the partitioning of the frame into one or several groups of tiles. Each group of tiles is composed by one or several tiles. Two different kinds of tile groups are provided as illustrated by frames 201 and 202. A first kind of tile group is restricted to tile groups forming a rectangular area in the frame. Frame 201 illustrates the portioning of a frame into five different rectangular tile groups. A second kind of tile group is restricted to successive tiles in raster scan order. Frame 202 illustrates the partitioning of a frame into three different tile groups composed of successive tiles in raster scan order. Rectangular tile groups is a structure of choice for dealing with regions of interest in a video. A tile group can be encoded in the bitstream as one or several NAL units. A NAL unit, standing for a Network Abstraction Layer unit, is a logical unit of data for the encapsulation of data in the encoded bitstream. In the example of VVC encoding system, a tile group is encoded as a single NAL unit.
In OMAF v2 ISO/IEC 23090-2, a sub-picture is a portion of a picture that represents a spatial subset of the original video content, which has been split into spatial subsets before video encoding at the content production side. A sub picture is for example one or more Tile Groups.
Figure 2b illustrates an example of partitioning of a picture in sub pictures. A sub picture represents a picture portion that covers a rectangular region of a picture. Each sub picture may have different sizes and coding parameters. For instance, different tile grids and tile groups or slice partitioning may be defined for each sub picture. The pictures are divided into frame portions corresponding to subpictures, the frame portion are divided into subportions corresponding to slices or tile groups. In figure 2b, the picture 204 is subdivided in 24 sub pictures including the sub pictures 205 and 206. These two sub pictures further describe a tile grid and a partitioning in tile group similar to the picture 201 and 202 of figure 2.
In a variant a picture is first decomposed in tiles and tile groups or slices. Then the subpictures are defined as sets of tile groups or slices with the constraints that each subpicture covers a rectangular area of a picture and the subpictures create a partition of the picture.
In a variant, rather than considering sub pictures, a picture could be partionned into several regions that may be independently coded as layers (e.g a VVC or HEVC layers). We may refer to such layer as“sub picture layer” or“region layer”. Each sub picture layer could be independently coded. When combined, the pictures of the sub picture layers may form a new picture of greater size equal to the size of the combination of the sub picture layers. In other word, on one hand, a picture may be spatially divided into sub pictures, each sub picture defining a grid of tiles and being spatially divided into tile groups. Moreover, on another hand, a picture may be divided into layers, each layer defining a grid of tiles and being spatially divided into tile groups. Tiles and tile groups may be defined at the picture level, at the sub picture level, or at the layer level. The invention will apply to all these configurations. Figure 3 illustrates the organisation of the bitstream in the exemplary coding system VVC.
A bitstream 300 according to the VVC coding system is composed of an ordered sequence of syntax elements and coded data. The syntax element and coded data are placed into NAL unit 301-305. There are different NAL unit types. The network abstraction layer provides the ability to encapsulate the bitstream into different protocols, like RTP/IP, standing for Real Time Protocol / Internet Protocol, ISO Base Media File Format, etc. The network abstraction layer also provides a framework for packet loss resilience.
NAL units are divided into VCL NAL units and non-VCL NAL units, VCL standing for Video Coding Layer. The VCL NAL units contain the actual encoded video data. The non-VCL NAL units contain additional information. This additional information may be parameters needed for the decoding of the encoded video data or supplemental data that may enhance usability of the decoded video data. NAL units 305 correspond to tile groups and constitute the VCL NAL units of the bitstream. Different NAL units 301-304 correspond to different parameter sets, these NAL units are non-VCL NAL units. The VPS NAL unit 301 , VPS standing for Video Parameter Set, contains parameters defined for the whole video, and thus the whole bitstream. The naming of VPS may change and for instance becomes DPS in VVC. The SPS NAL unit 302, SPS standing for Sequence Parameter Set, contains parameters defined for a video sequence. The PPS NAL unit 303, PPS standing for Picture Parameter Set, contains parameters defined for a picture or a group of pictures. The APS NAL unit 304, APS standing for Adaptive Loop Filter (ALF) Parameter Set, contains parameters for the ALF that are defined at the tile group level. The bitstream may also contain SEI, standing for Supplemental Enhancement Information, NAL units. The periodicity of occurrence of these parameter sets in the bitstream is variable. A VPS that is defined for the whole bitstream needs to occur only once in the bitstream. At the opposite, an APS that is defined for a tile group may occur once for each tile group in each picture. Actually, different tile groups may rely on the same APS and thus there are generally fewer APS than tile groups in each picture.
The VCL NAL units 305 contain each a tile group. A tile group may correspond to the whole picture, a single tile or a plurality of tiles. A tile group is composed of a tile group header 310 and a raw byte sequence payload, RBSP, 311 that contains the tiles.
The tile group index is the index of the tile group in the frame in raster scan order. For example, in Figure 2, the number in a round represents the tile group index for each tile group. Tile group 203 has a tile group index of 0. The tile group identifier is a value, meaning an integer or any bit sequence, which is associated to a tile group. Typically, the PPS contains the association for each tile group between the tile group index and the tile group identifier for one or several pictures. For example, in Figure 2, the tile group 203 with tile group index 0 can have a tile group identifier of‘345’.
The tile group address is a syntax element present in the header of the tile group NAL unit. The tile group address may refer to the tile group index, to the tile group identifier or even to the tile index. In the latter case, it will be the index of the first tile in the tile group. The semantic of the tile group address is defined by several flags present in one of the Parameters Set NAL units. In the example of tile group 203 in Figure 2, the tile group address may be the tile group index 0, the tile group identifier 345 or the tile index 0.
The tile group index, identifier and address are used to define the partitioning of the frame into tile groups. The tile group index is related with the location of the tile group in the frame. The decoder parses the tile group address in the tile group NAL unit header and uses it to locate the tile group in the frame and determine the location of the first sample in the NAL unit. When the tile group address refers to the tile group identifier, the decoder uses the association indicated by the PPS to retrieve the tile group index associated with the tile group identifier and thus determine the location of the tile group and of the first sample in the NAL unit.
The syntax of the PPS as proposed in the current version of VVC is as follows:
Figure imgf000007_0001
Figure imgf000008_0001
The descriptor column gives the encoding of a syntax element, u(1) means that the syntax element is encoded using one bit, ue(v) means that the syntax element is encoded using unsigned integer 0-th order Exp-Golomb-coded syntax element with the left bit first that is a variable length encoding. The syntax elements num_tile_columns_minus1 and num_tile_rows_minus1 respectively indicate the number of tile columns and rows in the frame. When the tile grid is not uniform (uniform_tile_spacing_flag equal 0) the syntax element tile_column_width_minus1 [] and tile_row_height_minus1[] specify the widths and heights of each column and rows of the tile grid.
The tile group partitioning is expressed with the following syntax elements:
The syntax element single_tile_in_pic_flag states whether the frame contains a single tile. In other words, there is only one tile and one tile group in the frame when this flag is true.
single_tile_per_tile_group_flag states whether each tile group contains a single tile. In other words, all the tiles of the frame belong to a different tile group when this flag is true.
The syntax element rect_tile_group_flag indicates that tile groups of the frames form a rectangular shape as represented in the frame 201. When present, the syntax element num_tile_groups_in_pic_minus1 is equal to the number of rectangular tile groups in the frame minus one.
Syntax elements top_left_tile_idx[] and bottom_right_tile_idx[] are arrays that respectively specify the first tile (top left) tile and the last (bottom right) tile in a rectangular tile group. Theses arrays are indexed by tile group index.
The tile group identifiers are specified when the signalled_tile_group_id_flag is equal to 1. In this case, the signalled_tile_group_id_length_minus1 syntax element indicates the number of bits used to code each tile group identifier value. The tile_group_id[] association table is indexed by tile group index and contains the identifier of the tile group. When the signalled_tile_group_id_flag equal to 0 the tile_group_id is indexed by tile group index and contains the tile group index of the tile group.
The tile group header comprises the tile group address according to the following syntax in the current VVC version:
Figure imgf000009_0001
When the tile group is not rectangular, the tile group header indicates the number of tiles in the tile group NAL unit with help of num_tiles_in_tile_group_minus1 syntax element.
Each tile 320 may comprise a tile segment header 330 and a tile segment data 331. The tile segment data 331 comprises the encoded coding units 340. In current version of the VVC standard, the tile segment header is not present and tile segment data contains the coding unit data 340.
Figure 4 illustrates the process of generating a video bitstream composed of different regions of interest from one or several original bitstreams.
In a step 400, the regions to be extracted from the original bitstreams are selected. The regions may correspond for instance to a specific region of interest or a specific viewing direction in an omnidirectional content. The tile groups comprising encoded samples present in the selected set of regions are selected in the original bitstreams. At the end of this step, the identifier of each tile group in the original bitstreams, which will be merged in the resulting bitstreams, is determined. For example, the identifiers of the tile groups 1 , 2, and 4 of frame 101 and of the tile group 3 of frame 100 in Figure 1 are determined.
In a step 401 , a new arrangement for the selected tile groups in the resulting video is determined. This consists in determining the size and location of each selected tile group in the resulting video. For instance, the new arrangement conforms to a predetermined ROI composition. Alternatively, a user defines a new arrangement.
In a step 402, the tile partitioning of the resulting video needs to be determined. When the tile partitioning of the original bitstreams are identical, the same tile partitioning is kept for the resulting video. At the end of this step, the number of rows and columns of the tile grid with the width and height of the tiles is determined and, advantageously stored in memory.
When determining the new arrangement, determined in step 401 , of the tile groups in the resulting video, the location of a tile group in the video may change regarding its location in the original video. In a step 403, the new locations of the tile groups are determined. In particular, the tile group partitioning of the resulting video is determined. The location of the tile groups are determined in reference with the new tile grid as determined in step 402.
In a step 404, new parameters sets are generated for the resulting bitstream. In particular, new PPS NAL units are generated. These new PPS contains syntax elements to encode the tile grid partitioning, the tile group partitioning and positioning and the association of the tile group identifier and the tile group index. To do so, the tile group identifier is extracted from each tile group and associated with the tile group index depending of the new decoding location of the tile group. It is reminded that each tile group, in the exemplary embodiment, is identified by an identifier in the tile group header and that each tile group identifier is associated with an index corresponding to the tile group index of the tile group in the picture in raster scan order. This association is stored in a PPS NAL unit. Assuming that there is no collision in the identifiers of the tile groups, when changing the position of a tile group in a picture, and thus changing the tile group index, there is no need to change the tile groups identifiers and thus to amend the tile group structure. Only PPS NAL units need to be amended.
In a step 405, the VCL NAL unit, namely the tile groups, are extracted from the original bitstreams to be inserted in the resulting bitstream. It may happen that these VCL NAL units need to be amended. In particular, some parameters in the tile group header may not be compatible with the resulting bitstream and need to be amended. It would be advantageous to avoid this amending step, as decoding, amending and recoding the tile group header is resource consuming.
In particular, there is an issue concerning the signalling of the quantisation parameter (QP) in the bitstream. The quantisation parameter is an important parameter when encoding a coding unit as it determines the compression bitrate and thus the quality of the encoding. Encoding a coding unit using a high quantisation parameter leads to a high compression ratio, thus a low bitrate and a low quality of the compressed image. Using a low quantisation parameter leads to a low compression ratio, thus a high bitrate and a high quality of the compressed image.
Most compression systems use a variable quantisation parameter that changes from coding unit to coding unit and between frames in order to adapt to the complexity of the content of the coding unit and to the structure of the compressed video when using successive temporally predicted frames. By using a variable quantisation parameter, it is possible to obtain a uniform perceived quality of the decompressed image independently of the content of the different coding units. The global quality targeted for a video sequence is determined by a default initial value of the quantisation parameter that is fixed for the sequence. This default initial quantisation parameter value is modified at different levels by applying some modifying offsets to this default initial value. In particular, a quantisation parameter delta is defined at the level of a tile group and stored in the tile group header. When encoding the first coding unit of the tile group, the encoder uses a quantisation parameter that corresponds to the addition of the default initial quantisation value corrected by the addition of the quantisation parameter delta defined for the tile group. Then the encoding process will modify the quantisation parameter for each coding unit of the tile group based on this first value of the quantisation parameter.
When merging tile groups from different original bitstreams, each original bitstream defining its own default initial quantisation parameter, the quantisation parameter deltas encoded in each tile group needs to be adapted to the default initial quantisation parameter value defined in the resulting bitstream. This implies that the tile group headers need to be decoded, amended and re-encoded in order to fix this quantisation parameter delta issue.
SUMMARY OF INVENTION
The present invention has been devised to address one or more of the foregoing concerns. It concerns an encoding and decoding method for a bitstream that allow solving the quantisation parameter delta issue when merging tile groups from different bitstreams without amending the tile group encoded data.
According to a first aspect of the invention there is provided a method of encoding video data comprising frames into a bitstream of logical units, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising:
- determining a quantisation parameter information associated with a group of frame portions;
- encoding the group of frame portions into a first logical unit;
- encoding the association between an index of the group of frame portions and the quantisation parameter information into a second logical unit.
In an embodiment:
- the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to groups of frame portions.
In an embodiment:
- the quantisation parameter information is encoded for each group of frame portions in the second logical unit as an associated default initial quantisation parameter.
In an embodiment:
- a quantization parameter offset is associated to each group in a subset of the groups of frame portions; and
- the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the group and a quantization parameter offset.
In an embodiment, the second logical unit is a PPS.
According to another aspect of the invention, there is provided a method for decoding a bitstream of logical units of video data comprising frames, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: - parsing a first logical unit comprising a group of frame portions to determine a quantisation parameter delta associated with the group of frame portions;
- parsing a second logical unit comprising the association between an index of the group of frame portions and a quantisation parameter information associated with the group of frame portions;
- decoding the group of frame portions comprised in the first logical unit using a quantisation parameter calculated based on the quantisation parameter delta and based on the quantisation parameter information. In an embodiment:
- the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to groups of frame portions.
In an embodiment:
- the quantisation parameter information is encoded for each group of frame portions in the second logical unit as an associated default initial quantisation parameter.
In an embodiment:
- a quantization parameter offset is associated to each group in a subset of the groups of frame portions; and
- the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the group and a quantization parameter offset.
In an embodiment, the second logical unit is a PPS.
According to another aspect of the invention, there is provided a method for merging a selection of groups of frame portions from a plurality of original bitstreams into a resulting bitstreams, bitstreams being composed of logical units comprising frames, frames being divided into tiles, tiles being grouped into groups of frame portions, the method comprising:
- encoding a logical unit comprising the association of an index of a group of frame portions for each selected group of frame portions with a quantisation parameter information; - generating the resulting bitstream comprising the logical units comprising the groups of frame portions, the encoded logical unit comprising the association of an index of the group of frame portions with a quantisation parameter information.
In an embodiment:
- the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to groups of frame portions.
In an embodiment:
- the quantisation parameter information is encoded for each group of frame portions in the logical unit as an associated default initial quantisation parameter.
In an embodiment:
- a quantisation parameter offset is associated to each group in a subset of the groups of frame portions; and,
- the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the group and a quantization parameter offset.
In an embodiment, the logical unit comprising the quantisation parameter information is a PPS.
According to another aspect of the invention, there is provided a computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to the invention, when loaded into and executed by the programmable apparatus.
According to another aspect of the invention, there is provided a computer- readable storage medium storing instructions of a computer program for implementing a method according to the invention.
According to another aspect of the invention, there is provided a computer program which upon execution causes the method of the invention to be performed.
At least parts of the methods according to the invention may be computer implemented. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, microcode, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
Since the present invention can be implemented in software, the present invention can be embodied as computer-readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible, non-transitory carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid-state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.
BRIEF DESCRIPTION OF DRAWINGS
Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which:
Figure 1a and 1b illustrate two different application examples for the combination of regions of interest;
Figure 2a and 2b illustrates some partitioning in encoding systems;
Figure 3 illustrates the organisation of the bitstream in the exemplary coding system VVC;
Figure 4 illustrates the process of generating a video bitstream composed of different regions of interest from one or several original bitstreams;
Figure 5 illustrates the quantisation parameter issue when merging different tile groups from different bitstreams;
Figure 6 illustrates the embodiment of the invention where a quantisation parameter offset is associated with a tile group in a non-VCL NAL unit of the bitstream;
Figure 7 illustrates the main steps of an encoding process according to an embodiment of the invention;
Figure 8 illustrates the main steps of a decoding process according to an embodiment of the invention; Figure 9 illustrates the extraction and merge operation of two bitstreams stored in a file to form a resulting bitstream stored in a resulting file in an embodiment of the invention;
Figure 10 illustrates the main step of the extraction and merge process at file format level in an embodiment of the invention;
Figure 11 is a schematic block diagram of a computing device for implementation of one or more embodiments of the invention.
DETAILED DESCRIPTION OF THE INVENTION
Figure 5 illustrates the quantisation parameter issue when merging different tile groups from different bitstreams.
In video coding, the quantisation parameter (QP) is an important parameter for determining the quality of each encoded coded unit. A higher QP creates more losses during encoding and decrease the quality of the decoded image. In H.264, the QP range is 0 to 51 , while in VP9 the range is 0 to 63.
In VVC the QP value used in a block encoding can be in the range -QpBdOffsetY to +63, where QpBdOffsetY is a value depending of the bit depth of the luma component of the video. The PPS defines a default initial QP value used as reference by all the tile groups referencing the PPS. The value is named init_qp_minus26, and it defines the initial QP value decremented by 26. This value is encoded with a variable length (se(v) means signed integer 0-th order Exp-Golomb-coded). 26 is estimated as the default value for QP, thus it will have a 1-bit length when encoded. The syntax of the PPS for signalling the default initial quantisation parameter is typically as follows:
Figure imgf000016_0001
Each tile group header in each picture defines another value tile_group_qp_delta which can be used to modify the default initial quantisation parameter value for all the CTU inside the tile group. Typically, this value changes in each picture. This value is encoded in variable length and thus a value closer to 0 will use a lower number of bits. tile_group_header( ) { Descriptor
Figure imgf000017_0001
The quantisation parameter used for encoding or decoding the first coding unit inside the tile group is the sum of 26 with init_qp_minus26 and tile_group_qp_delta.
TileGroupQpY = 26 + init_qp_minus26 + tile_group_qp_delta;
This design creates an issue when merging tile groups from different videos with different default initial quantisation parameter values. Figure 5 illustrates the merging of tile group #3 from video 500 with a high quality and tile group #4 from video 501 with a low quality in a new video 502 with variable quality in function of the region in the image. Because the video 500 has a high quality, its default initial quantisation parameter init_qp_minus_26 would have a low value, for example 20 (init_qp_minus26 = -6) giving a high quality. The video 501 with a low quality will have high value for its default initial quantisation parameter, for example 40 (init_qp_minus26 = 14). Tile group #3 from video 500 can then use a qp_delta of 0 (tile_group_qp_delta = 0) to have high quality and the tile group 4 from video 501 can use the same value 0 (tile_group_qp_delta = 0) to have a low quality.
When merging the two tile groups in the video 511, it is not possible to keep the same value tile_group_qp_delta for both tile groups because the quantisation parameter value used to encode the blocks inside the two tile groups must be different. It will be necessary to generate a new PPS 520 with a new default initial quantisation parameter value, using for example init_qp_minus26 = -6, and also to parse all the tile group headers and modify the value tile_group_qp_delta of one the tile group inside each picture of the video. For example, the tile group #4 511 is modified by adopting a quantisation parameter delta value of 20 (tile_group_qp_delta = 20). The first operation consisting in defining the default initial quantisation parameter value is not too complex because the non-VCL NAL unit PPS is relatively small and not frequent. Moreover it is encoded with mostly fixed length values. The second operation consisting in modifying all the tile group headers from one or several videos is very complex and time consuming because it requires reading all the video bitstreams (J-00 and J-01), decoding all the tile group headers which contains many variable length fields and rewriting the complete bitstream. Moreover, it must be noted that the value tile_group_qp_delta for at least one of the tile groups will have a value far different from 0 and thus will use a large number of bits in its variable length encoding.
It is thus desirable to have a better encoding for the tile group quantisation parameter to allow an efficient merge of several videos allowing easy resolution of quantisation parameter conflicts without requiring modification of the VCL units and in particular the tile group headers.
According to an embodiment of the invention, an optional quantisation parameter offset is associated to each tile group and stored in a non-VCL NAL unit of the bitstream. Accordingly, the tile group structure of each tile group may be kept unchanged. The actual quantisation parameter used for a given tile group will be calculated based on the default initial quantisation parameter of the bitstream, modified with the quantisation parameter offset associated to the tile group and summed with the quantisation parameter delta signalled in the tile group.
Figure 6 illustrates the embodiment of the invention where a quantisation parameter offset is associated with a tile group in a non-VCL NAL unit of the bitstream.
In this embodiment, the quantisation parameter offset is signalled in the PPS NAL unit, for example using the following syntax:
Figure imgf000018_0001
The flag signalled_qp_offset_flag indicates that a quantisation parameter offset is provided associated with each tile group. The association table tile_group_qp_offset stores an offset encoded in signed variable length coding associated with each tile group. With this new syntax, the decoding quantisation parameter of the first coding unit inside the tile group is the sum of 26 with init_qp_minus26, tile_group_qp_delta and tile_group_qp_offset:
TileGroupQpY = 26 + init_qp_minus26 + tile_group_qp_offset + tile_group_qp_delta;
For example, the semantics associated with the syntax element of the PPS are the following:
signalled_qp_offset_flag equal to 1 specifies the presence of tile_group_qp_offset[ i ] in the PPS. signalled_qp_offset_flag equal to 0 specifies the absence of the tile_group_qp_offset[ i ].
tile_group_qp_offset [ i ] when present, specifies the offset value that applies to the initial value of TileGroupQpy for the i-th tile group. The value of tile_group_qp_offset[ i] shall be in the range of -QpBdOffsety to+63, inclusive. QpBdOffsety is an offset computed in function of the bit depth of the luma samples. For instance QpBdOffsety is equal to 6 times the bit depth of the luma samples. In a variant, the value of tile_group_qp_offset[ i] shall be in the range of -rangeQp to rangeQp where rangeQp is an integer value. When not present, the value of tile_group_qp_offset[ i ] is inferred to be equal 0 for each value of i in range of 0 to num_tile_groups_in_pic_minus1 , inclusive. The syntax elements of the tile group header remain identical but the semantics of tile_group_qp_delta are for example the following:
tile_group_qp_delta specifies the initial value of QpY to be used for the coding blocks in the tile group until modified by the value of CuQpDeltaVal in the coding unit layer. The initial value of the QpY quantization parameter for the tile group, TileGroupQpY, is derived as follows:
TileGroupQpY = 26 + init_qp_minus26 + tile_group_qp_offset [ tileGroupQpldx ] + tile_group_qp_delta
The value of TileGroupQpY shall be in the range of -QpBdOffsetY to +63, inclusive.
The variable tileGroupQpldx which specifies the index of the tile group is derived as follows:
tileGroupQpldx = 0
while( tile_group_address != tile_group_id[ tileGroupQpldx] )
tileGroupQpldx ++ A first bitstream 600 comprises a tile group 3. The bitstream 600 contains a PPS 610 comprising a default initial quantisation parameter with the value‘20’ associated with a high quality encoding. The tile group 3 comprises a quantisation parameter delta with a value 0. It means that the first coding unit of tile group 3 is encoded with a quantisation parameter of 20, and that this value of quantisation parameters must be used when decoding this coding unit. A second bitstream 601 comprises a tile group 4. The bitstream 601 contains a PPS 611 comprising a default initial quantisation parameter with the value‘40’ associated with a low quality encoding. The tile group 4 comprises a quantisation parameter delta with a value 0. It means that the first coding unit of tile group 4 is encoded with a quantisation parameter of 40, and that this value of quantisation parameters must be used when decoding this coding unit.
After merging, the bitstream 602 comprises both the tile group 3 from bitstream 600 and the tile group 4 from bitstream 601. Both tile groups still comprise a quantisation parameter delta with the value 0. Bitstream 602 comprises a PPS 620 with a default initial quantisation parameter value of 20 identical to the one in bitstream 600. To avoid decoding the tile group 4 with a wrong quantisation parameter value of‘20’, the PPS 620 also comprises an association table associating each tile group with a quantisation parameter offset. The quantisation parameter offset associated with tile group 4 has a value‘20’. Accordingly, when decoding the tile group 4, the quantisation parameter used for the first coding unit of tile group 4 has the right value‘40’ corresponding to the sum of the default initial quantisation parameter‘20’, the quantisation parameter offset‘20’ associated with tile group 4, and the quantisation parameter delta Ό’ in tile group 4.
It can be seen that this solution allows solving the quantisation parameter issue in the merge while keeping unchanged the tile group structure.
In an alternate embodiment, instead of a quantisation parameter offset, there is directly a different default initial quantisation parameter associated to each tile group in the PPS. It means that the PPS does not contain any more a global default initial quantisation parameter, but instead an association table associating to each tile group a default initial quantisation parameter value. These default initial quantisation parameter values may be encoded minus 26 as was done for the global default initial quantisation parameter according, for example, to the following syntax:
Figure imgf000020_0001
Figure imgf000021_0001
With this new syntax, the decoding quantisation parameter of the first coding unit inside the tile group is the sum of 26 with the tile_group_init_qp_minus26 associated with the tile group of index Ϊ, and the tile_group_qp_delta of the tile group.
TileGroupQpY = 26 + tile_group_ init_qp_minus26[i] + tile_group_qp_delta;
For example, the semantics associated with the syntax element of the PPS are the following:
tile_group_init_qp_minus26 [ i ] plus 26 specifies the initial value of TileGroupQpy for the i-th tile group. The initial value of TileGroupQpy is modified at the tile group layer when a non-zero value of tile_group_qp_delta is decoded. The value of init_qp_minus26 shall be in the range of -( 26 + QpBdOffsety ) to +37, inclusive. QpBdOffsety is an offset computed in function of the bit depth of the luma samples. For instance QpBdOffsety is equal to 6 times the bit depth of the luma samples. In a variant, the value of tile_group_ init_qp_minus26 [ i] shall be in the range of -rangeQp to rangeQp where rangeQp is an integer value.
The syntax elements of the tile group header remain identical but the semantics of tile_group_qp_delta are for example the following:
tile_group_qp_delta specifies the initial value of QpY to be used for the coding blocks in the tile group until modified by the value of CuQpDeltaVal in the coding unit layer. The initial value of the QpY quantization parameter for the tile group, TileGroupQpY, is derived as follows:
TileGroupQpY = 26 + tile_group_init_qp_minus26[tileGroupQpldx] + tile_group_qp_delta
The value of TileGroupQpY shall be in the range of -QpBdOffsetY to +63, inclusive.
The variable tileGroupQpldx which specifies the index of the tile group is derived as follows:
tileGroupQpldx = 0
while( tile_group_address != tile_group_id[ tileGroupQpldx] )
tileGroupQpldx ++ Another advantage of the new syntax in a different usage is now explained. When encoding a video, the encoder may know that a particular region in an image of the video is more important for the viewer than the other parts of the images: this is a region of interest. There may be several regions of interest in each image. A region of interest may be at a fixed position in the image. For example, in the case of a video surveillance camera, the installer of the camera can position the regions of interest to visualize an interesting part of the scene for example the doors to enter a room. In another case it can be estimated that the center part of the image is the region of interest. It is interesting to encode the video with a higher quality in the regions of interest compared to the other parts of the image. The tile groups at the spatial position of the regions of interest will thus have a different (lower) quantisation parameter than the other tile groups. In this case, the encoder can use the list of tile_group_qp_offset in the PPS to set a different offset value to the tile groups of the regions of interest. In this way the tile_group_qp_delta value in the tile group headers will be closer to value 0 and thus will be encoded with a lower number of bits because it is encoded in variable length. As noted before the PPS is written only once for a large number of frames while the tile groups are written many time in each frames so it is useful to decrease the size the fields in the tile group header to obtain a better compression ratio of the video.
In another alternate embodiment, the tile_group_qp_offset is specified only for a subset of the tile groups of the frame using the tile group identifier. It advantageously reduces the signalling overhead when the number of non-null offset is low. For instance the syntax of the PPS is the following:
Figure imgf000022_0001
Figure imgf000023_0001
For example, the semantics associated with the syntax element of the PPS are the following:
num_signaled_qp_offset_minus1 plus 1 specifies the number of tile_group_qp_address [ i ] and tile_group_qp_offset [ i ] specified in the PPS; num_signaled_qp_offset_minus1 shall be in the range of 0 to num_tile_group_in_pic_minus1 , inclusive.
tile_group_qp_address [ i ] when present specifies the tile group address of each tile group that has a QP offset signalled in the PPS. In a variant, tile_group_qp_address[i] specifies the tile group index of each tile group that has a QP signalled in the APS. The range of tile_group_qp_address[i] shall be in range of 0 to num_tile_group_in_pic_minus1 , inclusive.
tile_group_qp_offset [ i ] specifies the offset value that applies to the initial value of TileGroupQpv for the tile group with tile_group_address equal to tile_group_qp_address[ i ] . The value of tile_group_qp_offset[ i] shall be in the range of -QpBdOffsety to+63, inclusive. QpBdOffsety is an offset computed in function of the bit depth of the luma samples. For instance, QpBdOffsety is equal to 6 times the bit depth of the luma samples. In a variant, the value of tile_group_qp_offset[ i] shall be in the range of -rangeQp to rangeQp where rangeQp is an integer value.
The variable qpOffsetTGIdx which specifies the index of the tile group is derived as follows:
for( i = 0; i <= num_tile_groups_in_pic_minus1 ; i++ ) {
if (tile_group_id[i] == tile_group_qp_address[ i ]) {
qpOffsetTGIdx[i] = tile_group_qp_offset[i]
}
else {
qpOffsetTGIdx[i] = 0
}
} The syntax elements of the tile group header remain identical but the semantics of tile_group_qp_delta are for example the following:
tile_group_qp_delta specifies the initial value of QpY to be used for the coding blocks in the tile group until modified by the value of CuQpDeltaVal in the coding unit layer. The initial value of the QpY quantization parameter for the tile group, TileGroupQpY, is derived as follows:
TileGroupQpY = 26 + init_qp_minus26 + qpOffsetTGIdx[tileGroupQpldx] + tile_group_qp_delta;
The value of TileGroupQpY shall be in the range of -QpBdOffsetY to +63, inclusive.
The variable tileGroupQpldx which specifies the index of the tile group is derived as follows:
tileGroupQpldx = 0
while( tile_group_address != tile_group_id[ tileGroupQpldx] )
tileGroupQpldx ++
Figure 7 illustrates the main steps of an encoding process according to an embodiment of the invention.
The described encoding process concerns the encoding according to an embodiment of the invention of a single bitstream. The obtained encoded bitstream may be used in a merging operation as described above as an original bitstream or as the resulting bitstream.
In a step 700, a tile partitioning of the frames is determined. For instance, the encoder defines the number of columns and rows so that each region of interest of the video is covered by at least one tile. In another example, the encoder is encoding an omnidirectional video where each tile corresponds to a predetermined field of view in the video. The tile partitioning of the frame according to a tile grid is typically represented in a parameter set NAL unit, for example a PPS according to the syntax presented in reference to Figure 3.
In a step 701 , a set of tile groups are defined, each tile group comprising one or more tiles. In a particular embodiment, a tile group is defined for each tile of the frame. Advantageously, in order to avoid some VCL NAL unit rewriting in merge operation, a tile group identifier is defined for each tile group in the bitstream. The tile group identifiers are determined in order to be unique for the tile group. The unicity of the tile group identifiers may be defined at the level of a set of bitstreams comprising the bitstream currently encoded.
The number of bits used to encode the tile group identifier, corresponding to the length of the tile group identifier, is determined as a function of the number of tile groups in the encoded bitstream or as a function of a number of tile groups in a set of different bitstreams comprising the bitstream being currently encoded.
The length of the tile group identifier and the association of each tile group with an identifier is specifically specified in parameter set NAL unit as the PPS.
In a step 702, each tile group is associated to a quantisation parameter information. This quantisation parameter information is then encoded into a parameter set. Typically, the parameter set is a PPS. According to embodiments, the quantisation parameter information may be encoded as a quantisation parameter offset based on a default initial quantisation parameter and taking into account the quantisation parameter delta encoded into the tile group header. Alternatively, the quantisation parameter information is encoded in the parameter set as a dedicated default initial quantisation parameter associated to the tile group. For example, the parameter set can take the form of the PPS described in reference to Figure 6.
In a step 703, the samples of each tile group are encoded according to the parameters defined in the different parameter sets. In particular, the encoding will be based on the quantisation parameter information associated with the tile group. A complete bitstream is generated comprising both the non-VCL NAL units corresponding to the different parameter sets and the VCL NAL units corresponding to the encoded data of the different tile groups.
Figure 8 illustrates the main steps of a decoding process according to an embodiment of the invention.
In a step 800, the decoder parses the bitstream in order to determine the tile portioning of the frames. This information is obtained from a parameter set, typically from the PPS NAL unit. The syntax elements of the PPS are parsed and decoded to determine the grid of tiles.
In a step 801 , the decoder determines the tile group partitioning of the frame and in particular obtain the number of tile group associated with an identification information of each tile group. This information is valid for at least one frame, but stay valid generally for many frames. It may take the form of the tile group identifier that may be obtained from a parameter set as the PPS NAL unit as described in Figure 6. In a step 802, the decoder parses the bitstream to determine the quantisation parameter information that is associated with each tile group. This is typically done by extracting a quantisation parameter delta from the tile group header and by combining this information with a quantisation parameter information associated with the tile group in a parameter set, typically in a PPS. Based on both the quantisation parameter delta and the quantisation parameter information associated with the tile group, an actual quantisation parameter is determined that allows the decoding of the tile group. According to embodiments, the quantisation parameter information associated to the tile group may be a quantisation parameter offset based on a default initial quantisation parameter. Alternatively, the quantisation parameter information associated to the tile group is a dedicated default initial quantisation parameter.
In a step 803, the decoder decodes the VCL NAL units corresponding to the tile groups according to the parameters determined in the previous steps. In particular, the decoding uses the quantisation parameter associated with the tile group.
Figure 9 illustrates the extraction and merge operation of two bitstreams stored in a file to form a resulting bitstream stored in a resulting file in an embodiment of the invention.
Figure 10 illustrates the main step of the extraction and merge process at file format level in an embodiment of the invention.
Figure 9 illustrates the merge of two ISO BMFF streams 900 and 901 resulting in a new ISO BMFF bitstream 902 according to the method of Figure 10.
The encapsulation of the VVC streams consists in this embodiment in defining one tile track for each tile group of the stream and one tile base track for the NAL units common to the tile groups. For example, the stream 900 contains two tile groups one with the identifier Ί .1’ and another one with identifier Ί .2’. The samples corresponding to each tile group Ί .1’ and Ί .2’ are described respectively in one tile track similarly to tile tracks of in ISO/IEC 14496-15. While initially designed for HEVC, the VVC tile groups could be encapsulated in tile tracks. This VVC tile track could be differentiated from HEVC tile track by defining a new sample entry for instance‘vvtT instead of ‘hvtT. Similarly, a tile base track for HEVC is extended to support VVC format. This VVC tile base track could be differentiated from HEVC tile base track by defining a different sample entries. The VVC tile base track describes NAL units common to the two tile groups. Typically, it contains mainly non-VCL NAL unit such as the Parameter Sets and the SEI NAL unit. First, the merging method consists in determining in step 1000 the set of tile tracks from the two streams to be merged in a single bitstream. For instance, it corresponds to the tile tracks of the tile group with the identifier‘2.1’ of 901 file and of the tile group with identifier Ί .2’ of the file 900.
Then the method generates in a step 1001 new Parameter Sets NAL units (i.e.
PPS) to describe the new decoding locations of the tile groups in the resulting stream accordingly to the embodiments described above. Since all the modifications consists in modifying only the non VCL NAL units, it is equivalent to generating in a step 1003 a new Tile Base Track. The samples of the original tile tracks corresponding to the extracted tile groups remains identical. The tile tracks of the file 902 reference the tile base tracks with a track reference type set to‘tbas’. The tile base track references as well the tile tracks with a track reference type set to‘sbat’.
The advantage of this method is that combining two streams consists mainly in generating a new tile base track, update the track reference boxes and copying as is the tile tracks samples corresponding to the selected tile groups. The processing is simplified since rewriting process of the tile tracks samples is avoided compared to prior art.
In some embodiments, sub-pictures are divided into slices instead of tile groups. Slices comprises the notion of tile groups with the addition that slices may also correspond to a sub part of a tile, namely a number of lines of CTU within a tile. Everything that has been described relatively to tile groups and tile group headers is relevant regarding slices and slice headers.
In some embodiments, the quantization parameter offset may be defined at the sub-picture level to apply to all slices of the sub-picture. Optionally, some quantization parameter offsets may be defined for the chrominance Cr and Cb components independently of the quantization parameter offset defined for the luminance component.
A possible syntax of the PPS for defining the quantization parameter offsets associated with a sub-picture may be:
Figure imgf000027_0001
Figure imgf000028_0001
Where the introduced new syntax elements may have the following semantic: pps_subpic_qp_offset_present_flag equal to 1 specifies the presence of pps_subpic_qp_offset[ i ], pps_subpic_cb_qp_offset[ i ], pps_subpic_cr_qp_offset[ i ] and pps_subpicJoint_cbcr_qp_offset_value[ i ] in the PPS. pps_subpic_qp_offset_present_flag equal to 0 specifies the absence of the pps_subpic_qp_offset[ i ], pps_subpic_cb_qp_offset[ i ], pps_subpic_cr_qp_offset[ i ] and pps_subpicJoint_cbcr_qp_offset_value[ i ] .
pps_subpic_qp_offset[ i ] when present, specifies the offset value that applies to the initial value of SliceQpy for the i-th subpicture. The value of 26 + init_qp_minus26 + pps_subpic_qp_offset[ i ] shall be in the range of -QpBdOffsetY to +63, inclusive. When not present, the value of pps_subpic_qp_offset[ i ] is inferred to be equal 0 for each value of i in range of 0 to pps_num_subpic_minus1 , inclusive.
pps_subpic_cb_qp_offset[ i ] and pps_subpic_cr_qp_offset[ i ] specify differences to be added to the values of pps_cb_qp_offset and pps_cr_qp_offset when determining the value of the quantization parameters Qp'cb and Qp'cr respectively for the i-th subpicture. The values of pps_subpic_cb_qp_offset[ i ] and pps_subpic_cr_qp_offset[ i ] shall be in the range of -12 to +12, inclusive. When not present, the values of pps_subpic_cb_qp_offset[ i ] and pps_subpic_cr_qp_offset[ i ] are inferred to be equal to 0. The values of pps_cb_qp_offset + pps_subpic_cb_qp_offset[ i ] and pps_cr_qp_offset + pps_subpic_cr_qp_offset[ i ] shall be in the range of -12 to +12, inclusive.
pps_subpic_joint_cbcr_qp_offset_value[ i ] specifies the difference to be added to the value of ppsJoint_cbcr_qp_offset when determining the value of the quantization parameter Qp'cbcr for the i-th subpicture. The value of pps_subpicJoint_cbcr_qp_offset_value[ i ] shall be in the range of -12 to +12, inclusive. When not present, the value of pps_subpicJoint_cbcr_qp_offset_value[ i ] is inferred to be equal to 0. The value of ppsJoint_cbcr_qp_offset_value + pps_subpicJoint_cbcr_qp_offset_value[ i ] shall be in the range of -12 to +12, inclusive.
The syntax elements of the slice header remain identical but the semantics of slice_qp_delta, slice_cb_qp_offset, slice_cr_qp_offset and sliceJoint_cbcr_qp_offset are the following:
slice_qp_delta specifies the initial value of Qpy to be used for the coding blocks in the slice until modified by the value of CuQpDeltaVal in the coding unit layer. The initial value of the Qpy quantization parameter for the slice, SliceQpy , is derived as follows:
SliceQpy = 26 + init_qp_minus26 + pps_subpic_qp_offset[ SubPicldx ] + slice_qp_delta;
The value of SliceQpy shall be in the range of -QpBdOffsetY to +63, inclusive. slice_cb_qp_offset specifies a difference to be added to the value of pps_cb_qp_offset when determining the value of the Qp'cb quantization parameter. The value of slice_cb_qp_offset shall be in the range of -12 to +12, inclusive. When slice_cb_qp_offset is not present, it is inferred to be equal to 0. The value of pps_cb_qp_offset + pps_subpic_cb_qp_offset[ SubPicldx ] + slice_cb_qp_offset shall be in the range of -12 to +12, inclusive.
slice_cr_qp_offset specifies a difference to be added to the value of pps_cr_qp_offset when determining the value of the Qp'cr quantization parameter. The value of slice_cr_qp_offset shall be in the range of -12 to +12, inclusive. When slice_cr_qp_offset is not present, it is inferred to be equal to 0. The value of pps_cr_qp_offset + pps_subpic_cr_qp_offset[ SubPicldx ] + slice_cr_qp_offset shall be in the range of -12 to +12, inclusive.
sliceJoint_cbcr_qp_offset specifies a difference to be added to the value of ppsJoint_cbcr_qp_offset_value when determining the value of the Qp'cbcr. The value of sliceJoint_cbcr_qp_offset shall be in the range of -12 to +12, inclusive. When sliceJoint_cbcr_qp_offset is not present, it is inferred to be equal to 0. The value of ppsJoint_cbcr_qp_offset_value + pps_subpicJoint_cbcr_qp_offset_value[ SubPicldx ] + sliceJoint_cbcr_qp_offset shall be in the range of -12 to +12, inclusive. In some embodiments, similar syntax may be proposed in the SPS (Sequence
Parameter Set) or in the Picture Header to define the quantization parameter offsets associated with sub-pictures.
Accordingly, by defining quantization parameter offsets at the sub-picture level it is possible to define at once a quantization parameter offset that applies to all the slices of a given sub-picture. When defined in the SPS or the PPS, the quantization parameter offsets defined for index i are applied to all slices of all sub-pictures of index i in all pictures using the SPS or the PPS respectively. When defined in a Picture Header, the quantization parameter offsets defined for index i are applied to all slices of the subpicture of index i using the picture header.
In some embodiments, an ID may be given to each subpicture in the PPS using the pps_subpic_id. pps_subpic_id[ i ] specifies the subpicture ID of the i-th subpicture. In this case, only the subpicture ID is specified in the slice header and not the subpicture index. The order of the subpictures may be changed in each Picture Header using a second table ph_subpic_id. ph_subpic_id[ i ] specifies the subpicture ID of the i-th subpicture. In order to find the index SubPicldx to find the value to use in the offset tables defined in the PPS (pps_subpic_qp_offset, pps_subpic_cb_qp_offset, pps_subpic_cr_qp_offset and pps_subpicJoint_cbcr_qp_offset_value) the decoder must use the value SubPicldx such that pps_subpic_id[ SubPicldx] is equal to the ID defined in the slice header and not the the values of the table ph_subpic_id which indicates the position in the picture where the subpicture is decoded.
In some embodiments, a list of quantization parameter offsets may be defined for the chrominance components. Each Coding Unit (CU) can use a different value in the list by indicating the index in the list, to adjust its chrominance QP value. In order to allow the CU in different subpictures to use the same index value to refer to different offsets, it would be useful to have several chroma quantization parameter offset tables and each subpicture can use a different table.
The list of chroma quantization parameter offsets may take for example the following syntax in the PPS: pic_parameter_set_rbsp( ) {
Figure imgf000030_0001
Descriptor
Figure imgf000031_0001
Where the introduced new syntax elements may have the following semantic: pps_num_qp_offset_lists_minus1 plus 1 specifies the number of chroma_qp_offset_list_len_minus1[ i ] syntax elements that are present in the PPS RBSP syntax structure. In other words it defines the number of chroma qp offset tables defined in the pps. The value of pps_num_qp_offset_lists_minus1 shall be in the range of 0 to 5, inclusive.
chroma_qp_offset_list_len_minus1 [ i ] plus 1 specifies the number of cb_qp_offset_list[ i ], cr_qp_offset_list[ i ] and joint_cbcr_qp_offset_list[ i ] syntax elements that are present in the PPS RBSP syntax structure. In other words it defines the number of offset values in the ith cb, cr and joint_cbcr quantization parameter offset tables. The value of the sum of the chroma_qp_offset_list_len_minus1 [i] for i in the range 0 to pps_num_qp_offset_lists_minus1 shall be in the range of 0 to 5, inclusive. In other words the number of offset values defined in all the chroma offset tables should be limited to limit the complexity of the decoder hardware.
cb_qp_offset_list[ i ][ j ], cr_qp_offset_list[ i ][ j ] and joint_cbcr_qp_offset_list[ i ][ j ] specify offsets used in the derivation of Qp'cb, Qp'cr, and Qp'cbcr, respectively. The values of cb_qp_offset_list[ i ][ j ], cr_qp_offset_list[ i ][ j ], and joint_cbcr_qp_offset_list[ i ][ j ] shall be in the range of -12 to +12, inclusive. When not present, cb_qp_offset_list[ i ][ j ], cr_qp_offset_list[ i ][ j ] and joint_cbcr_qp_offset_list[ i ] are inferred to be equal to 0.
subpic_chroma_qp_offset_list_index[ i ] specifies the index of the chroma qp offset tables used by the ith subpicture. subpic_chroma_qp_offset_list_index[ i ] shall be in the range 0 to pps_num_qp_offset_lists_minus1 inclusive. The syntax elements of the transform unit containing a CU remain identical but the semantics of cu_chroma_qp_offset_idx is the following:
cu_chroma_qp_offset_idx, when present, specifies the index into the cb_qp_offset_list[ ][ ], cr_qp_offset_list[ ][ ], and joint_cbcr_qp_offset_list[ ][ ] that is used to determine the value of CuQpOffsetcb, CuQpOffsetcr, and CuQpOffsetcbcr. When present, the value of cu_chroma_qp_offset_idx shall be in the range of 0 to chroma_qp_offset_list_len_minus1 [subpic_chroma_qp_offset_list_index[ SubPicldx ] ] inclusive. When not present, the value of cu_chroma_qp_offset_idx is inferred to be equal to 0.
When cu_chroma_qp_offset_flag is present, the following applies:
- The variable IsCuChromaQpOffsetCoded is set equal to 1.
- The variables CuQpOffsetcb, CuQpOffsetcr, and CuQpOffsetcbcr are derived as follows:
- If cu_chroma_qp_offset_flag is equal to 1 , the following applies:
CuQpOffsetcb =
cb_qp_offset_list[subpic_chroma_qp_offset_list_index[ SubPicldx ] ]
[ cu_chroma_qp_offset_idx ]
CuQpOffsetcr =
cr_qp_offset_list[subpic_chroma_qp_offset_list_index[ SubPicldx ] ]
[ cu_chroma_qp_offset_idx ]
CuQpOffsetcbcr =
joint_cbcr_qp_offset_list[subpic_chroma_qp_offset_list_index[ SubPicldx ] ]
[ cu_chroma_qp_offset_idx ]
Otherwise (cu_chroma_qp_offset_flag is equal to 0), CuQpOffsetcb,
CuQpOffsetcr, and CuQpOffsetcbcr are all set equal to 0.
In some embodiments, similar syntax may be proposed in the SPS (Sequence Parameter Set) or in the Picture Header to define the chroma quantization parameter offset tables associated with sub-pictures.
In some embodiment, the syntax of the chroma quantization parameter offset tables may be simplified to directly define a chroma quantization parameter offset table for each subpicture. This syntax has the advantage to avoid the indirection through the subpic_chroma_qp_offset_list_index table to obtain the tables associated to each subpicture: each chroma quantization parameter offset table is indexed directly by the subpicture index. But it has the disavantage to require to define more offset values and thus it may increase the complexity of the decoder. Figure 11 is a schematic block diagram of a computing device 1100 for implementation of one or more embodiments of the invention. The computing device 1100 may be a device such as a microcomputer, a workstation or a light portable device. The computing device 1100 comprises a communication bus connected to:
- a central processing unit 1101 , such as a microprocessor, denoted CPU;
- a random access memory 1102, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method according to embodiments of the invention, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port, for example;
- a read only memory 1103, denoted ROM, for storing computer programs for implementing embodiments of the invention;
- a network interface 1104 is typically connected to a communication network over which digital data to be processed are transmitted or received. The network interface 1104 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data packets are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 1101 ;
- a user interface 1105 may be used for receiving inputs from a user or to display information to a user;
- a hard disk 1106 denoted HD may be provided as a mass storage device;
- an I/O module 1107 may be used for receiving/sending data from/to external devices such as a video source or display.
The executable code may be stored either in read only memory 1103, on the hard disk 1106 or on a removable digital medium such as for example a disk. According to a variant, the executable code of the programs can be received by means of a communication network, via the network interface 1104, in order to be stored in one of the storage means of the communication device 1100, such as the hard disk 1106, before being executed.
The central processing unit 1101 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 1101 is capable of executing instructions from main RAM memory 1102 relating to a software application after those instructions have been loaded from the program ROM 1103 or the hard disk (HD) 1106 for example. Such a software application, when executed by the CPU 1101 , causes the steps of the flowcharts of the invention to be performed.
Any step of the algorithms of the invention may be implemented in software by execution of a set of instructions or program by a programmable computing machine, such as a PC (“Personal Computer”), a DSP (“Digital Signal Processor”) or a microcontroller; or else implemented in hardware by a machine or a dedicated component, such as an FPGA (“Field-Programmable Gate Array”) or an ASIC (“Application-Specific Integrated Circuit”).
Although the present invention has been described herein above with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a skilled person in the art which lie within the scope of the present invention.
Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.
Each of the embodiments of the invention described above can be implemented solely or as a combination of a plurality of the embodiments. Also, features from different embodiments can be combined where necessary or where the combination of elements or features from individual embodiments in a single embodiment is beneficial.
Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
For example, a Tile Group could also be a slice, a tile set, a motion constrained tile set (MCTS), a region of interest or a sub picture.
The information coded in the Picture Parameter Set PPS could also be encoded in other non VCL units like a Video Parameter Set VPS, Sequence Parameter Set SPS or the DPS or new units like Layer Parameter Set, or Tile Group Parameter Set. These units define parameters valid for several frames and thus there are at a higher hierarchical level than the tile group units or the APS units in the video bitstream. The tile group units are valid only inside one frame. The APS units can be valid for some frames but their usage changes rapidly from one frame to another. In the claims, the word“comprising” does not exclude other elements or steps, and the indefinite article“a” or“an” does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Claims

1. A method of encoding video data comprising frames into a bitstream of logical units, frames being spatially divided into frame portions, frame portions being divided into subportions, the method comprising:
- determining a quantisation parameter information associated with a subportion;
- encoding the subportion into a first logical unit;
- encoding the association between an index of the subportion and the quantisation parameter information into a second logical unit.
2. The method of claim 1 , wherein:
- the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to subportions.
3. The method of claim 1 , wherein:
- the quantisation parameter information is encoded for each subportion in the second logical unit as an associated default initial quantisation parameter.
4. The method of claim 1 , wherein:
- a quantization parameter offset is associated to each subportion in a subset of the subportions; and
- the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the subportion and a quantization parameter offset.
5. The method of any claims 1 to 4, wherein the second logical unit is a PPS.
6. A method for decoding a bitstream of logical units of video data comprising frames, frames being spatially divided into frame portions, frame portions being divided into subportions, the method comprising: - parsing a first logical unit comprising a subportion to determine a quantisation parameter delta associated with the subportion;
- parsing a second logical unit comprising the association between an index of the subportion and a quantisation parameter information associated with the subportion;
- decoding the subportion comprised in the first logical unit using a quantisation parameter calculated based on the quantisation parameter delta and based on the quantisation parameter information.
7. The method of claim 6, wherein:
- the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to subportions.
8. The method of claim 6, wherein:
- the quantisation parameter information is encoded for each subportion in the second logical unit as an associated default initial quantisation parameter.
9. The method of claim 6, wherein:
- a quantization parameter offset is associated to each subportion in a subset of the subportions; and
- the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the subportion and a quantization parameter offset.
10. The method of any one claim 6 to 9, wherein the second logical unit is a PPS.
11. A method for merging a selection of subportions from a plurality of original bitstreams into a resulting bitstreams, bitstreams being composed of logical units comprising frames, frames being divided into frame portions, frame portions being subportions, the method comprising: - encoding a logical unit comprising the association of an index of a subportion for each selected subportion with a quantisation parameter information;
- generating the resulting bitstream comprising the logical units comprising the subportions, the encoded logical unit comprising the association of an index of the subportion with a quantisation parameter information.
12. The method of claim 11 , wherein:
- the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to subportions.
13. The method of claim 11 , wherein:
- the quantisation parameter information is encoded for each subportion in the logical unit as an associated default initial quantisation parameter.
14. The method of claim 11 , wherein:
- a quantisation parameter offset is associated to each subportion in a subset of the subportions; and,
- the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the subportion and a quantization parameter offset.
15. A method of generating a file comprising a bitstream of logical units of encoded video data comprising frames, frames being spatially divided into frame portions, frame portions being divided into subportions, the method comprising:
- encoding the bitstream according to any one of claim 1 to 5;
- generating a first track comprising the logical units containing the association between the indexes of the subportions and the quantisation parameter information;
- generating for a subportion, a track containing the logical unit containing the subportion; and, - generating the file comprising the generated tracks.
16. A bitstream of logical units, the bitstream comprising encoded video data comprising frames, frames being spatially divided into frame portions, frame portions being divided into subportions, the bitstream comprising:
- a first logical unit comprising a subportion; and
- a second logical unit comprising the association between an index of the subportion and a quantization information.
17. The method of any one claim 11 to 14, wherein the logical unit comprising the quantisation parameter information is a PPS.
18. A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to any one of claims 1 to 15, when loaded into and executed by the programmable apparatus.
19. A computer-readable storage medium storing instructions of a computer program for implementing a method according to any one of claims 1 to 15.
20. A computer program which upon execution causes the method of any one of claims 1 to 15 to be performed.
PCT/EP2020/055184 2019-03-01 2020-02-27 Method and apparatus for encoding and decoding a video bitstream for merging regions of interest WO2020178144A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
GBGB1902831.5A GB201902831D0 (en) 2019-03-01 2019-03-01 Method and apparatus for encoding and decoding a video bitstream for merging regions of interest
GB1902831.5 2019-03-01
GB1903383.6A GB2581853A (en) 2019-03-01 2019-03-12 Method and apparatus for encoding and decoding a video bitstream for merging regions of interest
GB1903383.6 2019-03-12
GB1918656.8 2019-12-17
GB1918656.8A GB2581869B (en) 2019-03-01 2019-12-17 Method and apparatus for encoding and decoding a video bitstream for merging regions of interest

Publications (1)

Publication Number Publication Date
WO2020178144A1 true WO2020178144A1 (en) 2020-09-10

Family

ID=66377309

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/055184 WO2020178144A1 (en) 2019-03-01 2020-02-27 Method and apparatus for encoding and decoding a video bitstream for merging regions of interest

Country Status (2)

Country Link
GB (3) GB201902831D0 (en)
WO (1) WO2020178144A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115314722A (en) * 2022-06-17 2022-11-08 百果园技术(新加坡)有限公司 Video code rate distribution method, system, equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11303897B2 (en) * 2020-02-25 2022-04-12 Tencent America LLC Method and apparatus for signaling of chroma quantization parameters

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150030068A1 (en) * 2012-03-15 2015-01-29 Sony Corporation Image processing device and method
WO2020070120A1 (en) * 2018-10-02 2020-04-09 Telefonaktiebolaget Lm Ericsson (Publ) Picture tile attributes signaled using loop(s) over tiles

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8737464B1 (en) * 2011-07-21 2014-05-27 Cisco Technology, Inc. Adaptive quantization for perceptual video coding
US9414054B2 (en) * 2012-07-02 2016-08-09 Microsoft Technology Licensing, Llc Control and use of chroma quantization parameter values
WO2019009776A1 (en) * 2017-07-05 2019-01-10 Telefonaktiebolaget Lm Ericsson (Publ) Decoding a block of video samples

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150030068A1 (en) * 2012-03-15 2015-01-29 Sony Corporation Image processing device and method
WO2020070120A1 (en) * 2018-10-02 2020-04-09 Telefonaktiebolaget Lm Ericsson (Publ) Picture tile attributes signaled using loop(s) over tiles

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"WD of ISO/IEC 23090-2 2nd edition OMAF", no. n18227, 15 February 2019 (2019-02-15), XP030212758, Retrieved from the Internet <URL:http://phenix.int-evry.fr/mpeg/doc_end_user/documents/125_Marrakech/wg11/w18227.zip w18227-v1.docx> [retrieved on 20190215] *
OUEDRAOGO N ET AL: "[AHG17/AHG12] Bitstream extraction and merging with variable initial Qp", no. m46850, 12 March 2019 (2019-03-12), XP030209627, Retrieved from the Internet <URL:http://phenix.int-evry.fr/mpeg/doc_end_user/documents/126_Geneva/wg11/m46850-JVET-N0192-v1-JVET-N0192.zip JVET-N0192.docx> [retrieved on 20190312] *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115314722A (en) * 2022-06-17 2022-11-08 百果园技术(新加坡)有限公司 Video code rate distribution method, system, equipment and storage medium
CN115314722B (en) * 2022-06-17 2023-12-08 百果园技术(新加坡)有限公司 Video code rate distribution method, system, equipment and storage medium
WO2023241376A1 (en) * 2022-06-17 2023-12-21 广州市百果园信息技术有限公司 Video bitrate allocation method, system and device, and storage medium

Also Published As

Publication number Publication date
GB201918656D0 (en) 2020-01-29
GB2581869A (en) 2020-09-02
GB201903383D0 (en) 2019-04-24
GB2581853A (en) 2020-09-02
GB201902831D0 (en) 2019-04-17
GB2581869B (en) 2023-02-08

Similar Documents

Publication Publication Date Title
US20220217355A1 (en) Method and apparatus for encoding and decoding a video bitstream for merging regions of interest
US20220329792A1 (en) Method and apparatus for encoding and decoding a video stream with subpictures
TWI680673B (en) Method and apparatus for video image coding and decoding
EP3407605B1 (en) Method and device for encoding and decoding parameter sets at slice level
WO2020178065A1 (en) Method and apparatus for encoding and decoding a video bitstream for merging regions of interest
US9313515B2 (en) Methods and apparatus for the use of slice groups in encoding multi-view video coding (MVC) information
EP2624570B1 (en) Image signal encoding method and apparatus
CN113519162B (en) Parameter set signaling in digital video
JP7472292B2 (en) Method, apparatus, and computer program product for video encoding and video decoding - Patents.com
US11589047B2 (en) Video encoding and decoding methods and apparatus
US20230060709A1 (en) Video coding supporting subpictures, slices and tiles
US20210092359A1 (en) Method, device, and computer program for coding and decoding a picture
WO2020178144A1 (en) Method and apparatus for encoding and decoding a video bitstream for merging regions of interest
KR20130116815A (en) Extension of hevc nal unit syntax structure
CN115550719A (en) Purpose of signaling preselection
GB2584723A (en) Method, device, and computer program for coding and decoding a picture
TW202310626A (en) Independent subpicture film grain

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20707264

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20707264

Country of ref document: EP

Kind code of ref document: A1