GB2581853A - Method and apparatus for encoding and decoding a video bitstream for merging regions of interest - Google Patents
Method and apparatus for encoding and decoding a video bitstream for merging regions of interest Download PDFInfo
- Publication number
- GB2581853A GB2581853A GB1903383.6A GB201903383A GB2581853A GB 2581853 A GB2581853 A GB 2581853A GB 201903383 A GB201903383 A GB 201903383A GB 2581853 A GB2581853 A GB 2581853A
- Authority
- GB
- United Kingdom
- Prior art keywords
- group
- tile
- frame portions
- quantisation parameter
- groups
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims description 56
- 238000013139 quantization Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 10
- 238000000638 solvent extraction Methods 0.000 description 17
- 230000008569 process Effects 0.000 description 11
- 241000023320 Luma <angiosperm> Species 0.000 description 7
- 230000006835 compression Effects 0.000 description 7
- 238000007906 compression Methods 0.000 description 7
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000011664 signaling Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005538 encapsulation Methods 0.000 description 2
- 230000000153 supplemental effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/88—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving rearrangement of data among different coding units, e.g. shuffling, interleaving, scrambling or permutation of pixel data or permutation of transform coefficient data among different blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/167—Position within a video image, e.g. region of interest [ROI]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/174—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/188—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a video data packet, e.g. a network abstraction layer [NAL] unit
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Encoding video data comprising frames into a bitstream of logical units, where the frames are spatially divided into frame portions, and the frame portions are grouped. The quantisation parameter (QP) information associated with a group of frame portions is determined. The group of frame portions is encoded into a first logical unit and the association between an index of the group and corresponding QP information is encoded into a second logical unit. The grouping of frame portions may be based on regions of interest, and the grouping may include merging tile groups from different bitstreams. A merged bitstream 602 may include a Picture Parameter Set (PPS) 620 which may include a default initial QP value (init_qu_minus26) and an association table comprising QP offset values (tile_group_qp_offset). In an embodiment, instead of the offset, the initial QP value may have a different default initial QP value associated to each tile group. In another embodiment, the tile_group_qp_offset parameter is only set for a subset of tile groups. The present invention solves the quantisation parameter delta problem when merging tile groups from different bitstreams without amending the tile group NAL unit by introducing QP information associated with each tile group in a parameter set.
Description
METHOD AND APPARATUS FOR ENCODING AND DECODING A VIDEO BITSTREAM FOR MERGING REGIONS OF INTEREST
FIELD OF THE INVENTION
The present disclosure concerns a method and a device for encoding and decoding a video bitstream that facilitates the merge of regions of interest. It concerns more particularly the encoding and decoding of a video bitstream resulting of the merging of regions coming from different video bitstreams. In addition, it is proposed a corresponding method of generating such bitstream resulting from the merge of different regions coming from different video bitstreams.
BACKGROUND OF INVENTION
Figures la and lb illustrate two different application examples for the combination of regions of interest.
For instance, Figure la illustrates an example where a frame (or picture) 100 from a first video bitstream and a frame 101 from a second video bitstream are merged into a frame 102 of the resulting bitstream. Each frame is composed of four regions of interest numbered from 1 to 4. The frame 100 has been encoded using encoding parameters resulting in a high quality encoding. The frame 101 has been encoded using encoding parameters resulting in a low quality encoding. As well known, the frame encoded with a low quality is associated with a lower bitrate than the frame encoded with a high quality. The resulting frame 102 combines the regions of interest 1, 2 and 4 from the frame 101, thus encoded with a low quality, with the region of interest 3 from frame 100 encoded with a high quality. The goal of such combination is generally to get a region of interest, here the region 3, in high quality, while keeping the resulting bitrate reasonable by having regions 1, 2 and 4 encoded in low quality. Such kind of scenario may happen in particular in the context of omnidirectional content allowing a higher quality for the content actually visible while the remaining parts have a lower quality.
Figure 1 b illustrates a second example where four different videos A, B, C and D are merged to form a resulting video. A frame 103 of video A is composed of regions of interest A1, A2, A3, and A4. A frame 104 of video B is composed of regions of interest B1, B2, B3, and B4. A frame 105 of video C is composed of regions of interest C1, C2, C3, and C4. A frame 106 of video D is composed of regions of interest D1, D2, D3, and D4. The frame 107 of the resulting video is composed by regions B4, A3, C3, and D1. In this example, the resulting video is a mosaic video of different regions of interest of each original video stream. The regions of interest of the original video streams are rearranged and combined in a new location of the resulting video stream.
The compression of video relies on block-based video coding in most coding systems like HEVC, standing for High Efficiency Video Coding, or the emerging VVC, standing for Versatile Video Coding, standard. In these encoding systems, a video is composed of a sequence of frames or pictures or images or samples which may be displayed at several different times. In the case of multi layered video (for example scalable, stereo, 3D videos), several frames may be decoded to compose the resulting image to display at one instant. A frame can be also composed of different image components. For instance, for encoding the luminance, the chrominances or depth information.
The compression of a video sequence relies on several partitioning techniques for each frame. Figure 2 illustrates some partitioning in encoding systems. The frames 201 and 202 are divided in coded tree units (CTU) illustrated by the dotted lines. A CTU is the elementary unit of encoding and decoding. For example, the CTU can encode an area of 128 by 128 pixels.
A Coding Tree Unit (CTU) could also be named block, macro block, coding unit. It can encode simultaneously the different image components or it can be limited to only 20 one image component.
As illustrated by Figure 2, the frame can be partitioned according to a grid of tiles, illustrated by the thin solid lines. The tiles are frame portions, thus rectangular regions of pixels that may be defined independently of the CTU partitioning. The boundaries of tiles and the boundaries of the CTU may be different. A tile may also correspond to a sequence of CTUs, as in the represented example, meaning that the boundaries of tiles and CTUs coincide.
Tiles definition provides that tile boundaries break the spatial encoding dependencies. This means that encoding of a CTU in a tile is not based on pixel data from another tile in the frame.
Some encoding systems, like for example WC, provide the notion of tile groups.
This mechanism allows the partitioning of the frame into one or several groups of tiles. Each group of tiles is composed by one or several tiles. Two different kinds of tile groups are provided as illustrated by frames 201 and 202. A first kind of tile group is restricted to tile groups forming a rectangular area in the frame. Frame 201 illustrates the portioning of a frame into five different rectangular tile groups. A second kind of tile group is restricted to successive tiles in raster scan order. Frame 202 illustrates the partitioning of a frame into three different tile groups composed of successive tiles in raster scan order. Rectangular tile groups is a structure of choice for dealing with regions of interest in a video. A tile group can be encoded in the bitstream as one or several NAL units. A NAL unit, standing for a Network Abstraction Layer unit, is a logical unit of data for the encapsulation of data in the encoded bitstream. In the example of WC encoding system, a tile group is encoded as a single NAL unit.
In OMAF v2 ISO/IEC 23090-2, a sub-picture is a portion of a picture that represents a spatial subset of the original video content, which has been split into spatial subsets before video encoding at the content production side. A sub picture is for example one or more Tile Groups.
Figure 3 illustrates the organisation of the bitstream in the exemplary coding system VVC.
A bitstream 300 according to the VVC coding system is composed of an ordered sequence of syntax elements and coded data. The syntax element and coded data are placed into NAL unit 301-305. There are different NAL unit types. The network abstraction layer provides the ability to encapsulate the bitstream into different protocols, like RTP/IP, standing for Real Time Protocol / Internet Protocol, ISO Base Media File Format, etc. The network abstraction layer also provides a framework for packet loss resilience.
NAL units are divided into VCL NAL units and non-VCL NAL units, VCL standing for Video Coding Layer. The VCL NAL units contain the actual encoded video data. The non-VCL NAL units contain additional information. This additional information may be parameters needed for the decoding of the encoded video data or supplemental data that may enhance usability of the decoded video data. NAL units 305 correspond to tile groups and constitute the VCL NAL units of the bitstream. Different NAL units 301-304 correspond to different parameter sets, these NAL units are non-VCL NAL units. The VPS NAL unit 301, VPS standing for Video Parameter Set, contains parameters defined for the whole video, and thus the whole bitstream. The naming of VPS may change and for instance becomes DPS in VVC. The SPS NAL unit 302, SPS standing for Sequence Parameter Set, contains parameters defined for a video sequence. The PPS NAL unit 303, PPS standing for Picture Parameter Set, contains parameters defined for a picture or a group of pictures. The APS NAL unit 304, APS standing for Adaptive Loop Filter (ALF) Parameter Set, contains parameters for the ALF that are defined at the tile group level. The bitstream may also contain SEI, standing for Supplemental Enhancement Information, NAL units. The periodicity of occurrence of these parameter sets in the bitstream is variable. A VPS that is defined for the whole bitstream needs to occur only once in the bitstream. At the opposite, an APS that is defined for a tile group may occur once for each tile group in each picture. Actually, different tile groups may rely on the same APS and thus there are generally fewer APS than tile groups in each picture.
The VCL NAL units 305 contain each a tile group. A tile group may correspond to the whole picture, a single tile or a plurality of tiles. A tile group is composed of a tile group header 310 and a raw byte sequence payload, RBSP, 311 that contains the tiles.
The tile group index is the index of the tile group in the frame in raster scan order.
For example, in Figure 2, the number in a round represents the tile group index for each tile group. Tile group 203 has a tile group index of 0.
The tile group identifier is a value, meaning an integer or any bit sequence, which is associated to a tile group. Typically, the PPS contains the association for each tile group between the tile group index and the tile group identifier for one or several pictures.
For example, in Figure 2, the tile group 203 with tile group index 0 can have a tile group identifier of '345'.
The tile group address is a syntax element present in the header of the tile group NAL unit. The tile group address may refer to the tile group index, to the tile group identifier or even to the tile index. In the latter case, it will be the index of the first tile in the tile group. The semantic of the tile group address is defined by several flags present in one of the Parameters Set NAL units. In the example of tile group 203 in Figure 2, the tile group address may be the tile group index 0, the tile group identifier 345 or the tile index 0.
The tile group index, identifier and address are used to define the partitioning of the frame into tile groups. The tile group index is related with the location of the tile group in the frame. The decoder parses the tile group address in the tile group NAL unit header and uses it to locate the tile group in the frame and determine the location of the first sample in the NAL unit. When the tile group address refers to the tile group identifier, the decoder uses the association indicated by the PPS to retrieve the tile group index associated with the tile group identifier and thus determine the location of the tile group and of the first sample in the NAL unit.
The syntax of the PPS as proposed in the current version of VVC is as follows: pie parameter set rbsp( ) { Descriptor pps_pic_parameter_set_id ue(v) pps_seq_parameter_set_id ue(v) transform_skip_enabled_flag u(I) single_tile_in_pic_flag u(I) if( !single_tile_in_pic_flag) { num_tile_columns_minusl ue(v) nu m_tile_rows_minu s 1 ue(v) uniform_tile_spaeino tlag u(1) if( !uniform tile spacing flag) { for( = 0; i < numtilesolumnsminusl-i++ ) tile_column_width_minusl[ i] ue(v) for( = 0; i <num_tile_rows_minus1; i++) tile_mw_height_minus 1 [ i] ue(v) single_tile_per_tile_group_flag u(I) 11'(Isingle tile per tile_group flag) rect_tile_group_flag u(I) if( rect tile group flag && !single tile per tile group flag) { num_tile_groups_in_pic_minusl ue(v) for( i = 0; i <= num tile groups in pie minus I; i++ ) { if( 0) topieft_tile_idid i I u(v) bottom_right_tile_idx1 i I u(v) loop_filter_across_tiles_enabled_flag u(1) if( rect tile group flag) { sign alled_tile_grou p_id_tlag u(1) if( signalled tile group id flag) { signalled_tile_grou p_id_length_minu s 1 ue(v) for( i = 0; i <= nun tile groups in pic minus I; i++ ) tile_group_id] i] u(v) 1 o I I 1/ Additional syntax elements not represented rbsp trailing bits() ) The descriptor column gives the encoding of a syntax element, u(1) means that the syntax element is encoded using one bit, ue(v) means that the syntax element is encoded using unsigned integer 0-th order Exp-Golomb-coded syntax element with the left bit first that is a variable length encoding. The syntax elements num_tile_columns_minusl and num_tile_rows_minus1 respectively indicate the number of tile columns and rows in the frame. When the tile grid is not uniform (uniform_tile_spacing_flag equal 0) the syntax element tile_column_width_minusl[] and tile_row_height minus1[] specify the widths and heights of each column and rows of the tile grid.
The tile group partitioning is expressed with the following syntax elements: The syntax element single_tile_in_pic_flag states whether the frame contains a single tile. In other words, there is only one tile and one tile group in the frame when this flag is true.
single_tile_per_tile_group_flag states whether each tile group contains a single tile. In other words, all the tiles of the frame belong to a different tile group when this flag is true.
The syntax element rect_tile_group_flag indicates that tile groups of the frames form a rectangular shape as represented in the frame 201.
When present, the syntax element num_tile_groups_in_pic_minusi is equal to the number of rectangular tile groups in the frame minus one.
Syntax elements top_left_tile_idx[] and bottom_right_tile_idx[] are arrays that respectively specify the first tile (top left) tile and the last (bottom right) tile in a rectangular tile group. Theses arrays are indexed by tile group index.
The tile group identifiers are specified when the signalled_tile_group_id_flag is equal to 1. In this case, the signalled_tile_group_id_length_minusl syntax element indicates the number of bits used to code each tile group identifier value. The tile_group_id[] association table is indexed by tile group index and contains the identifier of the tile group. When the signalled_tile_group_id_flag equal to 0 the tile_group_id is indexed by tile group index and contains the tile group index of the tile group.
The tile group header comprises the tile group address according to the following syntax in the current VVC version: rile group header( ) -{ Descriptor tile_group_pic_parameter set_id ue(v) if( rect tile group flag 1 I NuniTilesloPic 1) tile_group_ad d ress u(v) if( !rect tile group _flag && !single tileper tile group flag) num_tiles_in_tile_gmup_minu s 1 ue(v)
I I
When the tile group is not rectangular, the tile group header indicates the number of tiles in the tile group NAL unit with help of num_tiles_in_tile_group_minus1 syntax element.
Each tile 320 may comprise a tile segment header 330 and a tile segment data 331. The tile segment data 331 comprises the encoded coding units 340. In current version of the VVC standard, the tile segment header is not present and tile segment data contains the coding unit data 340.
Figure 4 illustrates the process of generating a video bitstream composed of different regions of interest from one or several original bitstreams.
In a step 400, the regions to be extracted from the original bitstreams are selected. The regions may correspond for instance to a specific region of interest or a specific viewing direction in an omnidirectional content. The tile groups comprising encoded samples present in the selected set of regions are selected in the original bitstreams. At the end of this step, the identifier of each tile group in the original bitstreams, which will be merged in the resulting bitstreams, is determined. For example, the identifiers of the tile groups 1, 2, and 4 of frame 101 and of the tile group 3 of frame 100 in Figure 1 are determined.
In a step 401, a new arrangement for the selected tile groups in the resulting video is determined. This consists in determining the size and location of each selected tile group in the resulting video. For instance, the new arrangement conforms to a predetermined ROI composition. Alternatively, a user defines a new arrangement. In a step 402, the tile partitioning of the resulting video needs to be determined. When the tile partitioning of the original bitstreams are identical, the same tile partitioning is kept for the resulting video. At the end of this step, the number of rows and columns of the tile grid with the width and height of the tiles is determined and, advantageously stored in memory.
When determining the new arrangement, determined in step 401, of the tile groups in the resulting video, the location of a tile group in the video may change regarding its location in the original video. In a step 403, the new locations of the tile groups are determined. In particular, the tile group partitioning of the resulting video is determined. The location of the tile groups are determined in reference with the new tile grid as determined in step 402.
In a step 404, new parameters sets are generated for the resulting bitstream. In particular, new PPS NAL units are generated. These new PPS contains syntax elements to encode the tile grid partitioning, the tile group partitioning and positioning and the association of the tile group identifier and the tile group index. To do so, the tile group identifier is extracted from each tile group and associated with the tile group index depending of the new decoding location of the tile group. It is reminded that each tile group, in the exemplary embodiment, is identified by an identifier in the tile group header and that each tile group identifier is associated with an index corresponding to the tile group index of the tile group in the picture in raster scan order. This association is stored in a PPS NAL unit. Assuming that there is no collision in the identifiers of the tile groups, when changing the position of a tile group in a picture, and thus changing the tile group index, there is no need to change the tile groups identifiers and thus to amend the tile group structure. Only PPS NAL units need to be amended.
In a step 405, the VCL NAL unit, namely the tile groups, are extracted from the original bitstreams to be inserted in the resulting bitstream. It may happen that these VCL NAL units need to be amended. In particular, some parameters in the tile group header may not be compatible with the resulting bitstream and need to be amended. It would be advantageous to avoid this amending step, as decoding, amending and recoding the tile group header is resource consuming.
In particular, there is an issue concerning the signalling of the quantisation parameter (QP) in the bitstream. The quantisation parameter is an important parameter when encoding a coding unit as it determines the compression bitrate and thus the quality of the encoding. Encoding a coding unit using a high quantisation parameter leads to a high compression ratio, thus a low bitrate and a low quality of the compressed image. Using a low quantisation parameter leads to a low compression ratio, thus a high bitrate and a high quality of the compressed image.
Most compression systems use a variable quantisation parameter that changes from coding unit to coding unit and between frames in order to adapt to the complexity of the content of the coding unit and to the structure of the compressed video when using successive temporally predicted frames. By using a variable quantisation parameter, it is possible to obtain a uniform perceived quality of the decompressed image independently of the content of the different coding units. The global quality targeted for a video sequence is determined by a default initial value of the quantisation parameter that is fixed for the sequence. This default initial quantisation parameter value is modified at different levels by applying some modifying offsets to this default initial value. In particular, a quantisation parameter delta is defined at the level of a tile group and stored in the tile group header. When encoding the first coding unit of the tile group, the encoder uses a quantisation parameter that corresponds to the addition of the default initial quantisation value corrected by the addition of the quantisation parameter delta defined for the tile group. Then the encoding process will modify the quantisation parameter for each coding unit of the tile group based on this first value of the quantisation parameter.
When merging tile groups from different original bitstreams, each original bitstream defining its own default initial quantisation parameter, the quantisation parameter deltas encoded in each tile group needs to be adapted to the default initial quantisation parameter value defined in the resulting bitstream. This implies that the tile group headers need to be decoded, amended and re-encoded in order to fix this quantisation parameter delta issue.
SUMMARY OF INVENTION
The present invention has been devised to address one or more of the foregoing concerns. It concerns an encoding and decoding method for a bitstream that allow solving the quantisation parameter delta issue when merging tile groups from different bitstreams without amending the tile group encoded data.
According to a first aspect of the invention there is provided a method of encoding video data comprising frames into a bitstream of logical units, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: - determining a quantisation parameter information associated with a group of frame portions; encoding the group of frame portions into a first logical unit; - encoding the association between an index of the group of frame portions and the quantisation parameter information into a second logical unit.
In an embodiment: - the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to groups of frame portions.
In an embodiment: - the quantisation parameter information is encoded for each group of frame portions in the second logical unit as an associated default initial quantisation parameter.
In an embodiment: - a quantization parameter offset is associated to each group in a subset of the groups of frame portions; and the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the group and a quantization parameter offset.
In an embodiment, the second logical unit is a PPS.
According to another aspect of the invention, there is provided a method for decoding a bitstream of logical units of video data comprising frames, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: parsing a first logical unit comprising a group of frame portions to determine a quantisation parameter delta associated with the group of frame portions; parsing a second logical unit comprising the association between an index of the group of frame portions and a quantisation parameter information associated with the group of frame portions; decoding the group of frame portions comprised in the first logical unit using a quantisation parameter calculated based on the quantisation parameter delta and based on the quantisation parameter information. In an embodiment: the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to groups of frame portions.
In an embodiment: the quantisation parameter information is encoded for each group of frame portions in the second logical unit as an associated default initial quantisation parameter.
In an embodiment: a quantization parameter offset is associated to each group in a subset of the groups of frame portions; and the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the group and a quantization parameter offset.
In an embodiment, the second logical unit is a PPS.
According to another aspect of the invention, there is provided a method for merging a selection of groups of frame portions from a plurality of original bitstreams into a resulting bitstreams, bitstreams being composed of logical units comprising frames, frames being divided into tiles, tiles being grouped into groups of frame portions, the method comprising: encoding a logical unit comprising the association of an index of a group of frame portions for each selected group of frame portions with a quantisation parameter information; generating the resulting bitstream comprising the logical units comprising the groups of frame portions, the encoded logical unit comprising the association of an index of the group of frame portions with a quantisation parameter information.
In an embodiment: - the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to groups of frame portions.
In an embodiment: - the quantisation parameter information is encoded for each group of frame portions in the logical unit as an associated default initial quantisation parameter.
In an embodiment: a quantisation parameter offset is associated to each group in a subset of the groups of frame portions; and, the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the group and a quantization parameter offset.
In an embodiment, the logical unit comprising the quantisation parameter information is a PPS.
According to another aspect of the invention, there is provided a computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to the invention, when loaded into and executed by the programmable apparatus.
According to another aspect of the invention, there is provided a computer-readable storage medium storing instructions of a computer program for implementing a method according to the invention.
According to another aspect of the invention, there is provided a computer program which upon execution causes the method of the invention to be performed.
At least parts of the methods according to the invention may be computer implemented. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, microcode, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
Since the present invention can be implemented in software, the present invention can be embodied as computer-readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible, non-transitory carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid-state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.
BRIEF DESCRIPTION OF DRAWINGS
Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which: Figure la and lb illustrate two different application examples for the combination of regions of interest; Figure 2 illustrates some partitioning in encoding systems; Figure 3 illustrates the organisation of the bitstream in the exemplary coding system VVC; Figure 4 illustrates the process of generating a video bitstream composed of different regions of interest from one or several original bitstreams; Figure 5 illustrates the quantisation parameter issue when merging different tile groups from different bitstreams; Figure 6 illustrates the embodiment of the invention where a quantisation parameter offset is associated with a tile group in a non-VCL NAL unit of the bitstream; Figure 7 illustrates the main steps of an encoding process according to an embodiment of the invention; Figure 8 illustrates the main steps of a decoding process according to an embodiment of the invention; Figure 9 illustrates the extraction and merge operation of two bitstreams stored in a file to form a resulting bitstream stored in a resulting file in an embodiment of the invention; Figure 10 illustrates the main step of the extraction and merge process at file format level in an embodiment of the invention; Figure 11 is a schematic block diagram of a computing device for implementation of one or more embodiments of the invention.
DETAILED DESCRIPTION OF THE INVENTION
Figure 5 illustrates the quantisation parameter issue when merging different tile groups from different bitstreams.
In video coding, the quantisation parameter (QP) is an important parameter for determining the quality of each encoded coded unit. A higher OP creates more losses during encoding and decrease the quality of the decoded image. In H.264, the QP range is 0 to 51, while in VP9 the range is 0 to 63.
In VVC the QP value used in a block encoding can be in the range -QpBdOffsetY to +63, where QpBdOffsetY is a value depending of the bit depth of the luma component of the video. The PPS defines a default initial QP value used as reference by all the tile groups referencing the PPS. The value is named init_qp_minus26, and it defines the initial QP value decremented by 26. This value is encoded with a variable length (se(v) means signed integer 0-th order Exp-Golomb-coded). 26 is estimated as the default value for QP, thus it will have a 1-bit length when encoded. The syntax of the PPS for signalling the default initial quantisation parameter is typically as follows: pic parameter set rbsp( ) { Descriptor pps_pic_parameter_set_id ue(v) pps_seq_parameter_set_id ue(v) init_qp_minus26 se(s) Each tile group header in each picture defines another value tile_group_qp_delta which can be used to modify the default initial quantisation parameter value for all the CTU inside the tile group. Typically, this value changes in each picture. This value is encoded in variable length and thus a value closer to 0 will use a lower number of bits.
tile group header( ) Descriptor tile_group_pic_parameter set_id ue(v) tile_group_qp_delta se(v) The quantisation parameter used for encoding or decoding the first coding unit inside the tile group is the sum of 26 with init_qp_minus26 and tile_group_qp_delta. TileGroupOpY = 26 + init_qp_minus26 + tile_group_qp_delta; This design creates an issue when merging tile groups from different videos with different default initial quantisation parameter values. Figure 5 illustrates the merging of tile group #3 from video 500 with a high quality and tile group #4 from video 501 with a low quality in a new video 502 with variable quality in function of the region in the image. Because the video 500 has a high quality, its default initial quantisation parameter init_qp_minus_26 would have a low value, for example 20 (init_qp_minus26 = -6) giving a high quality. The video 501 with a low quality will have high value for its default initial quantisation parameter, for example 40 (init_qp_minus26 = 14). Tile group #3 from video 500 can then use a qp_delta of 0 (tile_group_qp_delta = 0) to have high quality and the tile group 4 from video 501 can use the same value 0 (tile_group_qp_delta = 0) to have a low quality.
When merging the two tile groups in the video 511, it is not possible to keep the same value tile_group_qp_delta for both tile groups because the quantisation parameter value used to encode the blocks inside the two tile groups must be different. It will be necessary to generate a new PPS 520 with a new default initial quantisation parameter value, using for example init_qp_minus26 = -6, and also to parse all the tile group headers and modify the value tile_group_qp_delta of one the tile group inside each picture of the video. For example, the tile group #4 511 is modified by adopting a quantisation parameter delta value of 20 (tile_group_qp_delta = 20). The first operation consisting in defining the default initial quantisation parameter value is not too complex because the non-VCL NAL unit PPS is relatively small and not frequent. Moreover it is encoded with mostly fixed length values. The second operation consisting in modifying all the tile group headers from one or several videos is very complex and time consuming because it requires reading all the video bitstreams (J-00 and J-01), decoding all the tile group headers which contains many variable length fields and rewriting the complete bitstream. Moreover, it must be noted that the value tile_group_qp_delta for at least one of the tile groups will have a value far different from 0 and thus will use a large number of bits in its variable length encoding.
It is thus desirable to have a better encoding for the tile group quantisation parameter to allow an efficient merge of several videos allowing easy resolution of quantisation parameter conflicts without requiring modification of the VCL units and in particular the tile group headers.
According to an embodiment of the invention, an optional quantisation parameter offset is associated to each tile group and stored in a non-VCL NAL unit of the bitstream. Accordingly, the tile group structure of each tile group may be kept unchanged. The actual quantisation parameter used for a given tile group will be calculated based on the default initial quantisation parameter of the bitstream, modified with the quantisation parameter offset associated to the tile group and summed with the quantisation parameter delta signalled in the tile group.
Figure 6 illustrates the embodiment of the invention where a quantisation parameter offset is associated with a tile group in a non-VCL NAL unit of the bitstream.
In this embodiment, the quantisation parameter offset is signalled in the PPS NAL unit, for example using the following syntax: pic_pararneter_set_rbsp( ) { Descriptor if( rect tile group flag) { signalled_tile_group_id_flag u(I) if( signalled_tile_group_id_flag) { signalled_tile_group_id_length_minusl ue(v) for( i = 0; i <= nunttile_groups_in_pic minusl; i++ ) tile_group_id [ i) 11(V) 1. I if( rect tile group flag) { sign alled_qp_offset _flag u(1) if (sign alled_qp_offset _flag) { for( i = (.1 i <= num tile groups in pic minus I. i++ ) tile_group_qp_offset I i I se(v) i The flag signalled_qp_offset_flag indicates that a quantisation parameter offset is provided associated with each tile group. The association table tile_group_qp_offset stores an offset encoded in signed variable length coding associated with each tile group.
With this new syntax, the decoding quantisation parameter of the first coding unit inside the tile group is the sum of 26 with init_qp_minus26, tile_group_qp_delta and file_group_qp_offset TileGroupQpY = 26 + init_qp_minus26 + tile_group_qp_offset + tile_group_qp_delta; For example, the semantics associated with the syntax element of the PPS are the following: signalled_qp_offset_flag equal to 1 specifies the presence of tile_group_qp_offset[ i] in the PPS. signalled_qp_offset_flag equal to 0 specifies the absence of the tile_group_qp_offset[ i].
tile_group_qp_offset [ i] when present, specifies the offset value that applies to the initial value of TileGroupOpy for the i-th tile group. The value of tile_group_qp_offset[ i] shall be in the range of -QpBdOffsety to+63, inclusive. QpBdOffsety is an offset computed in function of the bit depth of the luma samples. For instance QpBdOffsety is equal to 6 times the bit depth of the luma samples. In a variant, the value of tile_group_qp_offset[ i] shall be in the range of -rangeQp to rangeQp where rangeQp is an integer value. When not present, the value of tile_group_qp_offset[ i] is inferred to be equal 0 for each value of i in range of 0 to num_tile_groups_in_pic_minus1, inclusive.
The syntax elements of the tile group header remain identical but the semantics of tile_group_qp_delta are for example the following: tile_group_qp_delta specifies the initial value of QpY to be used for the coding blocks in the tile group until modified by the value of CuQpDeltaVal in the coding unit layer. The initial value of the QpY quantization parameter for the tile group, TileGroupQpY, is derived as follows: TileGroupOpY = 26 + init_qp_minus26 + tile_group_qp_offset [ tileGroupQpIdx] + tile_group_qp_delta The value of TileGroupQpY shall be in the range of -QpBdOffsetY to +63, inclusive.
The variable tileGroupQpldx which specifies the index of the tile group is derived as follows: tileGroupQpldx = 0 while( tile_group_address!= tile_group_id[ tileGroupQpldx] ) tileGroupQpldx ++ A first bitstream 600 comprises a tile group 3. The bitstream 600 contains a PPS 610 comprising a default initial quantisation parameter with the value '20' associated with a high quality encoding. The tile group 3 comprises a quantisation parameter delta with a value 0. It means that the first coding unit of tile group 3 is encoded with a quantisation parameter of 20, and that this value of quantisation parameters must be used when decoding this coding unit. A second bitstream 601 comprises a tile group 4. The bitstream 601 contains a PPS 611 comprising a default initial quantisation parameter with the value '40' associated with a low quality encoding. The tile group 4 comprises a quantisation parameter delta with a value 0. It means that the first coding unit of tile group 4 is encoded with a quantisation parameter of 40, and that this value of quantisation parameters must be used when decoding this coding unit.
After merging, the bitstream 602 comprises both the tile group 3 from bitstream 600 and the tile group 4 from bitstream 601. Both tile groups still comprise a quantisation parameter delta with the value 0. Bitstream 602 comprises a PPS 620 with a default initial quantisation parameter value of 20 identical to the one in bitstream 600. To avoid decoding the tile group 4 with a wrong quantisation parameter value of '20', the PPS 620 also comprises an association table associating each tile group with a quantisation parameter offset. The quantisation parameter offset associated with tile group 4 has a value '20'. Accordingly, when decoding the tile group 4, the quantisation parameter used for the first coding unit of tile group 4 has the right value '40' corresponding to the sum of the default initial quantisation parameter '20', the quantisation parameter offset '20' associated with tile group 4, and the quantisation parameter delta '0' in tile group 4.
It can be seen that this solution allows solving the quantisation parameter issue in the merge while keeping unchanged the tile group structure.
In an alternate embodiment, instead of a quantisation parameter offset, there is directly a different default initial quantisation parameter associated to each tile group in the PPS. It means that the PPS does not contain any more a global default initial quantisation parameter, but instead an association table associating to each tile group a default initial quantisation parameter value. These default initial quantisation parameter values may be encoded minus 26 as was done for the global default initial quantisation parameter according, for example, to the following syntax: pie_parameter set rbsp( ) { Descriptor fo = 0; i <= num tile groups i Lpie minusl; i++ ) tile_group_init_qp_minus26 [ i] se(v) i With this new syntax, the decoding quantisation parameter of the first coding unit inside the tile group is the sum of 26 with the tile_group_init_qp_minus26 associated with the tile group of index and the tile_group_qp_delta of the tile group.
TileGroupQpY = 26 + tile_group_ init_qp_minus26[i] + tile_group_qp_delta; For example, the semantics associated with the syntax element of the PPS are the following: tile_group_init_qp_minus26 [ i] plus 26 specifies the initial value of TileGroupQpy for the i-th tile group. The initial value of TileGroupQpy is modified at the tile group layer when a non-zero value of tile_group_qp_delta is decoded. The value of init_qp_minus26 shall be in the range of -( 26 + QpBdOffsety) to +37, inclusive. QpBdOffsety is an offset computed in function of the bit depth of the luma samples. For instance QpBdOffsety is equal to 6 times the bit depth of the luma samples. In a variant, the value of tile_group_ init_qp_minus26 [ i] shall be in the range of -rangeQp to rangeQp where rangeQp is an integer value.
The syntax elements of the tile group header remain identical but the semantics of tile_group_qp_delta are for example the following: tile_group_qp_delta specifies the initial value of QpY to be used for the coding blocks in the tile group until modified by the value of CuQpDeltaVal in the coding unit layer. The initial value of the QpY quantization parameter for the tile group, TileGroupQpY, is derived as follows: TileGroupQpY = 26 + tile_group_init_qp_minus26[tileGroupQpIdx] 25 tile_group_qp_delta The value of TileGroupQpY shall be in the range of -QpBdOffsetY to +63, inclusive.
The variable tileGroupQpldx which specifies the index of the tile group is derived as follows: tileGroupQpldx = 0 while( tile_group_address!= tile_group_id[ tileGroupQpldx] ) tileGroupQpldx ++ Another advantage of the new syntax in a different usage is now explained.
When encoding a video, the encoder may know that a particular region in an image of the video is more important for the viewer than the other parts of the images: this is a region of interest. There may be several regions of interest in each image. A region of interest may be at a fixed position in the image. For example, in the case of a video surveillance camera, the installer of the camera can position the regions of interest to visualize an interesting part of the scene for example the doors to enter a room. In another case it can be estimated that the center part of the image is the region of interest. It is interesting to encode the video with a higher quality in the regions of interest compared to the other parts of the image. The tile groups at the spatial position of the regions of interest will thus have a different (lower) quantisation parameter than the other tile groups. In this case, the encoder can use the list of tile_group_qp_offset in the PPS to set a different offset value to the tile groups of the regions of interest. In this way the tile_group_qp_delta value in the tile group headers will be closer to value 0 and thus will be encoded with a lower number of bits because it is encoded in variable length. As noted before the PPS is written only once for a large number of frames while the tile groups are written many time in each frames so it is useful to decrease the size the fields in the tile group header to obtain a better compression ratio of the video.
In another alternate embodiment, the tile_group_qp_offset is specified only for a subset of the tile groups of the frame using the tile group identifier. It advantageously reduces the signalling overhead when the number of non-null offset is low. For instance the syntax of the PPS is the following: pie_pararneter_set_rbsp( ) { Descriptor if( rect_tile_groupflag) { signalled_tile_group_idflag u(1) if( signalled_tile_group_id_flag) { signalled_tile_group_id_length_minusl ue(v) for( i = 0; i <= lum tile groups i i pie minus]. i++ ) tile_group_id1 i J u(v) 1. I if( rect_tile_groupflag) { signalled_qp_offset _flag u(I) if (signalled_qp_offset _flag) { num_signaled_qp_offset_minusl ue(v) for( = 0; i < num_signaled_qp_offset_minusl; i++ ) { tile_group_qp_address [ i 1 ue(v) tile_grotip_cip_offset [ i] se(v) For example, the semantics associated with the syntax element of the PPS are the following: num_signaled_qp_offset_minus1 plus 1 specifies the number of tile_group_qp_address [ i] and tile_group_qp_offset [ i] specified in the PPS; num_signaled_qp_offset_minus1 shall be in the range of 0 to num_tile_group_in_pic_minus1, inclusive.
tile_group_qp_address [ i] when present specifies the tile group address of each tile group that has a QP offset signalled in the PPS. In a variant, tile_group_qp_address[i] specifies the tile group index of each tile group that has a QP signalled in the APS. The range of tile_group_qp_address[i] shall be in range of 0 to num_tile_group_in_pic_minus1, inclusive.
tile_group_qp_offset [ i] specifies the offset value that applies to the initial value of TileGroupQpy for the tile group with tile_group_address equal to tile_group_qp_address[ i. The value of tile_group_qp_offset[ i] shall be in the range of -QpBdOffsety to+63, inclusive. QpBdOffsety is an offset computed in function of the bit depth of the luma samples. For instance, QpBdOffsety is equal to 6 times the bit depth of the luma samples. In a variant, the value of tile_group_qp_offset[ i] shall be in the range of -rangeQp to rangeQp where rangeQp is an integer value.
The variable qp0ffsetTGIdx which specifies the index of the tile group is derived as follows: for( i = 0; i <= num_tile_groups_in_pic_minus1; i++ ) { if (tile_group_id[i] == tile_group_qp_address[ i]) { qp0ffsetTGIdx[i] = tile_group_qp_offset[i] 25} else { qp0ffsetTGIdx[i] = 0 The syntax elements of the tile group header remain identical but the semantics of tile_group_qp_delta are for example the following: tile_group_qp_delta specifies the initial value of QpY to be used for the coding blocks in the tile group until modified by the value of CuQpDeltaVal in the coding unit layer. The initial value of the QpY quantization parameter for the tile group, TileGroupQpY, is derived as follows: TileGroupQpY = 26 + init_qp_minus26 + qp0ffsetTGIdx[tileGroupQpIdx] tile_group_qp_delta; The value of TileGroupQpY shall be in the range of -QpBdOffsetY to +63, inclusive.
The variable tileGroupQpldx which specifies the index of the tile group is derived as follows: tileGroupQpldx = 0 while( tile_group_address!= tile_group_id[ tileGroupQpldx] ) tileGroupQpldx ++ Figure 7 illustrates the main steps of an encoding process according to an embodiment of the invention.
The described encoding process concerns the encoding according to an embodiment of the invention of a single bitstream. The obtained encoded bitstream may be used in a merging operation as described above as an original bitstream or as the resulting bitstream.
In a step 700, a tile partitioning of the frames is determined. For instance, the encoder defines the number of columns and rows so that each region of interest of the video is covered by at least one tile. In another example, the encoder is encoding an omnidirectional video where each tile corresponds to a predetermined field of view in the video. The tile partitioning of the frame according to a tile grid is typically represented in a parameter set NAL unit, for example a PPS according to the syntax presented in reference to Figure 3.
In a step 701, a set of tile groups are defined, each tile group comprising one or more tiles. In a particular embodiment, a tile group is defined for each tile of the frame. Advantageously, in order to avoid some VCL NAL unit rewriting in merge operation, a tile group identifier is defined for each tile group in the bitstream. The tile group identifiers are determined in order to be unique for the tile group. The unicity of the tile group identifiers may be defined at the level of a set of bitstreams comprising the bitstream currently encoded.
The number of bits used to encode the tile group identifier, corresponding to the length of the tile group identifier, is determined as a function of the number of tile groups in the encoded bitstream or as a function of a number of tile groups in a set of different bitstreams comprising the bitstream being currently encoded.
The length of the tile group identifier and the association of each tile group with an identifier is specifically specified in parameter set NAL unit as the PPS.
In a step 702, each tile group is associated to a quantisation parameter information. This quantisation parameter information is then encoded into a parameter set. Typically, the parameter set is a PPS. According to embodiments, the quantisation parameter information may be encoded as a quantisation parameter offset based on a default initial quantisation parameter and taking into account the quantisation parameter delta encoded into the tile group header. Alternatively, the quantisation parameter information is encoded in the parameter set as a dedicated default initial quantisation parameter associated to the tile group. For example, the parameter set can take the form of the PPS described in reference to Figure 6.
In a step 703, the samples of each tile group are encoded according to the parameters defined in the different parameter sets. In particular, the encoding will be based on the quantisation parameter information associated with the tile group. A complete bitstream is generated comprising both the non-VCL NAL units corresponding to the different parameter sets and the VCL NAL units corresponding to the encoded data of the different tile groups.
Figure 8 illustrates the main steps of a decoding process according to an embodiment of the invention.
In a step 800, the decoder parses the bitstream in order to determine the tile portioning of the frames. This information is obtained from a parameter set, typically from the PPS NAL unit. The syntax elements of the PPS are parsed and decoded to determine the grid of tiles.
In a step 801, the decoder determines the tile group partitioning of the frame and in particular obtain the number of tile group associated with an identification information of each tile group. This information is valid for at least one frame, but stay valid generally for many frames. It may take the form of the tile group identifier that may be obtained from a parameter set as the PPS NAL unit as described in Figure 6.
In a step 802, the decoder parses the bitstream to determine the quantisation parameter information that is associated with each tile group. This is typically done by extracting a quantisation parameter delta from the tile group header and by combining this information with a quantisation parameter information associated with the tile group in a parameter set, typically in a PPS. Based on both the quantisation parameter delta and the quantisation parameter information associated with the tile group, an actual quantisation parameter is determined that allows the decoding of the tile group. According to embodiments, the quantisation parameter information associated to the tile group may be a quantisation parameter offset based on a default initial quantisation parameter. Alternatively, the quantisation parameter information associated to the tile group is a dedicated default initial quantisation parameter.
In a step 803, the decoder decodes the VCL NAL units corresponding to the tile groups according to the parameters determined in the previous steps. In particular, the decoding uses the quantisation parameter associated with the tile group.
Figure 9 illustrates the extraction and merge operation of two bitstreams stored in a file to form a resulting bitstream stored in a resulting file in an embodiment of the invention.
Figure 10 illustrates the main step of the extraction and merge process at file format level in an embodiment of the invention.
Figure 9 illustrates the merge of two ISO BMFF streams 900 and 901 resulting in a new ISO BMFF bitstream 902 according to the method of Figure 10.
The encapsulation of the VVC streams consists in this embodiment in defining one tile track for each tile group of the stream and one tile base track for the NAL units common to the tile groups. For example, the stream 900 contains two tile groups one with the identifier '1.1' and another one with identifier '1.2'. The samples corresponding to each tile group '1.1' and '1.2' are described respectively in one tile track similarly to tile tracks of in ISO/IEC 14496-15. While initially designed for HEVC, the VVC tile groups could be encapsulated in tile tracks. This VVC tile track could be differentiated from HEVC tile track by defining a new sample entry for instance vvt1' instead of thvtl.
Similarly, a tile base track for HEVC is extended to support VVC format. This VVC tile base track could be differentiated from HEVC tile base track by defining a different sample entries. The VVC tile base track describes NAL units common to the two tile groups. Typically, it contains mainly non-VCL NAL unit such as the Parameter Sets and the SEI NAL unit.
First, the merging method consists in determining in step 1000 the set of tile tracks from the two streams to be merged in a single bitstream. For instance, it corresponds to the tile tracks of the tile group with the identifier '2.1' of 901 file and of the tile group with identifier '1.2' of the file 900.
Then the method generates in a step 1001 new Parameter Sets NAL units (i.e. PPS) to describe the new decoding locations of the tile groups in the resulting stream accordingly to the embodiments described above. Since all the modifications consists in modifying only the non VCL NAL units, it is equivalent to generating in a step 1003 a new Tile Base Track. The samples of the original tile tracks corresponding to the extracted tile groups remains identical. The tile tracks of the file 902 reference the tile base tracks with a track reference type set to 'tbas'. The tile base track references as well the tile tracks with a track reference type set to 'that'.
The advantage of this method is that combining two streams consists mainly in generating a new tile base track, update the track reference boxes and copying as is the tile tracks samples corresponding to the selected tile groups. The processing is simplified since rewriting process of the tile tracks samples is avoided compared to prior art.
Figure 11 is a schematic block diagram of a computing device 1100 for implementation of one or more embodiments of the invention. The computing device 1100 may be a device such as a microcomputer, a workstation or a light portable device.
The computing device 1100 comprises a communication bus connected to: -a central processing unit 1101, such as a microprocessor, denoted CPU; -a random access memory 1102, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method according to embodiments of the invention, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port, for example; -a read only memory 1103, denoted ROM, for storing computer programs for implementing embodiments of the invention; -a network interface 1104 is typically connected to a communication network over which digital data to be processed are transmitted or received. The network interface 1104 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data packets are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 1101; -a user interface 1105 may be used for receiving inputs from a user or to display information to a user; -a hard disk 1106 denoted HD may be provided as a mass storage device; -an I/O module 1107 may be used for receiving/sending data from/to external devices such as a video source or display.
The executable code may be stored either in read only memory 1103, on the hard disk 1106 or on a removable digital medium such as for example a disk. According to a variant, the executable code of the programs can be received by means of a communication network, via the network interface 1104, in order to be stored in one of the storage means of the communication device 1100, such as the hard disk 1106, before being executed.
The central processing unit 1101 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 1101 is capable of executing instructions from main RAM memory 1102 relating to a software application after those instructions have been loaded from the program ROM 1103 or the hard disk (HD) 1106 for example.
Such a software application, when executed by the CPU 1101, causes the steps of the flowcharts of the invention to be performed.
Any step of the algorithms of the invention may be implemented in software by execution of a set of instructions or program by a programmable computing machine, such as a PC ("Personal Computer"), a DSP ("Digital Signal Processor") or a microcontroller; or else implemented in hardware by a machine or a dedicated component, such as an FPGA ("Field-Programmable Gate Array") or an ASIC ("Application-Specific Integrated Circuit').
Although the present invention has been described herein above with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a skilled person in the art which lie within the scope of the present invention.
Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.
Each of the embodiments of the invention described above can be implemented solely or as a combination of a plurality of the embodiments. Also, features from different 5 embodiments can be combined where necessary or where the combination of elements or features from individual embodiments in a single embodiment is beneficial.
Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
For example, a Tile Group could also be a slice, a tile set, a motion constrained tile set (MCTS), a region of interest or a sub picture.
The information coded in the Picture Parameter Set PPS could also be encoded in other non VCL units like a Video Parameter Set VPS, Sequence Parameter Set SPS or the DPS or new units like Layer Parameter Set, or Tile Group Parameter Set. These units define parameters valid for several frames and thus there are at a higher hierarchical level than the tile group units or the APS units in the video bitstream. The tile group units are valid only inside one frame. The APS units can be valid for some frames but their usage changes rapidly from one frame to another. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.
Claims (20)
- CLAIMS1. A method of encoding video data comprising frames into a bitstream of logical units, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: determining a quantisation parameter information associated with a group of frame portions; encoding the group of frame portions into a first logical unit; encoding the association between an index of the group of frame portions and the quantisation parameter information into a second logical unit.
- 2. The method of claim 1, wherein: the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to groups of frame portions.
- 3. The method of claim 1, wherein: -the quantisation parameter information is encoded for each group of frame portions in the second logical unit as an associated default initial quantisation parameter.
- 4. The method of claim 1, wherein: a quantization parameter offset is associated to each group in a subset of the groups of frame portions; and the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the group and a quantization parameter offset.
- 5. The method of any claims 1 to 4, wherein the second logical unit is a PPS.
- 6. A method for decoding a bitstream of logical units of video data comprising frames, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: parsing a first logical unit comprising a group of frame portions to determine a quantisation parameter delta associated with the group of frame portions; parsing a second logical unit comprising the association between an index of the group of frame portions and a quantisation parameter information associated with the group of frame portions; decoding the group of frame portions comprised in the first logical unit using a quantisation parameter calculated based on the quantisation parameter delta and based on the quantisation parameter information.
- 7. The method of claim 6, wherein: the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to groups of frame portions.
- 8. The method of claim 6, wherein: the quantisation parameter information is encoded for each group of frame portions in the second logical unit as an associated default initial quantisation parameter.
- 9. The method of claim 6, wherein: a quantization parameter offset is associated to each group in a subset of the groups of frame portions; and the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the group and a quantization parameter offset.
- 10. The method of any one claim 6 to 9, wherein the second logical unit is a PPS.
- 11. A method for merging a selection of groups of frame portions from a plurality of original bitstreams into a resulting bitstreams, bitstreams being composed of logical units comprising frames, frames being divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: - encoding a logical unit comprising the association of an index of a group of frame portions for each selected group of frame portions with a quantisation parameter information; generating the resulting bitstream comprising the logical units comprising the groups of frame portions, the encoded logical unit comprising the association of an index of the group of frame portions with a quantisation parameter information.
- 12. The method of claim 11, wherein: - the quantisation parameter information is encoded as a quantisation parameter offset determined based on a global default initial quantisation parameter, the global default initial quantisation parameter being associated to groups of frame portions.
- 13. The method of claim 11, wherein: the quantisation parameter information is encoded for each group of frame portions in the logical unit as an associated default initial quantisation parameter.
- 14. The method of claim 11, wherein: a quantisation parameter offset is associated to each group in a subset of the groups of frame portions; and, the quantisation parameter information is determined based on a global default initial quantisation parameter, and based on the existence of an association between the index of the group and a quantization parameter offset.
- 15. A method of generating a file comprising a bitstream of logical units of encoded video data comprising frames, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: - encoding the bitstream according to any one of claim 1 to 5; generating a first track comprising the logical units containing the association between the indexes of the groups of frame portions and the quantisation parameter information; generating for a group of frame portions, a track containing the logical unit containing the group of frame portions; and, generating the file comprising the generated tracks.
- 16. A bitstream of logical units, the bitstream comprising encoded video data comprising frames, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the bitstream comprising: a first logical unit comprising a group of frame portions; and a second logical unit comprising the association between an index of the group of frame portions and a quantization information.
- 17. The method of any one claim 11 to 14, wherein the logical unit comprising the quantisation parameter information is a PPS.
- 18. A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to any one of claims 1 to 15, when loaded into and executed by the programmable apparatus.
- 19. A computer-readable storage medium storing instructions of a computer program for implementing a method according to any one of claims 1 to 15.
- 20. A computer program which upon execution causes the method of any one of claims 1 to 15 to be performed.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1918656.8A GB2581869B (en) | 2019-03-01 | 2019-12-17 | Method and apparatus for encoding and decoding a video bitstream for merging regions of interest |
PCT/EP2020/055184 WO2020178144A1 (en) | 2019-03-01 | 2020-02-27 | Method and apparatus for encoding and decoding a video bitstream for merging regions of interest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GBGB1902831.5A GB201902831D0 (en) | 2019-03-01 | 2019-03-01 | Method and apparatus for encoding and decoding a video bitstream for merging regions of interest |
Publications (2)
Publication Number | Publication Date |
---|---|
GB201903383D0 GB201903383D0 (en) | 2019-04-24 |
GB2581853A true GB2581853A (en) | 2020-09-02 |
Family
ID=66377309
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GBGB1902831.5A Ceased GB201902831D0 (en) | 2019-03-01 | 2019-03-01 | Method and apparatus for encoding and decoding a video bitstream for merging regions of interest |
GB1903383.6A Withdrawn GB2581853A (en) | 2019-03-01 | 2019-03-12 | Method and apparatus for encoding and decoding a video bitstream for merging regions of interest |
GB1918656.8A Expired - Fee Related GB2581869B (en) | 2019-03-01 | 2019-12-17 | Method and apparatus for encoding and decoding a video bitstream for merging regions of interest |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GBGB1902831.5A Ceased GB201902831D0 (en) | 2019-03-01 | 2019-03-01 | Method and apparatus for encoding and decoding a video bitstream for merging regions of interest |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB1918656.8A Expired - Fee Related GB2581869B (en) | 2019-03-01 | 2019-12-17 | Method and apparatus for encoding and decoding a video bitstream for merging regions of interest |
Country Status (2)
Country | Link |
---|---|
GB (3) | GB201902831D0 (en) |
WO (1) | WO2020178144A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11303897B2 (en) * | 2020-02-25 | 2022-04-12 | Tencent America LLC | Method and apparatus for signaling of chroma quantization parameters |
CN115314722B (en) * | 2022-06-17 | 2023-12-08 | 百果园技术(新加坡)有限公司 | Video code rate distribution method, system, equipment and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019009776A1 (en) * | 2017-07-05 | 2019-01-10 | Telefonaktiebolaget Lm Ericsson (Publ) | Decoding a block of video samples |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8737464B1 (en) * | 2011-07-21 | 2014-05-27 | Cisco Technology, Inc. | Adaptive quantization for perceptual video coding |
CN104170383A (en) * | 2012-03-15 | 2014-11-26 | 索尼公司 | Image processing device and method |
US9414054B2 (en) * | 2012-07-02 | 2016-08-09 | Microsoft Technology Licensing, Llc | Control and use of chroma quantization parameter values |
EP3861754A1 (en) * | 2018-10-02 | 2021-08-11 | Telefonaktiebolaget LM Ericsson (publ) | Picture tile attributes signaled using loop(s) over tiles |
-
2019
- 2019-03-01 GB GBGB1902831.5A patent/GB201902831D0/en not_active Ceased
- 2019-03-12 GB GB1903383.6A patent/GB2581853A/en not_active Withdrawn
- 2019-12-17 GB GB1918656.8A patent/GB2581869B/en not_active Expired - Fee Related
-
2020
- 2020-02-27 WO PCT/EP2020/055184 patent/WO2020178144A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019009776A1 (en) * | 2017-07-05 | 2019-01-10 | Telefonaktiebolaget Lm Ericsson (Publ) | Decoding a block of video samples |
Also Published As
Publication number | Publication date |
---|---|
GB2581869A (en) | 2020-09-02 |
WO2020178144A1 (en) | 2020-09-10 |
GB201918656D0 (en) | 2020-01-29 |
GB201903383D0 (en) | 2019-04-24 |
GB201902831D0 (en) | 2019-04-17 |
GB2581869B (en) | 2023-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220217355A1 (en) | Method and apparatus for encoding and decoding a video bitstream for merging regions of interest | |
US12113971B2 (en) | Method and apparatus for encoding and decoding a video stream with subpictures | |
TWI680673B (en) | Method and apparatus for video image coding and decoding | |
GB2581852A (en) | Method and apparatus for encoding and decoding a video bitstream for merging regions of interest | |
TWI575936B (en) | Video parameter set for hevc and extensions | |
CA2805900C (en) | Image signal decoding apparatus, image signal decoding method, image signal encoding apparatus, image signal encoding method, and program | |
KR20210146439A (en) | Concept for picture/video data streams allowing efficient reducibility or efficient random access | |
US11589047B2 (en) | Video encoding and decoding methods and apparatus | |
US20230060709A1 (en) | Video coding supporting subpictures, slices and tiles | |
US20210092359A1 (en) | Method, device, and computer program for coding and decoding a picture | |
CN111989928A (en) | Method and apparatus for encoding or decoding video data having frame portion | |
US20240236328A1 (en) | Video coding in relation to subpictures | |
GB2584723A (en) | Method, device, and computer program for coding and decoding a picture | |
WO2020178144A1 (en) | Method and apparatus for encoding and decoding a video bitstream for merging regions of interest | |
CN113519162B (en) | Parameter set signaling in digital video | |
TW202310626A (en) | Independent subpicture film grain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WAP | Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1) |