GB2581852A

GB2581852A - Method and apparatus for encoding and decoding a video bitstream for merging regions of interest

Info

Publication number: GB2581852A
Application number: GB1903379.4A
Authority: GB
Inventors: Ouedraogo Naël; Nassor Eric; Taquet Jonathan; Kergourlay Gérald
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-03-01
Filing date: 2019-03-12
Publication date: 2020-09-02
Also published as: WO2020178065A1; GB201904461D0; GB201918658D0; GB2582206B; GB2581855A; GB201903379D0; GB2582206A; GB201902829D0; GB202000479D0

Abstract

Encoding and decoding video data comprising frames into a bitstream of logical units, where the frames are spatially divided into frame portions, and the frame portions are grouped. A filter parameter set applying to samples of a group of frame portions is determined, a first identification information of the determined filter parameter set is determined, and a second identification information associated with an index of the group of frame portions is determined. A filter parameter set identifier is determined based on the first and second identification information. The bitstream has a first, a second, and an additional logical unit. The first logical unit is a group of frame portions. The second logical unit has the filter parameter set for the group of frame portions and the filter parameter set identifier associated with the index of the group of frame portions. The additional logical unit has the association between the index of the group of frame portions and the second identification information. The present invention solves the APS identifier collision problem when merging tile groups from different bitstreams without amending the tile group structure by introducing second identification information associated to each tile group in a parameter set of the bitstream.

Description

METHOD AND APPARATUS FOR ENCODING AND DECODING A VIDEO BITSTREAM FOR MERGING REGIONS OF INTEREST

FIELD OF THE INVENTION

The present disclosure concerns a method and a device for encoding and decoding a video bitstream that facilitates the merge of regions of interest. It concerns more particularly the encoding and decoding of a video bitstream resulting of the merging of regions coming from different video bitstreams. In addition, it is proposed a corresponding method of generating such bitstream resulting from the merge of different regions coming from different video bitstreams.

BACKGROUND OF INVENTION

Figures la and lb illustrate two different application examples for the combination of regions of interest.

For instance, Figure la illustrates an example where a frame (or picture) 100 from a first video bitstream and a frame 101 from a second video bitstream are merged into a frame 102 of the resulting bitstream. Each frame is composed of four regions of interest numbered from 1 to 4. The frame 100 has been encoded using encoding parameters resulting in a high quality encoding. The frame 101 has been encoded using encoding parameters resulting in a low quality encoding. As well known, the frame encoded with a low quality is associated with a lower bitrate than the frame encoded with a high quality. The resulting frame 102 combines the regions of interest 1, 2 and 4 from the frame 101, thus encoded with a low quality, with the region of interest 3 from frame 100 encoded with a high quality. The goal of such combination is generally to get a region of interest, here the region 3, in high quality, while keeping the resulting bitrate reasonable by having regions 1, 2 and 4 encoded in low quality. Such kind of scenario may happen in particular in the context of omnidirectional content allowing a higher quality for the content actually visible while the remaining parts have a lower quality.

Figure 1 b illustrates a second example where four different videos A, B, C and D are merged to form a resulting video. A frame 103 of video A is composed of regions of interest Al, A2, A3, and A4. A frame 104 of video B is composed of regions of interest B1, B2, B3, and B4. A frame 105 of video C is composed of regions of interest C1, C2, C3, and C4. A frame 106 of video D is composed of regions of interest D1, D2, D3, and D4. The frame 107 of the resulting video is composed by regions B4, A3, C3, and Dl. In this example, the resulting video is a mosaic video of different regions of interest of each original video stream. The regions of interest of the original video streams are rearranged and combined in a new location of the resulting video stream.

The compression of video relies on block-based video coding in most coding systems like HEVC, standing for High Efficiency Video Coding, or the emerging VVC, standing for Versatile Video Coding, standard. In these encoding systems, a video is composed of a sequence of frames or pictures or images or samples which may be displayed at several different times. In the case of multi layered video (for example scalable, stereo, 3D videos), several frames may be decoded to compose the resulting image to display at one instant. A frame can be also composed of different image components. For instance, for encoding the luminance, the chrominances or depth information.

The compression of a video sequence relies on several partitioning techniques for each frame. Figure 2 illustrates some partitioning in encoding systems. The frames 201 and 202 are divided in coded tree units (CTU) illustrated by the dotted lines. A CTU is the elementary unit of encoding and decoding. For example, the CTU can encode an area of 128 by 128 pixels.

A Coding Tree Unit (CTU) could also be named block, macro block, coding unit. It can encode simultaneously the different image components or it can be limited to only 20 one image component.

As illustrated by Figure 2, the frame can be partitioned according to a grid of tiles, illustrated by the thin solid lines. The tiles are frame portions, thus rectangular regions of pixels that may be defined independently of the CTU partitioning. The boundaries of tiles and the boundaries of the CTU may be different. A tile may also correspond to a sequence of CTUs, as in the represented example, meaning that the boundaries of tiles and CTUs coincide.

Tiles definition provides that tile boundaries break the spatial encoding dependencies. This means that encoding of a CTU in a tile is not based on pixel data from another tile in the frame.

Some encoding systems, like for example WC, provide the notion of tile groups.

This mechanism allows the partitioning of the frame into one or several groups of tiles. Each group of tiles is composed by one or several tiles. Two different kinds of tile groups are provided as illustrated by frames 201 and 202. A first kind of tile group is restricted to tile groups forming a rectangular area in the frame. Frame 201 illustrates the portioning of a frame into five different rectangular tile groups. A second kind of tile group is restricted to successive tiles in raster scan order. Frame 202 illustrates the partitioning of a frame into three different tile groups composed of successive tiles in raster scan order. Rectangular tile groups is a structure of choice for dealing with regions of interest in a video. A tile group can be encoded in the bitstream as one or several NAL units. A NAL unit, standing for a Network Abstraction Layer unit, is a logical unit of data for the encapsulation of data in the encoded bitstream. In the example of WC encoding system, a tile group is encoded as a single NAL unit. When a tile group is encoded in the bistream as several NAL units, each NAL unit of the Tile Group is a Tile Group Segment. A tile group segment includes a tile group segment header that contains the coding parameters of the tile group segment. The header of the first segment NAL unit of the tile group contains all the coding parameters of the tile group. The tile group segment header of the subsequent NAL unit of the tile group may contains less parameters than the first NAL units. In such a case, the first tile group segment is an independent tile group segment and the subsequent segments are dependent tile group segments.

In OMAF v2 ISO/IEC 23090-2, a sub-picture is a portion of a picture that represents a spatial subset of the original video content, which has been split into spatial subsets before video encoding at the content production side. A sub picture is for example one or more Tile Groups.

Figure 3 illustrates the organisation of the bitstream in the exemplary coding 20 system VVC.

A bitstream 300 according to the VVC coding system is composed of an ordered sequence of syntax elements and coded data. The syntax element and coded data are placed into NAL unit 301-305. There are different NAL unit types. The network abstraction layer provides the ability to encapsulate the bitstream into different protocols, like RTP/IP, standing for Real Time Protocol / Internet Protocol, ISO Base Media File Format, etc. The network abstraction layer also provides a framework for packet loss resilience.

NAL units are divided into VCL NAL units and non-VCL NAL units, VCL standing for Video Coding Layer. The VCL NAL units contain the actual encoded video data. The non-VCL NAL units contain additional information. This additional information may be parameters needed for the decoding of the encoded video data or supplemental data that may enhance usability of the decoded video data. NAL units 305 correspond to tile groups and constitute the VCL NAL units of the bitstream. Different NAL units 301-304 correspond to different parameter sets, these NAL units are non-VCL NAL units. The VPS NAL unit 301, VPS standing for Video Parameter Set, contains parameters defined for the whole video, and thus the whole bitstream. The naming of VPS may change and for instance becomes DPS in VVC. The SPS NAL unit 302, SPS standing for Sequence Parameter Set, contains parameters defined for a video sequence. The PPS NAL unit 303, PPS standing for Picture Parameter Set, contains parameters defined for a picture or a group of pictures. The APS NAL unit 304, APS standing for Adaptive Loop Filter (ALF) Parameter Set, contains parameters for the ALF that are defined at the tile group level. The bitstream may also contain SEI, standing for Supplemental Enhancement Information, NAL units. The periodicity of occurrence of these parameter sets in the bitstream is variable. A VPS that is defined for the whole bitstream needs to occur only once in the bitstream. At the opposite, an APS that is defined for a tile group may occur once for each tile group in each picture. Actually, different tile groups may rely on the same APS and thus there are generally fewer APS than tile groups in each picture. The VCL NAL units 305 contain each a tile group. A tile group may correspond to the whole picture, a single tile or a plurality of tiles. A tile group is composed of a tile group header 310 and a raw byte sequence payload, RBSP, 311 that contains the tiles.

The tile group index is the index of the tile group in the frame in raster scan order. For example, in Figure 2, the number in a round represents the tile group index for each tile group. Tile group 203 has a tile group index of 0.

The tile group identifier is a value, meaning an integer or any bit sequence, which is associated to a tile group. Typically, the PPS contains the association for each tile group between the tile group index and the tile group identifier for one or several pictures. For example, in Figure 2, the tile group 203 with tile group index 0 can have a tile group identifier of '345'.

The tile group address is a syntax element present in the header of the tile group NAL unit. The tile group address may refer to the tile group index, to the tile group identifier or even to the tile index. In the latter case, it will be the index of the first tile in the tile group. The semantic of the tile group address is defined by several flags present in one of the Parameters Set NAL units. In the example of tile group 203 in Figure 2, the tile group address may be the tile group index 0, the tile group identifier 345 or the tile index 0.

The tile group index, identifier and address are used to define the partitioning of the frame into tile groups. The tile group index is related with the location of the tile group in the frame. The decoder parses the tile group address in the tile group NAL unit header and uses it to locate the tile group in the frame and determine the location of the first sample in the NAL unit. When the tile group address refers to the tile group identifier, the decoder uses the association indicated by the PPS to retrieve the tile group index associated with the tile group identifier and thus determine the location of the tile group and of the first sample in the NAL unit.

The syntax of the PPS as proposed in the current version of VVC is as follows: pic_parameter set rbsp( ) 1 Descriptor pps_pic_parameter_set_id ue(v) pp s_seq_parameter_set_id ue(v) transform_ship_enablcdflag u(1) single_tile_in_picilag u(1) if( !single_t le_in_pic_flag) { num_tilc_columns_minusl ue(v) num_tile_rows_minusi ue(v) uniform_tile_spacin 0 flag u(1) if( !uniform tile spacing flag) { for( i = 0; < numtile_columnsminust; i++ ) tile_column_width_minusl[ i] ue(v) for( i = 0; i < num tile rows minus I; i++ ) tile_row_height_minusl I i I ue(v) t single tile per tile gmup nag u(1) if(! single tile per tile group flag) rect_tile_grouptlag u(1) if( rect tile group flag && !single tile per tile group flag) 11 num_tile_groups_in_pic_minus1 ue(v) for( i = 0; i <= nun the groups in_pic ininus I; i++ ) { if( 0) topieft_tile_idx! i I u(v) bottom_right_tile_idx1 i] u(v) loop_filter_across_tiles_enabled_flag u(1) if( rect tile group flag) { signalled_tile_group_id_flag u(1) if( signalled_tile_group_id_flag) { signalled tile group id length minusl ue(v) for( i = O. i <= num tile groups in pic minus1 i++ ) tile_group_id[ i] u(v) [ 11/ Additional syntax elements not represented rbsp_trailing_bits( ) The descriptor column gives the encoding of a syntax element, u(1) means that the syntax element is encoded using one bit, ue(v) means that the syntax element is encoded using unsigned integer 0-th order Exp-Golomb-coded syntax element with the left bit first that is a variable length encoding. The syntax elements num_tile_columns_minusl and num_tile_rows_minus1 respectively indicate the number of tile columns and rows in the frame. When the tile grid is not uniform (uniform_tile_spacing_flag equal 0) the syntax element tile_column_width_minusl[] and tile_row_height minus1[] specify the widths and heights of each column and rows of the tile grid.

The tile group partitioning is expressed with the following syntax elements: The syntax element single_tile_in_pic_flag states whether the frame contains a single tile. In other words, there is only one tile and one tile group in the frame when this flag is true.

single_tile_per_tile_group_flag states whether each tile group contains a single tile. In other words, all the tiles of the frame belong to a different tile group when this flag is true.

The syntax element rect_tile_group_flag indicates that tile groups of the frames form a rectangular shape as represented in the frame 201.

When present, the syntax element num_file_groups_in_pic_minus1 is equal to the number of rectangular tile groups in the frame minus one.

Syntax elements top_left_tile_idx[] and bottom_right_tile_idx[] are arrays that respectively specify the first tile (top left) tile and the last (bottom right) tile in a rectangular tile group. Theses arrays are indexed by tile group index.

The tile group identifiers are specified when the signalled_tile_group_id_flag is equal to 1. In this case, the signalled_tile_group_id_length_minusl syntax element indicates the number of bits used to code each tile group identifier value. The tile_group_idn association table is indexed by tile group index and contains the identifier of the tile group. When the signalled_tile_group_id_flag equal to 0 the tile_group_id is indexed by tile group index and contains the tile group index of the tile group.

The tile group header comprises the tile group address according to the following syntax in the current VVC version: rile group header( ) { Descriptor tile_group_pic_parameter set_id ue(v) if( rect tile group flag 1 I NuniTilesloPic > 1) tile_group_address u(v) if( !rect_tile_group_flag && !singletile_per_tile_group_flag) num_tiletin_tile_group_minusl ue(v) [...] When the tile group is not rectangular, the tile group header indicates the number of tiles in the tile group NAL unit with help of num_tiles_in_tile_group_minus1 syntax element.

Each tile 320 may comprise a tile segment header 330 and a tile segment data 331. The tile segment data 331 comprises the encoded coding units 340. In current version of the VVC standard, the tile segment header is not present and tile segment data contains the coding unit data 340.

Figure 4 illustrates the process of generating a video bitstream composed of different regions of interest from one or several original bitstreams.

In a step 400, the regions to be extracted from the original bitstreams are selected. The regions may correspond for instance to a specific region of interest or a specific viewing direction in an omnidirectional content. The tile groups comprising encoded samples present in the selected set of regions are selected in the original bitstreams. At the end of this step, the identifier of each tile group in the original bitstreams, which will be merged in the resulting bitstreams, is determined. For example, the identifiers of the tile groups 1, 2, and 4 of frame 101 and of the tile group 3 of frame 100 in Figure 1 are determined.

In a step 401, a new arrangement for the selected tile groups in the resulting video is determined. This consists in determining the size and location of each selected tile group in the resulting video. For instance, the new arrangement conforms to a predetermined ROI composition. Alternatively, a user defines a new arrangement. In a step 402, the tile partitioning of the resulting video needs to be determined. When the tile partitioning of the original bitstreams are identical, the same tile partitioning is kept for the resulting video. At the end of this step, the number of rows and columns of the tile grid with the width and height of the tiles is determined and, advantageously stored in memory.

When determining the new arrangement, determined in step 401, of the tile groups in the resulting video, the location of a tile group in the video may change regarding its location in the original video. In a step 403, the new locations of the tile groups are determined. In particular, the tile group partitioning of the resulting video is determined. The location of the tile groups are determined in reference with the new tile grid as determined in step 402.

In a step 404, new parameters sets are generated for the resulting bitstream. In particular, new PPS NAL units are generated. These new PPS contains syntax elements to encode the tile grid partitioning, the tile group partitioning and positioning and the association of the tile group identifier and the tile group index. To do so, the tile group identifier is extracted from each tile group and associated with the tile group index depending of the new decoding location of the tile group. It is reminded that each tile group, in the exemplary embodiment, is identified by an identifier in the tile group header and that each tile group identifier is associated with an index corresponding to the tile group index of the tile group in the picture in raster scan order. This association is stored in a PPS NAL unit. Assuming that there is no collision in the identifiers of the tile groups, when changing the position of a tile group in a picture, and thus changing the tile group index, there is no need to change the tile groups identifiers and thus to amend the tile group structure. Only PPS NAL units need to be amended.

In a step 405, the VCL NAL unit, namely the tile groups, are extracted from the original bitstreams to be inserted in the resulting bitstream. It may happen that these VCL NAL units need to be amended. In particular, some parameters in the tile group header may not be compatible with the resulting bitstream and need to be amended. It would be advantageous to avoid this amending step, as decoding, amending and recoding the tile group header is resource consuming.

In particular, APS NAL units may generate a need to amend tile group headers. It is reminded that APS stores the parameters needed for the adaptive loop filtering of the picture. Each APS comprises an identifier to identify this APS. Each tile group header comprises a flag that indicates if adaptive loop filtering is to be applied, and if this flag is true, the identifier of the APS containing the parameters to be used for adaptive loop filtering is stored in the tile header. In the current version of the standard, the APS identifier can take 32 values. Due to the low number of possible values, when merging tile groups from different bitstreams, there is a high risk of collision between these identifiers. Solving these collisions implies to change some APS identifiers and thus to amend the APS identifier in some tile group headers.

SUMMARY OF INVENTION

The present invention has been devised to address one or more of the foregoing concerns. It concerns an encoding and decoding method for a bitstream that allows solving APS identifier collision when merging tile groups from different bitstreams without amending the tile group encoded data.

According to a first aspect of the invention, there is provided a method of encoding video data comprising frames into a bitstream of logical units, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: determining a filter parameter set applying to samples of a group of frame portions; determining a first identification information of the determined filter parameter set; determining a second identification information associated with an index of the group of frame portions; encoding the group of frame portions into a first logical unit comprising the first identification information; encoding the filter parameter set into a second logical unit comprising a filter parameter set identifier determined based on the first identification information and on the second identification information; and, encoding the association between the index of the group of frame portions and the second identification information into a logical unit.

In an embodiment: the second identification information is an extension identifier; and, the filter parameter set identifier comprises the first identification information and the extension identifier.

In an embodiment: the second identification information is an offset; and, the filter parameter set identifier is the addition of the first identification information and of the offset.

In an embodiment: -the second identification information is an index of a filter parameter set. In an embodiment, the association between the index of the group of frame portions and the second identification information is encoded into a third logical unit.

In an embodiment, the second and third logical units are parameter set logical units applying at different levels of the bitstream.

In an embodiment, the association between the index of the group of frame portions and the second identification information is encoded into the second logical unit.

In an embodiment, a plurality of filter parameter sets are determined, the method further comprising: encoding the plurality of filter parameter sets into the second logical unit, each filter parameter set being associated with an index, the second logical unit comprising for each group of frame portions, the association of an index of the group of frame portions and the index of a filter parameter set.

According to another aspect of the invention, there is provided a method for decoding a bitstream of logical units of video data comprising frames, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: parsing a first logical unit comprising a group of frame portions to determine a first identification information of a filter parameters set applying to samples of the group of frame portions; parsing a second logical unit comprising the association between an index of the group of frame portions and a second identification information; determining a filter parameter set identifier based on the first identification information and the second identification information; decoding a logical unit comprising the filter parameter set identified by the parameter set identifier; decoding the group of frame portions comprised in the first logical unit using the decoded filter parameter set.

In an embodiment: -the second identification information is an offset; and, -the filter parameter set identifier is the addition of the first identification information and the offset.

In an embodiment: -the second identification information is an index of a filter parameter set. In an embodiment, the logical unit comprising the filter parameter set is a third logical unit.

In an embodiment, the logical unit comprising the filter parameter set is the second logical unit.

In an embodiment, a plurality of filter parameter sets are determined, the method further comprising: decoding the plurality of filter parameter sets from the second logical unit, each filter parameter set being associated with an index, the second logical unit comprising for each group of frame portions, the association of an index of the group of frame portions and the index of a filter parameter set.

According to another aspect of the invention, there is provided a method for merging groups of frame portions from a plurality of original bitstreams of video data into a resulting bitstream, bitstreams being composed of logical units comprising frames, frames being divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: parsing the logical units comprising the groups of frame portions to determine a first identification information of a filter parameter set associated with each group of frame portions; extracting logical units comprising a filter parameter set applying to samples of a group of frame portions, the logical unit being identified by the first identification information; encoding a logical unit comprising the association of a group of frame portions index for each group of frame portions with a second identification information; encoding each extracted logical unit comprising a filter parameter set into a logical unit comprising the filter parameter set and a filter parameter set identifier determined based on the first identification information and the second identification information; generating the resulting bitstream comprising the logical units comprising the groups of frame portions, the encoded logical unit comprising the association of the groups of frame portions indexes with a second identification information and the encoded logical units comprising the filter parameter sets.

In an embodiment: the second identification information is an extension identifier; and the filter parameter set identifier comprises the first identification information and the extension identifier.

In an embodiment: the second identification information is an offset; and the filter parameter set identifier is the addition of the first identification information and of the offset.

In an embodiment: -the second identification information is an index of a filter parameter set. According to another aspect of the invention, there is provided a method of generating a file comprising a bitstream of logical units of encoded video data comprising frames, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: - encoding the bitstream according to the invention; generating a first track comprising the logical units containing the filter parameter sets, and the logical unit containing the association between the indexes of the groups of frame portions and the second identification information; generating for a group of frame portions, a track containing the logical unit containing the group of frame portions; and, -generating the file comprising the generated tracks.

According to another aspect of the invention, there is provided a bitstream of logical units, the bitstream comprising encoded video data comprising frames, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the bitstream comprising: -a first logical unit comprising a group of frame portions; a second logical unit comprising a filter parameter set applying to samples of the group of frame portions and a filter parameter set identifier determined based on a first identification information of the filter parameter set and on a second identification information associated with an index of the group of frame portion; and, a logical unit comprising the association between the index of the group of frame portions and the second identification information.

According to another aspect of the invention, there is provided a computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to the invention, when loaded into and executed by the programmable apparatus.

According to another aspect of the invention, there is provided a computer-readable storage medium storing instructions of a computer program for implementing a method according to the invention.

According to another aspect of the invention, there is provided a computer program which upon execution causes the methods of the invention to be performed.

At least parts of the methods according to the invention may be computer implemented. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, microcode, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.

Since the present invention can be implemented in software, the present invention can be embodied as computer-readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible, non-transitory carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid-state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.

BRIEF DESCRIPTION OF DRAWINGS

Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which: Figures la and lb illustrate two different application examples for the combination of regions of interest; Figure 2 illustrates some partitioning in encoding systems; Figure 3 illustrates the organisation of the bitstream in the exemplary coding system VVC; Figure 4 illustrates an example of process of generating a video bitstream composed of different regions of interest from one or several original bitstreams; Figure 5 illustrates issues with APS NAL unit when merging tile groups form different bitstreams; Figure 6 illustrates the encoding of an APS extension identifier according to embodiment of the invention; Figure 7 illustrates another embodiment where the second identification information is implemented as an offset to be applied to the APS identifier; Figure 8 illustrates an embodiment where the APS associated with different tile groups are merged; Figure 9 illustrates an embodiment where a plurality of APS can be associated with a tile group; Figure 10 illustrates the main steps of an encoding process according to an embodiment of the invention; Figure 11 illustrates the main steps of a decoding process according to an embodiment of the invention; Figure 12 illustrates the extraction and merge operation of two bitstreams stored in a file to form a resulting bitstream stored in a resulting file in an embodiment of the invention; Figure 13 illustrates the main step of the extraction and merge process at file format level in an embodiment of the invention; Figure 14 is a schematic block diagram of a computing device for implementation of one or more embodiments of the invention.

DETAILED DESCRIPTION OF THE INVENTION

Figure 5 illustrates issues with APS NAL unit when merging tile groups from different bitstreams.

Adaptive loop filtering (ALF) may be used as an in-loop filter for each frame. ALF requires the transmission of a set of parameters named ALF parameters. The ALF parameters are typically transmitted in a dedicated parameter set called APS for ALF Parameter Set. The APS is transmitted as a non-VOL NAL unit. It contains an identifier of the APS and the ALF parameters to be used in one or several tile groups of one or several pictures. The identifier is a value comprised in the range 0-31. The update mechanism is the following: when a new APS is received with a same identifier as a previous one, it replaces the previous one. The APS can change very rapidly, for each picture, the ALF parameters may be recomputed and new APS may be generated either as replacement or in addition to previous ones. The APS may typically take the following syntax: adaptation_pararneter_set_rbsp( ) { Descriptor adaptation_parameter_set_id ue(v) alf data( ) A tile group header comprises a flag, typically called tile_group_alf_enabled_flag, to indicate if the ALF filter is used. When ALF filter is used, the tile group header comprises the identifier of the APS to be used. In each successive picture, a tile group with the same index is likely to change its APS identifier. These syntax elements of the tile group header are typically encoded according to the following syntax: tile_group_header() ) { Descriptor if( sps alf enabled flag) { tile_group_alf enabledflag u(1) if( tile_gmup_alf enabled_flag) tile_group_aps_id ue(v) )

I 11(1)

When merging different tile groups from different bitstreams, this design generates a possibility of collision between the APS identifiers. Figure 5 illustrates an example of such collision. In Figure 5, the tile group 3 of a first bitstream 500 refers to an APS 510 having an identifier with the value 0 in bitstream 500. The tile group 4 in a second bitstream 501 also refers to an APS 511 having an identifier having the value 0 in bitstream 501. APS 510 and APS 511, while having the same identifier "0", are likely to contain different ALF parameters as they are defined in different bitstreams.

When generating the resulting bitstream 502, it is necessary to modify the identifier of at least one of the APS 520 and 521 in order to provide each tile group with the right ALF parameters. In the example, APS 521 corresponds to APS 511 with an amended identifier now taking the value "1". To do so, it is necessary to read, decode, amend and re-encode the APS 521 with the new identifier. This is not a too complex operation as APS are relatively small NAL units with mainly fixed variable length elements. It is also necessary to change the APS identifier referenced in the tile header group of the tile group 4 to correctly reference the APS 521 with its new identifier. This is a much more complex operation as the tile header is a complex structure with a lot of variable length elements. This means that the complete header needs to be decoded, amended and re-encoded, especially since the APS identifier is encoded in the last part of the tile group header.

In order to improve the merging operation, it may be contemplated to amend the structure of the tile group header. For example, the APS identifier may be encoded at the beginning of the tile group header using a fixed length syntax element. By doing so, the rewriting of the tile group header would only need to decode this first syntax element, to amend it and then to copy the rest of the tile group header. However, this copy would still be a costly operation due to the size of the tile group header and the tile group payload.

It may also be contemplated to increase the range of possible values for the APS identifier. The length of the APS identifier field could be indicated in the PPS. With this improvement, it would be possible for several communicating encoders to use different sub ranges of APS identifiers for encoding bitstreams in order to allow the merge of tile groups from these bitstreams with no collision in APS identifiers. However, this solution has some drawbacks. It increases the number of bits needed for the encoding of the APS identifier that is present in each tile group, so typically several times per picture. This implies a decrease of the compression ratio, which is not desirable. Moreover, it may not be possible to know at encoding time all the merging operation that will be necessary in order to manage the sub ranges of APS identifiers in order to plan all the merging operations. It may also be contemplated to generate randomly the APS identifiers in order to decrease the risk of collision. However, due to the high number of APS needed to encode a typical bitstream, it is unlikely to solve entirely the collision problem.

According to an embodiment of the invention, the APS identifier that is based on the original bitstream from which the APS and associated tile groups are issued forms a first identification information. It is completed with a second identification information. In this embodiment, each APS comprises the APS identifier and this second identification information, each tile group comprises only the APS identifier while the PPS, or another parameter set, comprises for each tile group, the second identification information.

According to this embodiment, the merge operation comprises the insertion of the second identification information in each APS based on the original bitstream it comes from and the insertion in the PPS of the second identification information associated with each tile group. The tile group NAL unit is not modified, and the tile group header keep its APS identifier.

At decoding, when decoding a tile group, the decoder needs to identify the right APS corresponding to the tile group. This is done by obtaining the APS identifier from the tile group header. Then the second identification information associated with this tile group is obtained from the PPS. The rightAPS is then identified by both the APS identifier and the second identification information. It must be noted that collisions are solved as, even in case of APS identifier collision, as both APS come from two different original bitstreams, the associated second identification information is different, meaning that the identification based on both the APS identifier and the second identification information correctly identify the right APS. This solution does not imply the rewriting of the tile group header, thus simplifying the merging operation.

In a first example of this embodiment, the second identification information is implemented as an APS extension identifier. The syntax of the APS can be, for example: adaptItion_parameter_set_rbsp( ) { Descriptor adaptation_parameter_set_id u(5) aps_ id_extension_flag u(1) if(aps_id_extension_flag) { aps_extension_id u(5) alf data( ) The presence of the APS extension identifier in the APS is signalled using a flag, for example named aps_id_extension_flag, encoded on one bit. The APS extension identifier, for example called aps_extension_id, is encoded on a fixed length for example 5 bits. In a variant, the encoding length in bits is signalled in one of the Parameter Set NAL unit. For instance, the SPS or the PPS. In table above, the coding method (descriptor column) is u(v) for aps_extension_id. In yet another variant, when aps_id_extension_flag equals 1, the aps_extension_id flag is preceded by an aps_extension_length syntax element that specifies the length in bits of aps_extension_id. In another variant, exp-golomb coding is used and the the new syntax element (aps_extension_id) is encoded for instance using ue(v) coding method.

For example, the semantics of syntax elements are the following: adaptation_parameter_set_id provides an identifier for the APS for reference by other syntax elements. The value of adaptation_parameter_set_id shall be in the range of 0 to 31.

aps_id_extension_flag equal to 1 specifies the presence of aps_extension_id in the APS. aps_extension_flag equal to 0 specifies the absence of the aps_extension_id.

aps_extension_id when present, provides extended identifier for reference by other syntax elements. The value of aps_extension_id shall be in the range of 0 to 31. When not present the value of aps_extension_id is inferred to be equal 0.

At decoding, the replacement rule of APS becomes that a new APS replaces a previous one if it has the same APS identifier and the same APS extension identifier.

The PPS contains an association for each tile group of the associated APS extension identifier, according, for example, to the following syntax: pic_pararneter_set_rbsp( ) { Descriptor if( rect tile group flag) { signalled_tile_group_id_flag u(1) if( signalled_tile_group_id_flag) { signalled_tile_group_id_length_minusl ue(v) for( i = 0: i <= lum tile groups rit pic minus1; i++ ) tile_group_id[ i] u(v) 1 I if( rect_tile_groupflag) { signalled_aps_id_extension_flag tal) if(sig,nalled_aps_id_e ensionflag) { fo = 0; i <= num tile groups i Lpic minits1; i++ ) tile_group_aps_extension_id[ i] u(5)

I

The presence of the APS extension identifier association table is indicated with the flag signalled_aps_id_extension_flag, encoded on one bit. When present (the flag equals 1) a table associating each tile group index with an APS extension identifier is encoded using fixed length encoding.

For example, the semantics of syntax elements are the following: signalled_aps_id_extension_flag equal to 1 specifies the presence of tile_group_aps_extension_id[ i] in the APS. signalled_aps_id_extension_flag equal to 0 specifies the absence of the tile_group_aps_extension_id[ i] in the PPS.

tile_group_aps_extension_id[ i] specifies the tile group extension ID of the i-th tile group, when present. When not present, the tile_group_aps_extension_id[ i is inferred equal to 0, for each i in the range of 0 to num_tile_group_in_pic_minus1 inclusive.

In a variant, the length of the APS extension identifier field decreased by one is first encoded using variable length encoding before the table for example according to the following syntax: pic_parameter_set_rbsp( ) { Descriptor if( rect_tile_group_flag) { signalled_tile_group_id_flag u(1) if( signalled tile group id flag) { signalled_tile_group_id_length_minus I ue(v) for( i = 0: i <= num tile groups in pic minus I; i++ ) tile_group_id[ i] u(v) if( rect_tile_group_flag) { signalled_aps_id_extension_flag u(I) if(siwialled aps id extension flag) { signalled_aps_id_extension_length_minusl ue(v) for( i = O. i <= num tile groups in pie minus I; i++ ) tile_group_aps_extension_LM] I u(v) For example, the semantics of syntax elements are the following: signalled_aps_id_extension_flag equal to 1 specifies the presence of signalled_aps_id_extension_length_minusl and tile_group_aps_extension_id[ i 15] in the APS. signalled_aps_id_extension_flag equal to 0 specifies the absence of the signalled_aps_id_extension_length_minusl and tile_group_aps_extension_id[ i] in the PPS. signalled_aps_id_extension_length_minusl when present, specifies the number of bits used to represent the syntax element tile_group_extension_id and aps_extension_id of the APS. The value of signalled_aps_id_extension_length_minusl shall be in the range of 0 to 15, inclusive. When not present the value of signalled_aps_id_extension_length_minusl is inferred to be equal to Ceil( Log2( num_tile_groups_in_pic_minus1 + 1) ) -1. tile_group_aps_extension_id[ i] specifies the tile group extension ID of the i-th tile group, when present. When not present, the tile_group_aps_extension_id[ i] is inferred equal to 0, for each i in the range of 0 to num_tile_group_in_pic_minus1 inclusive.

A variable length encoding of the extension identifier may have been used. This would have been more compact but more complex to parse.

No modification of the tile group header is contemplated. The presence of the APS identifier when combined with the APS extension identifier obtained from the PPS allows determining the right APS in all configurations.

Inc group header( ) { Descriptor if( sps alf enabled flag) -I tile_group_alf enabledflag u(I) if( tile group alf enabled flag) tile_group_aps_id ue(v) u(I) The semantics of some syntax elements of the tile group header are the following: tile_group_aps_id specifies the identifier of the APS in use.

The variable tileGroupExtensionldx which specifies the index of the tile group APS extension identifier is derived as follows: if( rect_tile_group_flag) { tileGroupExtensionldx = 0 while( tile_group_address I= tile_group_id[ tileGroupExtensionldx] ) tileGroupExtensionldx ++ else { tileGroupExtensionldx = 0; The APS in use is the APS NAL unit having adaptation_parameter_set_id equal to tile_group_aps_id and the aps_extension_id equal to tile_group_aps_extension_id[tileGroupExtensionldx].

The Temporalld of the APS NAL unit having adaptation_parameter_set_id equal 5 to tile_group_aps_id and the aps_extension_id equal to tile_group_aps_extension_id[tileGroupExtensionldx] shall be less than or equal to the Temporalld of the coded tile group NAL unit.

When multiple APSs with the same value of adaptation_parameter_set_id and aps_extension_id are referred to by two or more tile groups of the same picture, the 10 multiple APSs with the same value of adaptation_parameter_set_id and aps_extension_id shall have the same content.

Figure 6 illustrates this embodiment. Original bitstreams 600 and 601 with respective tile groups 3 and 4 to be merged, each referring to an APS, respectively 610 and 611, having both an APS identifier with a value 0, are identical to those of Figure 5.

In the resulting bitstream 602, both tile groups 3 and 4 are unmodified and continue to refer to the associated APS using the APS identifier with a value of 0. What changed is that now, the APS originated from bitstream 600 and corresponding to APS 610 comprises both an APS identifier with a value 0 and an APS extension identifier with a value of 0. The APS 621 corresponding to APS 611 from bitstream 601 comprises an APS identifier with a value 0 and an APS extension identifier with a value 1. The PPS comprises a table 630 that associates the tile group 3 with the APS extension identifier 0 and the tile group 4 with the APS extension identifier 1. At decoding, the decoder is therefore able to decode the tile group 3 with the correct identification of the associated APS 620 based on the APS identifier stored in the tile group 3 and the associated APS extension identifier form the PPS. The same is true for the decoding of the tile group 4.

It is to be noted that this mechanism works even if the APS identifier changes from a picture to another for the tile groups with the same identifier. The APS extension identifier will stay the same, still allowing the identification of the right APS.

This proposed embodiment allows solving the APS identifier collisions while keeping the tile group structure intact. Accordingly, the merge process of tile group from different original bitstreams is simplified.

In a variant of this embodiment, the APS extension identifier is called an APS group identifier. The semantics are the same except the naming of syntax element for which extension is replaced with group. In another variant extension is replaced with 35 extended.

In order to simplify the parsing of the PPS, the different loops could be merged as illustrated by the following syntax when the fixed length encoding length equal 5 is used: pic_parameter set rbsp( ) 1 Descriptor if( reel We group flag) 1 signalled_tile_group_id_flag u(1) if( signalled_tile_group_id_flag) 1 signalled_tile_group_id_length_minus I ue(v) signalled_aps_id_extension_flag u(1) for( i = 0; = num_tile_groups_in_pic_minus 1; i++ ) { if( signalled tile group id flag) tile_group_id[ i] u(v) if(signalled aps id extension flag) tile_group_aps_extension_id[ i] u(5)

I

In a variant the signalled_aps_group_id_length_minusl specifies the length of the identifier extension code, the syntax of the PPS is illustrated by the following syntax: pic_parameter set rbsp( ) 1 Descriptor if( rect tile group flag) 1 signalled_tile_group_id_flag u(I) if( signalled_tile_group_id_flag) signalled_tile_group_id_length_minusl ue(v) signalled_aps_group_flag u(1) if(signalled_aps_group_flag) signalled_aps_group_id_length_minusl ue(v) for( i = 0; <= num tile groups in_pic n mist; i++ ) { if( signalled_tile_group_id_flag) tile_group_id[ i] u(v) if(signalled_aps_group_flag) { tile_group_aps_group_id[ i] u(v) -0 s Figure 7 illustrates another variant of this embodiment where the second identification information is implemented as an offset to be applied to the APS identifier.

According to this embodiment, the APS contains an identifier field that is named adaptation_parameter_set_id. It corresponds to the original APS identifier as defined in the original bitstream the APS comes from.

adaptation_parametcr sct ibsp( ) { Descriptor adaptation_parameter_set_id u(5) alf data( ) For example, the semantics of syntax elements are the following: adaptation_parameter_set_id provides an identifier for the APS for reference by other syntax elements. The value of adaptation_parameter_set_id shall be in the range of 0 to 31.

The PPS associates each tile group with a signed offset that is computed to avoid APS identifier collision in the merged bitstream. This offset corresponds to the aps_offset syntax element describe in the table below.

pie parameter set rbsp( ) { Descriptor if( rect. tile group flag) { signalled_tile_group_id_flag u(I) if( signalled tile group id flag) { signalled_tile_group_id_length_minusl ue(v) for( i = 0; i <= nmn tile groups in pic minus I; i++ ) tile_group_hif i] u(v)

I

if( rect_tile_group_flag) { simalled_aps_id_offsettlag u(1) if(signalled_aps_id_offset_flag) { fo = 0; i <= num tilc groups i Lpic ininusl; i++ ) tile_group_aps_offset[ i] se(v) i For example, the semantics of syntax elements are the following: - signalled_aps_id_offset_flag equal to 1 specifies the presence of tile_group_aps_offset_id [ i] in the PPS. signalled_aps_id_offset_flag equal to 0 specifies the absence of the tile_group_aps_offset_id[ i] in the PPS.

- tile_group_aps_offset_id[ i] specifies the tile group ID offset of the i-th tile group, when present. tile_group_aps_offset_id should be in range of -32 to 32. When not present, the tile_group_aps_offset_id[ i] is inferred equal to i, for each i in the range of 0 to num_tile_group_in_pic_minus1 inclusive.

The tile group header is left unchanged indicating the APS identifier associated with the tile group.

At decoding, the decoder identifies the APS for a tile group by adding the offset obtained from the PPS to the APS identifier obtained from the tile group header to obtain the actual APS identifier comprised in the APS.

The semantics of some syntax elements of the tile group header are the following: tile_group_aps_id specifies the identifier of the APS in use.

The variable tileGroupOffsetldx which specifies the index of the tile group APS offset identifier is derived as follows: if( rect_tile_group_flag) { tileGroupOffsetldx = 0 while( tile_group_address!= tile_group_id[ tileGroupOffsetldx] ) tileGroupOffsetldx ++ else { tileGroupOffsetldx = 0; The APS in use is the APS NAL unit having adaptation_parameter_set_id equal to tile_group_aps_id + tile_group_aps_offset_id[tileGroupOffsetldx].

The Temporalld of the APS NAL unit having adaptation_parameter_set_id equal to tile_group_aps_ id + tile_group_aps_offset_id[tileGroupOffsetldx] shall be less than or equal to the Temporalld of the coded tile group NAL unit.

When multiple APSs with the same value of adaptation_parameter_set_id are referred to by two or more tile groups of the same picture, the multiple APSs with the same value of adaptation_parameter_set_id shall have the same content.

Figure 7 illustrates this embodiment. Original bitstreams 700 and 701 with respective tile groups 3 and 4 to be merged, each referring to an APS, respectively 710 and 711, having both an APS identifier with a value 0, are identical to those of Figure 5.

In the resulting bitstream 702, both tile groups 3 and 4 are unmodified and continue to refer to the associated APS using the APS identifier with a value of 0. What changed is that now, the APS originated from bitstream 700 and corresponding to APS 710 comprises an APS identifier with a value 0 corresponding to the original APS identifier of 0 added to the offset 0. The APS 721 corresponding to APS 711 from bitstream 701 comprises an APS identifier with a value 3 corresponding to the original APS identifier of 0 added to the offset of 3. The PPS comprises a table 730 that associates the tile group 3 with the offset 0 and the tile group 4 with the offset 3. At decoding, the decoder is therefore able to decode the tile group 3 with the correct identification of the associated APS 720 based on the APS identifier stored in the tile group 3 added to the offset from the PPS. The same is true for the decoding of the tile group 4.

In a variant, the PPS provides an additional field called tile_group_aps_base_id, which is an integer that will be added to the APS identifier to obtain the actual identifier.

The goal is to keep the APS identifier stored in the APS structure and the offsets stored in the PPS association table smaller to save on the encoding. The PPS syntax according to this variant may be: pic_parameter set rbsp( ) { Descriptor if( rect tile group flag) { signalled_tile_group_id_flag u(I) if( signalled tile group id flag) { signalled_tile_group_id_length_minusl ue(v) for( i = 0; i <= num tik groups iLpic minusl; i++ ) tile_group_id[ i] u(v) i if( rest tile group flag) { signalled_aps_id_offset_flag u(1) if(signalkd aps id offset flag) 1 tile_group_aps_id_base ue(v) for( i = 0; i <= num_tile_goups_m_pie_rninusi; i++ ) tile_group_aps_offset[ i] se(v)

I

For example, the semantics of syntax elements are the following: signalled_aps_id_offset_flag equal to 1 specifies the presence of tile_group_aps_id_base and tile_group_aps_offset_id[ i] in the PPS.

signalled_aps_id_offset_flag equal to 0 specifies the absence of the tile_group_aps_id_base and tile_group_aps_offset_id[ i] in the PPS. tile_group_aps_id_base is a the base value of all the tile group APS identifiers. The range of tile_group_aps_id_base shall be in range of 0 to 31, inclusive.

-tile_group_aps_offset_id[ i] specifies the tile group ID offset of the i-th tile group, when present. tile_group_aps_offset_id should be in range of -32 to 32. When not present, the tile_group_aps_offset_id[ i] is inferred equal to i, for each i in the range of 0 to num_file_group_in_pic_minus1 inclusive.

According to this embodiment, the APS contains an identifier field that is named adaptation_parameter_set_id_minus_base that is a signed integer. It corresponds to the original APS identifier as defined in the original bitstream the APS comes from at which the aps_id_base has been removed. The descriptor encoding se(v) corresponds to a variable length encoding of a signed integer.

adaptation_paranactcr sct ibsp( ) { Descriptor adaptation_parameter_set_id_minutbase sc(v) alf data( ) For example, the semantics of syntax elements are the following: adaptation_parameter set_id_minus_base provides an identifier for the APS for reference by other syntax elements. The value of adaptation_parameter_set id_minus_base shall be in the range of -15 to +14, inclusive.

The variable tileGroupOffsetldx which specifies the index of the tile group APS offset identifier is derived as follows: if( rect_tile_group_flag) { tileGroupOffsetldx = 0 while( tile_group_address!= tile_group_id[ tileGroupOffsetldx) tileGroupOffsetldx ++ else { tileGroupOffsetldx = 0; The APS in use is the APS NAL unit having adaptation_parameter_set_id_minus_base + tile_group_aps_id_base equal to tile_group_aps_id_base + tile_group_aps_id + tile_group_aps_offset_id[tileGroupOffsetldx].

The TemporalId of the APS NAL unit having adaptation_parameter_set_id_minus_base + tile_group_aps_id_base equal to tile_group_aps_id_base + tile_group_aps_id + tile_group_aps_offset_id[tileGroupOffsetldx] shall be less than or equal to the Temporalld of the coded tile group NAL unit.

When multiple APSs with the same value of adaptation_parameter_set_id_minus_base are referred to by two or more tile groups of 10 the same picture, the multiple APSs with the same value of adaptation_parameter_set_id_minus_base id shall have the same content.

According to another embodiment, the APS NAL unit syntax is modified in order to allow the APS to store several ALF parameter sets. The idea is to merge the APS from different bitstreams having the same APS identifier into a single APS in the resulting bitstream having the same identifier and storing the sets of ALF parameters that were included into the original APS. An additional table is included in the resulting APS to indicate for each tile group referring this APS identifier, which set of ALF parameters must be used.

The syntax of the new APS may be as follows: adaptation_parameter sct tbsp( ) { Descriptor ad aptatio n_param ete r_set_i d_offset se(v) cxtcnded_aps_flag u(1) if (extended aps flag) { aps_num_tile_group_signaled_minus I ue(v) for( i = 0; i < aps_num_tile_group_signaled_minusl ++ ) { aps_tile_group_address[i] u(v) aps_tile_group_alf ithlil ue(v) aps_num_alf data_minusl ue(v) for( i = O. i < aps num alf data; i++ ) { alf data(i) * * * For example, the semantics of syntax elements are the following: extended_aps_flag equal to 1 specifies the presence of aps_num_tile_group_signaled_minusl, aps_tile_group_address[ i] and aps_tile_group_alf_idx[ i]. extended_aps_flag equal to 0 specifies the absence of aps_num_tile_group_signaled_minus1, aps_tile_group_address[ i] and aps_tile_group_alf_idx[ i].

aps_num_tile_group_signaled_minusl plus 1 specifies the number of aps_tile_group_address[ i] and aps_tile_group_alf_idx[ i] specified in the APS; aps_num_tile_group_signaled_minusl shall be in the range of 0 to num_tile_group_in_pic_minus1, inclusive.

aps_tile_group_address[i] specifies the tile group address of each tile group signalled in the APS.

In a variant, aps_tile_group_address[i] specifies the tile group index of each tile group signalled in the APS. The range of aps_tile_group_address[i] shall be in range of 0 to num_tile_group_in_pic_minus1, inclusive.

aps_tile_group_alf_idx[i] specifies the index of the set of ALF parameters in the APS to be used for the tile group with a tile_group_address equal to aps_tile_group_address[ i].

aps_tile_group_alf_idx[ i] shall be in range of 0 to aps_num_alf_data_minusl, inclusive.

In a variant, aps_tile_group_alf_idx[i] specifies the index of the set of ALF parameters in the APS to be used for the tile group with a tile group index equal to aps_tile_group_address[ i].

aps_tile_group_alf_idx[ i] shall be in range of 0 to aps_num_alf_data_minus1, inclusive aps_num_alf_data_minus1 specifies the number of ALF parameter sets specified in the APS.

The semantics of some syntax elements of the tile group header are the following: The Temporalld of the APS NAL unit having adaptation_parameter_set_id_minus_base + tile_group_aps_id_base equal to tile_group_aps_id_base + tile_group_aps_id + tile_group_aps_offset_id[tileGroupOffsetldx] shall be less than or equal to the Temporalld of the coded tile group NAL unit.

When multiple APSs with the same value of adaptation_parameter_set_id_minus_base are referred to by two or more tile groups of 5 the same picture, the multiple APSs with the same value of adaptation_parameter_set_id_minus_base id shall have the same content.

Figure 8 illustrates this embodiment. A first bitstream 800 comprises a tile group 3 referring an APS 810 containing a set of ALF parameters called ALF data 1. A second bitstream 801 comprises a tile group 4 referring an APS 811 containing a set of ALF parameters called ALF data 2. Both APS in the two bitstreams have the same APS identifier with a value 0. The resulting bitstream 802 comprises both tile groups 3 and 4. These tile groups still refer to an APS with an APS identifier with a value 0. This APS 820 with an APS identifier with a value 0 comprises two ALF parameter sets, namely ALF data 1 and ALF data 2, indexed respectively 0 and 1. The APS also comprises a table that associates the tile group 3 with the index 0 of the ALF parameter set ALF data 1. The tile group 4 is associated with the index 1 of the ALF parameter set ALF data 2. Accordingly, the decoder is able to retrieve the APS from the APS identifier stored in the tile group header and then to identify in the APS the ALF parameter set to be used for filtering based on the tile group identifier. The merge process does not need to rewrite the tile group NAL units or the PPS. All the mechanism implies only the APS.

Figure 9 illustrates an embodiment where a plurality of APS can be associated with a tile group. This may allow applying different ALF parameter sets to different coding units within the tile group. A first bitstream 900 comprises a tile group 3 referring three different APS 910 with respective APS ids 0, 1, and 2. A second bitstream 901 comprises a tile group 4 referring three different APS 911 with respective APS ids 0, 2, and 3. When generating a resulting bitstream 902 comprising tile group 3, and 4, APS identifiers collisions may occur as it was the case with a single APS referred in a tile group. The previous embodiments described above to solve the APS identifier collisions may be applied successively to the plurality of APS referred in the tile groups. For instance, an APS extension identifier may be used.

In a first embodiment of a tile group referring to a plurality of APS, assuming a fixed number of APS is provided, the syntax of the tile group header may be as follows: tile group header( ) Descriptor if( sps alf enabled_ flag) { tile_group_alf enabled_flag u(I) if( tile_group_alf enabled_flag) for( = 0; i <= num tile group aps ids minusl; id+ ) 1 tile_group_aps_idv] ue(v) f u(1) In a second embodiment of a tile group referring to a plurality of APS, assuming a variable number of APS is provided, the syntax of the tile group header may be as follows: tile group header( ) { Descriptor if( sps alf enabled flag) { tile_group_alf enabled_flag u(1) if( tile_group_alf enabled_flag) num tile group aps id minus] for( = 0; i <= num_tile_group_aps_ids minusl; i++) { tile_group_aps_idii] ue(v) u(1) ) Figure 10 illustrates the main steps of an encoding process according to an embodiment of the invention.

The described encoding process concerns the encoding according to an embodiment of the invention of a single bitstream. The obtained encoded bitstream may be used in a merging operation as described above as an original bitstream or as the resulting bitstream.

In a step 1000, a tile portioning of the frames is determined. For instance, the encoder defines the number of columns and rows so that each region of interest of the video is covered by at least one tile. In another example, the encoder is encoding an omnidirectional video where each tile corresponds to a predetermined field of view in the video. The tile partitioning of the frame according to a tile grid is typically represented in a parameter set NAL unit, for example a PPS according to the syntax presented in reference to Figure 3.

In a step 1001, a set of tile groups are defined, each tile group comprising one or more tiles. In a particular embodiment, a tile group is defined for each tile of the frame. Advantageously, in order to avoid some VCL NAL unit rewriting in merge operation, a tile group identifier is defined for each tile group in the bitstream. The tile group identifiers are determined in order to be unique for the tile group. The unicity of the tile group identifiers may be defined at the level of a set of bitstreams comprising the bitstream currently encoded.

The number of bits used to encode the tile group identifier, corresponding to the length of the tile group identifier, is determined as a function of the number of tile groups in the encoded bitstream or as a function of a number of tile groups in a set of different bitstreams comprising the bitstream being currently encoded.

The length of the tile group identifier and the association of each tile group with an identifier is specifically specified in parameter set NAL unit as the PPS.

In a step 1002, each tile group is associated to decoding context parameters and in particular to an APS when adaptive loop filtering is to be applied. The association comprises the insertion in the tile group header of an APS identifier. Then, the PPS is generated comprising for each tile group a second identification information. The APS is generated with an APS identifier based on both the APS identifier inserted in the tile group header and the second identification information associated with the tile group in the PPS. The second identification information may be an APS extension identifier or an offset to be added to the APS identifier.

In a variant, the APS is defined with a plurality of ALF parameter sets, an association table being inserted to associate a tile group with an index of an ALF parameter set within the APS. The second identification information is the index of the ALF data associated with the tile group.

In a step 1003, the samples of each tile group are encoded according to the parameters defined in the different parameter sets. In particular, the encoding will be based on the ALF parameters in the APS associated with the tile group. A complete bitstream is generated comprising both the non-VCL NAL units corresponding to the different parameter sets and the VCL NAL units corresponding to the encoded data of the different tile groups.

Figure 11 illustrates the main steps of a decoding process according to an embodiment of the invention.

In a step 1100, the decoder parses the bitstream in order to determine the tile portioning of the frames. This information is obtained from a parameter set, typically from the PPS NAL unit. The syntax elements of the PPS are parsed and decoded to determine the grid of tiles.

In a step 1101, the decoder determines the tile group partitioning of the frame and in particular obtain the number of tile group associated with an identification information of each tile group. This information is valid for at least one frame, but stay valid generally for many frames. It may take the form of the tile group identifier that may be obtained from a parameter set as the PPS NAL unit as described in Figures 6, 7, 8, and 9.

In a step 1102, the decoder parses the bitstream to determine the APS identifier that is associated with each tile group. This is typically done by extracting an APS identification information from the tile group header and by combining this information with a second identification information associated with the tile group in a parameter set, typically in a PPS. Based on both the APS identification information and the second identification information, an actual APS identifier is determined that allows the determination of an APS NAL unit associated with the tile group.

In an alternate embodiment, the decoder parses the tile group header to determine an APS identifier associated with the tile group. Then, the decoder parses the APS NAL unit to determine an ALF parameter set in the APS that is associated with the tile group identifier.

In a step 1103, the decoder decodes the VCL NAL units corresponding to the tile groups according to the parameters determined in the previous steps. In particular, the decoding may include an adaptive loop filtering step with parameters obtained from the APS identified in the previous steps as being associated with the tile group.

Figure 12 illustrates the merge operation of two bitstreams stored in a file to form a resulting bitstream stored in a resulting file in an embodiment of the invention.

Figure 13 illustrates the main step of the merge process at file format level in an embodiment of the invention.

Figure 12 illustrates the merge of two ISO BMFF file 1200 and 1201 resulting in a new ISO BMFF file 1202 according to the method of Figure 13.

The encapsulation of the VVC streams consists in this embodiment in defining one tile track for each tile group of the stream and one tile base track for the NAL units common to the tile groups. It could be also possible to group more than one tile group in one tile track. For example, the file 1200 contains two tile groups one with the identifier '1.1' and another one with identifier '1.2'. The samples corresponding to each tile group '1.1' and '1.2' are described respectively in one tile track similarly to tile tracks of in ISO/IEC 14496-15. While initially designed for HEVC, the VVC tile groups could be encapsulated in tile tracks. This VVC tile track could be differentiated from HEVC tile track by defining a new sample entry for instance cvvtl instead of chytt. Similarly, a tile base track for HEVC is extended to support WC format. This VVC tile base track could be differentiated from HEVC tile base track by defining a different sample entries. The VVC tile base track describes NAL units common to the two tile groups. Typically, it contains mainly non-VCL NAL unit such as the Parameter Sets and the SEI NAL units. For example, it can be one of the Parameters Sets NAL units First, the merging method consists in determining in step 1300 the set of tile tracks from the two streams to be merged in a single bitstream. For instance, it corresponds to the tile tracks of the tile group with the identifier '2.1' of 1201 file and of the tile group with identifier '1.2' of the file 1200.

Then the method in a step 1301 determines the new decoding locations of the tile groups and generates new Parameter Sets NAL units (i.e. PPS and APS) to describe these new decoding locations in the resulting stream accordingly to the embodiments described above. Since all the modifications consist in modifying only the non-VCL NAL units, it is equivalent to generating in a step 1302 a new Tile Base Track. The samples of the original tile tracks corresponding to the extracted tile groups remains identical. The tile tracks of the file 1202 reference the tile base tracks with a track reference type set to 'tbas'. The tile base track references as well the tile tracks with a track reference type set to 'that'.

The advantage of this method is that combining two streams consists mainly in generating a new tile base track and update the track reference boxes and copying as is the tile tracks samples corresponding to the selected tile groups. The processing is simplified since rewriting process of the tile tracks samples is avoided compared to prior art.

Figure 14 is a schematic block diagram of a computing device 1400 for implementation of one or more embodiments of the invention. The computing device 1400 may be a device such as a microcomputer, a workstation or a light portable device.

The computing device 1400 comprises a communication bus connected to: -a central processing unit 1401, such as a microprocessor, denoted CPU; -a random access memory 1402, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method according to embodiments of the invention, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port, for example; -a read only memory 1403, denoted ROM, for storing computer programs for implementing embodiments of the invention; -a network interface 1404 is typically connected to a communication network over which digital data to be processed are transmitted or received. The network interface 1404 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data packets are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 1401; -a user interface 1405 may be used for receiving inputs from a user or to display information to a user; -a hard disk 1406 denoted HD may be provided as a mass storage device; -an I/O module 1407 may be used for receiving/sending data from/to external devices such as a video source or display.

The executable code may be stored either in read only memory 1403, on the hard disk 1406 or on a removable digital medium such as for example a disk. According to a variant, the executable code of the programs can be received by means of a communication network, via the network interface 1404, in order to be stored in one of the storage means of the communication device 1400, such as the hard disk 1406, before being executed.

The central processing unit 1401 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 1401 is capable of executing instructions from main RAM memory 1402 relating to a software application after those instructions have been loaded from the program ROM 1403 or the hard disk (HD) 1406, for example.

Such a software application, when executed by the CPU 1401, causes the steps of the flowcharts of the invention to be performed.

Any step of the algorithms of the invention may be implemented in software by execution of a set of instructions or program by a programmable computing machine, such as a PC ("Personal Computer"), a DSP ("Digital Signal Processor") or a microcontroller; or else implemented in hardware by a machine or a dedicated component, such as an FPGA ("Field-Programmable Gate Array") or an ASIC ("Application-Specific Integrated Circuit").

Although the present invention has been described herein above with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a skilled person in the art which lie within the scope of the present invention.

Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.

Each of the embodiments of the invention described above can be implemented solely or as a combination of a plurality of the embodiments. Also, features from different embodiments can be combined where necessary or where the combination of elements or features from individual embodiments in a single embodiment is beneficial.

Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

For example, a Tile Group could also be a slice, a tile set, a motion constrained tile set (MCTS), a region of interest or a sub picture.

The information coded in the Tile Group may also be encoded in all the Tile Group Segment headers. Alternatively, the information is encoded only in the independent tile group segment header to reduce the size of the dependent tile group segment headers.

The information coded in the Picture Parameter Set PPS could also be encoded in other non VCL units like a Video Parameter Set VPS, Sequence Parameter Set SPS or the DPS or new units like Layer Parameter Set, or Tile Group Parameter Set. These units define parameters valid for several frames and thus there are at a higher hierarchical level than the tile group units or the APS units in the video bitstream. The tile group units are valid only inside one frame. The APS units can be valid for some frames but their usage changes rapidly from one frame to another.

The Adaptation Parameter Set unit (APS) contains parameters defined for the Adaptive Loop Filter (ALF). In some variants, the APS may contain several loop filters parameter sets with different characteristics. The CTU using a particular APS can then select which particular loop filter parameter set is used. In another variant, the video can also use other types of filters (SAO, deblocking filters, post processing filter, denoising...). Some parameters for some other filters (in-loop and out of loop filters) could also be encoded and stored in some other Parameter Set NAL units (filter parameter set units) referenced by the tile group. The same invention could be applied to these new types of units.

In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Claims

CLAIMS1. A method of encoding video data comprising frames into a bitstream of logical units, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: determining a filter parameter set applying to samples of a group of frame portions; - determining a first identification information of the determined filter parameter set; determining a second identification information associated with an index of the group of frame portions; - encoding the group of frame portions into a first logical unit comprising the first identification information; - encoding the filter parameter set into a second logical unit comprising a filter parameter set identifier determined based on the first identification information and on the second identification information; and, - encoding the association between the index of the group of frame portions and the second identification information into a logical unit.
2. The method of claim 1, wherein: -the second identification information is an extension identifier; and, - the filter parameter set identifier comprises the first identification information and the extension identifier.
3. The method of claim 1, wherein: the second identification information is an offset; and, the filter parameter set identifier is the addition of the first identification information and of the offset.
4. The method of claim 1, wherein: -the second identification information is an index of a filter parameter set.
5. The method of claim 1, wherein the association between the index of the group of frame portions and the second identification information is encoded into a third logical unit.
6. The method of claim 5, wherein the second and third logical units are parameter set logical units applying at different levels of the bitstream.
7. The method of claim 1, wherein the association between the index of the group of frame portions and the second identification information is encoded into the second logical unit.
8. The method of claim 4, wherein a plurality of filter parameter sets are determined, the method further comprising: encoding the plurality of filter parameter sets into the second logical unit, each filter parameter set being associated with an index, the second logical unit comprising for each group of frame portions, the association of an index of the group of frame portions and the index of a filter parameter set.
9. A method for decoding a bitstream of logical units of video data comprising frames, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: parsing a first logical unit comprising a group of frame portions to determine a first identification information of a filter parameters set applying to samples of the group of frame portions; parsing a second logical unit comprising the association between an index of the group of frame portions and a second identification information; determining a filter parameter set identifier based on the first identification information and the second identification information; decoding a logical unit comprising the filter parameter set identified by the parameter set identifier; decoding the group of frame portions comprised in the first logical unit using the decoded filter parameter set.
10. The method of claim 9, wherein: -the second identification information is an extension identifier; and, -the filter parameter set identifier comprises the first identification information and the extension identifier.
11. The method of claim 9, wherein: - the second identification information is an offset; and, the filter parameter set identifier is the addition of the first identification information and the offset.
12. The method of claim 9, wherein: - the second identification information is an index of a filter parameter set.
13. The method of claim 9, wherein the logical unit comprising the filter parameter set is a third logical unit.
14. The method of claim 13, wherein the second and third logical units are parameter set logical units applying at different levels of the bitstream. 15
15. The method of claim 9, wherein the logical unit comprising the filter parameter set is the second logical unit.
16. The method of claim 12, wherein a plurality of filter parameter sets are determined, the method further comprising: decoding the plurality of filter parameter sets from the second logical unit, each filter parameter set being associated with an index, the second logical unit comprising for each group of frame portions, the association of an index of the group of frame portions and the index of a filter parameter set.
17. A method for merging groups of frame portions from a plurality of original bitstreams of video data into a resulting bitstream, bitstreams being composed of logical units comprising frames, frames being divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: parsing the logical units comprising the groups of frame portions to determine a first identification information of a filter parameter set associated with each group of frame portions; extracting logical units comprising a filter parameter set applying to samples of a group of frame portions, the logical unit being identified by the first identification information; encoding a logical unit comprising the association of a group of frame portions index for each group of frame portions with a second identification information; encoding each extracted logical unit comprising a filter parameter set into a logical unit comprising the filter parameter set and a filter parameter set identifier determined based on the first identification information and the second identification information; - generating the resulting bitstream comprising the logical units comprising the groups of frame portions, the encoded logical unit comprising the association of the groups of frame portions indexes with a second identification information and the encoded logical units comprising the filter parameter sets.
18. The method of claim 17, wherein: -the second identification information is an extension identifier; and - the filter parameter set identifier comprises the first identification information and the extension identifier.
19. The method of claim 17, wherein: the second identification information is an offset; and the filter parameter set identifier is the addition of the first identification information and of the offset.
20. The method of claim 17, wherein: -the second identification information is an index of a filter parameter set.
21. A method of generating a file comprising a bitstream of logical units of encoded video data comprising frames, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the method comprising: - encoding the bitstream according to any one of claims 1 to 8; generating a first track comprising the logical units containing the filter parameter sets, and the logical unit containing the association between the indexes of the groups of frame portions and the second identification information; generating for a group of frame portions, a track containing the logical unit containing the group of frame portions; and, generating the file comprising the generated tracks.
22. A bitstream of logical units, the bitstream comprising encoded video data comprising frames, frames being spatially divided into frame portions, frame portions being grouped into groups of frame portions, the bitstream comprising: a first logical unit comprising a group of frame portions; - a second logical unit comprising a filter parameter set applying to samples of the group of frame portions and a filter parameter set identifier determined based on a first identification information of the filter parameter set and on a second identification information associated with an index of the group of frame portion; and, - a logical unit comprising the association between the index of the group of frame portions and the second identification information.
23. A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to any one of claims 1 to 21, when loaded into and executed by the programmable apparatus.
24. A computer-readable storage medium storing instructions of a computer program for implementing a method according to any one of claims 1 to 21.
25. A computer program which upon execution causes the method of any one of claims 1 to 21 to be performed.