WO2019159820A1

WO2019159820A1 - Moving image encoding device and moving image decoding device

Info

Publication number: WO2019159820A1
Application number: PCT/JP2019/004497
Authority: WO
Inventors: 将伸八杉; 知宏猪飼; 友子青野
Original assignee: シャープ株式会社
Priority date: 2018-02-14
Filing date: 2019-02-07
Publication date: 2019-08-22

Abstract

When each tile is independently encoded/decoded while suppressing a reduction in encoding efficiency, distortion occurs in tile boundaries. This moving image decoding device, which divides an image into tiles and decodes a moving image by tile unit, is provided with: a header information decoding unit which decodes header information from an encoded stream and calculates tile information; a tile decoding unit which decodes encoded data for each tile and generates a decoded image of the tile; and a synthesis unit which synthesizes the decoded images of the tiles with reference to the tile information to generate a display image, wherein the tile includes an area overlapping an adjacent tile, and the synthesis unit filtering-processes a plurality of pixel values of each pixel in the overlapping area, and generates the display image by using pixel values of the decoded image of the tile and the filtering-processed pixel values.

Description

Video encoding apparatus and video decoding apparatus

One embodiment of the present invention relates to a video decoding device and a video encoding device.

In order to efficiently transmit or record a moving image, a moving image encoding device that generates encoded data by encoding the moving image, and a moving image that generates a decoded image by decoding the encoded data An image decoding device is used.

Specific examples of the moving image encoding method include a method proposed in H.264 / AVC and HEVC (High-Efficiency Video Coding).

In such a moving image coding system, an image (picture) constituting a moving image is a slice obtained by dividing the image, a coding tree unit (CTU: Coding Tree Unit obtained by dividing the slice). ), A coding unit obtained by dividing the coding tree unit (sometimes called a coding unit (CU)), and a prediction unit that is a block obtained by dividing the coding unit (PU: PredictionUnit) and a hierarchical structure composed of transform units (TU: Transform Unit) are managed and encoded / decoded for each CU.

In such a moving image coding method, a predicted image is usually generated based on a local decoded image obtained by encoding / decoding an input image, and the predicted image is generated from the input image (original image). A prediction residual obtained by subtraction (sometimes referred to as “difference image” or “residual image”) is encoded. Examples of the method for generating a predicted image include inter-screen prediction (inter prediction) and intra-screen prediction (intra prediction) (Non-Patent Document 1).

In recent years, with the evolution of processors such as multi-core CPUs and GPUs, configurations and algorithms that facilitate parallel processing have been adopted in video encoding and decoding processing. As an example of a configuration that facilitates parallelization, a screen (picture) division unit called a tile is introduced. Unlike a slice, a tile is obtained by dividing a picture into rectangular areas, and can be encoded and decoded independently for each tile (Patent Document 1, Non-Patent Document 2).

Furthermore, in recent years, the resolution of moving images, such as 4K, 8K, VR, and 360-degree moving images such as 360-degree moving images, has been increasing, and the standardization of projection formats has been progressing. Reference 3). When viewing these on a smartphone or HMD (Head-Mount-Display), a part of the high-resolution video is cut out and displayed on the display. Smartphones and HMDs are not large in battery capacity, and a mechanism is expected that allows users to view a video with a minimum of decoding processing by extracting a part of the area required for display.

Japanese Patent Gazette "Patent 6241504"

As described above, a tile is obtained by dividing a picture into rectangular areas, and can be decoded in the spatial and temporal directions without referring to information outside the tile (prediction mode, MV, pixel value). However, since the adjacent tile information of the target tile and the adjacent tile information of the collocated tile (the tile at the same position on the picture different from the target tile) are not referred to at all, distortion caused by discontinuity of the tile boundary ( In the following, this will be referred to as tile distortion), and the tile distortion is very easy to visually recognize. Also, the coding efficiency is reduced.

In addition, there is a limitation that the tile size is an integer multiple of the CTU, and it is difficult to divide into the same size for load balancing and to configure tiles that match the face size of 360-degree movies. .

Therefore, the present invention has been made in view of the above problems, and its purpose is to independently encode and decode each tile in the spatial direction and the temporal direction while suppressing a decrease in encoding efficiency. It is to provide a mechanism for removing or suppressing tile distortion. It also provides tile partitioning that is not limited to an integer multiple of the CTU.

A moving image decoding apparatus according to an aspect of the present invention is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information. A header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image The tile is composed of a tile active area that is a unit for dividing a picture without overlapping and a hidden area (tile extension area), and the tile active area is added to the tile active area. Is decoded in units of CTUs.

According to one aspect of the present invention, there are provided a mechanism for ensuring the independence of decoding of each tile and a mechanism for removing and suppressing tile distortion in a moving image. As a result, it is possible to greatly reduce the amount of processing when selecting and decoding an area necessary for display or the like, and it is possible to display an image without distortion at the tile boundary.

It is the schematic which shows the structure of the image transmission system which concerns on this embodiment. It is a figure which shows the hierarchical structure of the data of the encoding stream which concerns on this embodiment. It is a figure explaining a tile. It is a syntax table regarding tile information and the like. It is another syntax table regarding tile information and the like. It is a figure explaining the reference of the time direction of a tile. This is an example in which a picture is divided into M * N tiles with duplication allowed. It is a figure explaining the filter process of the overlap area | region of the tile adjacent to a horizontal direction. It is a block diagram which shows the structure of the moving image decoding apparatus which concerns on this invention. It is a figure which shows the structure of the tile decoding part which concerns on this embodiment. It is a block diagram which shows the structure of the moving image encoder which concerns on this invention. It is a block diagram which shows the structure of the tile encoding part which concerns on this embodiment. This is another example in which a picture is divided into M * N tiles allowing duplication. It is a flowchart explaining operation | movement of a moving image encoder and a moving image decoder. It is an example of the table of a weighting coefficient. It is a figure explaining the filter process of the overlap area | region of the tile adjacent to a perpendicular direction. It is an example which packs a projection image and produces | generates a two-dimensional image. It is another example which produces | generates a two-dimensional image by packing a projection image. It is another example which produces | generates a two-dimensional image by packing a projection image. It is another syntax table regarding tile information and the like. It is a figure which shows the tile division | segmentation of the picture when a tile size is an integral multiple of CTU, and the CTU division | segmentation of a tile. It is a figure which shows the tile division | segmentation of the picture which concerns on this embodiment, and the CTU division | segmentation of a tile. This is a syntax example of slice data and CTU data when the tile size is an integral multiple of CTU. It is a syntax example of slice data and CTU data according to the present embodiment. This is a syntax for explaining an example of dividing a picture into tiles regardless of multiples of CTUs. It is a figure explaining an example which divides | segments a picture into a tile irrespective of the multiple of CTU. It is a figure explaining another example which divides | segments a picture into a tile irrespective of the multiple of CTU. It is the syntax explaining another example which divides | segments a picture into a tile irrespective of the multiple of CTU. This is a syntax of an example of quadtree division of CTU when a picture is divided into tiles regardless of multiples of CTU. This is a syntax of an example of binary tree division of CTU when a picture is divided into tiles regardless of multiples of CTU. It is a figure explaining another example which divides | segments a picture into a region and a tile irrespective of the multiple of CTU. This is a syntax for explaining an example of dividing a region into tiles regardless of a multiple of CTU. This is an example of the syntax of a CTU when a picture is divided into tiles regardless of a multiple of the CTU. This is a syntax for explaining another example of dividing a region into tiles regardless of a multiple of CTU. It is an example explaining the notification method of the tile of an invalid area | region. It is a figure explaining another example which divides | segments a picture into a tile irrespective of the multiple of CTU. It is the syntax explaining another example which divides | segments a picture into a tile irrespective of the multiple of CTU. It is the figure shown about the structure of the transmitter which mounts the moving image encoder which concerns on this embodiment, and the receiver which mounts a moving image decoder. (A) shows a transmitting apparatus equipped with a moving picture coding apparatus, and (b) shows a receiving apparatus equipped with a moving picture decoding apparatus. It is the figure shown about the structure of the recording device carrying the moving image encoder which concerns on this embodiment, and the reproducing | regenerating apparatus carrying a moving image decoding apparatus. (A) shows a recording apparatus equipped with a moving picture coding apparatus, and (b) shows a reproduction apparatus equipped with a moving picture decoding apparatus.

(Embodiment 1)
Hereinafter, embodiments of the present invention will be described with reference to the drawings.

FIG. 1 is a schematic diagram showing a configuration of an image transmission system 1 according to the present embodiment.

The image transmission system 1 is a system that transmits a code obtained by encoding an encoding target image, decodes the transmitted code, and displays an image. The image transmission system 1 includes a moving image encoding device (image encoding device) 11, a network 21, a moving image decoding device (image decoding device) 31, and a moving image display device (image display device) 41.

The image T is input to the moving image encoding device 11.

The network 21 transmits the encoded stream Te generated by the video encoding device 11 to the video decoding device 31. The network 21 is the Internet, a wide area network (WAN: Wide Area Network), a small network (LAN: Local Area Network), or a combination thereof. The network 21 is not necessarily limited to a bidirectional communication network, and may be a unidirectional communication network that transmits broadcast waves such as terrestrial digital broadcasting and satellite broadcasting. The network 21 may be replaced with a storage medium that records an encoded stream Te such as a DVD (Digital Versatile Disc) or a BD (Blue-ray Disc).

The video decoding device 31 decodes each of the encoded streams Te transmitted by the network 21, and generates one or a plurality of decoded images Td that are respectively decoded.

The moving image display device 41 displays all or a part of one or a plurality of decoded images Td generated by the moving image decoding device 31. The moving image display device 41 includes, for example, a display device such as a liquid crystal display or an organic EL (Electro-luminescence) display. Examples of the display form include stationary, mobile, and HMD.

<Operator>
The operators used in this specification are described below.

>> is right bit shift, << is left bit shift, & is bitwise AND, | is bitwise OR, | = is OR assignment operator.

X? Y: z is a ternary operator that takes y when x is true (non-zero) and takes z when x is false (0).

Clip3 (a, b, c) is a function that clips c to a value between a and b, and returns a if c <a, b if c> b, otherwise Is a function that returns c (where a <= b).

Abs (a) is a function that returns the absolute value of a.

Int (a) is a function that returns an integer value of a.

Floor (a) is a function that returns the largest integer less than or equal to a.

Ceil (a) is a function that returns the smallest integer greater than or equal to a.

A / d represents the division of a by d.

<Structure of encoded stream Te>
Prior to detailed description of the video encoding device 11 and the video decoding device 31 according to the present embodiment, data of the encoded stream Te generated by the video encoding device 11 and decoded by the video decoding device 31. The structure will be described.

FIG. 2 is a diagram showing a hierarchical structure of data in the encoded stream Te. The encoded stream Te illustratively includes a sequence and a plurality of pictures constituting the sequence. (A) to (f) of FIG. 2 respectively show an encoded video sequence defining a sequence SEQ, an encoded picture defining a picture PICT, an encoded slice defining a slice S, and an encoded slice defining a slice data It is a figure which shows the coding tree unit (Coding | Tree | Unit: CTU) contained in data and coding slice data, and the coding unit (Coding | Unit: CU) contained in CTU.

(Encoded video sequence)
In the encoded video sequence, a set of data referred to by the video decoding device 31 for decoding the sequence SEQ to be processed is defined. As shown in FIG. 2A, the sequence SEQ includes a video parameter set VPS (Video Parameter Set), a sequence parameter set SPS (Sequence Parameter Set), a picture parameter set PPS (Picture Parameter Set), a picture PICT, It includes supplemental enhancement information (SEI).

The video parameter set VPS is a set of encoding parameters common to a plurality of moving images in a moving image composed of a plurality of layers, and encoding related to a plurality of layers and individual layers included in the moving image. A set of parameters is defined.

In the sequence parameter set SPS, a set of encoding parameters referred to by the video decoding device 31 for decoding the target sequence is defined. For example, the width and height of the picture are defined. A plurality of SPSs may exist. In that case, one of a plurality of SPSs is selected from the PPS.

In the picture parameter set PPS, a set of encoding parameters referred to by the video decoding device 31 for decoding each picture in the target sequence is defined. For example, a quantization width reference value (pic_init_qp_minus26) used for picture decoding and a flag (weighted_pred_flag) indicating application of weighted prediction are included. There may be a plurality of PPSs. In that case, one of a plurality of PPSs is selected from each slice header in the target sequence.

(Encoded picture)
In the coded picture, a set of data referred to by the video decoding device 31 for decoding the picture PICT to be processed is defined. The picture PICT includes slices S0 to SNS-1 as shown in FIG. 2B (NS is the total number of slices included in the picture PICT).

In addition, hereinafter, when it is not necessary to distinguish each of the slices S0 to SNS-1, the subscripts may be omitted. The same applies to data included in an encoded stream Te described below and to which other subscripts are attached.

(Encoded slice)
In the coded slice, a set of data referred to by the video decoding device 31 for decoding the slice S to be processed is defined. As shown in FIG. 2C, the slice S includes a slice header SH and slice data SDATA.

The slice header SH includes a coding parameter group that is referred to by the video decoding device 31 in order to determine a decoding method of the target slice. Slice type designation information (slice_type) for designating a slice type is an example of an encoding parameter included in the slice header SH.

As slice types that can be specified by the slice type specification information, (1) I slice using only intra prediction at the time of encoding, (2) P slice using unidirectional prediction or intra prediction at the time of encoding, (3) B-slice using unidirectional prediction, bidirectional prediction, or intra prediction at the time of encoding may be used. Note that inter prediction is not limited to single prediction and bi-prediction, and a predicted image may be generated using more reference pictures. Hereinafter, the P and B slices refer to slices including blocks that can use inter prediction.

Note that the slice header SH may include a reference (pic_parameter_set_id) to the picture parameter set PPS included in the encoded video sequence.

(Encoded slice data)
In the encoded slice data, a set of data referred to by the video decoding device 31 for decoding the slice data SDATA to be processed is defined. The slice data SDATA includes a coding tree unit CTU (CTU block) as shown in FIG. A CTU is a block of a fixed size (for example, 64x64) that constitutes a slice, and is sometimes called a maximum coding unit (LCU: Large Coding Unit).

(Encoding tree unit)
In FIG. 2 (e), a set of data referred to by the video decoding device 31 for decoding the CTU to be processed is defined. The CTU is divided into coding units CU which are basic units of the coding process by recursive quadtree division (QT division) or binary tree division (BT division). A tree structure obtained by recursive quadtree partitioning or binary tree partitioning is called a coding tree (CT), and a node of the tree structure is called a coding node (CN). An intermediate node of the quadtree and the binary tree is a CN, and the CTU itself is also defined as the highest CN.

CT includes, as CT information, a QT split flag (cu_split_flag) indicating whether or not to perform QT split, and a BT split mode (split_bt_mode) indicating a split method of BT split. cu_split_flag and / or split_bt_mode are transmitted for each CN. When cu_split_flag is 1, CN is divided into four CNs. When cu_split_flag is 0, when split_bt_mode is 1, CN is horizontally divided into two CNs, when split_bt_mode is 2, CN is vertically divided into two CNs, and when split_bt_mode is 0, CN is not divided and has one CU as a node. CU is a terminal node (leaf node) of CN and is not further divided.

If the CTU size is 64x64 pixels, the CU size is 64x64 pixels, 64x32 pixels, 32x64 pixels, 32x32 pixels, 64x16 pixels, 16x64 pixels, 32x16 pixels, 16x32 pixels, 16x16 pixels, 64x8 pixels, 8x64 pixels , 32x8 pixels, 8x32 pixels, 16x8 pixels, 8x16 pixels, 8x8 pixels, 64x4 pixels, 4x64 pixels, 32x4 pixels, 4x32 pixels, 16x4 pixels, 4x16 pixels, 8x4 pixels, 4x8 pixels, and 4x4 pixels .

(Encoding unit)
In FIG. 2F, a set of data referred to by the video decoding device 31 for decoding the CU to be processed is defined. Specifically, the CU includes a prediction tree (PT: Prediction Tree), a transform tree (TT: Transform Tree), and a CU header CUH. In the CU header, a prediction mode, a division method (PU division mode), and the like are defined.

In PT, prediction parameters (reference picture index, motion vector, etc.) of each prediction unit (PU: Prediction Unit) obtained by dividing CU into one or a plurality are defined. In other words, the PU is one or more non-overlapping areas constituting the CU. The PT includes one or more PUs obtained by the above division. Hereinafter, a prediction unit obtained by further dividing the PU is referred to as a “sub-block”. The sub block is composed of a plurality of pixels. When the size of the PU and the sub block is equal, there is one sub block in the PU. If the PU is larger than the size of the sub-block, the PU is divided into sub-blocks. For example, when the PU is 8x8 and the sub-block is 4x4, the PU is divided into four sub-blocks that are divided into two horizontally and vertically divided into two.

The prediction process may be performed for each PU (or sub block).

There are roughly two types of prediction in PT: intra prediction and inter prediction. Intra prediction is prediction within the same picture, and inter prediction refers to prediction processing performed between different pictures (for example, between display times and between layer images).

In the case of intra prediction, there are 2Nx2N (the same size as the encoding unit) and NxN division methods.

In the case of inter prediction, the division method is encoded by the PU division mode (part_mode) of encoded data, and 2Nx2N (same size as the encoding unit), 2NxN, 2NxnU, 2NxnD, Nx2N, nLx2N, nRx2N, NxN, etc. There is. 2NxN and Nx2N indicate 1: 1 symmetrical division, and 2NxnU, 2NxnD and nLx2N and nRx2N indicate 1: 3 and 3: 1 asymmetric division. The PUs included in the CU are expressed as PU0, PU1, PU2, and PU3 in this order.

In TT, a CU is divided into one or a plurality of transform units (TU: Transform Unit), and the position and size of each TU are defined. In other words, the TU is one or more non-overlapping areas constituting the CU. Further, TT includes one or a plurality of TUs obtained by the above division.

There are two types of partitioning in TT: one that allocates an area of the same size as the CU as a TU, and one that uses recursive quadtree partitioning, similar to the above-described CU partitioning.

Conversion processing is performed for each TU.

(Prediction parameter)
The predicted image of the PU is derived by a prediction parameter associated with the PU. The prediction parameters include a prediction parameter for intra prediction or a prediction parameter for inter prediction. Hereinafter, prediction parameters for inter prediction (inter prediction parameters) will be described. The inter prediction parameter includes prediction list use flags predFlagL0 and predFlagL1, reference picture indexes refIdxL0 and refIdxL1, and motion vectors mvL0 and mvL1. The prediction list use flags predFlagL0 and predFlagL1 are flags indicating whether or not reference picture lists called L0 list and L1 list are used, respectively. When the value is 1, a corresponding reference picture list is used. In this specification, when “flag indicating whether or not it is XX” is described, when the flag is not 0 (for example, 1) is XX, 0 is not XX, and logical negation, logical product, etc. 1 is treated as true and 0 is treated as false (the same applies hereinafter). However, other values can be used as true values and false values in an actual apparatus or method.

(Reference picture list)
The reference picture list is a list including reference pictures stored in the reference picture memory 306.

(Merge prediction and AMVP prediction)
The prediction parameter decoding (encoding) method includes a merge prediction (merge) mode and an AMVP (Adaptive Motion Vector Prediction) mode. The merge flag merge_flag is a flag for identifying these. The merge mode is a mode in which the prediction list use flag predFlagLX (or inter prediction identifier inter_pred_idc), the reference picture index refIdxLX, and the motion vector mvLX are not included in the encoded data and are derived from the prediction parameters of already processed neighboring PUs. The AMVP mode is a mode in which the inter prediction identifier inter_pred_idc, the reference picture index refIdxLX, and the motion vector mvLX are included in the encoded data. Note that the motion vector mvLX is encoded as a prediction vector index mvp_lX_idx for identifying the prediction vector mvpLX and a difference vector mvdLX.

(Motion vector)
The motion vector mvLX indicates a shift amount between blocks on two different pictures. A prediction vector and a difference vector related to the motion vector mvLX are referred to as a prediction vector mvpLX and a difference vector mvdLX, respectively.

(Intra prediction)
The intra prediction parameter is a parameter used for processing of predicting a CU with information in a picture, for example, an intra prediction mode IntraPredMode, and the luminance intra prediction mode IntraPredModeY and the color difference intra prediction mode IntraPredModeC may be different. There are, for example, 67 types of intra prediction modes, which include planar prediction, DC prediction, and angular (direction) prediction. As the color difference prediction mode IntraPredModeC, for example, any one of planar prediction, DC prediction, Angular prediction, direct mode (a mode using a luminance prediction mode), and LM prediction (a mode in which linear prediction is performed from luminance pixels) is used.

The luminance intra prediction mode IntraPredModeY is derived using an MPM (Most Probable Mode) candidate list composed of intra prediction modes estimated to have a high probability of being applied to the target block, and prediction modes not included in the MPM candidate list May be derived from REM. Which method is used is notified by the flag prev_intra_luma_pred_flag, and in the former case, IntraPredModeY is derived using the index mpm_idx and the MPM candidate list derived from the intra prediction mode of the adjacent block. In the latter case, the intra prediction mode is derived using the flag rem_selected_mode_flag and the modes rem_selected_mode and rem_non_selected_mode.

The color difference intra prediction mode IntraPredModeC is derived using a flag not_lm_chroma_flag indicating whether to use LM prediction, or is derived using a flag not_dm_chroma_flag indicating whether to use the direct mode. It may be derived using an index chroma_intra_mode_idx that directly specifies the prediction mode.

(Loop filter)
The loop filter is a filter provided in the encoding loop, which removes block distortion and ringing distortion and improves image quality. The loop filter mainly includes a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF).

The deblocking filter is a pixel for luminance and color difference components with respect to a block boundary when a difference between pre-deblock pixel values of pixels of luminance components adjacent to each other via the block boundary is smaller than a predetermined threshold. Is subjected to a deblocking process, thereby filtering an image near the block boundary.

SAO is a filter that is applied after the deblocking filter and has the effect of removing ringing distortion and quantization distortion. SAO is a process in units of CTUs, and is a filter that classifies pixel values into several categories and adds or subtracts offsets in units of pixels for each category. In the SAO edge offset (EO) processing, an offset value to be added to the pixel value is determined according to the magnitude relationship between the target pixel and the adjacent pixel (reference pixel).

The ALF generates an ALF-completed decoded image by performing an adaptive filtering process using an ALF parameter (filter coefficient) ALFP decoded from the encoded stream Te on the ALF pre-decoded image.

(Entropy coding)
Entropy coding includes a variable length coding method using a context (probability model) adaptively selected according to the type of syntax and the surrounding situation, a predetermined table, or a calculation formula. There is a method for variable-length coding of syntax using. In the former CABAC (Context Adaptive Binary Arithmetic Coding), an updated probability model is stored in memory for each encoded or decoded picture. In the P picture or B picture using the subsequent inter prediction, the initial state of the context of the target picture uses the same slice type and the same slice level quantization parameter from the probability model stored in the memory. A picture probability model is selected and used for encoding and decoding.

(tile)
FIG. 3A is a diagram illustrating an example in which a picture is divided into N tiles (solid-line rectangle, N = 9 is an example). The tile is further divided into a plurality of CTUs (dashed rectangles). As shown in the center of FIG. 3A, the upper left coordinate of the tile is denoted as (xTs, yTs), the width as wT, and the height as hT. The width of the picture is written as wPict and the height is written as hPict. Note that information on the number of tile divisions and the size is referred to as tile information, and details will be described later. The unit of xTs, yTs, wT, hT, wPict, and hPict is a pixel. The picture width and height are set in pic_width_in_luma_samples and pic_height_in_luma_samples, which are notified by sequence_parameter_set_rbsp () (referred to as SPS) shown in FIG.

wPict = pic_width_in_luma_samples
hPict = pic_height_in_luma_samples
FIG. 3B is a diagram showing the CTU encoding and decoding order when a picture is divided into tiles. The number described in each tile is TileId (the identifier of the tile in the picture), and the number TileId may be assigned to the tile in the picture from the upper left to the lower right in the raster scan order. Further, the CTU is processed in the order of raster scanning from the upper left to the lower right in each tile, and when the processing in one tile is completed, the CTU in the next tile is processed.

FIG. 3 (c) is a diagram showing tiles continuous in the time direction. As shown in FIG. 3C, the video sequence is composed of a plurality of pictures that are continuous in the time direction. The tile sequence is composed of tiles at one or more times that are continuous in the time direction. Tile (n, tk) in the figure represents a tile with TileId = n at time tk. Note that CVS (Coded | Video | Sequence) in a figure is a picture group from a certain intra picture to the picture immediately before another intra picture in decoding order.

FIG. 4 shows an example of syntax related to tile information and the like.

The parameter tile_parameters () related to the tile is notified by PPS (pic_parameter_set_rbsp ()) shown in FIG. 4 (b). Hereinafter, to notify the parameter means to include the parameter in the encoded data (bitstream). The moving image encoding apparatus encodes the parameter, and the moving image decoding apparatus decodes the parameter. As shown in FIG. 4C, when tile_enabled_flag indicating whether or not a tile is 1, tile_parameters () is notified of tile information tile_info (). Also, when tile_enabled_flag is 1, independent_tiles_flag indicating whether or not tiles can be decoded independently over a plurality of temporally continuous pictures is notified. When independent_tiles_flag is 0, tiles are decoded with reference to adjacent tiles in the reference picture (cannot be independently decoded). When independent_tiles_flag is 1, decoding is performed without referring to adjacent tiles in the reference picture. When tiles are used, decoding is performed without referring to adjacent tiles in the target picture regardless of the value of independent_tiles_flag, so that a plurality of tiles can be decoded in parallel. As shown in FIG. 4 (c), when independent_tiles_flag is 0, loop_filter_across_tiles_enable_flag indicating the on / off of the loop filter at the tile boundary applied to the reference picture is transmitted (present). When independent_tiles_flag is 1, loop_filter_across_tiles_enable_flag may not be transmitted (present) and may always be 0.

When tiles are processed independently through a sequence, the independent tile flag independent_tiles_flag may be notified by SPS as shown in FIG. 4 (a). The independent_tiles_flag will be described later.

The tile information tile_info () is, for example, num_tile_columns_minus1, num_tile_rows_minus1, uniform_spacing_flag, column_width_minus1 [i], row_height_minus1 [i] as shown in FIG. 4 (d), and may include overlap_tiles_flag and overlap information. Here, num_tile_columns_minus1 and num_tile_rows_minus1 are values obtained by subtracting 1 from the number of horizontal and vertical tiles M and N in the picture, respectively. uniform_spacing_flag is a flag indicating whether or not a picture is equally tiled. When the uniform_spacing_flag value is 1, the width and height of each tile of the picture are set to be equal, so the tile width is determined from the number of tiles in the picture in the horizontal and vertical directions. The height can be derived.

M = num_tile_columns_minus1 + 1
N = num_tile_rows_minus1 + 1
wT [m] = floor (wPict / M) (0 <= m <M-1) (Formula TSP-1)
wT [M-1] = wPict-Σ (wT [m]) (0 <= m <M-1)
hT [n] = floor (hPict / N) (0 <= n <N-1)
hT [N-1] = hPict-Σ (hT [n]) (0 <= n <N-1)
Alternatively, wT [m] and hT [n] may be expressed by the following equations.

wT [m] = ceil (wPict / M) (0 <= m <M-1) (Formula TSP-2)
hT [n] = ceil (hPict / N) (0 <= n <N-1)
Alternatively, wT [m] and hT [n] may be expressed by the following equations.

for (m = 0; m <M; m ++)
wT [m] = ((m + 1) * wPict) / M- (m * wPict) / M (Formula TSP-3)
for (n = 0; n <N; n ++)
hT [n] = ((n + 1) * hPict) / N- (n * hPict) / N
The tile size may be a multiple of a tile unit size (minimum tile size) wUnitTile and hUnitTile. In this case, it derives below.

wT [m] = floor (wPict / M / wUnitTile) * wUnitTile (0 <= m <M) (Formula TSP-4)
hT [n] = floor (hPict / N / hUnitTile) * hUnitTile (0 <= n <N)
Or you may express with the following formula.

wT [m] = ceil (wPict / M / wUnitTile) * wUnitTile (0 <= m <M) (Formula TSP-5)
hT [n] = ceil (hPict / N / hUnitTile) * hUnitTile (0 <= n <N)
for (m = 0; m <M; m ++)
wT [m] = ((m + 1) * wPict / M / wUnitTile-m * wPict / M / wUnitTile) * wUnitTIle (Formula TSP-6)
for (n = 0; n <N; n ++)
hT [n] = ((n + 1) * hPict / N / hUnitTIle-n * hPict / N / hUnitTIle) * hUnitTile
If wPict and hPict are not integer multiples of M and N, respectively, it is preferable to allocate the surplus number of pixels to a part of wT [m] or hT [n]. For example, when wPict = 500 and M = 3, two pixels are left, so wT [0] and wT [1] are increased by one. Alternatively, wT [M-1] and wT [M-2] are increased by 1 in reverse order from M-1. Alternatively, specific elements such as wT [0] and wT [M-1] may be increased by 2.
If the uniform_spacing_flag value is 0, the width and height of each tile in the picture need not be set equal. In the moving image encoding apparatus, the width of each tile column_width_minus1 [i] (value expressed in wUnitTile in wT in FIG. 3) and the height row_height_minus1 [i] (in FIG. 3, hT expressed in hUnitTile as a unit) Value) for each tile. In the moving picture decoding apparatus, the tile sizes wT [m] and hT [n] are decoded as follows for each tile based on the encoded (column_width_minus1 [], row_width_minus1 []).

wT [m] = (column_width_minus1 [m] +1) * wUnitTile (0 <= m <M) (Formula TSP-7)
hT [n] = (row_height_minus1 [m] +1) * hUnitTile (0 <= n <N)
Here, wUnitTile and hUnitTile are the unit size (minimum size) of the tile. In addition, the tile size may be an integral multiple of the minimum CU size MIN_CU_SIZE (= 1 << log2CUSize) (wUnitTile = hUnitTile = MIN_CU_SIZE), and the tile sizes wT [m] and hT [n] may be decoded as follows: .

wT [m] = ((column_width_minus1 [m] +1) << log2CUSize) (0 <= m <M) (Formula TSP-8)
hT [n] = ((row_height_minus1 [m] +1) << log2CUSize) (0 <= n <N)
Further, the tile size may be an integral multiple of the CTU size (wCTU, hCTU) (wUnitTile = wCTU, hUnitTile = hCTU), and the tile sizes wT [m] and hT [n] may be decoded as follows.

wT [m] = (column_width_minus1 [m] +1) * wCTU (0 <= m <M) (Formula TSP-9)
hT [n] = (row_height_minus1 [m] +1) * hCTU (0 <= n <N)
overlap_tiles_flag indicates whether or not an area near a tile boundary overlaps with an adjacent tile. When overlap_tiles_flag is 1, it indicates that it overlaps with an adjacent tile, and overlap information overlap_tiles_info () shown in FIG. 5 (f) is notified. When overlap_tiles_flag is 0, it does not overlap with adjacent tiles. Here, the overlap means that two or more tiles include a region of the same image, and the overlap region indicates a region included in two or more tiles.

The overlap information overlap_tiles_info () includes uniform_overlap_flag and information indicating the width and height of the overlap area. uniform_overlap_flag is a flag indicating whether the width or height of the overlap area of each tile is equal. When all the widths or all the heights of the overlap area of each tile are equal, uniform_overlap_flag is set to 1, and syntaxes tile_overlap_width_div2 and tile_overlap_height_div2 indicating the width and height of the overlap area are notified. If the width or height of the overlap area of each tile is different, set uniform_overlap_flag to 0 and notify the syntax tile_overlap_width_div2 [m] and tile_overlap_height_div2 [n] indicating the width and height of the overlap area of each tile . When uniform_overlap_flag is 1, the following relationship holds.

tile_overlap_width_div2 [m] = tile_overlap_width_div2 (0 <= m <M-1)
tile_overlap_height_div2 [n] = tile_overlap_height_div2 (0 <= n <N-1)
The relationship between the actual overlap area width wOVLP and height hOVLP is shown by the following equation. These units are pixels.

wOVLP = tile_overlap_width_div2 [m] * 2
hOVLP = tile_overlap_height_div2 [n] * 2
If there is no overlap, overlap_tiles_flag is set to 0, and the width and height of the overlap area are set to 0. When overlap_tiles_flag is 0, tile_overlap_width_div2 and tile_overlap_height_div2 are not included in the encoded data, and tile_overlap_width_div2 = 0 and tile_overlap_height_div2 = 0 are derived.

In the above, considering the case of YUV4: 2: 0, the width and height of the overlap area were multiples of 2, but the overlap area height and YUV4: 4: 4 in the case of YUV4: 2: 2 In this case, the width and height of the overlap region may be notified in units of pixels without making each pixel unit a multiple of two. The parameters represented by "_div2" are also expressed in 2 pixel units or 1 pixel unit depending on the color difference format (4: 2: 0, 4: 2: 2, 4: 4: 4). You may switch whether to represent.

The tile identifier TileId of the position (m, n) may be calculated as follows.

TileId = n * M + m
Alternatively, when TileId is known, (m, n) indicating the position of the tile may be calculated from TileId.

m = TileId% M
n = TileId / M
In the slice data (slice_segment_data ()) of FIG. 5 (g), M * N tiles starting from the position of (xTsmn, yTsmn) on the picture using the tile information notified by the PPS shown in FIG. 4 (b) Every time, the tile syntax Tile (m, n) is notified. Specifically, with (xTsmn, yTsmn) on the picture as the upper left coordinates (0, 0) of each tile, the tile is divided into CTUs (width wCTU, height hCTU) as shown in FIG. The encoded data coding_quadtree () of each CTU may be notified. Here, (xTsmn, yTsmn) is (xTs00, yTs00) to (xTsM-1 N-1, yTsM-1 N-1).

Note that tile_info () shown in FIG. 25 may be notified instead of tile_info () shown in FIG. 4 (d). The difference between tile_info () in FIG. 4 (d) and tile_info () in FIG. 25 (a) is that in FIG. 4 (d), column_width_minus1 [which represents the tile width and height in the minimum tile unit or CTU unit. i] and row_height_minus1 [i] are notified, but in FIG. 25 (a), when overlap_tiles_flag is not 0, that is, overlap, column_width_in_luma_samples_div2_minus1 [i], row_height_in_luma_samples_div2_minus1 When [i] is notified and overlap_tiles_flag is 0, column_width_minus1 [i] and row_height_minus1 [i] that represent the tile width and height in the minimum tile unit or CTU unit, as in FIG. It is to notify. column_width_in_luma_samples_div2_minus1 [i] and row_height_in_luma_samples_div2_minus1 [i] are values obtained by dividing the width and height of the tile pixel unit by two. In this case, the width wT [m] and the height hT [n] of the tile in pixel units are expressed by the following equations.

wT [m] = column_width_in_luma_samples_div2_minus1 [m] * 2 + 1 (Formula TSP-10)
hT [n] = row_height_in_luma_samples_div2_minus1 [n] * 2 + 1
Note that column_width_in_luma_samples_div2_minus1 [m] and row_height_in_luma_samples_div2_minus1 [n] * 2 are either expressed in units of 2 pixels depending on the color difference format (4: 2: 0, 4: 2: 2, 4: 4: 4), or 1 It may be switched whether it is expressed in units of pixels.

Also, column_width_in_luma_samples_div2_minus1 [i] and row_height_in_luma_samples_div2_minus1 [i] may be fixed length encoding (f (n)) instead of variable length encoding (ue (v)). Since the values are expressed in units of pixels, the value of these syntaxes tends to be large, and the code amount is smaller in the fixed-length coding than in the variable-length coding.

In FIG. 25 (a), the unit of the width and height of the tile is switched depending on whether or not there is an overlap area. You may switch.

In FIG. 5 (f), a value obtained by dividing the overlap width and height in pixel units by 2 is notified as shown in the following equation.

wOVLP [m] = tile_overlap_width_div2 * 2 (Formula OVLP-1)
hOVLP [m] = tile_overlap_height_div2 * 2
In addition, as shown in FIG. 25B, a value obtained by subtracting 1 from the width and height of the overlap in pixel units may be notified.

wOVLP [m] = tile_overlap_width_minus1 + 1 (Formula OVLP-2)
hOVLP [m] = tile_overlap_height_minus1 + 1
(Tile boundary limit)
Since tile information is notified by PPS, the position and size of tiles can be changed for each picture. On the other hand, when the tile sequence is decoded independently, that is, when tiles having the same TileId can be decoded without referring to information on tiles having different TileId, the tile position and size are not changed for each picture. That is, when each tile refers to a picture (reference picture) at a different time, the same tile division may be applied to all pictures in the CVS. In this case, the tiles having the same TileId are set to have the same upper left coordinate, width, and height throughout all the CVS pictures.

The fact that tile information does not change through CVS is notified by setting the value of tiles_fixed_structure_flag of vui_parameter () shown in FIG. 4 (e) to 1. That is, when the value of tiles_fixed_structure_flag is 1, num_tile_columns_minus1, num_tile_rows_minus1, uniform_spacing_flag, column_width_minus1 [i], row_height_minus1 [i], overlap_tiles_flag, and loop_filter_acrossflags_enabled_tile_enabled_tile_enabled_tile_values are unique. When the value of tiles_fixed_structure_flag is 1, tiles with the same TileId in CVS are over the tile position (tile upper left coordinates, width, height) on the picture, even in pictures with different times (POC: Picture Order Count). The lap information is not changed. Further, when the value of tiles_fixed_structure_flag is 0, the size of the tile sequence may be different depending on the time.

FIG. 4A is a syntax table excerpted from a part of the sequence parameter set SPS. The independent tile flag independent_tiles_flag is a flag indicating whether the tile sequence can be independently encoded and decoded not only in the target picture (in the spatial direction) but also in the temporally continuous sequence (in the temporal direction). When the value of independent_tiles_flag is 1, it means that the tile sequence can be encoded and decoded independently, and the following restrictions may be imposed on the encoding / decoding of tiles and the syntax of encoded data.
(Constraint 1) In CVS, tiles do not refer to information on tiles having different TileIds.
(Constraint 2) The number of tiles in the horizontal and vertical directions, the tile width, the tile height, and the overlap area width and height in the picture notified by PPS through CVS are equal. Within CVS, tiles with the same TileId are not changed in tile position (tile upper left coordinates, width, height) on the picture, even in pictures with different time (POC). The value of tiles_fixed_structure_flag of vui_parameter () is set to 1.

The above (Restriction 1) “Tiles do not refer to information on tiles with different TileIds” will be described in detail.

FIG. 6 is a diagram for explaining reference to tiles in the temporal direction (between different pictures). FIG. 6A shows an example in which the intra picture Pict (t0) at time t0 is divided into N tiles. FIG. 6B shows an example in which the inter picture Pict (t1) at time t1 = t0 + 1 is divided into N tiles. Pict (t1) refers to Pict (t0). In the figure, Tile (n, t) represents a tile of TileId = n (n = 0..N-1) at time t. From the above (Constraint 2), the upper left coordinates, width, and height of the tile of TileId = n are equal at any time.

In FIG. 6 (b), CU1, CU2, and CU3 in tile Tile (n, t1) refer to blocks BLK1, BLK2, and BLK3 in FIG. 6 (a). In this case, BLK1 and BLK3 are blocks included in tiles outside the tile Tile (n, t0). To refer to them, not only Tile (n, t0) but also the entire Pict (t0) at time t0 Need to be decrypted. In other words, tile Tile (n, t1) cannot be decoded by simply decoding the tile sequence corresponding to TileId = n at times t0 and t1, and tile sequences other than TileId = n can also be decoded. is necessary. Therefore, in order to independently decode the tile sequence, the reference pixel in the reference picture to be referred to when deriving the motion compensated image of the CU in the tile is included in the collocated tile (the tile at the same position on the reference picture). There is a need.

Also, if the value of independent_tiles_flag is 0, it means that the tile sequence may not be decoded independently.

As explained above, by turning off intra prediction and loop filter that refer to pixels outside the tile at the tile boundary, information on tiles adjacent to the target tile is not referred to, and inter prediction is used to specify the range to be referred to. By limiting to collocated tiles, only tiles at arbitrary positions can be encoded / decoded.

In particular, when viewing a high-resolution image such as 8K on a mobile device or viewing a VR or 360-degree video on an HMD, it is common to extract and view only a specific area on the screen. . When only a specific area in the screen is viewed, only a part of the moving image needs to be decoded, so that power consumption required for processing can be suppressed and the viewing time can be extended.

On the other hand, since the tile adjacent to the target tile and the tile adjacent to the collocated tile are not referred to, the pixel value becomes discontinuous at the tile boundary, and tile distortion occurs. In the following, a technique that does not cause tile distortion while encoding and decoding individual tiles independently will be described.

In Embodiment 1 of the present application, when a picture is divided into tiles, as shown in FIG. 7, tiles are generated by dividing an area on the picture while allowing overlap.

FIG. 7 (a) is a diagram in which a picture (width wPict, height hPict) is divided into M * N tiles. The tile at position (m, n) is represented by Tile [m] [n]. Here, 0 <= m <M and 0 <= n <N. In FIG. 7A, M = 3 and N = 2. The width and height of the tile Tile [m] [n] are represented as wT [m] and hT [n], and the upper left coordinates (positions indicated by black circles in FIG. 7A) are represented as (xTsmn, yTsmn). The shaded area in the figure is an area where a plurality of tiles overlap (overlap). The units of wPict, hPict, wT [m], hT [n], xTsmn, and yTsmn are pixels.

FIG. 7B is a diagram showing a relationship between two adjacent tiles Tile [0] [0] and Tile [1] [0]. The hatched area at the right end of Tile [0] [0] is an area that overlaps Tile [1] [0], and the shaded area at the bottom is an area that overlaps Tile [0] [1]. The width wT [0] and height hT [0] of Tile [0] [0] are the width of the tile including the area overlapping with Tile [1] [0] and Tile [0] [1]. Indicates the height. Similarly, the left hatched area of Tile [1] [0] is an area that overlaps Tile [0] [0], and the right hatched area is an area that overlaps Tile [2] [0]. The hatched area at the bottom is an area that overlaps Tile [1] [1]. And the width wT [1] and height hT [0] of Tile [1] [0] are respectively over Tile [0] [0], Tile [2] [0], and Tile [1] [1]. Includes the area to wrap.

That is, the hatched portion on the right side of Tile [0] [0] is an area that is encoded (overlapping) with Tile [0] [0] and Tile [1] [0].

And, in the configuration in which the size of each tile is CTU unit, the width and height of each tile is an integral multiple of the width and height of the CTU.

wT [m] = wCTU * a
hT [n] = hCTU * b
Here, wCTU and hCTU are the width and height of the CTU, and a and b are positive integers. Even if the size of each tile is CTU unit, the width of the tile at the right end of the picture and the height of the tile at the bottom end may not be an integral multiple of the CTU. In this manner, crop offset areas are provided at the right and bottom edges of the picture (horizontal line area in FIG. 7A), and the width and height obtained by adding the tile and the crop offset area are set to integer multiples of the CTU. The crop offset area is not intended to be displayed, and is an area used for increasing the size of the area to be processed for the sake of convenience so as to facilitate processing in units of CTUs. For output, for example, gray (Y, Cb, Cr) = (1 << (bitDepthY-1), 1 << (bitDepthCb-1), 1 << (bitDepthCr-1)) as a pixel value for convenience. Alternatively, the value obtained by padding the pixel values at the right end / bottom end of the picture is set. Further, the upper left coordinates (xTsmn, yTsmn) of each tile at the (m, n) position in tile units are not necessarily a position that is an integer multiple of the CTU. As will be described later, a net display area obtained by subtracting an overlap area indicated by (wOVLP, hOVLP) from a tile effective area indicated by a size of (wT, hT) may be called a tile active area.

For example, if the picture is (wPict, hPict) = (1920,1080), (wCTU, hCTU) = (128,128), overlap area width wOVLP = 4, overlap area height hOVLP = 4, tile information is You may set as follows.

M = 3
N = 2
uniform_spacing_flag = 0
wT [0] = 768
wT [1] = 640
wT [2] = 520
hT [0] = 640
hT [1] = 444
overlap_tiles_flag = 1
uniform_overlap_flag = 1
tile_overlap_width_div2 = 2
tile_overlap_height_div2 = 2
Since column_width_minus1 [2] and row_height_minus1 [1] correspond to integer multiples of the CTU, a crop offset area may be provided and the tile size may be an integer multiple of the CTU size. In this case, the width wCRP [2] and the height hCRP [1] of the crop offset area are set as follows. The unit of wCRP [] and hCRP [] is a pixel.

wCRP [2] = 120
hCRP [1] = 68
When the width wT [2] and height hT [1] of the tile and the width wCRP [2] and height hCRP [1] of the crop offset area are added, the CTU size is obtained.

wT [2] * wCTU + wCRP [2] = 520 + 120 = 640 = 128 * 5
hT [1] * hCTU + hCRP [1] = 444 + 68 = 512 = 128 * 4
The tile size is not limited to the CTU size, and may be a tile unit size (wUnitTile, hUnitTile), an integer multiple of the minimum CU size MIN_CU_SIZE, or the like.

The size of the crop offset area can be derived based on the size of the tile from the constraint that the added value of the size of the tile and the crop offset area is an integer multiple of the CTU.

Also, the upper left coordinates (xTsmn, yTsmn) of each tile in the picture indicated by the tile unit position (m, n) set in raster order are calculated by the following formula. The upper left coordinate of each tile is also the upper left coordinate of the CTU at the beginning of the tile.

xTsmn = ΣwT [m-1] -wOVLP * m (If 1 <= m <M, Σ is the sum of 1 to m) (Formula TLA-1)
0 (when m = 0)
yTsmn = ΣhT [n-1] -hOVLP * n (If 1 <= n <N, Σ is the sum of 1 to n)
0 (when n = 0)
More specifically, it is as follows.

(xTs00, yTs00) = (0,0)
(xTs10, yTs10) = (764,0)
(xTs20, yTs20) = (1400,0)
(xTs01, yTs01) = (0,636)
(xTs11, yTs11) = (764,636)
(xTs21, yTs21) = (1400,636)
As described above, the upper left coordinate of each tile in the picture (the upper left coordinate of the CTU at the head of the tile) is not necessarily an integer multiple of the CTU in the picture.

When encoding / decoding each tile, the overlap region of the tile is encoded / decoded for each tile, and a plurality of decoded images are generated. For example, in FIG. 7A, the overlap region of Tile [0] [0] and Tile [1] [0] is encoded and decoded once for each tile, so that two decoded images are generated. . In addition, since the overlap region of Tile [0] [0] and Tile [0] [1] is encoded and decoded once for each tile, two decoded images are generated. The overlap area of Tile [0] [0], Tile [1] [0], Tile [0] [1], and Tile [1] [1] is encoded and decoded once for each tile. Therefore, four decoded images are generated. For these regions, a composite image (display image) without tile distortion can be generated by performing a composite process (filtering of tile boundaries) after decoding. An example is shown in FIG. In FIG. 8A, a composite image is generated by calculating a weighted sum of two decoded images. A method for synthesizing images will be described later.

(Configuration of video decoding device)
FIG. 9 (a) shows a video decoding device (image decoding device) 31 of the present invention. The moving picture decoding apparatus 31 includes a header information decoding unit 2001, tile decoding units 2002a to 2002n, and a tile synthesis unit 2003.

The header information decoding unit 2001 decodes header information from an encoded stream Te input from the outside and encoded in units of NAL (network abstraction) layers. The header information decoding unit 2001 derives a tile (TileId) necessary for display from control information indicating an image area to be displayed on a display or the like input from the outside. In addition, the header information decoding unit 2001 extracts an encoded tile necessary for display from the encoded stream Te, and transmits the encoded tile to the tile decoding units 2002a to 2002n. Also, the header information decoding unit 2001 transmits tile information (information related to tile division) obtained by decoding the PPS and TileId of the tile decoded by the tile decoding unit 2002 to the tile synthesis unit 2003. Specifically, tile information is num_tile_columns_minus1, num_tile_rows_minus1, uniform_spacing_flag, column_width_minus1 [i], row_height_minus1 [i], overlap_tiles_flag, the number of tiles in the horizontal direction, calculated from the syntax of overlap information, etc. N, tile width wT [m] and height hT [n], overlap area width wOVLP [m] and height hOVLP [n], and the like. Also, the width wCRP [m] and height hCRP [n] of the crop offset area are derived from these pieces of information.

The tile decoding units 2002a to 2002n decode the encoded tiles and transmit the decoded tiles to the tile synthesis unit 2003.

Here, since the tile decoding units 2002a to 2002n perform the decoding process with the tile sequence as one independent video sequence, the tile decoding units 2002a to 2002n do not refer to the prediction information between the tile sequences in time and space when performing the decoding process. . That is, the tile decoding units 2002a to 2002n do not refer to tiles of another tile sequence (having different TileId) when decoding tiles in a certain picture.

In this way, since the tile decoding units 2002a to 2002n each decode the tile, it is possible to decode a plurality of tiles in parallel or to decode only one tile independently. As a result, according to the tile decoding units 2002a to 2002n, it is possible to efficiently execute the decoding process, such as decoding an image necessary for display by executing only the minimum necessary decoding process.

(Configuration of tile decoding unit)
The configuration of the tile decoding units 2002a to 2002n will be described. FIG. 10 is a block diagram showing the configuration of 2002, which is one of the tile decoding units 2002a to 2002n. The tile decoding unit 2002 includes an entropy decoding unit 301, a prediction parameter decoding unit (prediction image decoding device) 302, a loop filter 305, a reference picture memory 306, a prediction parameter memory 307, a prediction image generation unit (prediction image generation device) 308, and inversely. A quantization / inverse transform unit 311 and an adder 312 are included. Note that there is a configuration in which the tile decoding unit 2002 does not include the loop filter 305 in accordance with the tile encoding unit 2012 described later.

The prediction parameter decoding unit 302 includes an inter prediction parameter decoding unit 303 and an intra prediction parameter decoding unit 304. The predicted image generation unit 308 includes an inter predicted image generation unit 309 and an intra predicted image generation unit 310.

Hereinafter, an example in which CTU, CU, PU, and TU are used as processing units will be described. However, the present invention is not limited to this example, and processing may be performed in units of CUs instead of units of TUs or PUs. Alternatively, CTU, CU, PU, and TU may be read as blocks, and processing in units of blocks may be performed.

The entropy decoding unit 301 performs entropy decoding on the coded stream Te input from the outside, and separates and decodes individual codes (syntax elements). The separated code includes a prediction parameter for generating a prediction image and residual information for generating a difference image.

The entropy decoding unit 301 outputs a part of the separated code to the prediction parameter decoding unit 302. Some of the separated codes are, for example, a prediction mode predMode, a PU partition mode part_mode, a reference picture index ref_idx_lX, a prediction vector index mvp_lX_idx, and a difference vector mvdLX. Control of which code is decoded is performed based on an instruction from the prediction parameter decoding unit 302. The entropy decoding unit 301 outputs the quantized transform coefficient to the inverse quantization / inverse transform unit 311. In the coding process, this quantized transform coefficient is DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), KLT (Karyhnen Loeve Transform) ) Etc., and is a coefficient obtained by performing quantization and quantizing.

The inter prediction parameter decoding unit 303 decodes the inter prediction parameter with reference to the prediction parameter stored in the prediction parameter memory 307 based on the code input from the entropy decoding unit 301. Also, the inter prediction parameter decoding unit 303 outputs the decoded inter prediction parameters to the prediction image generation unit 308 and stores them in the prediction parameter memory 307.

The intra prediction parameter decoding unit 304 refers to the prediction parameter stored in the prediction parameter memory 307 on the basis of the code input from the entropy decoding unit 301 and decodes the intra prediction parameter. The intra prediction parameter decoding unit 304 outputs the decoded intra prediction parameter to the prediction image generation unit 308 and stores it in the prediction parameter memory 307.

The loop filter 305 applies filters such as a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) to the decoded image of the CU generated by the adding unit 312. As long as the loop filter 305 is paired with the tile encoding unit 2012, the above three types of filters do not necessarily have to be included, and for example, a configuration with only a deblocking filter may be used.

The reference picture memory 306 stores the decoded image of the CU generated by the adding unit 312 at a predetermined position for each decoding target picture and CTU or CU.

The prediction parameter memory 307 stores the prediction parameter at a predetermined position for each decoding target picture and PU (or sub-block, fixed-size block, pixel). Specifically, the prediction parameter memory 307 stores the inter prediction parameter decoded by the inter prediction parameter decoding unit 303, the intra prediction parameter decoded by the intra prediction parameter decoding unit 304, and the prediction mode predMode separated by the entropy decoding unit 301. .

The prediction image generation unit 308 receives the prediction mode predMode input from the entropy decoding unit 301 and the prediction parameter from the prediction parameter decoding unit 302. Further, the predicted image generation unit 308 reads a reference picture from the reference picture memory 306. The prediction image generation unit 308 generates a prediction image of a PU (block) or a sub-block using the input prediction parameter and the read reference picture (reference picture block) in the prediction mode indicated by the prediction mode predMode.

Here, when the prediction mode predMode indicates the inter prediction mode, the inter prediction image generation unit 309 uses the inter prediction parameter input from the inter prediction parameter decoding unit 303 and the read reference picture (reference picture block). To generate a prediction image of PU (block) or sub-block.

The inter prediction image generation unit 309 performs a motion vector on the basis of the decoding target PU from the reference picture indicated by the reference picture index refIdxLX for a reference picture list (L0 list or L1 list) having a prediction list use flag predFlagLX of 1. The reference picture block at the position indicated by mvLX is read from the reference picture memory 306. The inter predicted image generation unit 309 performs interpolation based on the read reference picture block, and generates a PU predicted image (interpolated image, motion compensated image). The inter prediction image generation unit 309 outputs the generated prediction image of the PU to the addition unit 312. Here, a reference picture block is a set of pixels on a reference picture (usually called a block because it is a rectangle), and is an area that is referred to in order to generate a predicted image of a PU or sub-block.

(Tile border padding)
The reference picture block (reference block) is a motion vector on the reference picture indicated by the reference picture index refIdxLX with respect to the reference picture list of the prediction list use flag predFlagLX = 1 and based on the position of the target CU (block) This is the block at the position indicated by mvLX. As already described, there is no guarantee that the pixel of the reference block is located within the tile (collocated tile) on the reference picture having the same TileId as the target tile. Therefore, as an example, in the reference picture, reference is made without referring to the pixel values outside the collocated tile by padding the outside of each tile (complementing with the pixel value at the tile boundary) as shown in FIG. You can read the block.

Tile boundary padding (outside tile padding) is the following position (xRef + i, yRef + j) as the pixel value of the reference pixel position (xIntL + i, yIntL + j) in motion compensation by the inter prediction image generation unit 309. ) Pixel value refImg [xRef + i] [yRef + j]. That is, when referring to the reference pixel, the reference position is clipped at the positions of the upper, lower, left and right boundary pixels of the tile.

xRef + i = Clip3 (xTs, xYs + wT-1, xIntL + i)
yRef + j = Clip3 (yTs, yYs + hT-1, yIntL + j)
Here, (xTs, yTs) is the upper left coordinates of the target tile where the target block is located, and wT and hT are the width and height of the target tile. These units are pixels.

XIntL and yIntL are (xb, yb) as the upper left coordinates of the target block with reference to the upper left coordinates of the picture, and (mvLX [0], mvLX [1]) as the motion vectors.
xIntL = xb + (mvLX [0] >> log2 (MVUNIT))
yIntL = yb + (mvLX [1] >> log2 (MVUNIT))
It may be derived by Here, MVUNIT indicates that the accuracy of the motion vector is 1 / MVUNIT pel.

By reading the pixel value of the coordinates (xRef + i, yRef + j), the padding of FIG. 6 (c) can be realized.

By padding tile boundaries when independent_tiles_flag = 1, even if the motion vector points outside the collocated tile in inter prediction, the pixel value in the collocated tile is used to replace the reference pixel, so the tile sequence is inter predicted independently. Can be used for decoding.

(Tile boundary motion vector restriction)
As another method for restricting tile boundary padding, there is a tile boundary motion vector restriction. In this process, in the motion compensation by the inter prediction image generation unit 309, the motion vector is limited (clipped) so that the position of the reference pixel (xIntL + i, yIntL + j) falls within the collocated tile.

In this process, the upper left coordinates (xb, yb) of the target block (target subblock or target block), the block size (BW, BH), the upper left coordinates (xTs, yTs) of the target tile, the width and height of the target tile When w is wT and hT, the motion vector mvLX of the block is input and a limited motion vector mvLX is output.

The left end posL, right end posR, upper end posU, and lower end posD of the reference pixel in the interpolation image generation of the target block are as follows. NTAP is the number of filter taps used for generating the interpolation image. Here, MVUNIT indicates that the accuracy of the motion vector is 1 / MVUNIT pel.

posL = xb + (mvLX [0] >> log2 (MVUNIT))-NTAP / 2 + 1)
posR = xb + BW-1 + (mvLX [0] >> log2 (MVUNIT)) + NTAP / 2
posU = yb + (mvLX [1] >> log2 (MVUNIT))-NTAP / 2 + 1
posD = yb + BH-1 + (mvLX [1] >> log2 (MVUNIT)) + NTAP / 2
The restrictions for the reference pixel to enter the collocated tile are as follows.

posL> = xTs
posR <= xTs + wT-1
posU> = yTs
posD <= yTs + hT-1
It is. The following modifications can be made.

posL = xb + (mvLX [0] >> log2 (MVUNIT))-NTAP / 2 + 1> = xTs
(mvLX [0] >> log2 (MVUNIT))> = xTs-xb + NTAP / 2-1
posR = xb + BW-1 + (mvLX [0] >> log2 (MVUNIT)) + NTAP / 2 <= xTs + wT-1
(mvLX [0] >> log2 (MVUNIT)) <= xTs + wT-1-xb-BW + 1-NTAP / 2
posU = yb + (mvLX [1] >> log2 (MVUNIT))-NTAP / 2 + 1> = yTs
(mvLX [0] >> log2 (MVUNIT))> = yTs-yb + NTAP / 2-1
posD = yb + BH-1 + (mvLX [1] >> log2 (MVUNIT)) + NTAP / 2 <= yTs + hT-1
(mvLX [1] >> log2 (MVUNIT)) <= yTs + hT-1-yb-BH + 1-NTAP / 2
Therefore, the motion vector limit can be derived by the following equation.

mvLX [0] = Clip3 (vxmin, vxmax, mvLX [0])
mvLX [1] = Clip3 (vymin, vymax, mvLX [1])
Where vxmin = (xTs-xb + NTAP / 2-1) << log2 (MVUNIT)
vxmax = (xTs + wT-xb-BW-NTAP / 2) << log2 (MVUNIT)
vymin = (yTs-yb + NTAP / 2-1) << log2 (MVUNIT)
vymax = (yTs + hT-yb-BH-NTAP / 2) << log2 (MVUNIT)
When independent_tiles_flag = 1, by limiting the motion vector, the motion vector can always point in the collocated tile in the inter prediction. With this configuration, tile sequences can be independently decoded using inter prediction.

When the prediction mode predMode indicates the intra prediction mode, the intra predicted image generation unit 310 performs intra prediction using the intra prediction parameter input from the intra prediction parameter decoding unit 304 and the read reference picture. Specifically, the intra predicted image generation unit 310 reads, from the reference picture memory 306, neighboring PUs that are pictures to be decoded and are in a predetermined range from the decoding target PUs among the PUs that have already been decoded. The predetermined range is, for example, one of the left, upper left, upper, and upper right adjacent PUs when the decoding target PU sequentially moves in the so-called raster scan order, and differs depending on the intra prediction mode. The raster scan order is an order in which each row is sequentially moved from the left end to the right end in each picture from the upper end to the lower end.

The intra-predicted image generation unit 310 performs prediction in the prediction mode indicated by the intra-prediction mode IntraPredMode based on the read adjacent PU, and generates a predicted image of the PU. The intra predicted image generation unit 310 outputs the generated predicted image of the PU to the adding unit 312.

The inverse quantization / inverse transform unit 311 performs inverse quantization on the quantized transform coefficient input from the entropy decoding unit 301 to obtain a transform coefficient. The inverse quantization / inverse transform unit 311 performs inverse frequency transform such as inverse DCT, inverse DST, inverse KLT on the obtained transform coefficient, and calculates a prediction residual signal. The inverse quantization / inverse transform unit 311 outputs the calculated residual signal to the adder 312.

The addition unit 312 adds the prediction image of the PU input from the inter prediction image generation unit 309 or the intra prediction image generation unit 310 and the residual signal input from the inverse quantization / inverse conversion unit 311 for each pixel, Generate a decoded PU image. The adding unit 312 outputs the generated decoded image of the block to at least one of a deblocking filter, a SAO (sample adaptive offset) unit, and an ALF.

(Configuration of tile composition unit)
The tile synthesis unit 2003 generates and synthesizes a decoded image Td by referring to the tile information transmitted from the header information decoding unit 2001, the tile TileId necessary for display, and the tile decoded by the tile decoding units 2002a to 2002n. Outputs an image (display image). As shown in FIG. 9B, the tile composition unit 2003 includes a smoothing processing unit 20031 and a composition unit 20032. When overlap_tiles_flag is 1, the smoothing processing unit 20031 may perform filter processing (averaging processing, weighted averaging processing) using the overlap region of each tile decoded by the tile decoding unit 2002 . That is, one pixel may be derived using pixels of two or more tiles corresponding to the overlap region. For example, the pixel value tmp of the overlap region after the filter processing of two tiles Tile [m−1] [n] and Tile [m] [n] adjacent in the horizontal direction is calculated by the following equation.

tmp [m] [n] [x] [y] = (Tile [m] [n] [x] [y] + Tile [m-1] [n] [wT [m-1] -wOVLP [m- 1] + x] [y] +1) >> 1 (Formula FLT-1)
Here, wT [m−1] −wOVLP [m−1] + x indicates a position right by x starting from the tile position wT [m−1] −wOVLP [m−1]. tmp [m] [n] [x] [y] is the filter of the overlap area located at (x, y) with the upper left coordinate of the tile at (0,0) in the tile at (m, n) position This represents the subsequent pixel value. Tile [m] [n] [x] [y] represents the pixel value of the tile located at (x, y), where the tile upper left coordinate is (0,0) in the tile at (m, n) position . Further, the pixel value tmp of the overlap region after the filter processing of two tiles Tile [m] [n−1] and Tile [m] [n] adjacent in the vertical direction is calculated by the following equation.

tmp [m] [n] [x] [y] = (Tile [m] [n] [x] [y] + Tile [m] [n-1] [x] [hT [n-1] -hOVLP [n-1] + y] +1) >> 1 (Formula FLT-2)
The four tiles Tile [m-1] [n-1], Tile [m-1] [n], Tile [m] [n-1], and Tile [m] [n] The pixel value is calculated by the following formula.

tmp [m] [n] [x] [y] = (Tile [m] [n] [x] [y] + Tile [m] [n-1] [x] [hT [n-1] -hOVLP [n-1] + y] + Tile [m-1] [n] [wT [m-1] -wOVLP [m-1] + x] [y] + Tile [m-1] [n-1] [wT [m-1] -wOVLP [m-1] + x] [hT [n-1] -hOVLP [n-1] + y] +2) >> 2 (Formula FLT-3)
The smoothing processing unit 20031 (filter processing unit, averaging processing unit, weighted averaging processing unit) outputs the pixel value of the tile and the pixel value of the overlapped area (in this case, tmp) to the combining unit 20032. To do.

The composition unit 20032 generates a predetermined area specified by the picture or control information (TileId) from the pixel value of the tile and the pixel value of the overlap area. The entire composite image or the predetermined region Rec [x] [y] is represented by, for example, the simple average of the following equation.

Rec [xTsmn + x] [yTsmn + y] = tmp [m] [n] [x] [y] (m! = 0 && 0 <= x <wOVLP [m-1], or n! = 0 && 0 <= y <hOVLP [n-1])
Rec [xTsmn + x] [yTsmn + y] = Tile [m] [n] [x] [y] (Other than above, m = 0 or n = 0, 0 <= x <wT [0] -wOVLP [ 0], 0 <= y <hT [0] -hOVLP [0])
Rec [xTsmn + x] [yTsmn + y] = Tile [m] [n] [x] [y] (other than above)
When the output of the combining unit is a picture, the corresponding tiles are all tiles (0 <= m <M, 0 <= n <N), and when the output of the combining unit is a predetermined area, TileId indicates ( Composite tiles corresponding to m, n). Since these processes are performed outside the tile decoding unit, the synthesized image is not used for decoding the tiles.

Through the above processing, tile distortion can be removed by averaging tile boundaries that have been decoded in duplicate while decoding tiles independently.

(Configuration of video encoding device)
FIG. 11 (a) shows the moving picture encoding apparatus 11 of the present invention. The moving image encoding apparatus 11 includes a picture dividing unit 2010, a header information generating unit 2011, tile encoding units 2012a to 2012n, and an encoded stream generating unit 2013.

The picture dividing unit 2010 divides the tile into a plurality of tiles and transmits the tiles to the tile encoding units 2012a to 2012n. The header information generation unit 2011 generates tile information (TileId, number of tile divisions, size, overlap information) from the divided tiles, and transmits the generated tile information to the encoded stream generation unit 2013 as header information. The tile division when the tiles overlap will be described later.

The tile encoders 2012a to 2012n encode each tile. Further, the tile encoding units 2012a to 2012n encode tiles in units of tile sequences. Thus, according to the tile encoding units 2012a to 2012n, tiles can be encoded in parallel.

Here, the tile encoders 2012a to 2012n perform the encoding process on the tile sequence as in the case of one independent video sequence, and the prediction information of the tile sequences having different TileIds is temporally processed when the encoding process is performed. Neither spatially nor spatially. That is, when encoding a tile in a certain picture, the tile encoding units 2012a to 2012n do not refer to another tile both spatially and temporally.

The encoded stream generation unit 2013 generates header information including tile information transmitted from the header information generation unit 2011, and tile encoding units 2012a to 2012n encode the tiles to generate an encoded stream Te in units of NAL units. .

As described above, since the tile encoding units 2012a to 2012n can independently encode each tile, a plurality of tiles can be encoded in parallel, or a plurality of tiles can be decoded in parallel on the decoding device side. Or only one tile can be decoded independently.

(Picture division)
The picture dividing unit 2010 in FIG. 11 (a) includes a tile information calculating unit 20101 and a picture dividing unit A 20102 shown in FIG. 11 (b).

The tile information calculation unit 20101 includes a picture width wPict and a height hPict, a tile unit size width wUnitTile and a height hUnitTile, a horizontal number M of tiles to be divided, a vertical number N, and an overlap area width wOVLP [ The width wT [m] and height hT [n] of the tile and the width wCRP [m] and height hCRP [n] of the crop offset area are derived from m] and the height hOVLP [n]. Here, an example is shown in which the width and height of the overlap region are set to fixed values wOVLP and hOVLP.

wT [m] = ceil ((wPict + 1) / wUnitTile / M) * wUnitTile (0 <= m <= M-2)
wT [M-1] = wPict-ΣwCRP [m] + (M-1) * wOVLP (Σ is the sum of m-2 from m = 0)
hT [n] = ceil ((hPict + 1) / hUnitTile / N) * hUnitTile (0 <= n <= N-2)
hT [N-1] = hPict-ΣhCRP [n] + (N-1) * hOVLP (Σ is the sum from n = 0 to N-2)
wCRP [M-1] = ceil (wT [M-1] / wUnitTile) * wUnitTile-wT [M-1]
hCRP [N-1] = ceil (hT [N-1] / hUnitTile) * hUnitTile-hT [N-1]
Note that the formula for calculating wT [m] and hT [n] may be any one of (Formula TSP-1) to (Formula TSP-10).

Also, the width PicWidthInCtbsY and the height PicHeightInCtbsY of the picture in CTU units are expressed by the following equations.

PicWidthInCtbsY = ΣTileWidthinCtbs [m] (Σ is the sum of m = 0..M-1)
PicHeightInCtbsY = ΣTileHeightinCtbs [n] (Σ is the sum of n = 0..N-1)
Here, TileWidthinCtbs [m] and TileHeightinCtbs [n] are parameters representing the width and height of the tile in CTU units.

TileWidthinCtbs [m] = ceil (wT [m] / M)
TileHeightinCtbs [n] = ceil (hT [n] / N)
The larger the width and height of the overlap region, the greater the effect of removing tile distortion, but the code amount increases and the coding efficiency is sacrificed. A suitable overlap region width wOVLP [m] and height hOVLP [n] may be 2-6. The unit size of tiles may be CTU size (wUnitTile = wCTU, hUnitTile = hCTU), and the width wOVLPL [m] and height hOVLP [n] of the overlap area are all the same (for example, wOVLP = hOVLP = sOVLP) It may be. The following is an example of the tile information calculation formula of FIG. 7 when the overlap area width wOVLPL [m] and height hOVLP [n] are all set to sOVLP.

wT [m] = ceil ((wPict + 1) / wCTU / M) * wCTU (0 <= m <= M-2)
wT [M-1] = wPict-ΣwCRP [m] + (M-1) * sOVLP (Σ is the sum of m-2 from m = 0)
hT [n] = ceil ((hPict + 1) / hCTU / N) * hCTU (0 <= n <= N-2)
hT [N-1] = hPict-ΣhCRP [n] + (N-1) * sOVLP (Σ is the sum from n = 0 to N-2)
wCRP [M-1] = ceil (wT [M-1] / wCTU) * wCTU-wT [M-1]
hCRP [N-1] = ceil (hT [N-1] / hCTU) * hCTU-hT [N-1]
The tile information calculation unit 20101 outputs the calculated tile information to the picture division unit A 20102 and the header information generation unit 2011.

The picture dividing unit A 20102 divides a picture into tiles using the tile information calculated by the tile information calculating unit 20101. In other words, Tile [m] [n] extracts the area of xTsmn .. (xTsmn + wT [m] -1) and yTsmn .. (yTsmn + hT [n] -1) on the picture. And output to the tile encoding unit 2012. Note that a crop offset area of wCRP [M-1] and hCRP [N-1] is added to the right and bottom tiles of the picture, and then output to the tile encoding unit 2012.

(Header information generator)
In the header information generation unit 2011, the parameter set and tile information are converted into a syntax expression and output to the encoded stream generation unit 2013. The syntax expression of tile information is shown below.

num_tile_columns_minus1 = M-1
num_tile_rows_minus1 = N-1
uniform_spacing_flag = 1 (when all wT [m] are equal and all hT [n] are equal)
column_width_minus1 = ceil (wT [0] / wUnitTile) -1
row_height_minus1 = ceil (hT [0] / hUnitTile) -1
uniform_spacing_flag = 0 (other than above)
column_width_minus1 [m] = ceil (wT [m] / wUnitTile) -1
row_height_minus1 [n] = ceil (hT [n] / hUnitTile) -1
overlap_tiles_flag = 1
uniform_overlap_flag = 1
tile_overlap_width_div2 = sOVLP / 2
tile_overlap_height_div2 = sOVLP / 2
(Configuration of tile coding unit)
Next, the configuration of the tile encoding units 2012a to 2012n will be described. FIG. 12 is a block diagram illustrating a configuration of 2012, which is one of the tile encoding units 2012a to 2012n. The tile encoding unit 2012 includes a prediction image generation unit 101, a subtraction unit 102, a transform / quantization unit 103, an entropy encoding unit 104, an inverse quantization / inverse transform unit 105, an addition unit 106, a loop filter 107, and a prediction parameter memory. (Prediction parameter storage unit, frame memory) 108, reference picture memory (reference image storage unit, frame memory) 109, encoding parameter determination unit 110, and prediction parameter encoding unit 111. The prediction parameter encoding unit 111 includes an inter prediction parameter encoding unit 112 and an intra prediction parameter encoding unit 113. The tile encoding unit 2012 may be configured not to include the loop filter 107.

The predicted image generation unit 101 generates a predicted image of the PU for each picture of the image T for each CU that is an area obtained by dividing the picture. Here, the predicted image generation unit 101 reads a decoded block from the reference picture memory 109 based on the prediction parameter input from the prediction parameter encoding unit 111. For example, in the case of inter prediction, the predicted image generation unit 101 reads out a block at a position on a reference picture indicated by a motion vector with the target PU as a starting point. In the case of intra prediction, the pixel value of the adjacent PU used in the intra prediction mode is read from the reference picture memory 109 to generate a predicted image of the PU. The prediction image generation unit 101 generates a prediction image of the PU using one prediction method among a plurality of prediction methods for the read reference picture block. The predicted image generation unit 101 outputs the generated predicted image of the PU to the subtraction unit 102.

Note that the predicted image generation unit 101 includes the padding process at the tile boundary and has the same operation as the predicted image generation unit 308 already described, and a description thereof will be omitted.

The subtraction unit 102 subtracts the signal value of the prediction image of the PU input from the prediction image generation unit 101 from the pixel value at the corresponding PU position of the image T to generate a residual signal. The subtraction unit 102 outputs the generated residual signal to the transform / quantization unit 103.

The transform / quantization unit 103 performs frequency transform on the prediction residual signal input from the subtraction unit 102, and calculates a transform coefficient. The transform / quantization unit 103 quantizes the calculated transform coefficient to obtain a quantized transform coefficient. The transform / quantization unit 103 outputs the obtained quantized transform coefficient to the entropy coding unit 104 and the inverse quantization / inverse transform unit 105.

The entropy encoding unit 104 receives the quantized transform coefficient from the transform / quantization unit 103 and the prediction parameter from the prediction parameter encoding unit 111.

The entropy encoding unit 104 entropy-encodes the input division information, prediction parameters, quantization transform coefficients, and the like to generate an encoded stream Te, and outputs the generated encoded stream Te to the outside.

The inverse quantization / inverse transform unit 105 is the same as the inverse quantization / inverse transform unit 311 (FIG. 10) in the tile decoding unit 2002, and the quantized transform coefficient input from the transform / quantization unit 103 is inversely quantized. To obtain the conversion coefficient. The inverse quantization / inverse transform unit 105 performs inverse transform on the obtained transform coefficient to calculate a residual signal. The inverse quantization / inverse transform unit 105 outputs the calculated residual signal to the addition unit 106.

The addition unit 106 adds the signal value of the prediction image of the PU input from the prediction image generation unit 101 and the signal value of the residual signal input from the inverse quantization / inverse conversion unit 105 for each pixel, and generates a decoded image. Is generated. The adding unit 106 stores the generated decoded image in the reference picture memory 109.

The loop filter 107 performs a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) on the decoded image generated by the adding unit 106. Note that the loop filter 107 does not necessarily include the above three types of filters, and may have a configuration including only a deblocking filter, for example.

The prediction parameter memory 108 stores the prediction parameter generated by the encoding parameter determination unit 110 at a predetermined position for each encoding target picture and CU.

The reference picture memory 109 stores the decoded image generated by the loop filter 107 at a predetermined position for each picture to be encoded and each CU.

The encoding parameter determination unit 110 selects one set from among a plurality of sets of encoding parameters. The encoding parameter is the above-described QT or BT partition parameter, prediction parameter, or parameter to be encoded that is generated in association with these parameters. The predicted image generation unit 101 generates a predicted image of the PU using each of these encoding parameter sets.

The encoding parameter determination unit 110 calculates an RD cost value indicating the amount of information and the encoding error for each of a plurality of sets. The RD cost value is, for example, the sum of a code amount and a square error multiplied by a coefficient λ. The code amount is the information amount of the encoded stream Te obtained by entropy encoding the residual signal and the encoding parameter. The square error is the sum between pixels regarding the square value of the residual value of the residual signal calculated by the subtracting unit 102. The coefficient λ is a real number larger than a preset zero. The encoding parameter determination unit 110 selects a set of encoding parameters that minimizes the calculated RD cost value. As a result, the entropy encoding unit 104 outputs the selected set of encoding parameters to the outside as the encoded stream Te, and does not output the set of unselected encoding parameters. The encoding parameter determination unit 110 stores the determined encoding parameter in the prediction parameter memory 108.

The prediction parameter encoding unit 111 derives a format for encoding from the parameters input from the encoding parameter determination unit 110 and outputs the format to the entropy encoding unit 104. Deriving the format for encoding is, for example, deriving a difference vector from a motion vector and a prediction vector. Also, the prediction parameter encoding unit 111 derives parameters necessary for generating a prediction image from the parameters input from the encoding parameter determination unit 110 and outputs the parameters to the prediction image generation unit 101. The parameter necessary for generating the predicted image is, for example, a motion vector in units of sub-blocks.

The inter prediction parameter encoding unit 112 derives an inter prediction parameter such as a difference vector based on the prediction parameter input from the encoding parameter determination unit 110. The inter prediction parameter encoding unit 112 is partially the same as the configuration in which the inter prediction parameter decoding unit 303 derives the inter prediction parameters as a configuration for deriving parameters necessary for generating the prediction image to be output to the prediction image generating unit 101. Includes configuration.

In addition, the intra prediction parameter encoding unit 113 has a configuration in which the intra prediction parameter decoding unit 304 derives an intra prediction parameter as a configuration for deriving a prediction parameter necessary for generating a prediction image to be output to the prediction image generating unit 101. Some parts have the same configuration.

Through the above processing, tile distortion can be removed by filtering a plurality of overlapping tile boundaries on the video decoding device side while encoding the tiles independently.

(Modification 1)
In Modification 1 of the present application, the method of dividing a picture into tiles is changed from the dividing method shown in FIG. 7 to the dividing method shown in FIG. 7 differs from FIG. 13 in that FIG. 7 is a tile including an overlap area, and FIG. 13 is a tile including a crop offset area that is an unused area in addition to the overlap area. That is, all the tiles including the tiles at the screen edge may include the crop offset area. Fig. 13 (b) is a diagram showing Tile [0] [0] and Tile [1] [0] that are adjacent in the horizontal direction. The tile is the overlap area (shaded area) and the crop offset area (horizontal line area). including. Further, the width wT [m] and the height hT [n] of the tile and the width wCRP [m] and the height hCRP [n] of the crop offset area have the following relationship.

wT [m] + wCRP [m] = wTile [m] = wCTU * a (Formula TCS-1)
hT [n] + hCRP [n] = hTile [n] = hCTU * b
wTile [m] = TileWidthinCtbs [m] << CtbLog2SizeY
hTile [n] = TileHeightinCtbs [n] << CtbLog2SizeY
wCRP [m] = wTile [m] -wT [m] (Formula CRP-1)
hCRP [n] = hTile [n] -hT [n]
Here, wTile [m] and hTile [n] are the width and height of the tile to be encoded. The rest is the same as in the second embodiment.

分割 By dividing the picture into tiles as shown in Fig. 13, the upper left coordinate of the tile can be set at a position that is an integral multiple of the CTU. Therefore, in addition to the effect of the second embodiment, there is an effect that access to individual tiles is simplified.

(Details of tile division not depending on integer multiple of CTU)
Further, the operation and effect of tile division not depending on an integer multiple of CTU will be described with reference to FIGS. FIG. 21 is a diagram illustrating picture division in which the tile size is limited to an integral multiple of the CTU except for picture boundaries. FIG. 21A is a diagram in which a tile size is an integral multiple of a CTU and a 1920 × 1080 HD image is divided into 4 × 3 tiles. As shown in the figure, if the CTU size is 128x128, for example, if the tile size is an integer multiple of the CTU, even if you try to divide it into 4x3 tiles, it cannot be divided into equal sizes (divided into 512x384, 384x384, 512x312, 384x312) Therefore, there is a problem that load balancing cannot be performed even if the processing is divided into a plurality of processors and hardware. FIG. 21 (b) is a diagram showing CTU partitioning of each tile. Tiles that do not reach picture boundaries are divided into an integer number of CTUs. When dividing a picture boundary tile into CTU units, the area outside the picture is treated as a crop offset area.

FIG. 22 (a) is a technique of the present embodiment, in which a 1920x1080 HD image is divided into 4x3 tiles. When dividing into 4x3 tiles, all the tiles can be divided into equal sizes (divided into 480x360), which has the effect of being able to equally load balance multiple processors and hardware. The tile size can take a size other than an integer multiple of the CTU regardless of the picture boundary. FIG. 22 (b) is a diagram showing CTU division of each tile. When dividing into CTUs, if the tile size is not an integral multiple of the CTU size, a crop offset area is provided outside the tile. In particular, as shown in TILE B, the CTU is divided based on the upper left of each tile. Therefore, the upper left coordinate of the CTU is not limited to an integer multiple of the CTU size.

Fig. 23 shows an example of the syntax of slice data at a tile size that is an integral multiple of the CTU. Call the syntax coding_tree_unit of CTU data, which is encoded data in CTU units, for the number of CTUs in the slice data. In coding_tree_unit, when the tile size is an integer multiple of CTU, the upper left coordinates (xCtb, yCtb) of the CTU can be uniquely derived from the CTU address CtbAddrInRs in the picture because the picture is divided in CTU units. That is, in coding_tree_unit, the upper left coordinates (xCtb, yCtb) of the CTU are set to 1 << CtbLog2SizeY times so as to be an integer multiple of the CTU based on the intra-picture CTU address CtbAddrInRs. Here, CtbAddrInTs is a tile scan address for performing raster scan of the CTU in tile units. CtbAddrInRs represents the raster scan address of the CTU in units of pictures, and is 0 to PicSizeInCtbsY-1.

PicSizeInCtbsY = PicWidthInCtbsY * PicHeightInCtbsY
FIG. 24 shows a syntax example of slice data in the present embodiment. Also in this embodiment, CTU data syntax coding_tree_unit, which is encoded data in CTU units, is called for the number of CTUs in the slice data. In the embodiment, since the picture is not divided in units of CTUs, the upper left coordinates (xCtb, yCtb) of the CTU cannot be uniquely derived from the intra-picture CTU address CtbAddrInRs. Therefore, CTU coordinates are derived based on the upper left coordinates of the tile. Specifically, when the ID of the target tile is TileId and the upper left coordinates of the target tile are indicated by (TileAddrX [TileId], TileAddrY [TileId]), the CTU coordinates are derived using the following formula.

xCtb = (CtbAddrInTile% TileWidthinCtbs [TildId]) << CtbLog2SizeY + TileAddrX [TileId]
yCtb = (CtbAddrInTile / TileWidthinCtbs [TildId]) << CtbLog2SizeY + TileAddrY [TileId]
Here, CtbAddrInTile is the raster scan position within the tile of the CTU, where the top of the tile is 0. If the CTU address at the top of the tile is firstCtbAddrInTs, CtbAddrInTile is expressed by the following equation. Here, CtbAddrInTs is a tile scan address through a picture.

CtbAddrInTile = CtbAddrInTs-firstCtbAddrInTile
That is, in this embodiment, the CTU in-tile coordinates ((CtbAddrInTile% TileWidthinCtbs [TildId]) << CtbLog2SizeY, (CtbAddrInTile / TileWidthinCtbs [TildId]) << CtbLog2SizeY) derived from the CTU address CtbAddrInTile in the tile Then, using the in-picture coordinates (TileAddrX [TileId], TileAddrY [TileId]) at the upper left position of the tile, the in-picture coordinates at the CTU position are derived. That is, the upper left coordinates (xCtb, yCtb) of the CTU may be derived from the sum of the in-tile coordinates of the CTU and the in-picture coordinates at the head of the tile.

Here, the upper left coordinates (TileAddrX [TileId], TileAddrY [TileId]) of the tile of TileId are expressed below using the upper left coordinates (xTsmn, yTsmn) of the tile at the position (m, n) already described. May be.

TileId = n * M + m
TileAddrX [TileId] = xTsmn
TileAddrY [TileId] = yTsmn
TileWidthinCtbs [TileId] = ceil (wT [m] / wCTU)
TileHeightinCtbs [TileId] = ceil (hT [n] / hCTU)
That is, the CTU coordinates may be derived using the following equation.

xCtb = (CtbAddrInTile% ceil (wT [m] / wCTU)) << CtbLog2SizeY + xTsmn
yCtb = (CtbAddrInTile / ceil (wT [m] / hCTU)) << CtbLog2SizeY + yTsmn
Further, CTU coordinates may be derived using the syntax of column_width_minus1 and row_height_minus1.

xCtb = (CtbAddrInTile% (column_width_minus1 [m] +1)) << CtbLog2SizeY + xTsmn
yCtb = (CtbAddrInTile / (column_width_minus1 [m] +1)) << CtbLog2SizeY + yTsmn
In the configuration of the above embodiment, since the CTU coordinate is derived based on the upper left coordinate of the tile, even when the tile is positioned regardless of the unit in which the picture is divided, the processing in units of CTU can be performed. In the case of Embodiment 4 or later that introduces a region to be described later, this embodiment in which the upper left coordinate of the tile can be located at an arbitrary place is particularly effective.

The operations of the video encoding device 11 and the video decoding device 31 described above will be described with reference to the flowchart of FIG.

FIG. 14 (a) shows the flow of processing of the video encoding device 11.

The tile information calculation unit 20101 sets the number of tiles and the overlap area, and calculates information about the tile (width, height, upper left coordinates, and crop offset area if any) (S1500).

The picture dividing unit A 20102 divides the picture into tiles allowing overlap as shown in FIG. 7 or FIG. 13 (S1502).

The header information generation unit 2011 generates the tile information syntax and generates header information such as SPS, PPS, and slice header (S1504).

The tile encoding unit 2012 encodes each tile (S1506).

The encoded stream generation unit 2013 generates an encoded stream Te from the header information and the encoded stream of each tile (S1508).

FIG. 14 (b) shows the processing flow of the video decoding device 31.

The header information decoding unit 2001 decodes the header, and sets or calculates tile information (number of tiles, width, height, upper left coordinates, overlap width, height, if any, crop offset area). In addition, a tile identifier necessary for covering the display area designated from the outside is derived (S1520).

The tile decoding unit 2002 decodes each tile (S1522).

The smoothing processing unit 20031 performs a filtering process on the overlap area of each tile (S1524).

The synthesizing unit 20032 synthesizes each tile including the filtered area to generate a picture (S1526).

(Embodiment 2)
In the second embodiment of the present application, filter processing will be described.

In the filter processing of the first embodiment, the pixel values of the areas adjacent to the tile boundary are calculated by simply averaging the pixel values of the plurality of overlapping areas. In the second embodiment, an example will be described in which filtering is performed by a weighted sum that changes the weight depending on the distance from the tile boundary.

The smoothing processing unit 20031 of the tile composition unit 2003 shown in FIG. Operations other than the tile composition unit 2003 are the same as those described in the first embodiment, and a description thereof will be omitted.

The smoothing processing unit 20031 sets a weighting coefficient ww [x] according to the distance from the tile boundary as shown in FIG. FIG. 8A is a diagram for explaining the filter processing of the overlapping region of two tiles Tile [m−1] [n] and Tile [m] [n] adjacent in the horizontal direction in FIG. The weighting coefficient of Tile [m] [n] is ww [x], and the weighting coefficient of Tile [m-1] [n] is 1-ww [x]. Here, 0 <ww [x] <1. In Tile [m] [n] and Tile [m-1] [n], the weight coefficient ww [x] is set to 0 or 1 for the pixels outside the overlap area, and the weight coefficient in the overlap area is linearly interpolated. Is derived by

ww [x] = 1 / (wOVLP + 1) * (x + 1) (0 <= x <wOVLP)
Using this weighting factor, the pixel value of the overlap region of the tiles Tile [m−1] [n] and Tile [m] [n] is calculated by the following equation.

xx = wT [m-1] -wOVLP + x
Tile [m-1] [n] [xx] [y] = Tile [m-1] [n] [xx] [y] * (1-ww [x]) + Tile [m] [n] [x ] [y] * ww [x]
(1 <m <M-1)
In the above equation, the pixel value of the overlap region on the right side of Tile [m−1] [n] (OVLP_RIGHT in FIG. 8A) is replaced with the pixel value after filtering.

Similarly, the filtering process of the overlapping area of two tiles adjacent in the vertical direction shown in FIG. 16 (a) will be described. FIG. 16 (a) is a diagram in which Tile [m] [n-1] and Tile [m] [n] are extracted from the tiles shown in FIG. If the weighting factor for Tile [m] [n] is wh [y] and the weighting factor for Tile [m] [n-1] is 1-wh [y] (0 <wh [y] <1), then Tile [ In m] [n] and Tile [m] [n-1], the weighting factor wh [y] is set to 0 or 1 for the pixels outside the overlapping region, and the weighting factor for the overlapping region is derived by linear interpolation. Is done.

wh [y] = 1 / (hOVLP + 1) * (y + 1) (0 <= y <hOVLP)
Using this weighting factor, the pixel value of the overlap region of tiles Tile [m] [n-1] and Tile [m] [n] is calculated by the following equation.

yy = hT [n-1] -hOVLP + y
Tile [m] [n-1] [x] [yy] = Tile [m] [n-1] [x] [yy] * (1-wh [y]) + Tile [m] [n] [x ] [y] * wh [y]
(1 <n <N-1)
In the above equation, the pixel value in the overlap region (OVLP_BOTTOM) below Tile [m] [n-1] is replaced with the pixel value after filtering.

The synthesizing unit 20032 synthesizes the non-overlap area of each tile and the overlap area filtered by the smoothing processing unit 20031, and generates a synthesized image (display image) Rec [] [].

Rec [xTsmn + x] [yTsmn + y] = Tile [0] [0] [x] [y] (m = n = 0, 0 <= x <wT [0], 0 <= y <hT [0 ])
Rec [xTsmn + x] [yTsmn + y] = Tile [m] [0] [x] [y] (m! = 0, n = 0, wOVLP <= x <wT [m], 0 <= y < hT [n])
Rec [xTsmn + x] [yTsmn + y] = Tile [0] [n] [x] [y] (m = 0, n! = 0, 0 <= x <wT [m], hOVLP <= y < hT [n])
Rec [xTsmn + x] [yTsmn + y] = Tile [m] [n] [x] [y] (m! = 0, n! = 0, wOVLP <= x <wOVLP [m], hOVLP <= y <hT [n])
The pixel value after filtering is set in the overlap area (OVLP_RIGHT in FIG. 8 and OVLP_BOTTOM in FIG. 16) on the left side or the upper side of Tile [m] [n]. Using these pixel values, the overlap region (OVLP_LEFT in FIG. 8, OVLP_ABOVE in FIG. 16) on the left side or the upper side of Tile [m] [n] is not used.

It should be noted that the pixel value of the overlap area (OVLP_RIGHT in FIG. 8, OVLP_BOTTOM in FIG. 16) on the left or upper tile of Tile [m] [n] is not replaced with the pixel value after the filter processing, but Tile [ The pixel values in the overlap region (OVLP_LEFT in FIG. 8, OVLP_ABOVE in FIG. 16) on the left side or the upper side of m] [n] may be replaced with the pixel values after filtering. In that case, when compositing a picture, the pixel value of the overlap region (OVLP_LEFT in FIG. 8, OVLP_ABOVE in FIG. 16) on the left side or the upper side of Tile [m] [n] is used, and Tile [m] [n] The overlap area of the tile on the left side or the lower side (OVLP_RIGHT in FIG. 8 and OVLP_BOTTOM in FIG. 16) is not used. In addition to this, the pixel value after the filter processing may be directly stored in Rec [] [] instead of the image of each tile.

In the above formula, the weighting factors ww [] and wh [] are calculated. However, when the width and height of the overlap region are constant, the weighting factor may be obtained by referring to a table prepared in advance. An example of the weighting coefficient table is shown in FIG. For example, when wOVLP = 4, ww [] = {0.2,0.4,0.6,0.8}.

Alternatively, the weighting factor may be replaced with a value approximated by multiplication and shift operation without using division. FIG. 15 (b) shows an example of a table in which weighting factors are represented by positive numbers WGT [] and shift WSHT. For example, when hOVLP = 4, wh [] = {0.125,0.375,0.625,0.875} = {1 >> 3,3 >> 3,5 >> 3,7 >> 3} and WGT [] = { 2,3,5,7} and WSHT = 3. That is, the weighting coefficient can be expressed as WGT [] >> WSHT. In the above example, WSHT = 3.

It should be noted that the weight may be obtained by a method other than linear interpolation, and the interpolation formula or table may be changed based on the coordinates.

FIGS. 8B and 16B are diagrams for explaining the filtering process of the overlap area in FIG. 13 showing an example in which the width or height of the crop offset area is included in the width or height of the tile. is there. Since the crop offset area is not subject to filter processing or picture composition / display, the tile filter processing in FIG. 13 is performed only on the overlap area as shown in FIGS. 8 (b) and 16 (b). The processing is the same as the processing for the overlap region in FIGS. 8 (a) and 16 (a). Therefore, the description of the second embodiment can be used as it is.

(Additional explanation 1)
In the additional explanation 1, the tile division method of a picture and the CTU division method of a tile will be described again by using another representation method for the tiles described in the first and second embodiments. In the first and second embodiments, the tile has been described as an area including a tile, an overlap area, and a crop offset area. In the additional description 1, the tile will be described as an area including a tile active area and a tile extension area. The tile active area is a net display area that does not include an overlap area. The tile extension area is an area composed of an overlap area and a crop offset area.

As a flag indicating the presence / absence of a tile extension area, cropoffset_flag obtained by replacing overlap_tiles_flag notified by tile_info () in FIG. 25 (a) may be used. When cropoffset_flag is 0, the tile extension area does not exist, and otherwise, the tile extension area exists.

FIG. 26 shows an example of dividing a picture into tiles regardless of multiples of CTUs. As shown in FIG. 26 (a), a picture is divided into tiles (tile active areas) that do not depend on multiples of CTUs. The tile active area is an area that configures pictures without overlapping. In other words, a picture is divided into “tile active areas” without overlapping. If the width and height of the tile active area are wAT [m] and hAT [n], and the width and height of the picture are wPict and hPict, they can be expressed by the following equations.

wPict = ΣwAT [m] (Σ is the sum of m = 0..M-1)
hPict = ΣhAT [n] (Σ is the sum of n = 0..N-1)
When uniform_spacing_flag is not 0, that is, when the tile active area is approximately the same size, it can be expressed by the following equation. M and N are the number of tiles in the horizontal and vertical directions.

for (m = 0; m <M; m ++)
wAT [m] = ((m + 1) * wPict) / M- (m * wPict) / M (Formula TAS-1)
for (n = 0; n <N; n ++)
hAT [n] = ((n + 1) * hPict) / N- (n * hPict) / N
Alternatively, the tile active area may be represented by the following expression as a multiple of tile unit size (minimum tile size) wUnitTile and hUnitTile.

wAT [m] = floor (wPict / M / wUnitTile) * wUnitTile (0 <= m <M) (Formula TAS-2)
hAT [n] = floor (hPict / N / hUnitTile) * hUnitTile (0 <= n <N)
Or wAT [m] = ceil (wPict / M / wUnitTile) * wUnitTile (0 <= m <M) (Formula TAS-3)
hAT [n] = ceil (hPict / N / hUnitTile) * hUnitTile (0 <= n <N)
Or for (m = 0; m <M; m ++)
wAT [m] = ((m + 1) * wPict / M / wUnitTile-m * wPict / M / wUnitTile) * wUnitTIle (expression TAS-4)
for (n = 0; n <N; n ++)
hAT [n] = ((n + 1) * hPict / N / hUnitTile-n * hPict / N / hUnitTile) * hUnitTile
When uniform_spacing_flag is 0, the size of the tile active area can be expressed by the following equation.

wAT [m] = column_width_in_luma_samples_div2_minus1 [m] * 2 (Formula TAS-5)
hAT [n] = row_height_in_luma_samples_div2_minus1 [n] * 2
When encoding a tile, the tile is actually encoded in units of CTUs. At this time, an image obtained by adding an extension area to the “tile active area” may be encoded. The extension area added at this time is called a “tile extension area”. The “tile extension area” corresponds to the areas named the overlap area and the crop offset area in the first and second embodiments. The tile extension area is not necessarily used for decoding and output, and may be treated as an area discarded after decoding. Further, a part or all of the tile extension area may be used for reference (decoding) of a subsequent picture, or may be used for generation of an output image. The “tile active area” and the “tile extension area” are collectively referred to as a “tile coding area”. The “tile encoding area” is an area that is actually encoded.

Among the tile extension areas, areas used for reference and decoding are called overlap areas, and areas not referenced and decoded are called crop offset areas (tile invalid areas). The first embodiment describes the case where all the tile extension areas are referred to and decoded, and the tile extension areas are overlap areas. In the first modification, an example has been described in which a part of the tile extension area is referred to as an overlap area and used for decoding, and the remaining part is referred to as a crop offset area and is not used for decoding. In addition, the “tile coding area” may be rephrased to be composed of a “tile effective area” used for decoding / output and a tile crop area (tile invalid area) not used for decoding / output. The tile effective area is composed of a tile active area which is a unit for dividing a picture and an overlap area.

FIG. 26 (b) is a diagram illustrating tiles that are actually encoded (also referred to as tile encoding areas). As shown in FIG. 26 (b), the tile (tile coding area) is a rectangle having an upper left coordinate (xTsmn, yTsmn), a width wTile [m], and a height hTile [n], and the tile active area Tile [m] [n] (a rectangle with a width wAT [m] and a height hAT [n]) and a tile extension area (a tile other than the tile active area, an area with a width wCRP [m] and a height hCRP [n]).

wTile [m] = wAT [m] + wCRP [m]
hTile [n] = hAT [n] + hCRP [n]
Alternatively, the tile coding area may be expressed by the following expression using the width TileWidthinCtbs [m] and the height TileHeightinCtbs [m] of the tile active area in CTU units.

TileWidthinCtbs [m] = ceil (wAT [m] / M)
TileHeightinCtbs [n] = ceil (hAT [n] / N)
wTile [m] = TileWidthinCtbs [m] << CtbLog2SizeY
hTile [n] = TileHeightinCtbs [n] << CtbLog2SizeY
FIG. 26 (c) is an example of dividing a tile into CTUs. Divided into CTUs, starting from the upper left coordinate of the tile. As shown in FIG. 26 (c), the size of the tile active area may be an integer multiple of the CTU size or may not be an integer multiple of the CTU size. Since the picture is divided into tile active areas, the upper left coordinates (xTsmn, yTsmn) of the tile at the position (m, n) in tile units are the sum of the tile active areas (wAT [i], hAT [i]). Matches.

xTsmn = ΣwAT [i] (Σ is the sum of i = 0..m-1) (Formula TLA-2)
yTsmn = ΣhAT [i] (Σ is the sum of i = 0..n-1)
The size of the tile effective area obtained by adding the tile active area and the overlap area may be an integer multiple of the CTU size or may not be an integer multiple of the CTU size.

FIG. 27 shows an example in which the tile extension area is composed of an overlap area and a crop offset area. In FIG. 27, the overlap area is a hatched area outside the tile active area. The overlap area overlaps the tile active area of the adjacent tile. The width wOVLP [m] and height hOVLP [n] of the overlap area and the width wCRP [m] and height hCRP [n] of the tile extension area have the following relationship.

0 <= wOVLP [m] <= wCRP [m]
0 <= hOVLP [n] <= hCRP [n]
(Summary)
The tile coding area (wTile, hTile) includes a tile active area (wAT, hAT) that is a unit for dividing a picture and a hidden area (tile extension area).

Alternatively, the tile coding area (wTile, hTile) is derived from the tile effective area (wT, hT) used for decoding / output and the crop offset area not used for decoding / output, that is, the tile invalid area (wCRP, hCRP). In other words, it may be configured.

The overlap area is outside the tile active area (wAT, hAT), which is a unit for dividing a picture, but is included in the tile effective area (wT, hT) used for decoding / output.

Therefore, the tile effective area is wT [m] = wAT [m] + wOVLP [m]
hT [n] = hAT [n] + hOVLP [n]
In addition, it includes the crop area and becomes the tile coding area wTile [m] = wT [m] + wCRP [m] = wAT [m] + wOVLP [m] + wCRP [m]
hTile [n] = hT [n] + hCRP [n] = hAT [n] + hOVLP [n] + hCRP [n]
(Example of CTU unit processing)
FIG. 28 (a) shows an example of the syntax of slice data slice_segment_data (). The operations of the video encoding device 11 and the video decoding device 31 will be described below with reference to the syntax.

In the figure, coding_tree_unit () indicates the CTU syntax. CtbAddrInTs, CtbAddrInRs, and CtbAddrInTile are CTU addresses, CtbAddrInTs is a CTU address in the tile scan order in the picture, CtbAddrInRs is a CTU address in the raster scan order in the picture, and CtbAddrInTile is a CTU address in the tile scan order in the tile. After the last CTU of each tile, end_of_subset_one_bit is set to 1, and the encoded data is byte aligned.

FIG. 28 (b) is an example of CTU syntax coding_tree_unit (). In order to cope with the case where the upper left coordinate of the tile (tile coding area) is not an integer multiple of the CTU, the upper left coordinate (xCtb, yCtb) of the CTU is derived for each tile. Specifically, the CTU in-tile coordinates derived from the in-tile address CtbAddrInTile ((CtbAddrInTile% TileWidthinCtbs [TileId]) << CtbLog2SizeY, (CtbAddrInTile / TileWidthinCtbs [TileId]) << CtbLog2SizeY) Deriving the CTU coordinates of the tile in the picture by adding [TileId], TileAddrY [TileId]) xCtb = ((CtbAddrInTile% TileWidthinCtbs [TileId]) << CtbLog2SizeY) + TileAddrX [TileId]
yCtb = ((CtbAddrInTile / TileWidthinCtbs [TileId]) << CtbLog2SizeY) + TileAddrY [TileId]
Where TileWidthinCtbs [] is the width of the tile effective area in CTU units, wT [] and hT [] are the width and height in pixel units of the tile effective area, CtbLog2SizeY is the logarithmic value of the CTU size, and (TileAddrX, TileAddrY) is This is the upper left coordinate of the tile in pixel units. Note that the width and height (wTile [], hTile []) of the tile coding area may be used instead of the width and height (wT [], hT []) of the tile effective area.

FIG. 29 shows an example of syntax coding_quadtree () for dividing a block (CU or CTU) into quadtrees, and FIG. 30 shows an example of syntax coding_binarytree () for dividing a block into binary trees. In FIG. 29, the upper left coordinate of the tile does not correspond to a position that is an integral multiple of the CTU.Therefore, when using a tile, the upper left coordinate of the CTU (xCtb, yCtb) and the tile size are considered, as shown in the following formula. Notifies split_cu_flag indicating whether or not to perform quadtree partitioning.

if (x0 + (1 << log2CbSize) -xTile <= wT && y0 + (1 << log2CbSize) -yTile <= hT &&log2CbSize> MinCbLog2SizeY)
split_cu_flag [x0] [y0]
Where (x0, y0) is the upper left coordinate of the block, (xTile, yTile) is the upper left coordinate of the tile, log2CbSize is the logarithm of the block size, wT and hT are the width of the tile effective area (or tile coding area) Height, MinCbLog2SizeY is the logarithm of the minimum size of the block.

If the right coordinate x0 + (1 << log2CbSize) and the bottom coordinate y0 + (1 << log2CbSize) are smaller than the right coordinate xTile + wT and the bottom coordinate yTile + hT of the tile effective area, the target block is It exists in the tile effective area. When the block exists in the tile and the block size is larger than the minimum value (log2CbSize> MinCbLog2SizeY), a flag split_cu_flag indicating whether or not the block is further divided is notified. If the block is further divided into quadtrees, split_cu_flag is set to 1. If the block is not divided into quadtrees, split_cu_flag is set to 0. If split_cu_flag is 1, recursively calls coding_quadtree () to notify whether or not to further perform quadtree partitioning. If split_cu_flag is 0, coding_binarytree () is called to notify whether or not to perform binary tree splitting (decoding).

Also, as shown in FIG. 29, if any of the four blocks obtained by the quadtree partitioning is outside the tile effective area (or outside the tile encoding area), the block is not encoded. Specifically, coding_quadtree (x1, y0, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile) which is a block located at (x1, y0) obtained by quadtree partitioning is x1 Is encoded or decoded if is located within a tile.

if (x1-xTile <wT)
coding_quadtree (x1, y0, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile)
Similarly, coding_quadtree (x0, y1, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile), which is a block located at (x0, y1), is encoded when y1 is located in the tile or Decrypted.

if (y1-yTile <hT)
coding_quadtree (x0, y1, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile)
Similarly, coding_quadtree (x1, y1, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile), which is a block located at (x1, y1), is coded when x1, y1 is located within the tile. Or decoded.

if (x1-xTile <wT && y1-yTile <hT)
coding_quadtree (x1, y1, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile)
If tiles are not used, set (xTile, yTile) = (0,0), (wT, hT) = (pic_width_in_luma_samples, pic_height_in_luma_samples), and whether to perform quadtree partitioning under the following conditions: You may notify split_cu_flag which shows.

if (x0 + (1 << log2CbSize) <= pic_width_in_luma_samples && y0 + (1 << log2CbSize) <= pic_height_in_luma_samples &&log2CbSize> MinCbLog2SizeY)
Similarly, in the case of a binary tree, when using a tile, split_bt_mode indicating whether or not to perform further binary tree division is notified in consideration of the upper left coordinates (xCtb, yCtb) of the CTU and the tile size (decoding) To do). Specifically, split_bt_mode indicating whether or not to perform binary tree division may be notified by the following equation.

if (((1 <<log2CbHeight)> minBTSize || (1 <<log2CbWidth)> minBTSize) && ((1 << log2CbWidth) <= maxBTSize && (1 << log2CbHeight) <= maxBTSize) && (x0 + (1 <<log2CbWidth) -xTile <= wT && y0 + (1 << log2CbHeight) -yTile <= hT) && cbtDepth <maxBTDepth)
split_bt_mode
In other words, the block size is larger than the minimum size minBTSize that can be divided into binary trees and less than the maximum size maxBTSize that can be divided into binary trees, and the upper left coordinate of the lower or right block when the binary tree is divided is within the tile. If the depth of binary tree division is smaller than the maximum divisible depth, split_bt_mode indicating whether or not to perform binary tree division and the direction of the binary tree is notified. If the block is further divided into binary trees, split_bt_flag is set to 1. If the block is not divided into binary trees, split_bt_mode is set to 0. When split_bt_mode is 1, recursively calls coding_binarytree () to notify whether or not to perform binary tree division. When split_bt_mode is 0, coding_unit (x0, y0, log2CbWidth, log2CbHeight) is called to actually encode or decode the block.

Also, as shown in FIG. 30, if any of the two blocks obtained by binary tree partitioning is outside the tile effective area (or outside the tile encoding area), that block is not encoded. Specifically, among the blocks obtained by dividing the upper and lower parts, coding_binarytree (x0, y1, log2CbWidth, log2CbHeight-1, cqtDepth, cbtDepth + 1, wT, hT, xTile, which is a block located at (x0, y1) yTile) is encoded or decoded when y1 is located in the tile.

if (y1-yTile <hT)
coding_binarytree (x0, y1, log2CbWidth, log2CbHeight-1, cqtDepth, cbtDepth + 1, wT, hT, xTile, yTile)
Similarly, coding_binarytree (x1, y0, log2CbWidth-1, log2CbHeight, cqtDepth, cbtDepth + 1, wT, hT, xTile, yTile) which is a block located at (x1, y0) among the blocks obtained by right and left splitting is , X1 is encoded or decoded when located in the tile.

if (x1-xTile <wT)
coding_binarytree (x1, y0, log2CbWidth-1, log2CbHeight, cqtDepth, cbtDepth + 1, wT, hT, xTile, yTile)
By the coordinate calculation process and the division process described in the first and second embodiments and the above, the picture can be divided into tiles having a size that does not depend on a multiple of the CTU.

(Embodiment 3)
In Embodiment 3 of the present application, when a display (projection image) is a spherical surface, such as a 360-degree video or a VR video, an image mapped to a two-dimensional image is encoded in order to encode image data during transmission / storage. Processing will be described.

An example of generating a two-dimensional image by packing projection images is shown in FIGS. FIGS. 17 (a) and 17 (c) are ERP (Equi-Rectangular Projection) Formats, and the sphere is expressed as a rectangle by enlarging the region laterally away from the equator. FIG. 17 (c) shows a cube format. The vertical line area in FIG. 17 (c) is an area where no image data exists. Mapping and packing into a two-dimensional image as shown in FIG. 17 (a) are performed on the image as preprocessing before being input to the moving image encoding device 11. The picture dividing unit 2010 in FIG. 11 assigns tiles to the rectangles 1 to 11 in FIG. 17A and the rectangles 0 to 5 in FIG. 17C, and each tile is encoded by the tile encoding unit 2012. Is done.

Or, for example, FIG. 18 is a cubic-like ERP format, and the equator region is divided into 5 and 6, as shown in FIG. 18 (a). Then, packing is performed together with a rectangle corresponding to the polar area generated by rotation, and a rectangular area as shown in FIG. 18B is generated in the preprocessing. In FIG. 18B, the picture dividing unit 2010 in FIG. 11 tiles, for example, a rectangle 6, a rectangle composed of triangular regions 1 to 4, a rectangle 5 and a rectangle composed of triangular regions 7 to 10, respectively. Each tile is encoded by the tile encoding unit 2012.

Fig. 19 shows SPP (Segmented Sphere Projection) Format, where the polar region is represented by

circle regions

1 and 2 in Fig. 19 (a) and the equator region is represented by rectangles 3-6 in Fig. 19 (a). The vertical line area outside the circle is an invalid area without image data. The picture dividing unit 2010 in FIG. 11 assigns tiles to the

rectangles

1 and 2 and the rectangles 3 to 6 in which the circular area is expanded, and each tile is encoded by the tile encoding unit 2012.

In this way, when encoding an image in which a spherical surface is two-dimensionally mapped, as shown in FIG. 4D, when the image is divided into tiles, the number of tiles included in each tile row may be equal. On the other hand, as shown in FIGS. 17 (a), 17 (c), and 18 (b), when dividing an image into tiles, the number of tiles included in each tile row may not be equal. Alternatively, the number of tiles included in each tile row may not be equal. In such a case, as shown in FIG. 5 (i), the syntax of the tile information notifies information on the number of tiles in the vertical direction (num_tile_rows_minus1), and information on the height of the tile (row_height_minus1 [ i]), information on the number of tiles in the horizontal direction (num_tile_columns_minus1), and information on the width of the tile (column_width_minus1 [i]). Also, the overlap area information (overlap_tiles_info ()) shown in FIG. 5 (j) is notified. In overlap_tiles_info (), when the overlap width or height of all tiles is uniform (uniform_overlap_flag = 1), the same syntax as in FIG. 5 (f) is encoded. Otherwise (uniform_overlap_flag = 0), information on the overlap height (tile_overlap_height_div2 [i]) and information on the overlap width of each tile (tile_overlap_width_div2 [i]) are notified for each tile row.

The header information generation unit 2011 generates the syntax shown in FIGS. 5 (i) and 5 (j) and outputs it to the tile encoding unit 2012 and the encoded stream generation unit 2013.

9, the header information decoding unit 2001 decodes the syntaxes shown in FIGS. 5 (i) and 5 (j) and outputs the decoded syntaxes to the tile decoding unit 2002 and the tile synthesis unit 2003.

In this way, by reporting the number of tiles in the horizontal direction, the width, and the width of the overlap area for each tile row, 360-degree video and VR can be used without changing the encoding method of the two-dimensional image at the tool level. Video can be encoded / decoded.

(Embodiment 4)
In Embodiment 3, a picture is directly divided into tiles. In Embodiment 4 of the present application, a method of dividing a picture into regions and dividing the region into tiles will be described. In this embodiment, a picture is hierarchically divided into two stages using a region that can be arranged in a picture using a designated position and size, and tiles that are divided into rectangular sizes within the region. A region is a collection of continuous regions in a projection image or regions using the same mapping method.

FIG. 17 (b) is an example in which the picture is divided into tiles shown in FIG. 17 (a) by dividing the picture into three regions and further dividing each region into tiles.

FIG. 17 (d) is an example in which the picture is divided into tiles shown in FIG. 17 (c) by dividing the picture into three regions and further dividing each region into tiles. FIG. 17 (e) is another example in which each region is divided into tiles. Region 0 is divided into tiles Tile [0] [0] and invalid region tiles Tile [1] [0] to Tile [3] [0]. Region 1 is divided into tile Tile [0] [0] and tile Tile [1] [0]. Region 2 is divided into tile Tile [0] [0] and invalid area tiles Tile [1] [0], Tile [2] [0], and Tile [3] [0]. The region 1 may be processed as one tile Tile [0] [0].

As in FIG. 17 (d), FIG. 18 (c) is a region corresponding to 18 (b). Region 0 in FIG. 18 (c) corresponds to rectangle 6 in FIG. 18 (b), and region 1 corresponds to triangle regions 1 to 4, rectangle 5, and triangle regions 7 to 10 in FIG. 18 (b). Triangular areas 1 to 4, rectangular 5, rectangular 6, and triangular areas 7 to 10 are continuous areas in the projection image. FIG. 18 (d) shows an example in which each region is divided into tiles. Region 0 is divided into tiles Tile [0] [0], tiles Tile [1] [0], and Tile [2] [0]. Region 1 includes tile Tile [0] [0] that includes triangular areas 1 to 4, tile Tile [1] [0] that is rectangular 5, and tile Tile [2] [0] that includes triangular areas 7 to 10. Divided. Region 0 may be processed as one tile Tile [0] [0].

Fig. 19 (b) shows the region corresponding to Fig. 19 (a). Region 0 in FIG. 19B corresponds to the

circular regions

1 and 2 in FIG. 19A and the surrounding invalid regions, and region 1 corresponds to rectangles 3 to 6 in FIG. 19A. The rectangles 3 to 6 are continuous regions in the projection image, and the

circular regions

1 and 2 are not continuous regions in the projection image, but both are polar regions and the mapping method is the same. FIG. 19 (c) is an example in which each region is divided into tiles. The region 0 is divided into a tile area Tile [0] [0] of the invalid area around the circular area 1 and its surrounding area, and a tile Tile [1] [0] of the invalid area around the circular area 2 and its surrounding area. Region 1 is divided into tiles Tile [0] [0] to Tile [3] [0] assigned to rectangles 3-6.

FIG. 31 shows a hierarchical structure of pictures, regions, tiles, and CTUs. FIG. 31 (a) is a diagram showing one picture. FIG. 31 (b) is a diagram of regions (Region 0, Region 1, and Region 2) obtained by dividing this picture into three. FIG. 31 (c) is a diagram of tiles obtained by further dividing each region. FIG. 31 (d) is a diagram of a CTU obtained by further dividing the tile obtained by dividing Region0 in FIG. 31 (c).

As shown in FIG. 31 (d), the upper left coordinates (xRs0, yRs0), width wReg [0], and height hReg [0] of the region Region [0] may not be an integer multiple of the CTU. Also, the upper left coordinates (xTsmn, yTsmn), width wAT [m], and height hAT [n] of the tile active area Tile [m] [n] of the tile divided from Region Region [0] are not an integral multiple of CTU. May be.

Fig. 20 (k) shows the syntax for dividing a picture into regions and dividing the region into tiles. region_parameters () is a syntax indicating region information, and is called from PPS. In FIG. 4B described above, tile_parameters () is notified by PPS, but in this embodiment, region_parameters () is notified by PPS, and tile_parameters () is notified in region_parameters ().

In region_parameter () of Fig. 20 (k), num_region_minus1 indicates the value obtained by subtracting 1 from the number of regions. When num_region_minus1 is 0, there is one region, and the syntax notified thereafter is the same as when the picture is directly divided into tiles. When num_region_minus1 is larger than 0, the upper left coordinates (region_topleft_x [i], region_topleft_y [i]), width region_width_div2_minus1 and height region_height_div2_minus1 are notified in each region. region_width_div2_minus1 and region_height_div2_minus1 are values obtained by dividing the width and height of the region by 2, and the actual region width wReg and height hReg are expressed as follows.

wReg [p] = region_width_div2_minus1 [p] * 2 + 1
hReg [p] = region_height_div2_minus1 [p] * 2 + 1
The width wAT [m] and the height hAT [n] of the tile active area are the same as the width wPict of the picture in any of (Formula TAS-1) to (Formula TAS-4) already described when uniform_spacing_flag is 0. HPict may be derived by replacing the region widths wReg [p] and hReg [p]. If uniform_spacing_flag is not 0, it may be derived using (Expression TAS-5). Formulas in which wPict and hPict in (Formula TAS-1) are replaced with wReg [p] and hReg [p] are shown below.

for (m = 0; m <M; m ++)
wAT [m] = ((m + 1) * wReg [p]) / M- (m * wReg [p]) / M
for (n = 0; n <N; n ++)
hAT [n] = ((n + 1) * hReg [p]) / N- (n * hReg [p]) / N
Here, M and N indicate the number of tiles in the region in the horizontal direction and the number in the vertical direction. The upper left coordinates (xRsp, yRsp) of the region Region [p] are set as follows.

xRsp = region_topleft_x [p] (Formula REG-1)
yRsp = region_top_left_y [p]
Note that region_width_div2_minus1 [p] and region_height_div2_minus1 [p] are either expressed in 2 pixel units or in 1 pixel units depending on the color difference format (4: 2: 0, 4: 2: 2, 4: 4: 4) It may be switched whether or not.

In addition, in order to encode and decode regions in parallel, CABAC initialization is performed at the beginning of the region as well as at the beginning of the slice and tile.

The fill_color_present_flag is a flag indicating whether to notify a value to be set to the pixel value of the invalid tile area (invalid area) to a tile area (hereinafter, invalid tile) that is not encoded in the picture or region, and fill_color_present_flag Is 1, the pixel value (fill_color_y, fill_color_cb, fill_color_cr) of the invalid area is notified. When fill_color_present_flag is 0, the pixel value of the invalid area is black (0, (1 << bitdepth-1), (1 << bitdepth-1)) or gray ((1 << bitdepth-1), (1 Set to << bitdepth-1), (1 << bitdepth-1)), etc. Here, bitdepth is the bit depth of the pixel value.

Also notify tile_parameters () for each region. tile_parameters () and tile information tile_info () included in the tile_parameters () may be expressed by the syntax shown in FIGS. 4 (c) and 4 (d). The tile divides the region uniformly with the upper left coordinates of the region (region_topleft_x [i], region_topleft_y [i]) as (0,0).

FIG. 11 (c) is an example of the picture dividing unit 2010 of FIG. 11 (a) that implements the fourth embodiment. In FIG. 11 (c), the picture dividing unit 2010 includes a region information calculating unit 20103, a tile information calculating unit 20101, and a picture dividing unit B20104. The region information calculation unit 20103, for example, region information for dividing the input image into regions as shown in FIGS. 17 (d), 18 (c), and 19 (b) (number of regions, upper left coordinates, Width and height, pixel values set in the invalid area, etc.) are calculated. The tile information calculation unit 20101 refers to the region information calculated by the region information calculation unit 20103, replaces the picture with the region, and divides the region into tiles by the method described in the third embodiment (for example, FIG. 17 (e), FIG. 18 (d), FIG. 19 (c), FIG. 31 (c), etc.) are calculated. The picture division unit B20104 divides the picture into regions by referring to the region information, and divides the region into tiles with reference to the tile information.

The header information generation unit 2011 generates the syntax shown in FIG. 20 (k) and outputs it to the tile encoding unit 2012 and the encoded stream generation unit 2013.

The tile encoding unit 2012 encodes the divided tiles, and the encoded stream generation unit 2013 generates an encoded stream Te from the encoded stream of each tile.

9, the header information decoding unit 2001 decodes the syntax shown in FIG. 20 (k) and outputs the decoded syntax to the tile decoding unit 2002 and the tile composition unit 2003. The tile decoding unit 2002 decodes the encoded stream of the designated tile and outputs it to the tile synthesis unit 2003.

The smoothing processing unit 20031 of the tile composition unit 2003 outputs a tile obtained by filtering the overlap region to the composition unit 20032 if there is an overlap region of the tile, and performs tile decoding if there is no overlap region of the tile. The output tile of the part 2012 is output as it is to the composition part 20032. The tile synthesis unit 20032 synthesizes the decoded image of the designated area from the region information and tile information decoded by the header information decoding unit 2001.

In this way, after dividing a picture into regions and then dividing the region into tiles, the size of the tiles in the region can be set almost uniformly. Therefore, the tile information notified by the header can be reduced as compared with the third embodiment. In addition, since the projection image is almost discontinuous at the region boundary, there is no need to provide an overlap region, but at the tile boundary in the region, the projection image is often continuous, so an overlap region is necessary. It is. Therefore, redundant encoded streams can be reduced by not providing overlap regions at region boundaries.

Also, from the projection format (ERP, SSP, etc.) and the packing method as shown in FIGS. 17 to 19, the position where adjacent tiles are continuous in the projection image can be specified. Therefore, at the tile boundary, an overlap area is provided in the tile at a position where projection images are continuous, and if not, no overlap area is provided. For example, in Figure 18 (d), the boundary between Tile [0] [0] and Tile [1] [0] in region 0, the boundary between Tile [1] [0] and Tile [2] [0], and the region An overlap region is provided at the boundary between Tile [0] [0] and Tile [1] [0] and at the boundary between Tile [1] [0] and Tile [2] [0]. Also, for example, in FIG. 19C, no overlap region is provided at the boundary between Tile [0] [0] and Tile [1] [0] in region 0, and Tile [0] [0] in region 1 And Tile [1] [0], Tile [1] [0] and Tile [2] [0], and Tile [2] [0] and Tile [3] [0] Provide. When no overlap region is provided, the width wOVLP and height hOVLP of the overlap region are set to 0, and overlap_tiles_flag is set to 0.

In this way, when the overlap area is unnecessary, the information on the width and height of the overlap area is not encoded, so that the header information can be reduced. In addition, since the redundant code amount due to the multiple encoding of the same region is reduced by the overlap, it is possible to suppress a decrease in encoding efficiency.

Figure 32 shows the syntax for the region. In FIG. 32, while the flag end_of_region_flag indicating whether or not the end of the region is 0 (not the end of the region), the CTU syntax coding_tree_unit () and end_of_region_flag are notified. At the end position of the tile, end_of_subset_one_bit (= 1) indicating the end of the tile is notified, and byte alignment is performed. The end position of the tile is determined by the following formula.

if (tiles_enabled_flag &&CtbAddrInTile> = NumCtbInTile [TileId])
CtbAddrInTs indicates the CTU address through the picture, NumCtbInTile [] indicates the number of CTUs in the tile, and CtbAddrInTile indicates the CTU address in the tile. If CtbAddrInTile is greater than or equal to NumCtbInTile [], it represents the outside of the target tile, so it can be seen that it is the end of the target tile. In FIG. 32, the tile identifier TileId is incremented by 1 at the end of the tile. That is, TileId is unique within a region and is reset to 0 at the beginning of a different region.

Next, FIG. 33 shows the syntax coding_tree_unit () of the CTU when the tile is divided regardless of the multiple of the CTU. In order to cope with the case where the upper left coordinate of the tile (tile effective area) is not an integer multiple of the CTU, the upper left coordinate (xCtb, yCtb) of the CTU is derived for each tile. Specifically, the CTU in-tile coordinates derived from the in-tile address CtbAddrInTile ((CtbAddrInTile% TileWidthinCtbs [TileId]) << CtbLog2SizeY, (CtbAddrInTile / TileWidthinCtbs [TileId]) << CtbLog2SizeY) [TileId], TileAddrY [TileId]) and region upper left coordinates (RegionAddrX [RegId], RegionAddrY [RegId]) are added to derive the CTU coordinates of the tile in the picture xCtb = ((CtbAddrInTile% TileWidthinCtbs [TileId]) << CtbLog2SizeY) + TileAddrX [TileId] + RegionAddrX [RegID]
yCtb = ((CtbAddrInTile / TileWidthinCtbs [TileId]) << CtbLog2SizeY) + TileAddrY [TileId] + RegionAddrY [RegID]
Where TileWidthinCtbs [] is the width of the tile effective area in CTU units, wT [] and hT [] are the width and height in pixel units of the tile effective area, CtbLog2SizeY is the logarithmic value of the CTU size, and (TileAddrX, TileAddrY) is The upper left coordinates of the tile in pixel units, (RegionAddrX [RegId], RegionAddrY [RegId]) are the upper left coordinates of the region in pixel units. The upper left coordinates (TileAddrX, TileAddrY) of the tile in pixels and the upper left coordinates of the region (RegionAddrX [RegId], RegionAddrY [RegId]) are derived from (Expression TLA-1) and (Expression TLA-2) (xTsmn, yTsmn), (xRsp, yRsp) derived by (Expression REG-1) may be set. Note that the width and height (wTile [], hTile []) of the tile coding area may be used instead of the width and height (wT [], hT []) of the tile effective area.

Fig. 34 shows another syntax indicating a region. In FIG. 32, the slice is divided into regions and the regions are divided into tiles. However, in FIG. 34, the regions may be divided into slices and tiles. Region information (region shape and size) is notified by PPS as shown in FIG. 20 (k). When the end of the region or tile is detected in the process of decoding slice_segment_data (), end_of_region_flag (= 1) or end_of_subset_one_bit (= 1) is inserted and byte aligned. The end condition of the tile is the following equation, as in FIG.

if (tiles_enabled_flag &&CtbAddrInTile> = NumCtbInTile [RegId] [TileId])
When the CTU address in the tile reaches a predetermined value NumCtbInTile [RegId] [TileId] or more, the processing of the target tile ends, TileId is incremented, and processing of the next tile starts. The region end condition is when the following expression no longer holds.

while (TileId <NumTilesInRegion [RegId])
When TileId becomes equal to or greater than a predetermined value NumTilesInRegion [RegId], the processing of the target region ends, RegId is incremented, TileId and CtbAddrInTs are reset, and processing of the next region starts. As described above, TileId and CtbAddrInTs are reset in units of regions.

Note that the syntax of coding_tree_unit (TileId) called in Fig. 34 is the same as in Fig. 33, and the upper left coordinate of CTU is the tile or region in order to process regions and tiles that are not necessarily a multiple of CTU. Is calculated using the upper left coordinates of.

As described above, a region having a size that is not necessarily a multiple of CTU can be divided into tiles for encoding and decoding.

(Embodiment 5)
In the fifth embodiment, an example in which a tile in an invalid area is notified in the third and fourth embodiments will be described.

FIG. 17 (e) is a diagram in which FIG. 17 (c) is divided into regions and then divided into tiles.

Regions

0 and 2 are divided into 4 tiles, and region 1 is divided into 2 tiles. In

regions

0 and 2, tile Tile [0] [0] is an effective area having an area corresponding to the projection image, but tiles Tile [1] [0], Tile [2] [0], and Tile [3 ] [0] is an invalid area. Therefore, Tile [1] [0], Tile [2] [0], and Tile [3] [0] do not need to be encoded / decoded.

In the syntax shown in FIG. 20 (l), a tile tile_valid_flag for notifying a tile in an invalid area is included in tile information, a tile with a tile_valid_flag of 1 is decoded, and a tile with a tile_valid_flag of 0 is not decoded. The other syntax is the same as that shown in FIG. In FIG. 20 (l), the information on the tile width and height is notified of the information on the number of tiles in the vertical direction (num_tile_rows_minus1), the information on the tile height (row_height_minus1 [i]) for each tile row, Information on the number of horizontal tiles (num_tile_columns_minus1) and tile width information (column_width_minus1 [i]) are notified. As in FIG. 4 (d), tile height information (row_height_minus1 [i]) and tile height Information about the width of the tiles in the horizontal direction (column_width_minus1 [i]) may be notified for the number in the vertical direction and the number in the horizontal direction.

Also, the pixel value of the invalid area may be notified by setting fill_color_present_flag to 1 in FIG. 20 (k) and filling_color_y, fill_color_cb, and fill_color_cr.

As another example of the invalid area, there is Right-angled Triangular resion-wise packing for cube map projection Format shown in Fig. 35. As shown in Figure 35 (a), Right-angled Triangular resion-wise packing for cube map project 立方 Format only packs the cube surface (front, left, half of Top, Bottom) that can be seen from the right front. , Encode. The form of this packing is shown in FIG. 35 (b). The picture in FIG. 35 (b) consists of three regions. Region [0] consists of Front and Left in FIG. 35 (a). Region [1] consists of a half area (triangle area) of each of Top and Bottom in FIG. 35 (a) and a padding area between two triangles. Region [2] is an invalid region that does not exist in FIG. 35 (a), and is generated when the heights of region [0] and region [1] are different. As shown in Figure 35 (c), region [0] is the upper left coordinate (xRs [0], yRs [0]), width wReg [0], height hReg [0], and region [1] is upper left Coordinates (xRs [1], yRs [1]), width wReg [1], height hReg [1], region [2] is the upper left coordinate (xRs [2], yRs [2]), width wReg [ 2], height hReg [2].

11 generates the syntax shown in FIG. 20 (l) and outputs it to the tile encoding unit 2012 and the encoded stream generating unit 2013. The tile encoding unit 2012 encodes only valid tiles.

9, the header information decoding unit 2001 decodes the syntax shown in FIG. 20 (l) and outputs the decoded syntax to the tile decoding unit 2002 and the tile synthesis unit 2003. The tile decoding unit 2002 decodes an encoded stream of valid tiles and outputs the decoded stream to the tile synthesis unit 2003.

Other encoding / decoding processes are the same as those in the third and fourth embodiments.

By notifying the flag indicating the validity / invalidity of the tile, the moving image encoding device and the moving image decoding device perform only necessary encoding / decoding processing, so that useless processing can be reduced.

(Embodiment 6)
In the first to fifth embodiments, the technique for dividing an image into tiles and encoding them independently and decoding only the necessary tiles in order to display a desired region has been described. In the sixth embodiment, a technique for encoding and decoding independently for each region will be described. In this case, tiles obtained by dividing a region do not refer to adjacent tiles in the spatial direction, but can refer to tiles at different times belonging to the same region in the temporal direction. Further, a loop filter may be applied to the tile boundary. This is the same as the process of encoding / decoding a region as a conventional picture. Accordingly, for each region notified in the slice data (slice_segment_data ()) shown in FIG. 20 (m), encoding / decoding is completed in units of regions (Region ()). The region of width wReg [i] and height hReg [i] with the upper left coordinates (region_topleft_x [i], region_topleft_y [i]) as (0,0) is regarded as one picture, and in FIG. 20 (n) In the Region () shown, the syntax of Tile () shown in FIG. 5 (h) may be notified in the raster scan order. Note that the initial value of the quantization parameter defined by the slice may be used as the first quantization parameter of each region. When dividing a picture into regions, the picture may be processed as one slice. Also, instead of FIG. 20 (m), FIG. 20 (n), and FIG. 5 (h), the encoding process or the decoding process is performed independently for each region using the syntax shown in FIG. 32 or FIG. May be.

変更 By changing the independent processing for each tile to independent processing for each region, the information that can be referred to by each tile (information on adjacent tiles of the collocated tile in the region) increases. Therefore, it is possible to decode only a part of the screen while suppressing a decrease in encoding efficiency.

(Embodiment 7)
Another embodiment of the method for dividing a picture into tiles will be described with reference to FIG. In the first to sixth embodiments, the area including the overlap area and the crop offset area (tile invalid area) is defined in CTU units based on the upper left coordinates of the net display area (tile active area) that is not limited to an integer multiple of the CTU. Encoding and decoding processes were performed. The upper left coordinate of the tile active area is not limited to an integer multiple of the CTU.

In the tile dividing method of the seventh embodiment, a tile (tile coding area) obtained by adding a crop offset area to a tile effective area including a tile active area and an overlap area without overlapping as shown in FIG. An arranged picture is generated, and this picture is used as an input picture to the moving picture coding apparatus 11. In this input image, the upper left coordinate of the tile coding area is set at a position that is an integral multiple of the CTU, and the size of the tile coding area is an integral multiple of the CTU. In the picture width pic_width_in_luma_samples and height pic_height_in_luma_samples notified by the SPS in FIG. 4 (a), the following formula includes an overlap area and a crop offset area, not the net size of the picture (first picture size). Is set (second picture size).

wPict = pic_width_in_luma_samples = ΣwTile [m] -wCRP [M-1] = Σ (wAT [m] + wOVLP [m] + wCRP [m])-wCRP [M-1] (Formula TCS-2)
hPict = pic_height_in_luma_samples = ΣhTile [n] -hCRP [N-1] = Σ (hAT [n] + hOVLP [n] + hCRP [n])-hCRP [N-1]
The picture width wPict and height hPict do not include the crop offset areas (wCRP [M-1] and hCRP [N-1]) at the right and bottom edges of the picture. Further, the image decoding device 31 decodes the tile encoding area, filters the overlap area with the adjacent tile active area, and discards the crop offset area, thereby reducing the size of the original picture (first image). (Picture size) picture is output. By processing tiles in units of CTUs in this way, the conventional tile encoding unit 2012 and tile decoding unit 2002 can be used for encoding processing and decoding processing, and the encoding processing and decoding processing are complicated. You can also reduce the degree.

FIG. 36 (a) is a diagram in which a picture is divided into tiles that are allowed to overlap and are not limited to an integral multiple of the CTU, as in the first embodiment. The shaded area is an overlap area, which is an area overlapping with an adjacent tile active area. FIG. 36 (b) is a diagram in which one tile of FIG. 36 (a) is taken out. The tile (tile effective area) Tile [m] [n] has a width wT [m] and a height hT [n], and the width wOVLP [m] and height hOVLP [n] of the overlap area shown by diagonal lines is wT Included in [m] and hT [n], respectively. The width wT [m], the height hT [n], and the upper left coordinates (sTsmn, yTsmn) of the tile effective area take values not limited to integer multiples of the CTU. FIG. 36 (c) is a picture generated by setting the upper left coordinate of the tile effective area at a position that is an integral multiple of the CTU so that adjacent tile effective areas do not overlap. This picture is an input image to the moving image encoding device 11. When the tile effective area is arranged in this way, the encoding process or the decoding process is performed with a tile encoding area (xTsmn, yTsmn) having a position that is an integral multiple of the CTU and a size that is an integral multiple of the CTU ( For width wTile [m] and height hTile [n]). The tile coding area is an area obtained by combining the tile effective area and the crop offset area (tile invalid area) as shown in (Formula TCS-1) or Formula (TCS-3). The upper left coordinates (xTsmn, yTsmn) of the tile coding area shown in FIG. 36 (c) are expressed by the following equations.

xTsmn = ΣwTile [i] = Σceil (wT [i]) (Σ is the sum of i = 0..m-1)
yTsmn = ΣhTile [j] = Σceil (hT [j]) (Σ is the sum of j = 0..n-1)
FIG. 37 shows syntax other than the picture width pic_width_in_luma_samples and the height pic_height_in_luma_samples. The tile_info () in FIG. 37 differs from the tile_info () in FIG. 25A in that the total_cropoffset_width and total_cropoffset_height are notified when the uniform_spacing_flag is not 0. total_cropoffset_width is the sum of the widths wCRP [m] (m = 0..M-2) of M-1 crop offset areas, and total_cropoffset_height is the height hCRP [n] (n of N-1 crop offset areas = 0..N-2), and when uniform_spacing_flag is not 0, it is used to calculate the width wT [m] and height hT [n] of the tile effective area.

wPict1 = wPict-total_croppoffset_width
for (m = 0; m <M; m ++)
wT [m] = ((m + 1) * wPict1) / M- (m * wPict1) / M
hPict1 = hPict-total_croppoffset_height
for (n = 0; n <N; n ++)
hT [n] = ((n + 1) * hPict1) / N- (n * hPict1) / N
Here, wPict and hPict are the width and height (second picture size) of the input image calculated by (Formula TCS-2). If uniform_spacing_flag is 0, the width wT [m] and height hT [n] of the tile effective area are calculated by substituting column_width_in_luma_samples_div2_minus1 [m] and row_height_in_luma_samples_div2_minus1 [n] into (Formula TSP-10) Otherwise, it is calculated by substituting column_width_minus1 [m] and row_width_minus1 [n] into any one of (Formula TSP-7) to (Formula TSP-9). Note that overlap_tiles_flag is a flag indicating the presence or absence of a crop offset area including an overlap area. The other syntax is the same as that in FIG.

Regarding overlap information, uniform_overlap_flag, tile_overlap_width_minus1 [], and tile_overlap_height_minus1 [] are notified by overlap_tiles_info () in FIG. If 0 is allowed for the size (width or height) of the overlap, the overlap width (tile_overlap_width []) and height (tile_overlap_height []) without subtracting 1 may be notified. Further, if the overlap size is always the same, uniform_overlap_flag may not be sent, and only a set of tile_overlap_width_minus1 and tile_overlap_height_minus1 may be earthed. Using these values, for example, the width wOVLP [m] and the height hOVLP [n] of the overlap region may be calculated by (Expression OVLP-1) or (Expression OVLP-2). Further, for example, the width wCRP [m] and the height hCRP [n] of the crop offset area may be calculated by (Expression CRP-1).

On the other hand, in the syntax below slice data (slice_segment_data ()), the tile (tile coding area) processed by the tile coding unit 2012 or the tile decoding unit 2002 is an integer multiple of the CTU, and the top of the tile is an integer of the CTU. Since the double position is set, the conventional slice_segment_data () and coding_tree_unit () shown in FIG. 23 may be used.

Processing following slice data is the same as the conventional tile encoding unit 2012 and tile decoding unit 2002 that process tiles independently. However, since the encoding target is an input image including an overlap region and a crop offset region, the processing content of the picture dividing unit 2010 is different from the processing described in Embodiments 1 to 6 in the encoding processing. In the decryption processing, the processing content of the tile composition unit 2003 is different from the processing described in the first to sixth embodiments. These processes will be described below.

In the moving image encoding device 11, the tile information calculation unit 20101 of the picture dividing unit 2010 calculates the width wAT of the tile active area having no overlap as shown in FIG. 26 (a) from the picture size (first picture size). [m], height hAT [n], overlap area width wOVLP, height hOVLP, crop offset area width wCRP, height hCRP, tile effective area width wT [m], height hT [n], Tile information including the width wTile [m] and the height hTile [n] of the tile coding area is calculated.

The picture dividing unit A20102 of the picture dividing unit 2010 divides the picture into tile active areas according to the tile information calculated by the tile information calculating unit 20101, and includes the tile effective area Tile [m] [n ] Is copied to a memory of a size (second picture size) that can store the area of (wPict, hPict) calculated by (Expression TCS-2). The memory size may be set to a size (wPict + wCRP [M-1], hPict + hCRP [N-1]) obtained by expanding (wPict, hPict) to an integer multiple of CTU. As shown in FIG. 36 (c), the tile effective area Tile [m] [n] is arranged such that the upper left coordinate is an integer multiple of the CTU and the tile effective areas do not overlap. Next, the picture dividing unit 2010 sets pixel values in an area outside the tile effective area where no pixel value is set (crop offset area). The pixel value to be set may be a pixel value of a tile effective area that is in contact with the crop offset area. The pixel value vPic (x, y) at the pixel position (x, y) in the crop offset area is derived from the pixel value in the tile effective area by the following equation.

vPic [x] [y] = Tile [m] [n] [wT [m] -1] [y] (wT [m] <x <wTile [m], 0 <= y <hT [n])
vPic [x] [y] = Tile [m] [n] [x] [hT [n] -1] (0 <= x <wT [m] <x <hT [n] <y <hTile [n] )
vPic [x] [y] = Tile [m] [n] [wT [m] -1] [hT [n] -1] (wT [m] <x <wTile [m], hT [n] <y <hTile [n])
Alternatively, a predetermined value, for example, (Y, Cb, CR) = (2 ^ (NBIT-1), 2 ^ (NBIT-1), 2 ^ (NBIT-1)) may be used. Here, NBIT is the number of bits of the pixel value of the picture. The picture dividing unit A20102 outputs the input image having the second picture size generated in this way to the tile encoding unit 2012 for each tile encoding region. The tile encoding unit 2012 encodes each tile encoding area and generates an encoded stream of each tile encoding area. The encoded stream generation unit 2013 generates an encoded stream of an input image from the encoded stream of each tile encoding area.

In the moving picture decoding apparatus 31, the header information decoding unit 2001 decodes header information including tile information from the input encoded stream, and outputs an input stream of each tile encoding area to the tile decoding unit 2002. The tile decoding unit 2002 decodes each tile coding area from the input stream and outputs the decoded region to the tile synthesis unit 2003.

When the overlap_tiles_flag is 1, the smoothing processing unit 20031 uses the overlap region of each tile decoded by the tile decoding unit 2002, for example, filter processing shown in (Expression FLT-1) to (Expression FLT-3) (Averaging processing, weighted averaging processing) is performed, and the pixel value (in this case, tmp) of the overlapped region subjected to the filter processing is overwritten in the memory shown in FIG. For example, the filter processing result of the overlap area at the right end of Tile [0] [0] and the tile active area at the left end of Tile [1] [0] is displayed in the tile active area at the left end of Tile [1] [0]. Overwritten, the tiled active area at the top edge of Tile [0] [1] and the tile active area at the top edge of Tile [0] [1] Will be overwritten.

The synthesizer 20032 uses the tile active area (wAT [m]) from the memory of the second picture size wPict * hPict or from the memory of (wPict + wCRP [M-1]) * (hPict + hCRP [N-1}). , hAT [n]) is extracted, and a decoded image having the original picture size (first picture size) is synthesized even if it is arranged so as not to overlap. Here, the original picture size is the sum of the width and height of each tile active area (ΣwAT [m], ΣhAT [n]), which is the size of the display image.

By making it possible to process tiles in units of CTUs in this way, conventional tile encoding processing and tile decoding processing can be used for encoding processing and decoding processing, and encoding processing and decoding are possible. Processing complexity can also be reduced.

A moving image decoding apparatus according to an aspect of the present invention is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information. A header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image The tile includes an area that overlaps with an adjacent tile, and the synthesis unit filters a plurality of pixel values of each pixel in the overlap area of the tile, and the decoded image of the tile A display image is generated using the pixel value and the filtered pixel value.

In the video decoding device according to an aspect of the present invention, the tile decoding unit decodes the target tile with reference to only the information on the target tile and the information on the collocated tile of the target tile.

In the video decoding device according to an aspect of the present invention, the tile information includes the number of tiles, the width, the height, the presence / absence of overlap between adjacent tiles, and the width of the overlap region when the tiles overlap. And height.

In the video decoding device according to an aspect of the present invention, the upper left coordinate of the tile is not limited to an integer multiple of the CTU.

In the video decoding device according to an aspect of the present invention, the tile includes an area that overlaps an adjacent tile and a crop offset area (tile invalid area), and the tile includes an overlapping area and a crop offset area. The size is an integral multiple of the CTU, and the upper left coordinate of the tile is limited to a position that is an integral multiple of the CTU.

In the moving picture decoding apparatus according to an aspect of the present invention, the filtering process of the synthesis unit is a simple average of pixel values of a plurality of overlap regions.

In the video decoding device according to an aspect of the present invention, the filtering process of the synthesis unit is a weighted sum that changes the weight depending on the distance from the tile boundary for the pixel values of the plurality of overlap regions. It is characterized by being.

A moving image encoding apparatus according to an aspect of the present invention is a moving image encoding apparatus that divides an image into tiles and encodes a moving image in units of tiles, and includes a tile information calculation unit that calculates tile information; A division unit that divides an image into tiles and a tile encoding unit that encodes tiles and generates an encoded stream are provided, and the division unit divides the tiles by allowing overlap.

In the video encoding device according to an aspect of the present invention, the tile encoding unit encodes the target tile with reference to only the information on the target tile and the information on the collocated tile of the target tile. And

In the video encoding apparatus according to an aspect of the present invention, the tile information includes the number of tiles, the width, the height, the presence / absence of overlap between adjacent tiles, and the overlap area when tiles overlap. Including width and height.

In the moving picture coding apparatus according to an aspect of the present invention, the dividing unit divides an image into tiles without limiting the upper left coordinates of the tiles to a position that is an integral multiple of a CTU.

In the video encoding device according to one aspect of the present invention,
In the tiles at the right and bottom edges of the image, the division unit has a crop offset area on the right and bottom tiles of the image when the width of the tile at the right edge of the image and the height of the tile at the bottom edge are not integer multiples of CTU. And the image is divided so that the width and height obtained by adding the tile and the crop offset area are an integral multiple of the CTU.

In the video encoding device according to an aspect of the present invention, the dividing unit divides the image into tiles that overlap with adjacent tiles and tiles that include a crop offset region, and the overlapping region and the crop offset region. The size of the tile including is an integer multiple of the CTU, and the upper left coordinate of the tile is set at a position that is an integer multiple of the CTU.

A moving image decoding apparatus according to an aspect of the present invention is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information. A header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image The tile information includes information on the number and width of tiles included in each tile row, the number of tiles included in the tile row is different, and the combining unit includes at least a decoded image of the tiles. A display image is generated by using the pixel value of.

A moving image encoding apparatus according to an aspect of the present invention is a moving image encoding apparatus that divides an image into tiles and encodes a moving image in units of tiles, and includes a tile information calculation unit that calculates tile information; A header information generation unit that encodes header information including the tile information; a division unit that divides an image into tiles; and a tile encoding unit that encodes a tile and generates an encoded stream. The tile information calculation unit divides the image into tiles so that the number of tiles included in the tile row is different, the tile information calculation unit calculates tile information regarding the number and width of tiles included in each tile row, and the header information generation unit The syntax of the tile information is generated.

A moving image decoding apparatus according to an aspect of the present invention is a moving image decoding apparatus that divides an image into regions including one or more tiles and decodes a moving image in units of regions, and includes header information from an encoded stream. A header information decoding unit that decodes and calculates region information and tile information, a tile decoding unit that decodes encoded data for each tile and generates a decoded image of the tile, and refers to the region information and the tile information And a synthesis unit that synthesizes the decoded images of the tiles to generate a display image, and the synthesis unit generates a display image using at least pixel values of the decoded images of the tiles.

A moving image encoding apparatus according to an aspect of the present invention is a moving image encoding apparatus that divides an image into regions each including one or more tiles, and encodes a moving image in units of regions. Region information calculation unit for calculating the number of images, upper left coordinates, width and height, pixel values set in the invalid area, tile information calculation unit for calculating tile information, and header information including the region information and tile information Header information generation unit that generates the syntax of, a division unit that divides an image into regions, the region is divided into tiles starting from the upper left coordinates of the region, and tile encoding that encodes tiles and generates an encoded stream And a section.

In the video decoding device and the video encoding device according to an aspect of the present invention, the region information includes a flag for notifying whether or not each tile is included in an invalid area.

In the video decoding device according to an aspect of the present invention, when the flag included in the region information indicates that the target tile is included in an invalid area, the tile decoding unit does not decode the target tile, To do.

In the video decoding device according to one aspect of the present invention, the tile decoding unit decodes the target tile with reference to only information on the target tile, the collocated tile of the target tile, and tile information included in the same region. Features.

In the video encoding device according to an aspect of the present invention, the tile encoding unit encodes the target tile with reference to only information on the target tile, the collocated tile of the target tile, and tile information included in the same region. It is characterized by doing.

In the moving picture decoding apparatus according to an aspect of the present invention, the tile extension area overlaps with a tile active area of an adjacent tile, an overlap area used for reference and decoding, and a crop offset area that does not reference and decode ( Tile invalid area).

In the video decoding device according to an aspect of the present invention, the size of the tile active area and the overlap area is not an integer multiple of the CTU size, and the upper left coordinate of the tile is not limited to an integer multiple of the CTU. It is characterized by that.

A moving image decoding apparatus according to an aspect of the present invention is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information. A header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image The tile is composed of a tile effective area used for decoding / output and a crop offset area (tile invalid area) not used for decoding / output, and the tile effective area is a unit for dividing a picture It is composed of the tile active area that overlaps with the tile active area of the adjacent tile and is used for reference and decoding, The serial tile effective area, characterized in that decoding by CTU units.

In the video decoding device according to an aspect of the present invention, the size of the tile effective area and the crop offset area is not an integer multiple of the CTU size, and the upper left coordinate of the tile is not limited to an integer multiple of the CTU. It is characterized by that.

In the moving picture decoding apparatus according to an aspect of the present invention, the synthesis unit performs filtering using a simple average of pixel values of a plurality of overlapping regions.

In the video decoding device according to an aspect of the present invention, the synthesis unit performs filtering using a weighted sum that changes a weight depending on a distance from a tile boundary with respect to pixel values of a plurality of overlap regions. It is characterized by processing.

A moving image encoding apparatus according to an aspect of the present invention is a moving image encoding apparatus that divides an image into tiles and encodes a moving image in units of tiles, and includes a tile information calculation unit that calculates tile information; The image processing apparatus includes a division unit that divides an image into tiles, and a tile encoding unit that encodes tiles and generates an encoded stream. The tiles are hidden from a tile active area that is a unit for dividing a picture without overlapping. A region obtained by adding the tile extension region to the tile active region is encoded in units of CTUs.

In the video encoding device according to an aspect of the present invention, the tile extension area overlaps with a tile active area of an adjacent tile, an overlap area used for reference or encoding, and a crop that is not referenced or encoded. It is composed of an offset area (tile invalid area).

In the video encoding device according to an aspect of the present invention, the size of the tile active area and the overlap area is not an integral multiple of the CTU size, and the upper left coordinate of the tile is limited to an integer multiple of the CTU. It is characterized by not being.

A moving image encoding apparatus according to an aspect of the present invention is a moving image encoding apparatus that divides an image into tiles and encodes a moving image in units of tiles, and includes a tile information calculation unit that calculates tile information; The image processing apparatus includes a division unit that divides an image into tiles, and a tile encoding unit that encodes tiles and generates an encoded stream. The tile includes a tile effective area used for encoding and output, and encoding and output. It is composed of crop offset areas (tile invalid areas) that are not used for the tile. The tile valid areas overlap with the tile active areas that are units for dividing a picture and the tile active areas of adjacent tiles, and are used for reference and encoding. The tile effective area is encoded by CTU encoding.

In the video encoding device according to an aspect of the present invention, the size obtained by adding the tile effective area and the crop offset area is not an integral multiple of the CTU size, and the upper left coordinate of the tile is limited to a position that is an integral multiple of the CTU. It is characterized by not being.

In the video encoding device according to an aspect of the present invention, the tile information includes the number of tiles, the width, the height, the presence / absence of overlap between adjacent tiles, and the overlap area when the tiles overlap. Including width and height.

A video decoding device according to an aspect of the present invention is a video decoding device that divides an image into regions each including one or more tiles, and decodes the video in units of regions.
A header information decoding unit that decodes header information from the encoded stream and calculates region information and tile information; a tile decoding unit that decodes encoded data for each tile and generates a decoded image of the tile; and the region information And a combining unit that combines the decoded image of the tile with reference to the tile information to generate a display image, and the size of the region is not an integer multiple of the CTU size, and the upper left coordinate is limited to a position that is an integer multiple of the CTU. It is characterized by not being.

In the video decoding device according to an aspect of the present invention, the tile is a region obtained by dividing a rectangular region including a region and a region (guard band) that is not displayed outside the region.

A moving image encoding apparatus according to an aspect of the present invention is a moving image encoding apparatus that divides an image into regions each including one or more tiles, and encodes a moving image in units of regions. Region information calculation unit for calculating the number of images, upper left coordinates, width and height, pixel values to be set in the invalid area, etc., a tile information calculation unit for calculating tile information, and a header including the region information and the tile information A header information generation unit that generates information syntax, a division unit that divides an image into regions, divides the region into tiles starting from the upper left coordinates of the region, and a tile code that encodes tiles and generates an encoded stream And the region size is not an integral multiple of the CTU size, and the upper left coordinate is not limited to a position that is an integral multiple of the CTU. To.

In the moving picture encoding apparatus according to an aspect of the present invention, the dividing unit divides a rectangular area, which is a combination of a region and a non-display area (guard band) outside the region, into tiles.

A moving picture decoding apparatus according to an aspect of the present invention is a moving picture decoding apparatus that divides an image into tiles (tile coding areas) and decodes moving pictures in units of tile coding areas. A header information decoding unit that decodes information and calculates tile information; a tile decoding unit that decodes encoded data for each tile and generates a decoded image of a tile encoding region; and the tile information with reference to the tile information. And a synthesizing unit that generates a display image by synthesizing a decoded image of the encoding area, and the tile encoding area includes a tile active area, an overlap area, and a crop offset area, and the tile active area is a first picture. The crop offset area is an encoding process for setting the size of the tile encoding area to an integer multiple of the CTU. Are invalid regions that are not related to, the upper left coordinates of the tile coding region position of integral multiple of the CTU, the size of the tile coding region characterized in that it is set to an integral multiple of the CTU.

The moving picture coding apparatus according to an aspect of the present invention generates a second picture in which tiles (tile coding areas) are arranged without overlapping from the first picture, and each tile coding area Tile information calculation for calculating a second picture size (second picture size) and tile information (tile active area, overlap area, crop offset area size) And a picture dividing unit that divides a tile active area obtained by dividing the first picture according to the tile information, and a second picture composed of an overlap area and a crop offset area outside the tile active area. A tile encoding unit that encodes the tile encoding area and generates an encoded stream. The block area is a unit for dividing the first picture without overlapping, and the crop offset area is an invalid area not related to the encoding process for setting the size of the tile encoding area to an integral multiple of the CTU. The size of the second picture is calculated by adding the tile active area, the tile overlap area, and the crop offset area, and the upper left coordinate of the tile coding area is an integer multiple of the CTU on the second picture. The tile coding area size is set to be an integral multiple of the CTU.

(Example of software implementation)
In addition, a part of the tile encoding unit 2012 and the tile decoding unit 2002 in the above-described embodiment, for example, the entropy decoding unit 301, the prediction parameter decoding unit 302, the loop filter 305, the predicted image generation unit 308, the inverse quantization / inverse transformation. Unit 311, addition unit 312, predicted image generation unit 101, subtraction unit 102, transform / quantization unit 103, entropy encoding unit 104, inverse quantization / inverse transform unit 105, loop filter 107, encoding parameter determination unit 110, The prediction parameter encoding unit 111 may be realized by a computer. In that case, the program for realizing the control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed. Here, the “computer system” is a computer system built in either the tile encoding unit 2012 or the tile decoding unit 2002, and includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a hard disk built in a computer system. Furthermore, the “computer-readable recording medium” is a medium that dynamically holds a program for a short time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, In such a case, a volatile memory inside a computer system serving as a server or a client may be included and a program that holds a program for a certain period of time. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

Further, a part or all of the moving image encoding device 11 and the moving image decoding device 31 in the above-described embodiment may be realized as an integrated circuit such as an LSI (Large Scale Integration). Each functional block of the moving image encoding device 11 and the moving image decoding device 31 may be individually made into a processor, or a part or all of them may be integrated into a processor. Further, the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Further, in the case where an integrated circuit technology that replaces LSI appears due to progress in semiconductor technology, an integrated circuit based on the technology may be used.

As described above, the embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the scope of the present invention. It is possible to

[Application example]
The moving image encoding device 11 and the moving image decoding device 31 described above can be used by being mounted on various devices that perform transmission, reception, recording, and reproduction of moving images. The moving image may be a natural moving image captured by a camera or the like, or an artificial moving image (including CG and GUI) generated by a computer or the like.

First, it will be described with reference to FIG. 38 that the above-described moving image encoding device 11 and moving image decoding device 31 can be used for transmitting and receiving moving images.

FIG. 38 (a) is a block diagram showing a configuration of a transmission apparatus PROD_A in which the moving picture encoding apparatus 11 is mounted. As illustrated in (a) of FIG. 38, the transmission device PROD_A modulates a carrier wave with an encoding unit PROD_A1 that obtains encoded data by encoding a moving image, and with the encoded data obtained by the encoding unit PROD_A1. Thus, a modulation unit PROD_A2 that obtains a modulation signal and a transmission unit PROD_A3 that transmits the modulation signal obtained by the modulation unit PROD_A2 are provided. The moving image encoding device 11 described above is used as the encoding unit PROD_A1.

Transmission device PROD_A, as a source of moving images to be input to the encoding unit PROD_A1, a camera PROD_A4 that captures moving images, a recording medium PROD_A5 that records moving images, an input terminal PROD_A6 for inputting moving images from the outside, and An image processing unit PRED_A7 that generates or processes an image may be further provided. In FIG. 38A, a configuration in which all of these are provided in the transmission device PROD_A is illustrated, but a part may be omitted.

Note that the recording medium PROD_A5 may be a recording of a non-encoded moving image, or a recording of a moving image encoded by a recording encoding scheme different from the transmission encoding scheme. It may be a thing. In the latter case, a decoding unit (not shown) for decoding the encoded data read from the recording medium PROD_A5 in accordance with the recording encoding method may be interposed between the recording medium PROD_A5 and the encoding unit PROD_A1.

(B) of FIG. 38 is a block diagram illustrating a configuration of the receiving device PROD_B in which the moving image decoding device 31 is mounted. As shown in FIG. 38 (b), the receiving device PROD_B includes a receiving unit PROD_B1 that receives a modulated signal, a demodulating unit PROD_B2 that obtains encoded data by demodulating the modulated signal received by the receiving unit PROD_B1, and a demodulator A decoding unit PROD_B3 that obtains a moving image by decoding the encoded data obtained by the unit PROD_B2. The moving picture decoding apparatus 31 described above is used as the decoding unit PROD_B3.

The receiving device PROD_B is a display destination PROD_B4 for displaying a moving image, a recording medium PROD_B5 for recording a moving image, and an output terminal for outputting the moving image to the outside as a supply destination of the moving image output by the decoding unit PROD_B3 PROD_B6 may be further provided. In FIG. 38 (b), the configuration in which all of these are provided in the receiving device PROD_B is illustrated, but a part may be omitted.

Note that the recording medium PROD_B5 may be used for recording a non-encoded moving image, or is encoded using a recording encoding method different from the transmission encoding method. May be. In the latter case, an encoding unit (not shown) for encoding the moving image acquired from the decoding unit PROD_B3 according to the recording encoding method may be interposed between the decoding unit PROD_B3 and the recording medium PROD_B5.

Note that the transmission medium for transmitting the modulation signal may be wireless or wired. Further, the transmission mode for transmitting the modulated signal may be broadcasting (here, a transmission mode in which the transmission destination is not specified in advance) or communication (here, transmission in which the transmission destination is specified in advance). Refers to the embodiment). That is, the transmission of the modulation signal may be realized by any of wireless broadcasting, wired broadcasting, wireless communication, and wired communication.

For example, a terrestrial digital broadcast broadcasting station (broadcasting equipment, etc.) / Receiving station (such as a television receiver) is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives modulated signals by wireless broadcasting. A broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) of cable television broadcasting is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives a modulated signal by cable broadcasting.

In addition, a server (workstation, etc.) / Client (television receiver, personal computer, smartphone, etc.) such as a VOD (Video On Demand) service or a video sharing service using the Internet is a transmission device that transmits and receives modulated signals via communication. This is an example of PROD_A / receiving device PROD_B (normally, either a wireless or wired transmission medium is used in a LAN, and a wired transmission medium is used in a WAN). Here, the personal computer includes a desktop PC, a laptop PC, and a tablet PC. The smartphone also includes a multi-function mobile phone terminal.

In addition to the function of decoding the encoded data downloaded from the server and displaying it on the display, the video sharing service client has a function of encoding a moving image captured by the camera and uploading it to the server. That is, the client of the video sharing service functions as both the transmission device PROD_A and the reception device PROD_B.

Next, it will be described with reference to FIG. 39 that the above-described moving image encoding device 11 and moving image decoding device 31 can be used for recording and reproduction of moving images.

FIG. 39 (a) is a block diagram showing a configuration of a recording apparatus PROD_C equipped with the moving picture encoding apparatus 11 described above. As shown in (a) of FIG. 39, the recording device PROD_C includes an encoding unit PROD_C1 that obtains encoded data by encoding a moving image, and the encoded data obtained by the encoding unit PROD_C1 on the recording medium PROD_M. A writing unit PROD_C2 for writing. The moving image encoding device 11 described above is used as the encoding unit PROD_C1.

The recording medium PROD_M may be of a type built into the recording device PROD_C, such as (1) HDD (Hard Disk Drive) or SSD (Solid State Drive), or (2) SD memory. It may be of the type connected to the recording device PROD_C, such as a card or USB (Universal Serial Bus) flash memory, or (3) DVD (Digital Versatile Disc) or BD (Blu-ray (registered trademark)) ) Disc: registered trademark) or the like, it may be loaded into a drive device (not shown) built in the recording device PROD_C.

In addition, the recording device PROD_C is a camera PROD_C3 that captures moving images as a source of moving images to be input to the encoding unit PROD_C1, an input terminal PROD_C4 for inputting moving images from the outside, and a reception for receiving moving images A unit PROD_C5 and an image processing unit PROD_C6 for generating or processing an image may be further provided. FIG. 39A illustrates a configuration in which the recording apparatus PROD_C includes all of these, but some of them may be omitted.

The receiving unit PROD_C5 may receive a non-encoded moving image, or may receive encoded data encoded by a transmission encoding scheme different from the recording encoding scheme. You may do. In the latter case, a transmission decoding unit (not shown) that decodes encoded data encoded by the transmission encoding method may be interposed between the reception unit PROD_C5 and the encoding unit PROD_C1.

Examples of such a recording device PROD_C include a DVD recorder, a BD recorder, an HDD (Hard Disk Drive) recorder, and the like (in this case, the input terminal PROD_C4 or the receiver PROD_C5 is a main source of moving images). . In addition, a camcorder (in this case, the camera PROD_C3 is a main source of moving images), a personal computer (in this case, the receiving unit PROD_C5 or the image processing unit C6 is a main source of moving images), a smartphone (this In this case, the camera PROD_C3 or the reception unit PROD_C5 is a main source of moving images), and the like is also an example of such a recording apparatus PROD_C.

(B) of FIG. 39 is a block showing a configuration of a playback device PROD_D equipped with the above-described video decoding device 31. As shown in (b) of FIG. 39, the playback device PROD_D reads a moving image by decoding a read unit PROD_D1 that reads encoded data written on the recording medium PROD_M and a read unit PROD_D1 that reads the encoded data. And a decoding unit PROD_D2 to obtain. The moving picture decoding apparatus 31 described above is used as the decoding unit PROD_D2.

The recording medium PROD_M may be of the type built into the playback device PROD_D, such as (1) HDD or SSD, or (2) such as an SD memory card or USB flash memory. It may be of the type connected to the playback device PROD_D, or (3) may be loaded into a drive device (not shown) built in the playback device PROD_D, such as a DVD or BD. Good.

In addition, the playback device PROD_D has a display unit PROD_D3 that displays a moving image as a supply destination of the moving image output by the decoding unit PROD_D2, an output terminal PROD_D4 that outputs the moving image to the outside, and a transmission unit that transmits the moving image. PROD_D5 may be further provided. In FIG. 39 (b), a configuration in which the playback apparatus PROD_D includes all of these is illustrated, but a part thereof may be omitted.

The transmission unit PROD_D5 may transmit a non-encoded moving image, or transmits encoded data encoded by a transmission encoding scheme different from the recording encoding scheme. You may do. In the latter case, it is preferable to interpose an encoding unit (not shown) that encodes a moving image using a transmission encoding method between the decoding unit PROD_D2 and the transmission unit PROD_D5.

Examples of such a playback device PROD_D include a DVD player, a BD player, and an HDD player (in this case, an output terminal PROD_D4 to which a television receiver or the like is connected is a main moving image supply destination). . In addition, a television receiver (in this case, the display PROD_D3 is a main supply destination of moving images), a digital signage (also referred to as an electronic signboard or an electronic bulletin board), and the display PROD_D3 or the transmission unit PROD_D5 is the main supply of moving images. Desktop PC (in this case, output terminal PROD_D4 or transmission unit PROD_D5 is the main video source), laptop or tablet PC (in this case, display PROD_D3 or transmission unit PROD_D5 is video) A smartphone (which is a main image supply destination), a smartphone (in this case, the display PROD_D3 or the transmission unit PROD_D5 is a main moving image supply destination), and the like are also examples of such a playback device PROD_D.

(Hardware implementation and software implementation)
Each block of the moving picture decoding device 31 and the moving picture encoding device 11 described above may be realized in hardware by a logic circuit formed on an integrated circuit (IC chip), or may be a CPU (Central Processing). Unit) may be implemented in software.

In the latter case, each of the above devices includes a CPU that executes instructions of a program that realizes each function, a ROM (Read Memory) that stores the program, a RAM (RandomAccess Memory) that develops the program, the program, and various data. A storage device (recording medium) such as a memory for storing the. The object of the embodiment of the present invention is a record in which the program code (execution format program, intermediate code program, source program) of the control program for each of the above devices, which is software that realizes the above-described functions, is recorded in a computer-readable manner This can also be achieved by supplying a medium to each of the above devices, and reading and executing the program code recorded on the recording medium by the computer (or CPU or MPU).

Examples of the recording medium include tapes such as magnetic tapes and cassette tapes, magnetic disks such as floppy (registered trademark) disks / hard disks, CD-ROMs (Compact Disc Read-Only Memory) / MO discs (Magneto-Optical discs). ) / MD (Mini Disc) / DVD (Digital Versatile Disc) / CD-R (CD Recordable) / Blu-ray Disc (Blu-ray (registered trademark) Disc), etc. Cards), optical cards, etc., mask ROM / EPROM (Erasable Programmable Read-Only Memory) / EEPROM (Electrically-Erasable and Programmable Read-Only Memory: registered trademark) / semiconductor memory such as flash ROM, or PLD Logic circuits such as (Programmable logic device) and FPGA (Field Programmable Gate Array) can be used.

Further, each of the above devices may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited as long as it can transmit the program code. For example, the Internet, intranet, extranet, LAN (Local Area Network), ISDN (Integrated Services Digital Network), VAN (Value-Added Network), CATV (Community Area Antenna / television / CableTelevision) communication network, Virtual Private Network (Virtual Private Network) A telephone network, a mobile communication network, a satellite communication network, etc. can be used. The transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type. For example, IEEE (Institute of Electrical and Electronic Engineers) 1394, USB, power line carrier, cable TV line, telephone line, ADSL (Asymmetric Digital SubscriberLine) line, etc. wired such as IrDA (Infrared Data Association) or remote control BlueTooth (registered trademark), IEEE802.11 wireless, HDR (High Data Rate), NFC (Near Field Communication), DLNA (registered trademark) (Digital Living Network Alliance: registered trademark), mobile phone network, satellite line, terrestrial digital broadcasting It can also be used by radio such as a network. The embodiment of the present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.

The embodiments of the present invention are not limited to the above-described embodiments, and various modifications are possible within the scope of the claims. That is, embodiments obtained by combining technical means appropriately modified within the scope of the claims are also included in the technical scope of the present invention.

Embodiments of the present invention are preferably applied to a moving image decoding apparatus that decodes encoded data in which image data is encoded, and a moving image encoding apparatus that generates encoded data in which image data is encoded. be able to. Further, the present invention can be suitably applied to the data structure of encoded data that is generated by a video encoding device and referenced by the video decoding device.

11 Video encoding device
31 Video decoding device
41 Video display device
2002 Tile decoding unit
2012 Tile Coding Department

Claims

In a video decoding device that divides an image into tiles and decodes the video in tile units,
A header information decoding unit that decodes header information from the encoded stream and calculates tile information;
A tile decoding unit that decodes encoded data for each tile and generates a decoded image of the tile;
A combining unit that generates a display image by combining the decoded image of the tile with reference to the tile information,
The tile is composed of a tile active area which is a unit for dividing a picture without overlapping, and a hidden area (tile extension area).
A moving picture decoding apparatus, wherein an area obtained by adding the tile extension area to the tile active area is decoded in units of CTUs.
The tile extension area overlaps with a tile active area of an adjacent tile, and includes an overlap area used for reference and decoding, and a crop offset area (tile invalid area) that is not referenced or decoded. The moving picture decoding apparatus according to claim 1.
The moving image according to claim 2, wherein a size obtained by adding the tile active area and the overlap area is not an integral multiple of a CTU size, and an upper left coordinate of the tile is not limited to an integer multiple of a CTU. Decoding device.
In a video decoding device that divides an image into tiles and decodes the video in tile units,
A header information decoding unit that decodes header information from the encoded stream and calculates tile information;
A tile decoding unit that decodes encoded data for each tile and generates a decoded image of the tile;
A combining unit that generates a display image by combining the decoded image of the tile with reference to the tile information,
The tile is composed of a tile effective area used for decoding / output and a crop offset area (tile invalid area) not used for decoding / output,
The tile effective area is composed of a tile active area that is a unit for dividing a picture and an overlap area that overlaps with a tile active area of an adjacent tile and is used for reference and decoding.
A moving picture decoding apparatus, wherein the tile effective area is decoded in CTU units.
5. The moving image according to claim 4, wherein a size obtained by adding the tile effective area and the crop offset area is not an integer multiple of a CTU size, and an upper left coordinate of the tile is not limited to an integer multiple of a CTU. Decoding device.
6. The moving picture decoding apparatus according to claim 1, wherein the tile decoding unit decodes the target tile with reference to only information on the target tile and information on a collocated tile of the target tile.
In a video encoding device that divides an image into tiles and encodes a video in tile units,
A tile information calculation unit for calculating tile information;
A dividing unit for dividing the image into tiles;
A tile encoding unit that encodes a tile and generates an encoded stream;
The tile is composed of a tile active area which is a unit for dividing a picture without overlapping, and a hidden area (tile extension area).
A moving picture encoding apparatus, wherein an area obtained by adding the tile extension area to the tile active area is encoded in units of CTUs.
The tile extension area overlaps with a tile active area of an adjacent tile, and includes an overlap area used for reference and encoding, and a crop offset area (tile invalid area) that is not referenced or encoded. The moving picture encoding apparatus according to claim 7.
9. The moving image according to claim 8, wherein a size obtained by adding the tile active area and the overlap area is not an integer multiple of a CTU size, and an upper left coordinate of the tile is not limited to an integer multiple of a CTU. Encoding device.
In a video encoding device that divides an image into tiles and encodes a video in tile units,
A tile information calculation unit for calculating tile information;
A dividing unit for dividing the image into tiles;
A tile encoding unit that encodes a tile and generates an encoded stream;
The tile is composed of a tile effective area used for encoding / output and a crop offset area (tile invalid area) not used for encoding / output,
The tile effective area is composed of a tile active area that is a unit for dividing a picture, and an overlap area that overlaps with a tile active area of an adjacent tile and is used for reference and encoding.
A moving picture coding apparatus, wherein the tile effective area is coded by CTU coding.
The moving image according to claim 10, wherein the size of the tile effective area and the crop offset area is not an integral multiple of the CTU size, and the upper left coordinate of the tile is not limited to an integer multiple of the CTU. Encoding device.
The moving image coding according to claim 7, wherein the tile coding unit codes the target tile with reference to only information on the target tile and information on a collocated tile of the target tile. apparatus.