WO2019159820A1 - Moving image encoding device and moving image decoding device - Google Patents

Moving image encoding device and moving image decoding device Download PDF

Info

Publication number
WO2019159820A1
WO2019159820A1 PCT/JP2019/004497 JP2019004497W WO2019159820A1 WO 2019159820 A1 WO2019159820 A1 WO 2019159820A1 JP 2019004497 W JP2019004497 W JP 2019004497W WO 2019159820 A1 WO2019159820 A1 WO 2019159820A1
Authority
WO
WIPO (PCT)
Prior art keywords
tile
area
unit
tiles
image
Prior art date
Application number
PCT/JP2019/004497
Other languages
French (fr)
Japanese (ja)
Inventor
将伸 八杉
知宏 猪飼
友子 青野
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2018023894A external-priority patent/JP2021064817A/en
Priority claimed from JP2018054270A external-priority patent/JP2021064819A/en
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Publication of WO2019159820A1 publication Critical patent/WO2019159820A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness

Definitions

  • One embodiment of the present invention relates to a video decoding device and a video encoding device.
  • a moving image encoding device that generates encoded data by encoding the moving image, and a moving image that generates a decoded image by decoding the encoded data
  • An image decoding device is used.
  • the moving image encoding method include a method proposed in H.264 / AVC and HEVC (High-Efficiency Video Coding).
  • an image (picture) constituting a moving image is a slice obtained by dividing the image, a coding tree unit (CTU: Coding Tree Unit obtained by dividing the slice). ), A coding unit obtained by dividing the coding tree unit (sometimes called a coding unit (CU)), and a prediction unit that is a block obtained by dividing the coding unit (PU: PredictionUnit) and a hierarchical structure composed of transform units (TU: Transform Unit) are managed and encoded / decoded for each CU.
  • CTU Coding Tree Unit obtained by dividing the slice.
  • PU coding unit
  • PU PredictionUnit
  • TU Transform Unit
  • a predicted image is usually generated based on a local decoded image obtained by encoding / decoding an input image, and the predicted image is generated from the input image (original image).
  • a prediction residual obtained by subtraction (sometimes referred to as “difference image” or “residual image”) is encoded.
  • Examples of the method for generating a predicted image include inter-screen prediction (inter prediction) and intra-screen prediction (intra prediction) (Non-Patent Document 1).
  • a screen (picture) division unit called a tile is introduced. Unlike a slice, a tile is obtained by dividing a picture into rectangular areas, and can be encoded and decoded independently for each tile (Patent Document 1, Non-Patent Document 2).
  • JEM7 Joint Exploration Test Model 7
  • JVET-G1001 Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29 / WG 11, 13-21 July 2017
  • ITU-T H.265 04/2015
  • SERIES H AUDIOVISUAL AND MULTIMEDIA SYSTEMS Infrastructure of audiovisual services Coding of moving video High efficiency video coding
  • Algorithm descriptions of projection format conversion and video quality metrics in 360Lib (Version 5) JVET-H1004
  • a tile is obtained by dividing a picture into rectangular areas, and can be decoded in the spatial and temporal directions without referring to information outside the tile (prediction mode, MV, pixel value).
  • prediction mode MV
  • pixel value information outside the tile
  • the adjacent tile information of the target tile and the adjacent tile information of the collocated tile are not referred to at all, distortion caused by discontinuity of the tile boundary ( In the following, this will be referred to as tile distortion), and the tile distortion is very easy to visually recognize. Also, the coding efficiency is reduced.
  • the tile size is an integer multiple of the CTU, and it is difficult to divide into the same size for load balancing and to configure tiles that match the face size of 360-degree movies. .
  • the present invention has been made in view of the above problems, and its purpose is to independently encode and decode each tile in the spatial direction and the temporal direction while suppressing a decrease in encoding efficiency. It is to provide a mechanism for removing or suppressing tile distortion. It also provides tile partitioning that is not limited to an integer multiple of the CTU.
  • a moving image decoding apparatus is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information.
  • a header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image
  • the tile is composed of a tile active area that is a unit for dividing a picture without overlapping and a hidden area (tile extension area), and the tile active area is added to the tile active area. Is decoded in units of CTUs.
  • a mechanism for ensuring the independence of decoding of each tile and a mechanism for removing and suppressing tile distortion in a moving image.
  • a mechanism for removing and suppressing tile distortion in a moving image it is possible to greatly reduce the amount of processing when selecting and decoding an area necessary for display or the like, and it is possible to display an image without distortion at the tile boundary.
  • FIG. 1 It is the figure shown about the structure of the transmitter which mounts the moving image encoder which concerns on this embodiment, and the receiver which mounts a moving image decoder.
  • A shows a transmitting apparatus equipped with a moving picture coding apparatus, and (b) shows a receiving apparatus equipped with a moving picture decoding apparatus. It is the figure shown about the structure of the recording device carrying the moving image encoder which concerns on this embodiment, and the reproducing
  • (A) shows a recording apparatus equipped with a moving picture coding apparatus, and (b) shows a reproduction apparatus equipped with a moving picture decoding apparatus.
  • FIG. 1 is a schematic diagram showing a configuration of an image transmission system 1 according to the present embodiment.
  • the image transmission system 1 is a system that transmits a code obtained by encoding an encoding target image, decodes the transmitted code, and displays an image.
  • the image transmission system 1 includes a moving image encoding device (image encoding device) 11, a network 21, a moving image decoding device (image decoding device) 31, and a moving image display device (image display device) 41.
  • the image T is input to the moving image encoding device 11.
  • the network 21 transmits the encoded stream Te generated by the video encoding device 11 to the video decoding device 31.
  • the network 21 is the Internet, a wide area network (WAN: Wide Area Network), a small network (LAN: Local Area Network), or a combination thereof.
  • the network 21 is not necessarily limited to a bidirectional communication network, and may be a unidirectional communication network that transmits broadcast waves such as terrestrial digital broadcasting and satellite broadcasting.
  • the network 21 may be replaced with a storage medium that records an encoded stream Te such as a DVD (Digital Versatile Disc) or a BD (Blue-ray Disc).
  • the video decoding device 31 decodes each of the encoded streams Te transmitted by the network 21, and generates one or a plurality of decoded images Td that are respectively decoded.
  • the moving image display device 41 displays all or a part of one or a plurality of decoded images Td generated by the moving image decoding device 31.
  • the moving image display device 41 includes, for example, a display device such as a liquid crystal display or an organic EL (Electro-luminescence) display. Examples of the display form include stationary, mobile, and HMD.
  • X? Y: z is a ternary operator that takes y when x is true (non-zero) and takes z when x is false (0).
  • Abs (a) is a function that returns the absolute value of a.
  • Int (a) is a function that returns an integer value of a.
  • Floor (a) is a function that returns the largest integer less than or equal to a.
  • Ceil (a) is a function that returns the smallest integer greater than or equal to a.
  • a / d represents the division of a by d.
  • FIG. 2 is a diagram showing a hierarchical structure of data in the encoded stream Te.
  • the encoded stream Te illustratively includes a sequence and a plurality of pictures constituting the sequence.
  • (A) to (f) of FIG. 2 respectively show an encoded video sequence defining a sequence SEQ, an encoded picture defining a picture PICT, an encoded slice defining a slice S, and an encoded slice defining a slice data
  • the sequence SEQ includes a video parameter set VPS (Video Parameter Set), a sequence parameter set SPS (Sequence Parameter Set), a picture parameter set PPS (Picture Parameter Set), a picture PICT, It includes supplemental enhancement information (SEI).
  • VPS Video Parameter Set
  • SPS Sequence Parameter Set
  • PPS Picture Parameter Set
  • SEI Supplemental Enhancement Information
  • the video parameter set VPS is a set of encoding parameters common to a plurality of moving images in a moving image composed of a plurality of layers, and encoding related to a plurality of layers and individual layers included in the moving image.
  • a set of parameters is defined.
  • sequence parameter set SPS a set of encoding parameters referred to by the video decoding device 31 for decoding the target sequence is defined. For example, the width and height of the picture are defined. A plurality of SPSs may exist. In that case, one of a plurality of SPSs is selected from the PPS.
  • a set of encoding parameters referred to by the video decoding device 31 for decoding each picture in the target sequence is defined.
  • a quantization width reference value (pic_init_qp_minus26) used for picture decoding and a flag (weighted_pred_flag) indicating application of weighted prediction are included.
  • the picture PICT includes slices S0 to SNS-1 as shown in FIG. 2B (NS is the total number of slices included in the picture PICT).
  • the coded slice In the coded slice, a set of data referred to by the video decoding device 31 for decoding the slice S to be processed is defined. As shown in FIG. 2C, the slice S includes a slice header SH and slice data SDATA.
  • the slice header SH includes a coding parameter group that is referred to by the video decoding device 31 in order to determine a decoding method of the target slice.
  • Slice type designation information (slice_type) for designating a slice type is an example of an encoding parameter included in the slice header SH.
  • I slice using only intra prediction at the time of encoding (2) P slice using unidirectional prediction or intra prediction at the time of encoding, (3) B-slice using unidirectional prediction, bidirectional prediction, or intra prediction at the time of encoding may be used.
  • inter prediction is not limited to single prediction and bi-prediction, and a predicted image may be generated using more reference pictures.
  • the P and B slices refer to slices including blocks that can use inter prediction.
  • the slice header SH may include a reference (pic_parameter_set_id) to the picture parameter set PPS included in the encoded video sequence.
  • the slice data SDATA includes a coding tree unit CTU (CTU block) as shown in FIG.
  • a CTU is a block of a fixed size (for example, 64x64) that constitutes a slice, and is sometimes called a maximum coding unit (LCU: Large Coding Unit).
  • a set of data referred to by the video decoding device 31 for decoding the CTU to be processed is defined.
  • the CTU is divided into coding units CU which are basic units of the coding process by recursive quadtree division (QT division) or binary tree division (BT division).
  • a tree structure obtained by recursive quadtree partitioning or binary tree partitioning is called a coding tree (CT), and a node of the tree structure is called a coding node (CN).
  • An intermediate node of the quadtree and the binary tree is a CN, and the CTU itself is also defined as the highest CN.
  • CT includes, as CT information, a QT split flag (cu_split_flag) indicating whether or not to perform QT split, and a BT split mode (split_bt_mode) indicating a split method of BT split.
  • cu_split_flag and / or split_bt_mode are transmitted for each CN.
  • CN is divided into four CNs.
  • cu_split_flag is 0, when split_bt_mode is 1, CN is horizontally divided into two CNs, when split_bt_mode is 2, CN is vertically divided into two CNs, and when split_bt_mode is 0, CN is not divided and has one CU as a node.
  • CU is a terminal node (leaf node) of CN and is not further divided.
  • the CU size is 64x64 pixels, 64x32 pixels, 32x64 pixels, 32x32 pixels, 64x16 pixels, 16x64 pixels, 32x16 pixels, 16x32 pixels, 16x16 pixels, 64x8 pixels, 8x64 pixels , 32x8 pixels, 8x32 pixels, 16x8 pixels, 8x16 pixels, 8x8 pixels, 64x4 pixels, 4x64 pixels, 32x4 pixels, 4x32 pixels, 16x4 pixels, 4x16 pixels, 8x4 pixels, 4x8 pixels, and 4x4 pixels .
  • the CU includes a prediction tree (PT: Prediction Tree), a transform tree (TT: Transform Tree), and a CU header CUH.
  • PT Prediction Tree
  • TT Transform Tree
  • CU header a prediction mode, a division method (PU division mode), and the like are defined.
  • each prediction unit (PU: Prediction Unit) obtained by dividing CU into one or a plurality are defined.
  • the PU is one or more non-overlapping areas constituting the CU.
  • the PT includes one or more PUs obtained by the above division.
  • a prediction unit obtained by further dividing the PU is referred to as a “sub-block”.
  • the sub block is composed of a plurality of pixels.
  • the PU is divided into sub-blocks. For example, when the PU is 8x8 and the sub-block is 4x4, the PU is divided into four sub-blocks that are divided into two horizontally and vertically divided into two.
  • the prediction process may be performed for each PU (or sub block).
  • Intra prediction is prediction within the same picture
  • inter prediction refers to prediction processing performed between different pictures (for example, between display times and between layer images).
  • the division method is encoded by the PU division mode (part_mode) of encoded data, and 2Nx2N (same size as the encoding unit), 2NxN, 2NxnU, 2NxnD, Nx2N, nLx2N, nRx2N, NxN, etc.
  • 2NxN and Nx2N indicate 1: 1 symmetrical division
  • 2NxnU, 2NxnD and nLx2N and nRx2N indicate 1: 3 and 3: 1 asymmetric division.
  • the PUs included in the CU are expressed as PU0, PU1, PU2, and PU3 in this order.
  • a CU is divided into one or a plurality of transform units (TU: Transform Unit), and the position and size of each TU are defined.
  • the TU is one or more non-overlapping areas constituting the CU.
  • TT includes one or a plurality of TUs obtained by the above division.
  • partitioning in TT There are two types of partitioning in TT: one that allocates an area of the same size as the CU as a TU, and one that uses recursive quadtree partitioning, similar to the above-described CU partitioning.
  • the predicted image of the PU is derived by a prediction parameter associated with the PU.
  • the prediction parameters include a prediction parameter for intra prediction or a prediction parameter for inter prediction.
  • prediction parameters for inter prediction inter prediction (inter prediction parameters) will be described.
  • the inter prediction parameter includes prediction list use flags predFlagL0 and predFlagL1, reference picture indexes refIdxL0 and refIdxL1, and motion vectors mvL0 and mvL1.
  • the prediction list use flags predFlagL0 and predFlagL1 are flags indicating whether or not reference picture lists called L0 list and L1 list are used, respectively. When the value is 1, a corresponding reference picture list is used.
  • flag indicating whether or not it is XX when “flag indicating whether or not it is XX” is described, when the flag is not 0 (for example, 1) is XX, 0 is not XX, and logical negation, logical product, etc. 1 is treated as true and 0 is treated as false (the same applies hereinafter).
  • flag when the flag is not 0 (for example, 1) is XX, 0 is not XX, and logical negation, logical product, etc. 1 is treated as true and 0 is treated as false (the same applies hereinafter).
  • other values can be used as true values and false values in an actual apparatus or method.
  • the reference picture list is a list including reference pictures stored in the reference picture memory 306.
  • the prediction parameter decoding (encoding) method includes a merge prediction (merge) mode and an AMVP (Adaptive Motion Vector Prediction) mode.
  • the merge flag merge_flag is a flag for identifying these.
  • the merge mode is a mode in which the prediction list use flag predFlagLX (or inter prediction identifier inter_pred_idc), the reference picture index refIdxLX, and the motion vector mvLX are not included in the encoded data and are derived from the prediction parameters of already processed neighboring PUs.
  • the AMVP mode is a mode in which the inter prediction identifier inter_pred_idc, the reference picture index refIdxLX, and the motion vector mvLX are included in the encoded data.
  • the motion vector mvLX is encoded as a prediction vector index mvp_lX_idx for identifying the prediction vector mvpLX and a difference vector mvdLX.
  • the motion vector mvLX indicates a shift amount between blocks on two different pictures.
  • a prediction vector and a difference vector related to the motion vector mvLX are referred to as a prediction vector mvpLX and a difference vector mvdLX, respectively.
  • the intra prediction parameter is a parameter used for processing of predicting a CU with information in a picture, for example, an intra prediction mode IntraPredMode, and the luminance intra prediction mode IntraPredModeY and the color difference intra prediction mode IntraPredModeC may be different.
  • intra prediction modes which include planar prediction, DC prediction, and angular (direction) prediction.
  • color difference prediction mode IntraPredModeC for example, any one of planar prediction, DC prediction, Angular prediction, direct mode (a mode using a luminance prediction mode), and LM prediction (a mode in which linear prediction is performed from luminance pixels) is used.
  • the luminance intra prediction mode IntraPredModeY is derived using an MPM (Most Probable Mode) candidate list composed of intra prediction modes estimated to have a high probability of being applied to the target block, and prediction modes not included in the MPM candidate list May be derived from REM. Which method is used is notified by the flag prev_intra_luma_pred_flag, and in the former case, IntraPredModeY is derived using the index mpm_idx and the MPM candidate list derived from the intra prediction mode of the adjacent block. In the latter case, the intra prediction mode is derived using the flag rem_selected_mode_flag and the modes rem_selected_mode and rem_non_selected_mode.
  • MPM Motion Probable Mode
  • the color difference intra prediction mode IntraPredModeC is derived using a flag not_lm_chroma_flag indicating whether to use LM prediction, or is derived using a flag not_dm_chroma_flag indicating whether to use the direct mode. It may be derived using an index chroma_intra_mode_idx that directly specifies the prediction mode.
  • the loop filter is a filter provided in the encoding loop, which removes block distortion and ringing distortion and improves image quality.
  • the loop filter mainly includes a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF).
  • the deblocking filter is a pixel for luminance and color difference components with respect to a block boundary when a difference between pre-deblock pixel values of pixels of luminance components adjacent to each other via the block boundary is smaller than a predetermined threshold. Is subjected to a deblocking process, thereby filtering an image near the block boundary.
  • SAO is a filter that is applied after the deblocking filter and has the effect of removing ringing distortion and quantization distortion.
  • SAO is a process in units of CTUs, and is a filter that classifies pixel values into several categories and adds or subtracts offsets in units of pixels for each category.
  • EO edge offset
  • an offset value to be added to the pixel value is determined according to the magnitude relationship between the target pixel and the adjacent pixel (reference pixel).
  • the ALF generates an ALF-completed decoded image by performing an adaptive filtering process using an ALF parameter (filter coefficient) ALFP decoded from the encoded stream Te on the ALF pre-decoded image.
  • Entropy coding includes a variable length coding method using a context (probability model) adaptively selected according to the type of syntax and the surrounding situation, a predetermined table, or a calculation formula. There is a method for variable-length coding of syntax using.
  • CABAC Context Adaptive Binary Arithmetic Coding
  • an updated probability model is stored in memory for each encoded or decoded picture.
  • the initial state of the context of the target picture uses the same slice type and the same slice level quantization parameter from the probability model stored in the memory.
  • a picture probability model is selected and used for encoding and decoding.
  • the unit of xTs, yTs, wT, hT, wPict, and hPict is a pixel.
  • the picture width and height are set in pic_width_in_luma_samples and pic_height_in_luma_samples, which are notified by sequence_parameter_set_rbsp () (referred to as SPS) shown in FIG.
  • FIG. 3B is a diagram showing the CTU encoding and decoding order when a picture is divided into tiles.
  • the number described in each tile is TileId (the identifier of the tile in the picture), and the number TileId may be assigned to the tile in the picture from the upper left to the lower right in the raster scan order. Further, the CTU is processed in the order of raster scanning from the upper left to the lower right in each tile, and when the processing in one tile is completed, the CTU in the next tile is processed.
  • FIG. 3 (c) is a diagram showing tiles continuous in the time direction.
  • the video sequence is composed of a plurality of pictures that are continuous in the time direction.
  • the tile sequence is composed of tiles at one or more times that are continuous in the time direction.
  • CVS Coded
  • FIG. 4 shows an example of syntax related to tile information and the like.
  • the parameter tile_parameters () related to the tile is notified by PPS (pic_parameter_set_rbsp ()) shown in FIG. 4 (b).
  • PPS picture_parameter_set_rbsp
  • to notify the parameter means to include the parameter in the encoded data (bitstream).
  • the moving image encoding apparatus encodes the parameter, and the moving image decoding apparatus decodes the parameter.
  • tile_enabled_flag indicating whether or not a tile is 1
  • tile_parameters () is notified of tile information tile_info ().
  • independent_tiles_flag indicating whether or not tiles can be decoded independently over a plurality of temporally continuous pictures is notified.
  • independent_tiles_flag When independent_tiles_flag is 0, tiles are decoded with reference to adjacent tiles in the reference picture (cannot be independently decoded). When independent_tiles_flag is 1, decoding is performed without referring to adjacent tiles in the reference picture. When tiles are used, decoding is performed without referring to adjacent tiles in the target picture regardless of the value of independent_tiles_flag, so that a plurality of tiles can be decoded in parallel. As shown in FIG. 4 (c), when independent_tiles_flag is 0, loop_filter_across_tiles_enable_flag indicating the on / off of the loop filter at the tile boundary applied to the reference picture is transmitted (present). When independent_tiles_flag is 1, loop_filter_across_tiles_enable_flag may not be transmitted (present) and may always be 0.
  • the independent tile flag independent_tiles_flag may be notified by SPS as shown in FIG. 4 (a).
  • the independent_tiles_flag will be described later.
  • the tile information tile_info () is, for example, num_tile_columns_minus1, num_tile_rows_minus1, uniform_spacing_flag, column_width_minus1 [i], row_height_minus1 [i] as shown in FIG. 4 (d), and may include overlap_tiles_flag and overlap information.
  • num_tile_columns_minus1 and num_tile_rows_minus1 are values obtained by subtracting 1 from the number of horizontal and vertical tiles M and N in the picture, respectively.
  • uniform_spacing_flag is a flag indicating whether or not a picture is equally tiled. When the uniform_spacing_flag value is 1, the width and height of each tile of the picture are set to be equal, so the tile width is determined from the number of tiles in the picture in the horizontal and vertical directions. The height can be derived.
  • wT [m] and hT [n] may be expressed by the following equations.
  • tile size may be a multiple of a tile unit size (minimum tile size) wUnitTile and hUnitTile. In this case, it derives below.
  • hT expressed in hUnitTile as a unit) Value for each tile.
  • the tile sizes wT [m] and hT [n] are decoded as follows for each tile based on the encoded (column_width_minus1 [], row_width_minus1 []).
  • wUnitTile and hUnitTile are the unit size (minimum size) of the tile.
  • overlap_tiles_flag indicates whether or not an area near a tile boundary overlaps with an adjacent tile.
  • overlap_tiles_flag 1, it indicates that it overlaps with an adjacent tile, and overlap information overlap_tiles_info () shown in FIG. 5 (f) is notified.
  • overlap_tiles_flag 0, it does not overlap with adjacent tiles.
  • the overlap means that two or more tiles include a region of the same image, and the overlap region indicates a region included in two or more tiles.
  • the overlap information overlap_tiles_info () includes uniform_overlap_flag and information indicating the width and height of the overlap area.
  • uniform_overlap_flag is a flag indicating whether the width or height of the overlap area of each tile is equal. When all the widths or all the heights of the overlap area of each tile are equal, uniform_overlap_flag is set to 1, and syntaxes tile_overlap_width_div2 and tile_overlap_height_div2 indicating the width and height of the overlap area are notified.
  • uniform_overlap_flag 0 and notify the syntax tile_overlap_width_div2 [m] and tile_overlap_height_div2 [n] indicating the width and height of the overlap area of each tile .
  • the relationship between the actual overlap area width wOVLP and height hOVLP is shown by the following equation. These units are pixels.
  • wOVLP tile_overlap_width_div2 [m] * 2
  • the width and height of the overlap area were multiples of 2, but the overlap area height and YUV4: 4: 4 in the case of YUV4: 2: 2
  • the width and height of the overlap region may be notified in units of pixels without making each pixel unit a multiple of two.
  • the parameters represented by "_div2" are also expressed in 2 pixel units or 1 pixel unit depending on the color difference format (4: 2: 0, 4: 2: 2, 4: 4: 4). You may switch whether to represent.
  • the tile identifier TileId of the position (m, n) may be calculated as follows.
  • TileId n * M + m
  • (m, n) indicating the position of the tile may be calculated from TileId.
  • tileId% M n TileId / M
  • (xTsmn, yTsmn) is (xTs00, yTs00) to (xTsM-1 N-1, yTsM-1 N-1).
  • tile_info () shown in FIG. 25 may be notified instead of tile_info () shown in FIG. 4 (d).
  • the difference between tile_info () in FIG. 4 (d) and tile_info () in FIG. 25 (a) is that in FIG. 4 (d), column_width_minus1 [which represents the tile width and height in the minimum tile unit or CTU unit. i] and row_height_minus1 [i] are notified, but in FIG.
  • column_width_in_luma_samples_div2_minus1 [i] and row_height_in_luma_samples_div2_minus1 [i] are values obtained by dividing the width and height of the tile pixel unit by two.
  • the width wT [m] and the height hT [n] of the tile in pixel units are expressed by the following equations.
  • wT [m] column_width_in_luma_samples_div2_minus1 [m] * 2 + 1 (Formula TSP-10)
  • hT [n] row_height_in_luma_samples_div2_minus1 [n] * 2 + 1
  • column_width_in_luma_samples_div2_minus1 [m] and row_height_in_luma_samples_div2_minus1 [n] * 2 are either expressed in units of 2 pixels depending on the color difference format (4: 2: 0, 4: 2: 2, 4: 4: 4), or 1 It may be switched whether it is expressed in units of pixels.
  • column_width_in_luma_samples_div2_minus1 [i] and row_height_in_luma_samples_div2_minus1 [i] may be fixed length encoding (f (n)) instead of variable length encoding (ue (v)). Since the values are expressed in units of pixels, the value of these syntaxes tends to be large, and the code amount is smaller in the fixed-length coding than in the variable-length coding.
  • the unit of the width and height of the tile is switched depending on whether or not there is an overlap area. You may switch.
  • wOVLP [m] tile_overlap_width_div2 * 2 (Formula OVLP-1)
  • hOVLP [m] tile_overlap_height_div2 * 2
  • a value obtained by subtracting 1 from the width and height of the overlap in pixel units may be notified.
  • hOVLP [m] tile_overlap_height_minus1 + 1 (Tile boundary limit) Since tile information is notified by PPS, the position and size of tiles can be changed for each picture. On the other hand, when the tile sequence is decoded independently, that is, when tiles having the same TileId can be decoded without referring to information on tiles having different TileId, the tile position and size are not changed for each picture. That is, when each tile refers to a picture (reference picture) at a different time, the same tile division may be applied to all pictures in the CVS. In this case, the tiles having the same TileId are set to have the same upper left coordinate, width, and height throughout all the CVS pictures.
  • tile information does not change through CVS is notified by setting the value of tiles_fixed_structure_flag of vui_parameter () shown in FIG. 4 (e) to 1. That is, when the value of tiles_fixed_structure_flag is 1, num_tile_columns_minus1, num_tile_rows_minus1, uniform_spacing_flag, column_width_minus1 [i], row_height_minus1 [i], overlap_tiles_flag, and loop_filter_acrossflags_enabled_tile_enabled_tile_enabled_tile_values are unique.
  • tiles_fixed_structure_flag When the value of tiles_fixed_structure_flag is 1, tiles with the same TileId in CVS are over the tile position (tile upper left coordinates, width, height) on the picture, even in pictures with different times (POC: Picture Order Count). The lap information is not changed. Further, when the value of tiles_fixed_structure_flag is 0, the size of the tile sequence may be different depending on the time.
  • FIG. 4A is a syntax table excerpted from a part of the sequence parameter set SPS.
  • the independent tile flag independent_tiles_flag is a flag indicating whether the tile sequence can be independently encoded and decoded not only in the target picture (in the spatial direction) but also in the temporally continuous sequence (in the temporal direction).
  • the value of independent_tiles_flag is 1, it means that the tile sequence can be encoded and decoded independently, and the following restrictions may be imposed on the encoding / decoding of tiles and the syntax of encoded data.
  • Constraint 1 In CVS, tiles do not refer to information on tiles having different TileIds.
  • FIG. 6 is a diagram for explaining reference to tiles in the temporal direction (between different pictures).
  • FIG. 6A shows an example in which the intra picture Pict (t0) at time t0 is divided into N tiles.
  • Pict (t1) refers to Pict (t0).
  • CU1, CU2, and CU3 in tile Tile (n, t1) refer to blocks BLK1, BLK2, and BLK3 in FIG. 6 (a).
  • BLK1 and BLK3 are blocks included in tiles outside the tile Tile (n, t0).
  • the reference pixel in the reference picture to be referred to when deriving the motion compensated image of the CU in the tile is included in the collocated tile (the tile at the same position on the reference picture).
  • independent_tiles_flag 0
  • the tile adjacent to the target tile and the tile adjacent to the collocated tile are not referred to, the pixel value becomes discontinuous at the tile boundary, and tile distortion occurs.
  • a technique that does not cause tile distortion while encoding and decoding individual tiles independently will be described.
  • Embodiment 1 of the present application when a picture is divided into tiles, as shown in FIG. 7, tiles are generated by dividing an area on the picture while allowing overlap.
  • FIG. 7 (a) is a diagram in which a picture (width wPict, height hPict) is divided into M * N tiles.
  • the tile at position (m, n) is represented by Tile [m] [n].
  • the width and height of the tile Tile [m] [n] are represented as wT [m] and hT [n]
  • the upper left coordinates are represented as (xTsmn, yTsmn).
  • the shaded area in the figure is an area where a plurality of tiles overlap (overlap).
  • the units of wPict, hPict, wT [m], hT [n], xTsmn, and yTsmn are pixels.
  • FIG. 7B is a diagram showing a relationship between two adjacent tiles Tile [0] [0] and Tile [1] [0].
  • the hatched area at the right end of Tile [0] [0] is an area that overlaps Tile [1] [0]
  • the shaded area at the bottom is an area that overlaps Tile [0] [1].
  • the width wT [0] and height hT [0] of Tile [0] [0] are the width of the tile including the area overlapping with Tile [1] [0] and Tile [0] [1]. Indicates the height.
  • the left hatched area of Tile [1] [0] is an area that overlaps Tile [0] [0]
  • the right hatched area is an area that overlaps Tile [2] [0].
  • the hatched area at the bottom is an area that overlaps Tile [1] [1]. And the width wT [1] and height hT [0] of Tile [1] [0] are respectively over Tile [0] [0], Tile [2] [0], and Tile [1] [1]. Includes the area to wrap.
  • the hatched portion on the right side of Tile [0] [0] is an area that is encoded (overlapping) with Tile [0] [0] and Tile [1] [0].
  • the width and height of each tile is an integral multiple of the width and height of the CTU.
  • wCTU and hCTU are the width and height of the CTU
  • a and b are positive integers. Even if the size of each tile is CTU unit, the width of the tile at the right end of the picture and the height of the tile at the bottom end may not be an integral multiple of the CTU. In this manner, crop offset areas are provided at the right and bottom edges of the picture (horizontal line area in FIG. 7A), and the width and height obtained by adding the tile and the crop offset area are set to integer multiples of the CTU.
  • the crop offset area is not intended to be displayed, and is an area used for increasing the size of the area to be processed for the sake of convenience so as to facilitate processing in units of CTUs.
  • gray (Y, Cb, Cr) (1 ⁇ (bitDepthY-1), 1 ⁇ (bitDepthCb-1), 1 ⁇ (bitDepthCr-1)) as a pixel value for convenience.
  • the value obtained by padding the pixel values at the right end / bottom end of the picture is set.
  • the upper left coordinates (xTsmn, yTsmn) of each tile at the (m, n) position in tile units are not necessarily a position that is an integer multiple of the CTU.
  • a net display area obtained by subtracting an overlap area indicated by (wOVLP, hOVLP) from a tile effective area indicated by a size of (wT, hT) may be called a tile active area.
  • a crop offset area may be provided and the tile size may be an integer multiple of the CTU size.
  • the width wCRP [2] and the height hCRP [1] of the crop offset area are set as follows.
  • the unit of wCRP [] and hCRP [] is a pixel.
  • the tile size is not limited to the CTU size, and may be a tile unit size (wUnitTile, hUnitTile), an integer multiple of the minimum CU size MIN_CU_SIZE, or the like.
  • the size of the crop offset area can be derived based on the size of the tile from the constraint that the added value of the size of the tile and the crop offset area is an integer multiple of the CTU.
  • the upper left coordinates (xTsmn, yTsmn) of each tile in the picture indicated by the tile unit position (m, n) set in raster order are calculated by the following formula.
  • the upper left coordinate of each tile is also the upper left coordinate of the CTU at the beginning of the tile.
  • the overlap region of the tile is encoded / decoded for each tile, and a plurality of decoded images are generated.
  • the overlap region of Tile [0] [0] and Tile [1] [0] is encoded and decoded once for each tile, so that two decoded images are generated.
  • the overlap region of Tile [0] [0] and Tile [0] [1] is encoded and decoded once for each tile, two decoded images are generated.
  • the overlap area of Tile [0] [0], Tile [1] [0], Tile [0] [1], and Tile [1] [1] is encoded and decoded once for each tile. Therefore, four decoded images are generated.
  • a composite image (display image) without tile distortion can be generated by performing a composite process (filtering of tile boundaries) after decoding.
  • An example is shown in FIG. In FIG. 8A, a composite image is generated by calculating a weighted sum of two decoded images. A method for synthesizing images will be described later.
  • FIG. 9 (a) shows a video decoding device (image decoding device) 31 of the present invention.
  • the moving picture decoding apparatus 31 includes a header information decoding unit 2001, tile decoding units 2002a to 2002n, and a tile synthesis unit 2003.
  • the header information decoding unit 2001 decodes header information from an encoded stream Te input from the outside and encoded in units of NAL (network abstraction) layers.
  • the header information decoding unit 2001 derives a tile (TileId) necessary for display from control information indicating an image area to be displayed on a display or the like input from the outside.
  • the header information decoding unit 2001 extracts an encoded tile necessary for display from the encoded stream Te, and transmits the encoded tile to the tile decoding units 2002a to 2002n.
  • the header information decoding unit 2001 transmits tile information (information related to tile division) obtained by decoding the PPS and TileId of the tile decoded by the tile decoding unit 2002 to the tile synthesis unit 2003.
  • tile information is num_tile_columns_minus1, num_tile_rows_minus1, uniform_spacing_flag, column_width_minus1 [i], row_height_minus1 [i], overlap_tiles_flag, the number of tiles in the horizontal direction, calculated from the syntax of overlap information, etc. N, tile width wT [m] and height hT [n], overlap area width wOVLP [m] and height hOVLP [n], and the like. Also, the width wCRP [m] and height hCRP [n] of the crop offset area are derived from these pieces of information.
  • the tile decoding units 2002a to 2002n decode the encoded tiles and transmit the decoded tiles to the tile synthesis unit 2003.
  • the tile decoding units 2002a to 2002n perform the decoding process with the tile sequence as one independent video sequence
  • the tile decoding units 2002a to 2002n do not refer to the prediction information between the tile sequences in time and space when performing the decoding process. . That is, the tile decoding units 2002a to 2002n do not refer to tiles of another tile sequence (having different TileId) when decoding tiles in a certain picture.
  • the tile decoding units 2002a to 2002n each decode the tile, it is possible to decode a plurality of tiles in parallel or to decode only one tile independently. As a result, according to the tile decoding units 2002a to 2002n, it is possible to efficiently execute the decoding process, such as decoding an image necessary for display by executing only the minimum necessary decoding process.
  • FIG. 10 is a block diagram showing the configuration of 2002, which is one of the tile decoding units 2002a to 2002n.
  • the tile decoding unit 2002 includes an entropy decoding unit 301, a prediction parameter decoding unit (prediction image decoding device) 302, a loop filter 305, a reference picture memory 306, a prediction parameter memory 307, a prediction image generation unit (prediction image generation device) 308, and inversely.
  • a quantization / inverse transform unit 311 and an adder 312 are included. Note that there is a configuration in which the tile decoding unit 2002 does not include the loop filter 305 in accordance with the tile encoding unit 2012 described later.
  • the prediction parameter decoding unit 302 includes an inter prediction parameter decoding unit 303 and an intra prediction parameter decoding unit 304.
  • the predicted image generation unit 308 includes an inter predicted image generation unit 309 and an intra predicted image generation unit 310.
  • CTU, CU, PU, and TU are used as processing units.
  • the present invention is not limited to this example, and processing may be performed in units of CUs instead of units of TUs or PUs.
  • CTU, CU, PU, and TU may be read as blocks, and processing in units of blocks may be performed.
  • the entropy decoding unit 301 performs entropy decoding on the coded stream Te input from the outside, and separates and decodes individual codes (syntax elements).
  • the separated code includes a prediction parameter for generating a prediction image and residual information for generating a difference image.
  • the entropy decoding unit 301 outputs a part of the separated code to the prediction parameter decoding unit 302.
  • Some of the separated codes are, for example, a prediction mode predMode, a PU partition mode part_mode, a reference picture index ref_idx_lX, a prediction vector index mvp_lX_idx, and a difference vector mvdLX. Control of which code is decoded is performed based on an instruction from the prediction parameter decoding unit 302.
  • the entropy decoding unit 301 outputs the quantized transform coefficient to the inverse quantization / inverse transform unit 311.
  • this quantized transform coefficient is DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), KLT (Karyhnen Loeve Transform) ) Etc., and is a coefficient obtained by performing quantization and quantizing.
  • DCT Discrete Cosine Transform
  • DST Discrete Sine Transform
  • KLT Karyhnen Loeve Transform
  • the inter prediction parameter decoding unit 303 decodes the inter prediction parameter with reference to the prediction parameter stored in the prediction parameter memory 307 based on the code input from the entropy decoding unit 301. Also, the inter prediction parameter decoding unit 303 outputs the decoded inter prediction parameters to the prediction image generation unit 308 and stores them in the prediction parameter memory 307.
  • the intra prediction parameter decoding unit 304 refers to the prediction parameter stored in the prediction parameter memory 307 on the basis of the code input from the entropy decoding unit 301 and decodes the intra prediction parameter.
  • the intra prediction parameter decoding unit 304 outputs the decoded intra prediction parameter to the prediction image generation unit 308 and stores it in the prediction parameter memory 307.
  • the loop filter 305 applies filters such as a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) to the decoded image of the CU generated by the adding unit 312.
  • filters such as a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) to the decoded image of the CU generated by the adding unit 312.
  • SAO sample adaptive offset
  • ALF adaptive loop filter
  • the reference picture memory 306 stores the decoded image of the CU generated by the adding unit 312 at a predetermined position for each decoding target picture and CTU or CU.
  • the prediction parameter memory 307 stores the prediction parameter at a predetermined position for each decoding target picture and PU (or sub-block, fixed-size block, pixel). Specifically, the prediction parameter memory 307 stores the inter prediction parameter decoded by the inter prediction parameter decoding unit 303, the intra prediction parameter decoded by the intra prediction parameter decoding unit 304, and the prediction mode predMode separated by the entropy decoding unit 301. .
  • the prediction image generation unit 308 receives the prediction mode predMode input from the entropy decoding unit 301 and the prediction parameter from the prediction parameter decoding unit 302. Further, the predicted image generation unit 308 reads a reference picture from the reference picture memory 306. The prediction image generation unit 308 generates a prediction image of a PU (block) or a sub-block using the input prediction parameter and the read reference picture (reference picture block) in the prediction mode indicated by the prediction mode predMode.
  • the inter prediction image generation unit 309 uses the inter prediction parameter input from the inter prediction parameter decoding unit 303 and the read reference picture (reference picture block). To generate a prediction image of PU (block) or sub-block.
  • the inter prediction image generation unit 309 performs a motion vector on the basis of the decoding target PU from the reference picture indicated by the reference picture index refIdxLX for a reference picture list (L0 list or L1 list) having a prediction list use flag predFlagLX of 1.
  • the reference picture block at the position indicated by mvLX is read from the reference picture memory 306.
  • the inter predicted image generation unit 309 performs interpolation based on the read reference picture block, and generates a PU predicted image (interpolated image, motion compensated image).
  • the inter prediction image generation unit 309 outputs the generated prediction image of the PU to the addition unit 312.
  • a reference picture block is a set of pixels on a reference picture (usually called a block because it is a rectangle), and is an area that is referred to in order to generate a predicted image of a PU or sub-block.
  • the pixel of the reference block is located within the tile (collocated tile) on the reference picture having the same TileId as the target tile. Therefore, as an example, in the reference picture, reference is made without referring to the pixel values outside the collocated tile by padding the outside of each tile (complementing with the pixel value at the tile boundary) as shown in FIG. You can read the block.
  • Tile boundary padding (outside tile padding) is the following position (xRef + i, yRef + j) as the pixel value of the reference pixel position (xIntL + i, yIntL + j) in motion compensation by the inter prediction image generation unit 309. ) Pixel value refImg [xRef + i] [yRef + j]. That is, when referring to the reference pixel, the reference position is clipped at the positions of the upper, lower, left and right boundary pixels of the tile.
  • xRef + i Clip3 (xTs, xYs + wT-1, xIntL + i)
  • yRef + j Clip3 (yTs, yYs + hT-1, yIntL + j)
  • (xTs, yTs) is the upper left coordinates of the target tile where the target block is located
  • wT and hT are the width and height of the target tile.
  • XIntL and yIntL are (xb, yb) as the upper left coordinates of the target block with reference to the upper left coordinates of the picture, and (mvLX [0], mvLX [1]) as the motion vectors.
  • xIntL xb + (mvLX [0] >> log2 (MVUNIT))
  • yIntL yb + (mvLX [1] >> log2 (MVUNIT))
  • MVUNIT indicates that the accuracy of the motion vector is 1 / MVUNIT pel.
  • tile boundary motion vector restriction As another method for restricting tile boundary padding, there is a tile boundary motion vector restriction.
  • the motion vector is limited (clipped) so that the position of the reference pixel (xIntL + i, yIntL + j) falls within the collocated tile.
  • the upper left coordinates (xb, yb) of the target block target subblock or target block
  • the block size (BW, BH) the upper left coordinates (xTs, yTs) of the target tile
  • the width and height of the target tile When w is wT and hT, the motion vector mvLX of the block is input and a limited motion vector mvLX is output.
  • the left end posL, right end posR, upper end posU, and lower end posD of the reference pixel in the interpolation image generation of the target block are as follows.
  • NTAP is the number of filter taps used for generating the interpolation image.
  • MVUNIT indicates that the accuracy of the motion vector is 1 / MVUNIT pel.
  • the restrictions for the reference pixel to enter the collocated tile are as follows.
  • mvLX [0] Clip3 (vxmin, vxmax, mvLX [0])
  • mvLX [1] Clip3 (vymin, vymax, mvLX [1])
  • vxmin (xTs-xb + NTAP / 2-1) ⁇ log2 (MVUNIT)
  • vxmax (xTs + wT-xb-BW-NTAP / 2)
  • vymin (yTs-yb + NTAP / 2-1)
  • vymax (yTs + hT-yb-BH-NTAP / 2) ⁇ log2 (MVUNIT)
  • independent_tiles_flag 1, by limiting the motion vector, the motion vector can always point in the collocated tile in the inter prediction. With this configuration, tile sequences can be independently decoded using inter prediction.
  • the intra predicted image generation unit 310 When the prediction mode predMode indicates the intra prediction mode, the intra predicted image generation unit 310 performs intra prediction using the intra prediction parameter input from the intra prediction parameter decoding unit 304 and the read reference picture. Specifically, the intra predicted image generation unit 310 reads, from the reference picture memory 306, neighboring PUs that are pictures to be decoded and are in a predetermined range from the decoding target PUs among the PUs that have already been decoded.
  • the predetermined range is, for example, one of the left, upper left, upper, and upper right adjacent PUs when the decoding target PU sequentially moves in the so-called raster scan order, and differs depending on the intra prediction mode.
  • the raster scan order is an order in which each row is sequentially moved from the left end to the right end in each picture from the upper end to the lower end.
  • the intra-predicted image generation unit 310 performs prediction in the prediction mode indicated by the intra-prediction mode IntraPredMode based on the read adjacent PU, and generates a predicted image of the PU.
  • the intra predicted image generation unit 310 outputs the generated predicted image of the PU to the adding unit 312.
  • the inverse quantization / inverse transform unit 311 performs inverse quantization on the quantized transform coefficient input from the entropy decoding unit 301 to obtain a transform coefficient.
  • the inverse quantization / inverse transform unit 311 performs inverse frequency transform such as inverse DCT, inverse DST, inverse KLT on the obtained transform coefficient, and calculates a prediction residual signal.
  • the inverse quantization / inverse transform unit 311 outputs the calculated residual signal to the adder 312.
  • the addition unit 312 adds the prediction image of the PU input from the inter prediction image generation unit 309 or the intra prediction image generation unit 310 and the residual signal input from the inverse quantization / inverse conversion unit 311 for each pixel, Generate a decoded PU image.
  • the adding unit 312 outputs the generated decoded image of the block to at least one of a deblocking filter, a SAO (sample adaptive offset) unit, and an ALF.
  • the tile synthesis unit 2003 generates and synthesizes a decoded image Td by referring to the tile information transmitted from the header information decoding unit 2001, the tile TileId necessary for display, and the tile decoded by the tile decoding units 2002a to 2002n. Outputs an image (display image).
  • the tile composition unit 2003 includes a smoothing processing unit 20031 and a composition unit 20032.
  • the smoothing processing unit 20031 may perform filter processing (averaging processing, weighted averaging processing) using the overlap region of each tile decoded by the tile decoding unit 2002 . That is, one pixel may be derived using pixels of two or more tiles corresponding to the overlap region. For example, the pixel value tmp of the overlap region after the filter processing of two tiles Tile [m ⁇ 1] [n] and Tile [m] [n] adjacent in the horizontal direction is calculated by the following equation.
  • tmp [m] [n] [x] [y] (Tile [m] [n] [x] [y] + Tile [m-1] [n] [wT [m-1] -wOVLP [m- 1] + x] [y] +1) >> 1 (Formula FLT-1)
  • wT [m ⁇ 1] ⁇ wOVLP [m ⁇ 1] + x indicates a position right by x starting from the tile position wT [m ⁇ 1] ⁇ wOVLP [m ⁇ 1].
  • tmp [m] [n] [x] [y] is the filter of the overlap area located at (x, y) with the upper left coordinate of the tile at (0,0) in the tile at (m, n) position This represents the subsequent pixel value.
  • Tile [m] [n] [x] [y] represents the pixel value of the tile located at (x, y), where the tile upper left coordinate is (0,0) in the tile at (m, n) position .
  • the pixel value tmp of the overlap region after the filter processing of two tiles Tile [m] [n ⁇ 1] and Tile [m] [n] adjacent in the vertical direction is calculated by the following equation.
  • tmp [m] [n] [x] [y] (Tile [m] [n] [x] [y] + Tile [m] [n-1] [x] [hT [n-1] -hOVLP [n-1] + y] + Tile [m-1] [n] [wT [m-1] -wOVLP [m-1] + x] [y] + Tile [m-1] [n-1] [wT [m-1] -wOVLP [m-1] + x] [hT [n-1] -hOVLP [n-1] + y] +2) >> 2 (Formula FLT-3)
  • the smoothing processing unit 20031 filter processing unit, averaging processing unit, weighted averaging processing unit) outputs the pixel value of the tile and the pixel value of the overlapped area (in this case, tmp) to the combining unit 20032. To do.
  • the composition unit 20032 generates a predetermined area specified by the picture or control information (TileId) from the pixel value of the tile and the pixel value of the overlap area.
  • TileId picture or control information
  • the entire composite image or the predetermined region Rec [x] [y] is represented by, for example, the simple average of the following equation.
  • tile distortion can be removed by averaging tile boundaries that have been decoded in duplicate while decoding tiles independently.
  • FIG. 11 (a) shows the moving picture encoding apparatus 11 of the present invention.
  • the moving image encoding apparatus 11 includes a picture dividing unit 2010, a header information generating unit 2011, tile encoding units 2012a to 2012n, and an encoded stream generating unit 2013.
  • the picture dividing unit 2010 divides the tile into a plurality of tiles and transmits the tiles to the tile encoding units 2012a to 2012n.
  • the header information generation unit 2011 generates tile information (TileId, number of tile divisions, size, overlap information) from the divided tiles, and transmits the generated tile information to the encoded stream generation unit 2013 as header information.
  • TileId number of tile divisions, size, overlap information
  • the tile encoders 2012a to 2012n encode each tile. Further, the tile encoding units 2012a to 2012n encode tiles in units of tile sequences. Thus, according to the tile encoding units 2012a to 2012n, tiles can be encoded in parallel.
  • the tile encoders 2012a to 2012n perform the encoding process on the tile sequence as in the case of one independent video sequence, and the prediction information of the tile sequences having different TileIds is temporally processed when the encoding process is performed. Neither spatially nor spatially. That is, when encoding a tile in a certain picture, the tile encoding units 2012a to 2012n do not refer to another tile both spatially and temporally.
  • the encoded stream generation unit 2013 generates header information including tile information transmitted from the header information generation unit 2011, and tile encoding units 2012a to 2012n encode the tiles to generate an encoded stream Te in units of NAL units. .
  • the tile encoding units 2012a to 2012n can independently encode each tile, a plurality of tiles can be encoded in parallel, or a plurality of tiles can be decoded in parallel on the decoding device side. Or only one tile can be decoded independently.
  • the picture dividing unit 2010 in FIG. 11 (a) includes a tile information calculating unit 20101 and a picture dividing unit A 20102 shown in FIG. 11 (b).
  • the tile information calculation unit 20101 includes a picture width wPict and a height hPict, a tile unit size width wUnitTile and a height hUnitTile, a horizontal number M of tiles to be divided, a vertical number N, and an overlap area width wOVLP [
  • the width wT [m] and height hT [n] of the tile and the width wCRP [m] and height hCRP [n] of the crop offset area are derived from m] and the height hOVLP [n].
  • an example is shown in which the width and height of the overlap region are set to fixed values wOVLP and hOVLP.
  • wCRP [M-1] ceil (wT [M-1] / wUnitTile) * wUnitTile-wT [M-1]
  • hCRP [N-1] ceil (hT [N-1] /
  • the width PicWidthInCtbsY and the height PicHeightInCtbsY of the picture in CTU units are expressed by the following equations.
  • TileWidthinCtbs [m] and TileHeightinCtbs [n] are parameters representing the width and height of the tile in CTU units.
  • TileWidthinCtbs [m] ceil (wT [m] / M)
  • TileHeightinCtbs [n] ceil (hT [n] / N)
  • a suitable overlap region width wOVLP [m] and height hOVLP [n] may be 2-6.
  • the following is an example of the tile information calculation formula of FIG. 7 when the overlap area width wOVLPL [m] and height hOVLP [n] are all set to sOVLP.
  • wCRP [M-1] ceil (wT [M-1] / wCTU) * wCTU-wT [M-1]
  • hCRP [N-1] ceil (hT [N-1] / hCTU) * wCTU-w
  • the picture dividing unit A 20102 divides a picture into tiles using the tile information calculated by the tile information calculating unit 20101.
  • Tile [m] [n] extracts the area of xTsmn .. (xTsmn + wT [m] -1) and yTsmn .. (yTsmn + hT [n] -1) on the picture. And output to the tile encoding unit 2012.
  • a crop offset area of wCRP [M-1] and hCRP [N-1] is added to the right and bottom tiles of the picture, and then output to the tile encoding unit 2012.
  • header information generator In the header information generation unit 2011, the parameter set and tile information are converted into a syntax expression and output to the encoded stream generation unit 2013.
  • the syntax expression of tile information is shown below.
  • FIG. 12 is a block diagram illustrating a configuration of 2012, which is one of the tile encoding units 2012a to 2012n.
  • the tile encoding unit 2012 includes a prediction image generation unit 101, a subtraction unit 102, a transform / quantization unit 103, an entropy encoding unit 104, an inverse quantization / inverse transform unit 105, an addition unit 106, a loop filter 107, and a prediction parameter memory.
  • the prediction parameter encoding unit 111 includes an inter prediction parameter encoding unit 112 and an intra prediction parameter encoding unit 113.
  • the tile encoding unit 2012 may be configured not to include the loop filter 107.
  • the predicted image generation unit 101 generates a predicted image of the PU for each picture of the image T for each CU that is an area obtained by dividing the picture.
  • the predicted image generation unit 101 reads a decoded block from the reference picture memory 109 based on the prediction parameter input from the prediction parameter encoding unit 111.
  • the predicted image generation unit 101 reads out a block at a position on a reference picture indicated by a motion vector with the target PU as a starting point.
  • intra prediction the pixel value of the adjacent PU used in the intra prediction mode is read from the reference picture memory 109 to generate a predicted image of the PU.
  • the prediction image generation unit 101 generates a prediction image of the PU using one prediction method among a plurality of prediction methods for the read reference picture block.
  • the predicted image generation unit 101 outputs the generated predicted image of the PU to the subtraction unit 102.
  • the predicted image generation unit 101 includes the padding process at the tile boundary and has the same operation as the predicted image generation unit 308 already described, and a description thereof will be omitted.
  • the subtraction unit 102 subtracts the signal value of the prediction image of the PU input from the prediction image generation unit 101 from the pixel value at the corresponding PU position of the image T to generate a residual signal.
  • the subtraction unit 102 outputs the generated residual signal to the transform / quantization unit 103.
  • the transform / quantization unit 103 performs frequency transform on the prediction residual signal input from the subtraction unit 102, and calculates a transform coefficient.
  • the transform / quantization unit 103 quantizes the calculated transform coefficient to obtain a quantized transform coefficient.
  • the transform / quantization unit 103 outputs the obtained quantized transform coefficient to the entropy coding unit 104 and the inverse quantization / inverse transform unit 105.
  • the entropy encoding unit 104 receives the quantized transform coefficient from the transform / quantization unit 103 and the prediction parameter from the prediction parameter encoding unit 111.
  • the entropy encoding unit 104 entropy-encodes the input division information, prediction parameters, quantization transform coefficients, and the like to generate an encoded stream Te, and outputs the generated encoded stream Te to the outside.
  • the inverse quantization / inverse transform unit 105 is the same as the inverse quantization / inverse transform unit 311 (FIG. 10) in the tile decoding unit 2002, and the quantized transform coefficient input from the transform / quantization unit 103 is inversely quantized. To obtain the conversion coefficient.
  • the inverse quantization / inverse transform unit 105 performs inverse transform on the obtained transform coefficient to calculate a residual signal.
  • the inverse quantization / inverse transform unit 105 outputs the calculated residual signal to the addition unit 106.
  • the addition unit 106 adds the signal value of the prediction image of the PU input from the prediction image generation unit 101 and the signal value of the residual signal input from the inverse quantization / inverse conversion unit 105 for each pixel, and generates a decoded image. Is generated.
  • the adding unit 106 stores the generated decoded image in the reference picture memory 109.
  • the loop filter 107 performs a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) on the decoded image generated by the adding unit 106.
  • the loop filter 107 does not necessarily include the above three types of filters, and may have a configuration including only a deblocking filter, for example.
  • the prediction parameter memory 108 stores the prediction parameter generated by the encoding parameter determination unit 110 at a predetermined position for each encoding target picture and CU.
  • the reference picture memory 109 stores the decoded image generated by the loop filter 107 at a predetermined position for each picture to be encoded and each CU.
  • the encoding parameter determination unit 110 selects one set from among a plurality of sets of encoding parameters.
  • the encoding parameter is the above-described QT or BT partition parameter, prediction parameter, or parameter to be encoded that is generated in association with these parameters.
  • the predicted image generation unit 101 generates a predicted image of the PU using each of these encoding parameter sets.
  • the encoding parameter determination unit 110 calculates an RD cost value indicating the amount of information and the encoding error for each of a plurality of sets.
  • the RD cost value is, for example, the sum of a code amount and a square error multiplied by a coefficient ⁇ .
  • the code amount is the information amount of the encoded stream Te obtained by entropy encoding the residual signal and the encoding parameter.
  • the square error is the sum between pixels regarding the square value of the residual value of the residual signal calculated by the subtracting unit 102.
  • the coefficient ⁇ is a real number larger than a preset zero.
  • the encoding parameter determination unit 110 selects a set of encoding parameters that minimizes the calculated RD cost value.
  • the entropy encoding unit 104 outputs the selected set of encoding parameters to the outside as the encoded stream Te, and does not output the set of unselected encoding parameters.
  • the encoding parameter determination unit 110 stores the determined encoding parameter in the prediction parameter memory 108.
  • the prediction parameter encoding unit 111 derives a format for encoding from the parameters input from the encoding parameter determination unit 110 and outputs the format to the entropy encoding unit 104. Deriving the format for encoding is, for example, deriving a difference vector from a motion vector and a prediction vector. Also, the prediction parameter encoding unit 111 derives parameters necessary for generating a prediction image from the parameters input from the encoding parameter determination unit 110 and outputs the parameters to the prediction image generation unit 101.
  • the parameter necessary for generating the predicted image is, for example, a motion vector in units of sub-blocks.
  • the inter prediction parameter encoding unit 112 derives an inter prediction parameter such as a difference vector based on the prediction parameter input from the encoding parameter determination unit 110.
  • the inter prediction parameter encoding unit 112 is partially the same as the configuration in which the inter prediction parameter decoding unit 303 derives the inter prediction parameters as a configuration for deriving parameters necessary for generating the prediction image to be output to the prediction image generating unit 101. Includes configuration.
  • the intra prediction parameter encoding unit 113 has a configuration in which the intra prediction parameter decoding unit 304 derives an intra prediction parameter as a configuration for deriving a prediction parameter necessary for generating a prediction image to be output to the prediction image generating unit 101. Some parts have the same configuration.
  • tile distortion can be removed by filtering a plurality of overlapping tile boundaries on the video decoding device side while encoding the tiles independently.
  • Modification 1 of the present application the method of dividing a picture into tiles is changed from the dividing method shown in FIG. 7 to the dividing method shown in FIG. 7 differs from FIG. 13 in that FIG. 7 is a tile including an overlap area, and FIG. 13 is a tile including a crop offset area that is an unused area in addition to the overlap area. That is, all the tiles including the tiles at the screen edge may include the crop offset area.
  • Fig. 13 (b) is a diagram showing Tile [0] [0] and Tile [1] [0] that are adjacent in the horizontal direction. The tile is the overlap area (shaded area) and the crop offset area (horizontal line area). including. Further, the width wT [m] and the height hT [n] of the tile and the width wCRP [m] and the height hCRP [n] of the crop offset area have the following relationship.
  • wTile [m] and hTile [n] are the width and height of the tile to be encoded. The rest is the same as in the second embodiment.
  • the upper left coordinate of the tile can be set at a position that is an integral multiple of the CTU. Therefore, in addition to the effect of the second embodiment, there is an effect that access to individual tiles is simplified.
  • FIG. 21 is a diagram illustrating picture division in which the tile size is limited to an integral multiple of the CTU except for picture boundaries.
  • FIG. 21A is a diagram in which a tile size is an integral multiple of a CTU and a 1920 ⁇ 1080 HD image is divided into 4 ⁇ 3 tiles.
  • FIG. 21 (b) is a diagram showing CTU partitioning of each tile. Tiles that do not reach picture boundaries are divided into an integer number of CTUs. When dividing a picture boundary tile into CTU units, the area outside the picture is treated as a crop offset area.
  • FIG. 22 (a) is a technique of the present embodiment, in which a 1920x1080 HD image is divided into 4x3 tiles. When dividing into 4x3 tiles, all the tiles can be divided into equal sizes (divided into 480x360), which has the effect of being able to equally load balance multiple processors and hardware.
  • the tile size can take a size other than an integer multiple of the CTU regardless of the picture boundary.
  • FIG. 22 (b) is a diagram showing CTU division of each tile. When dividing into CTUs, if the tile size is not an integral multiple of the CTU size, a crop offset area is provided outside the tile. In particular, as shown in TILE B, the CTU is divided based on the upper left of each tile. Therefore, the upper left coordinate of the CTU is not limited to an integer multiple of the CTU size.
  • Fig. 23 shows an example of the syntax of slice data at a tile size that is an integral multiple of the CTU.
  • coding_tree_unit of CTU data which is encoded data in CTU units, for the number of CTUs in the slice data.
  • the upper left coordinates (xCtb, yCtb) of the CTU can be uniquely derived from the CTU address CtbAddrInRs in the picture because the picture is divided in CTU units.
  • the upper left coordinates (xCtb, yCtb) of the CTU are set to 1 ⁇ CtbLog2SizeY times so as to be an integer multiple of the CTU based on the intra-picture CTU address CtbAddrInRs.
  • CtbAddrInTs is a tile scan address for performing raster scan of the CTU in tile units.
  • CtbAddrInRs represents the raster scan address of the CTU in units of pictures, and is 0 to PicSizeInCtbsY-1.
  • FIG. 24 shows a syntax example of slice data in the present embodiment.
  • CTU data syntax coding_tree_unit which is encoded data in CTU units, is called for the number of CTUs in the slice data.
  • the upper left coordinates (xCtb, yCtb) of the CTU cannot be uniquely derived from the intra-picture CTU address CtbAddrInRs. Therefore, CTU coordinates are derived based on the upper left coordinates of the tile. Specifically, when the ID of the target tile is TileId and the upper left coordinates of the target tile are indicated by (TileAddrX [TileId], TileAddrY [TileId]), the CTU coordinates are derived using the following formula.
  • CtbAddrInTile is the raster scan position within the tile of the CTU, where the top of the tile is 0. If the CTU address at the top of the tile is firstCtbAddrInTs, CtbAddrInTile is expressed by the following equation.
  • CtbAddrInTs is a tile scan address through a picture.
  • CtbAddrInTile CtbAddrInTs-firstCtbAddrInTile That is, in this embodiment, the CTU in-tile coordinates ((CtbAddrInTile% TileWidthinCtbs [TildId]) ⁇ CtbLog2SizeY, (CtbAddrInTile / TileWidthinCtbs [TildId]) ⁇ CtbLog2SizeY) derived from the CTU address CtbAddrInTile in the tile Then, using the in-picture coordinates (TileAddrX [TileId], TileAddrY [TileId]) at the upper left position of the tile, the in-picture coordinates at the CTU position are derived. That is, the upper left coordinates (xCtb, yCtb) of the CTU may be derived from the sum of the in-tile coordinates of the CTU and the in-picture coordinates at the head of the tile
  • the upper left coordinates (TileAddrX [TileId], TileAddrY [TileId]) of the tile of TileId are expressed below using the upper left coordinates (xTsmn, yTsmn) of the tile at the position (m, n) already described. May be.
  • CTU coordinates may be derived using the syntax of column_width_minus1 and row_height_minus1.
  • xCtb (CtbAddrInTile% (column_width_minus1 [m] +1)) ⁇ CtbLog2SizeY + xTsmn
  • yCtb (CtbAddrInTile / (column_width_minus1 [m] +1)) ⁇ CtbLog2SizeY + yTsmn
  • FIG. 14 (a) shows the flow of processing of the video encoding device 11.
  • the tile information calculation unit 20101 sets the number of tiles and the overlap area, and calculates information about the tile (width, height, upper left coordinates, and crop offset area if any) (S1500).
  • the picture dividing unit A 20102 divides the picture into tiles allowing overlap as shown in FIG. 7 or FIG. 13 (S1502).
  • the header information generation unit 2011 generates the tile information syntax and generates header information such as SPS, PPS, and slice header (S1504).
  • the tile encoding unit 2012 encodes each tile (S1506).
  • the encoded stream generation unit 2013 generates an encoded stream Te from the header information and the encoded stream of each tile (S1508).
  • FIG. 14 (b) shows the processing flow of the video decoding device 31.
  • the header information decoding unit 2001 decodes the header, and sets or calculates tile information (number of tiles, width, height, upper left coordinates, overlap width, height, if any, crop offset area).
  • tile information number of tiles, width, height, upper left coordinates, overlap width, height, if any, crop offset area.
  • a tile identifier necessary for covering the display area designated from the outside is derived (S1520).
  • the tile decoding unit 2002 decodes each tile (S1522).
  • the smoothing processing unit 20031 performs a filtering process on the overlap area of each tile (S1524).
  • the synthesizing unit 20032 synthesizes each tile including the filtered area to generate a picture (S1526).
  • the pixel values of the areas adjacent to the tile boundary are calculated by simply averaging the pixel values of the plurality of overlapping areas.
  • filtering is performed by a weighted sum that changes the weight depending on the distance from the tile boundary.
  • the smoothing processing unit 20031 of the tile composition unit 2003 shown in FIG. Operations other than the tile composition unit 2003 are the same as those described in the first embodiment, and a description thereof will be omitted.
  • the smoothing processing unit 20031 sets a weighting coefficient ww [x] according to the distance from the tile boundary as shown in FIG.
  • FIG. 8A is a diagram for explaining the filter processing of the overlapping region of two tiles Tile [m ⁇ 1] [n] and Tile [m] [n] adjacent in the horizontal direction in FIG.
  • the weighting coefficient of Tile [m] [n] is ww [x]
  • the weighting coefficient of Tile [m-1] [n] is 1-ww [x].
  • 0 ⁇ ww [x] ⁇ 1.
  • the weight coefficient ww [x] is set to 0 or 1 for the pixels outside the overlap area, and the weight coefficient in the overlap area is linearly interpolated. Is derived by
  • FIG. 16 (a) is a diagram in which Tile [m] [n-1] and Tile [m] [n] are extracted from the tiles shown in FIG. If the weighting factor for Tile [m] [n] is wh [y] and the weighting factor for Tile [m] [n-1] is 1-wh [y] (0 ⁇ wh [y] ⁇ 1), then Tile [ In m] [n] and Tile [m] [n-1], the weighting factor wh [y] is set to 0 or 1 for the pixels outside the overlapping region, and the weighting factor for the overlapping region is derived by linear interpolation. Is done.
  • the synthesizing unit 20032 synthesizes the non-overlap area of each tile and the overlap area filtered by the smoothing processing unit 20031, and generates a synthesized image (display image) Rec [] [].
  • the pixel value of the overlap area (OVLP_RIGHT in FIG. 8, OVLP_BOTTOM in FIG. 16) on the left or upper tile of Tile [m] [n] is not replaced with the pixel value after the filter processing, but Tile [
  • the pixel values in the overlap region (OVLP_LEFT in FIG. 8, OVLP_ABOVE in FIG. 16) on the left side or the upper side of m] [n] may be replaced with the pixel values after filtering.
  • Tile [m] [n] On the left side or the upper side of Tile [m] [n] is used, and Tile [m] [n] The overlap area of the tile on the left side or the lower side (OVLP_RIGHT in FIG. 8 and OVLP_BOTTOM in FIG. 16) is not used.
  • the pixel value after the filter processing may be directly stored in Rec [] [] instead of the image of each tile.
  • the weighting factors ww [] and wh [] are calculated.
  • the weighting factor may be obtained by referring to a table prepared in advance.
  • FIG. 15 (b) shows an example of a table in which weighting factors are represented by positive numbers WGT [] and shift WSHT.
  • the weight may be obtained by a method other than linear interpolation, and the interpolation formula or table may be changed based on the coordinates.
  • FIGS. 8B and 16B are diagrams for explaining the filtering process of the overlap area in FIG. 13 showing an example in which the width or height of the crop offset area is included in the width or height of the tile. is there. Since the crop offset area is not subject to filter processing or picture composition / display, the tile filter processing in FIG. 13 is performed only on the overlap area as shown in FIGS. 8 (b) and 16 (b). The processing is the same as the processing for the overlap region in FIGS. 8 (a) and 16 (a). Therefore, the description of the second embodiment can be used as it is.
  • the tile division method of a picture and the CTU division method of a tile will be described again by using another representation method for the tiles described in the first and second embodiments.
  • the tile has been described as an area including a tile, an overlap area, and a crop offset area.
  • the tile will be described as an area including a tile active area and a tile extension area.
  • the tile active area is a net display area that does not include an overlap area.
  • the tile extension area is an area composed of an overlap area and a crop offset area.
  • cropoffset_flag obtained by replacing overlap_tiles_flag notified by tile_info () in FIG. 25 (a) may be used.
  • cropoffset_flag is 0, the tile extension area does not exist, and otherwise, the tile extension area exists.
  • FIG. 26 shows an example of dividing a picture into tiles regardless of multiples of CTUs.
  • a picture is divided into tiles (tile active areas) that do not depend on multiples of CTUs.
  • the tile active area is an area that configures pictures without overlapping. In other words, a picture is divided into “tile active areas” without overlapping. If the width and height of the tile active area are wAT [m] and hAT [n], and the width and height of the picture are wPict and hPict, they can be expressed by the following equations.
  • the tile active area may be represented by the following expression as a multiple of tile unit size (minimum tile size) wUnitTile and hUnitTile.
  • wAT [m] column_width_in_luma_samples_div2_minus1 [m] * 2 (Formula TAS-5)
  • hAT [n] row_height_in_luma_samples_div2_minus1 [n] * 2
  • the “tile extension area” corresponds to the areas named the overlap area and the crop offset area in the first and second embodiments.
  • the tile extension area is not necessarily used for decoding and output, and may be treated as an area discarded after decoding.
  • tile extension area may be used for reference (decoding) of a subsequent picture, or may be used for generation of an output image.
  • the “tile active area” and the “tile extension area” are collectively referred to as a “tile coding area”.
  • the “tile encoding area” is an area that is actually encoded.
  • tile extension areas areas used for reference and decoding are called overlap areas, and areas not referenced and decoded are called crop offset areas (tile invalid areas).
  • the first embodiment describes the case where all the tile extension areas are referred to and decoded, and the tile extension areas are overlap areas.
  • an example has been described in which a part of the tile extension area is referred to as an overlap area and used for decoding, and the remaining part is referred to as a crop offset area and is not used for decoding.
  • the “tile coding area” may be rephrased to be composed of a “tile effective area” used for decoding / output and a tile crop area (tile invalid area) not used for decoding / output.
  • the tile effective area is composed of a tile active area which is a unit for dividing a picture and an overlap area.
  • FIG. 26 (b) is a diagram illustrating tiles that are actually encoded (also referred to as tile encoding areas).
  • the tile (tile coding area) is a rectangle having an upper left coordinate (xTsmn, yTsmn), a width wTile [m], and a height hTile [n], and the tile active area Tile [m] [n] (a rectangle with a width wAT [m] and a height hAT [n]) and a tile extension area (a tile other than the tile active area, an area with a width wCRP [m] and a height hCRP [n]).
  • the tile coding area may be expressed by the following expression using the width TileWidthinCtbs [m] and the height TileHeightinCtbs [m] of the tile active area in CTU units.
  • FIG. 26 (c) is an example of dividing a tile into CTUs. Divided into CTUs, starting from the upper left coordinate of the tile. As shown in FIG. 26 (c), the size of the tile active area may be an integer multiple of the CTU size or may not be an integer multiple of the CTU size.
  • the upper left coordinates (xTsmn, yTsmn) of the tile at the position (m, n) in tile units are the sum of the tile active areas (wAT [i], hAT [i]). Matches.
  • the size of the tile effective area obtained by adding the tile active area and the overlap area may be an integer multiple of the CTU size or may not be an integer multiple of the CTU size.
  • FIG. 27 shows an example in which the tile extension area is composed of an overlap area and a crop offset area.
  • the overlap area is a hatched area outside the tile active area.
  • the overlap area overlaps the tile active area of the adjacent tile.
  • the width wOVLP [m] and height hOVLP [n] of the overlap area and the width wCRP [m] and height hCRP [n] of the tile extension area have the following relationship.
  • the tile coding area includes a tile active area (wAT, hAT) that is a unit for dividing a picture and a hidden area (tile extension area).
  • the tile coding area (wTile, hTile) is derived from the tile effective area (wT, hT) used for decoding / output and the crop offset area not used for decoding / output, that is, the tile invalid area (wCRP, hCRP). In other words, it may be configured.
  • the overlap area is outside the tile active area (wAT, hAT), which is a unit for dividing a picture, but is included in the tile effective area (wT, hT) used for decoding / output.
  • FIG. 28 (a) shows an example of the syntax of slice data slice_segment_data (). The operations of the video encoding device 11 and the video decoding device 31 will be described below with reference to the syntax.
  • coding_tree_unit () indicates the CTU syntax.
  • CtbAddrInTs, CtbAddrInRs, and CtbAddrInTile are CTU addresses
  • CtbAddrInTs is a CTU address in the tile scan order in the picture
  • CtbAddrInRs is a CTU address in the raster scan order in the picture
  • CtbAddrInTile is a CTU address in the tile scan order in the tile.
  • end_of_subset_one_bit is set to 1, and the encoded data is byte aligned.
  • FIG. 28 (b) is an example of CTU syntax coding_tree_unit ().
  • the upper left coordinate (xCtb, yCtb) of the CTU is derived for each tile.
  • the CTU in-tile coordinates derived from the in-tile address CtbAddrInTile ((CtbAddrInTile% TileWidthinCtbs [TileId]) ⁇ CtbLog2SizeY, (CtbAddrInTile / TileWidthinCtbs [TileId]) ⁇ CtbLog2SizeY) Deriving the CTU coordinates of the tile in the picture by adding [TileId], TileAddrY [TileId])
  • xCtb ((CtbAddrInTile% TileWidthinCtbs [TileId]) ⁇ CtbLog2SizeY) + TileAddrX [TileId]
  • yCtb ((CtbAddrInTile / TileWidthinCtbs [TileId]) ⁇ CtbLog2SizeY) + TileAddrY [TileId]
  • FIG. 29 shows an example of syntax coding_quadtree () for dividing a block (CU or CTU) into quadtrees
  • FIG. 30 shows an example of syntax coding_binarytree () for dividing a block into binary trees.
  • the upper left coordinate of the tile does not correspond to a position that is an integral multiple of the CTU.Therefore, when using a tile, the upper left coordinate of the CTU (xCtb, yCtb) and the tile size are considered, as shown in the following formula. Notifies split_cu_flag indicating whether or not to perform quadtree partitioning.
  • the target block is It exists in the tile effective area.
  • the block size is larger than the minimum value (log2CbSize> MinCbLog2SizeY)
  • split_cu_flag indicating whether or not the block is further divided is notified. If the block is further divided into quadtrees, split_cu_flag is set to 1. If the block is not divided into quadtrees, split_cu_flag is set to 0.
  • split_cu_flag recursively calls coding_quadtree () to notify whether or not to further perform quadtree partitioning. If split_cu_flag is 0, coding_binarytree () is called to notify whether or not to perform binary tree splitting (decoding).
  • coding_quadtree (x1, y0, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile) which is a block located at (x1, y0) obtained by quadtree partitioning is x1 Is encoded or decoded if is located within a tile.
  • split_bt_mode indicating whether or not to perform further binary tree division is notified in consideration of the upper left coordinates (xCtb, yCtb) of the CTU and the tile size (decoding) To do). Specifically, split_bt_mode indicating whether or not to perform binary tree division may be notified by the following equation.
  • the block size is larger than the minimum size minBTSize that can be divided into binary trees and less than the maximum size maxBTSize that can be divided into binary trees, and the upper left coordinate of the lower or right block when the binary tree is divided is within the tile.
  • split_bt_mode indicating whether or not to perform binary tree division and the direction of the binary tree is notified. If the block is further divided into binary trees, split_bt_flag is set to 1. If the block is not divided into binary trees, split_bt_mode is set to 0. When split_bt_mode is 1, recursively calls coding_binarytree () to notify whether or not to perform binary tree division. When split_bt_mode is 0, coding_unit (x0, y0, log2CbWidth, log2CbHeight) is called to actually encode or decode the block.
  • any of the two blocks obtained by binary tree partitioning is outside the tile effective area (or outside the tile encoding area), that block is not encoded.
  • coding_binarytree (x0, y1, log2CbWidth, log2CbHeight-1, cqtDepth, cbtDepth + 1, wT, hT, xTile, which is a block located at (x0, y1) yTile) is encoded or decoded when y1 is located in the tile.
  • the picture can be divided into tiles having a size that does not depend on a multiple of the CTU.
  • Embodiment 3 when a display (projection image) is a spherical surface, such as a 360-degree video or a VR video, an image mapped to a two-dimensional image is encoded in order to encode image data during transmission / storage. Processing will be described.
  • FIGS. 17 (a) and 17 (c) are ERP (Equi-Rectangular Projection) Formats, and the sphere is expressed as a rectangle by enlarging the region laterally away from the equator.
  • FIG. 17 (c) shows a cube format.
  • the vertical line area in FIG. 17 (c) is an area where no image data exists.
  • Mapping and packing into a two-dimensional image as shown in FIG. 17 (a) are performed on the image as preprocessing before being input to the moving image encoding device 11.
  • the picture dividing unit 2010 in FIG. 11 assigns tiles to the rectangles 1 to 11 in FIG. 17A and the rectangles 0 to 5 in FIG. 17C, and each tile is encoded by the tile encoding unit 2012. Is done.
  • FIG. 18 is a cubic-like ERP format, and the equator region is divided into 5 and 6, as shown in FIG. 18 (a). Then, packing is performed together with a rectangle corresponding to the polar area generated by rotation, and a rectangular area as shown in FIG. 18B is generated in the preprocessing.
  • the picture dividing unit 2010 in FIG. 11 tiles, for example, a rectangle 6, a rectangle composed of triangular regions 1 to 4, a rectangle 5 and a rectangle composed of triangular regions 7 to 10, respectively. Each tile is encoded by the tile encoding unit 2012.
  • Fig. 19 shows SPP (Segmented Sphere Projection) Format, where the polar region is represented by circle regions 1 and 2 in Fig. 19 (a) and the equator region is represented by rectangles 3-6 in Fig. 19 (a).
  • the vertical line area outside the circle is an invalid area without image data.
  • the picture dividing unit 2010 in FIG. 11 assigns tiles to the rectangles 1 and 2 and the rectangles 3 to 6 in which the circular area is expanded, and each tile is encoded by the tile encoding unit 2012.
  • the number of tiles included in each tile row may be equal.
  • the number of tiles included in each tile row may not be equal.
  • the number of tiles included in each tile row may not be equal. In such a case, as shown in FIG.
  • the header information generation unit 2011 generates the syntax shown in FIGS. 5 (i) and 5 (j) and outputs it to the tile encoding unit 2012 and the encoded stream generation unit 2013.
  • the header information decoding unit 2001 decodes the syntaxes shown in FIGS. 5 (i) and 5 (j) and outputs the decoded syntaxes to the tile decoding unit 2002 and the tile synthesis unit 2003.
  • 360-degree video and VR can be used without changing the encoding method of the two-dimensional image at the tool level.
  • Video can be encoded / decoded.
  • Embodiment 4 In Embodiment 3, a picture is directly divided into tiles.
  • Embodiment 4 of the present application a method of dividing a picture into regions and dividing the region into tiles will be described.
  • a picture is hierarchically divided into two stages using a region that can be arranged in a picture using a designated position and size, and tiles that are divided into rectangular sizes within the region.
  • a region is a collection of continuous regions in a projection image or regions using the same mapping method.
  • FIG. 17 (b) is an example in which the picture is divided into tiles shown in FIG. 17 (a) by dividing the picture into three regions and further dividing each region into tiles.
  • FIG. 17 (d) is an example in which the picture is divided into tiles shown in FIG. 17 (c) by dividing the picture into three regions and further dividing each region into tiles.
  • FIG. 17 (e) is another example in which each region is divided into tiles.
  • Region 0 is divided into tiles Tile [0] [0] and invalid region tiles Tile [1] [0] to Tile [3] [0].
  • Region 1 is divided into tile Tile [0] [0] and tile Tile [1] [0].
  • Region 2 is divided into tile Tile [0] [0] and invalid area tiles Tile [1] [0], Tile [2] [0], and Tile [3] [0].
  • the region 1 may be processed as one tile Tile [0] [0].
  • FIG. 18 (c) is a region corresponding to 18 (b).
  • Region 0 in FIG. 18 (c) corresponds to rectangle 6 in FIG. 18 (b)
  • region 1 corresponds to triangle regions 1 to 4, rectangle 5, and triangle regions 7 to 10 in FIG. 18 (b).
  • Triangular areas 1 to 4, rectangular 5, rectangular 6, and triangular areas 7 to 10 are continuous areas in the projection image.
  • FIG. 18 (d) shows an example in which each region is divided into tiles. Region 0 is divided into tiles Tile [0] [0], tiles Tile [1] [0], and Tile [2] [0].
  • Region 1 includes tile Tile [0] [0] that includes triangular areas 1 to 4, tile Tile [1] [0] that is rectangular 5, and tile Tile [2] [0] that includes triangular areas 7 to 10. Divided. Region 0 may be processed as one tile Tile [0] [0].
  • FIG. 19 (b) shows the region corresponding to Fig. 19 (a).
  • Region 0 in FIG. 19B corresponds to the circular regions 1 and 2 in FIG. 19A and the surrounding invalid regions
  • region 1 corresponds to rectangles 3 to 6 in FIG. 19A.
  • the rectangles 3 to 6 are continuous regions in the projection image, and the circular regions 1 and 2 are not continuous regions in the projection image, but both are polar regions and the mapping method is the same.
  • FIG. 19 (c) is an example in which each region is divided into tiles.
  • the region 0 is divided into a tile area Tile [0] [0] of the invalid area around the circular area 1 and its surrounding area, and a tile Tile [1] [0] of the invalid area around the circular area 2 and its surrounding area.
  • Region 1 is divided into tiles Tile [0] [0] to Tile [3] [0] assigned to rectangles 3-6.
  • FIG. 31 shows a hierarchical structure of pictures, regions, tiles, and CTUs.
  • FIG. 31 (a) is a diagram showing one picture.
  • FIG. 31 (b) is a diagram of regions (Region 0, Region 1, and Region 2) obtained by dividing this picture into three.
  • FIG. 31 (c) is a diagram of tiles obtained by further dividing each region.
  • FIG. 31 (d) is a diagram of a CTU obtained by further dividing the tile obtained by dividing Region0 in FIG. 31 (c).
  • the upper left coordinates (xRs0, yRs0), width wReg [0], and height hReg [0] of the region Region [0] may not be an integer multiple of the CTU.
  • the upper left coordinates (xTsmn, yTsmn), width wAT [m], and height hAT [n] of the tile active area Tile [m] [n] of the tile divided from Region Region [0] are not an integral multiple of CTU. May be.
  • Fig. 20 (k) shows the syntax for dividing a picture into regions and dividing the region into tiles.
  • region_parameters () is a syntax indicating region information, and is called from PPS.
  • tile_parameters () is notified by PPS, but in this embodiment, region_parameters () is notified by PPS, and tile_parameters () is notified in region_parameters ().
  • num_region_minus1 indicates the value obtained by subtracting 1 from the number of regions.
  • num_region_minus1 is 0, there is one region, and the syntax notified thereafter is the same as when the picture is directly divided into tiles.
  • num_region_minus1 is larger than 0, the upper left coordinates (region_topleft_x [i], region_topleft_y [i]), width region_width_div2_minus1 and height region_height_div2_minus1 are notified in each region.
  • region_width_div2_minus1 and region_height_div2_minus1 are values obtained by dividing the width and height of the region by 2, and the actual region width wReg and height hReg are expressed as follows.
  • HPict may be derived by replacing the region widths wReg [p] and hReg [p]. If uniform_spacing_flag is not 0, it may be derived using (Expression TAS-5). Formulas in which wPict and hPict in (Formula TAS-1) are replaced with wReg [p] and hReg [p] are shown below.
  • M and N indicate the number of tiles in the region in the horizontal direction and the number in the vertical direction.
  • the upper left coordinates (xRsp, yRsp) of the region Region [p] are set as follows.
  • xRsp region_topleft_x [p] (Formula REG-1)
  • yRsp region_top_left_y [p]
  • region_width_div2_minus1 [p] and region_height_div2_minus1 [p] are either expressed in 2 pixel units or in 1 pixel units depending on the color difference format (4: 2: 0, 4: 2: 2, 4: 4: 4) It may be switched whether or not.
  • CABAC initialization is performed at the beginning of the region as well as at the beginning of the slice and tile.
  • the fill_color_present_flag is a flag indicating whether to notify a value to be set to the pixel value of the invalid tile area (invalid area) to a tile area (hereinafter, invalid tile) that is not encoded in the picture or region, and fill_color_present_flag Is 1, the pixel value (fill_color_y, fill_color_cb, fill_color_cr) of the invalid area is notified.
  • fill_color_present_flag is 0, the pixel value of the invalid area is black (0, (1 ⁇ bitdepth-1), (1 ⁇ bitdepth-1)) or gray ((1 ⁇ bitdepth-1), (1 Set to ⁇ bitdepth-1), (1 ⁇ bitdepth-1)), etc.
  • bitdepth is the bit depth of the pixel value.
  • tile_parameters () and tile information tile_info () included in the tile_parameters () may be expressed by the syntax shown in FIGS. 4 (c) and 4 (d).
  • the tile divides the region uniformly with the upper left coordinates of the region (region_topleft_x [i], region_topleft_y [i]) as (0,0).
  • FIG. 11 (c) is an example of the picture dividing unit 2010 of FIG. 11 (a) that implements the fourth embodiment.
  • the picture dividing unit 2010 includes a region information calculating unit 20103, a tile information calculating unit 20101, and a picture dividing unit B20104.
  • the region information calculation unit 20103, for example, region information for dividing the input image into regions as shown in FIGS. 17 (d), 18 (c), and 19 (b) (number of regions, upper left coordinates, Width and height, pixel values set in the invalid area, etc.) are calculated.
  • the tile information calculation unit 20101 refers to the region information calculated by the region information calculation unit 20103, replaces the picture with the region, and divides the region into tiles by the method described in the third embodiment (for example, FIG. 17 (e), FIG. 18 (d), FIG. 19 (c), FIG. 31 (c), etc.) are calculated.
  • the picture division unit B20104 divides the picture into regions by referring to the region information, and divides the region into tiles with reference to the tile information.
  • the header information generation unit 2011 generates the syntax shown in FIG. 20 (k) and outputs it to the tile encoding unit 2012 and the encoded stream generation unit 2013.
  • the tile encoding unit 2012 encodes the divided tiles, and the encoded stream generation unit 2013 generates an encoded stream Te from the encoded stream of each tile.
  • the header information decoding unit 2001 decodes the syntax shown in FIG. 20 (k) and outputs the decoded syntax to the tile decoding unit 2002 and the tile composition unit 2003.
  • the tile decoding unit 2002 decodes the encoded stream of the designated tile and outputs it to the tile synthesis unit 2003.
  • the smoothing processing unit 20031 of the tile composition unit 2003 outputs a tile obtained by filtering the overlap region to the composition unit 20032 if there is an overlap region of the tile, and performs tile decoding if there is no overlap region of the tile.
  • the output tile of the part 2012 is output as it is to the composition part 20032.
  • the tile synthesis unit 20032 synthesizes the decoded image of the designated area from the region information and tile information decoded by the header information decoding unit 2001.
  • the size of the tiles in the region can be set almost uniformly. Therefore, the tile information notified by the header can be reduced as compared with the third embodiment.
  • the projection image is almost discontinuous at the region boundary, there is no need to provide an overlap region, but at the tile boundary in the region, the projection image is often continuous, so an overlap region is necessary. It is. Therefore, redundant encoded streams can be reduced by not providing overlap regions at region boundaries.
  • Figure 32 shows the syntax for the region.
  • the CTU syntax coding_tree_unit () and end_of_region_flag are notified.
  • the end position of the tile is determined by the following formula.
  • CtbAddrInTs indicates the CTU address through the picture
  • NumCtbInTile [] indicates the number of CTUs in the tile
  • CtbAddrInTile indicates the CTU address in the tile. If CtbAddrInTile is greater than or equal to NumCtbInTile [], it represents the outside of the target tile, so it can be seen that it is the end of the target tile.
  • the tile identifier TileId is incremented by 1 at the end of the tile. That is, TileId is unique within a region and is reset to 0 at the beginning of a different region.
  • FIG. 33 shows the syntax coding_tree_unit () of the CTU when the tile is divided regardless of the multiple of the CTU.
  • the upper left coordinate (xCtb, yCtb) of the CTU is derived for each tile.
  • the upper left coordinates (TileAddrX, TileAddrY) of the tile in pixels and the upper left coordinates of the region (RegionAddrX [RegId], RegionAddrY [RegId]) are derived from (Expression TLA-1) and (Expression TLA-2) (xTsmn, yTsmn), (xRsp, yRsp) derived by (Expression REG-1) may be set.
  • the width and height (wTile [], hTile []) of the tile coding area may be used instead of the width and height (wT [], hT []) of the tile effective area.
  • Fig. 34 shows another syntax indicating a region.
  • the slice is divided into regions and the regions are divided into tiles.
  • the regions may be divided into slices and tiles.
  • Region information region shape and size
  • PPS region information
  • TileId becomes equal to or greater than a predetermined value NumTilesInRegion [RegId]
  • a predetermined value NumTilesInRegion [RegId] the processing of the target region ends, RegId is incremented, TileId and CtbAddrInTs are reset, and processing of the next region starts.
  • TileId and CtbAddrInTs are reset in units of regions.
  • coding_tree_unit (TileId) called in Fig. 34 is the same as in Fig. 33, and the upper left coordinate of CTU is the tile or region in order to process regions and tiles that are not necessarily a multiple of CTU. Is calculated using the upper left coordinates of.
  • a region having a size that is not necessarily a multiple of CTU can be divided into tiles for encoding and decoding.
  • FIG. 17 (e) is a diagram in which FIG. 17 (c) is divided into regions and then divided into tiles. Regions 0 and 2 are divided into 4 tiles, and region 1 is divided into 2 tiles. In regions 0 and 2, tile Tile [0] [0] is an effective area having an area corresponding to the projection image, but tiles Tile [1] [0], Tile [2] [0], and Tile [3 ] [0] is an invalid area. Therefore, Tile [1] [0], Tile [2] [0], and Tile [3] [0] do not need to be encoded / decoded.
  • a tile tile_valid_flag for notifying a tile in an invalid area is included in tile information, a tile with a tile_valid_flag of 1 is decoded, and a tile with a tile_valid_flag of 0 is not decoded.
  • the other syntax is the same as that shown in FIG. In FIG.
  • the information on the tile width and height is notified of the information on the number of tiles in the vertical direction (num_tile_rows_minus1), the information on the tile height (row_height_minus1 [i]) for each tile row, Information on the number of horizontal tiles (num_tile_columns_minus1) and tile width information (column_width_minus1 [i]) are notified.
  • tile height information (row_height_minus1 [i]) and tile height Information about the width of the tiles in the horizontal direction may be notified for the number in the vertical direction and the number in the horizontal direction.
  • the pixel value of the invalid area may be notified by setting fill_color_present_flag to 1 in FIG. 20 (k) and filling_color_y, fill_color_cb, and fill_color_cr.
  • FIG. 35 As another example of the invalid area, there is Right-angled Triangular resion-wise packing for cube map projection Format shown in Fig. 35.
  • Figure 35 (a) Right-angled Triangular resion-wise packing for cube map project ⁇ Format only packs the cube surface (front, left, half of Top, Bottom) that can be seen from the right front. , Encode.
  • the form of this packing is shown in FIG. 35 (b).
  • the picture in FIG. 35 (b) consists of three regions. Region [0] consists of Front and Left in FIG. 35 (a). Region [1] consists of a half area (triangle area) of each of Top and Bottom in FIG. 35 (a) and a padding area between two triangles.
  • Region [2] is an invalid region that does not exist in FIG. 35 (a), and is generated when the heights of region [0] and region [1] are different.
  • region [0] is the upper left coordinate (xRs [0], yRs [0]), width wReg [0], height hReg [0], and region [1] is upper left Coordinates (xRs [1], yRs [1]), width wReg [1], height hReg [1], region [2] is the upper left coordinate (xRs [2], yRs [2]), width wReg [ 2], height hReg [2].
  • the tile encoding unit 2012 encodes only valid tiles.
  • the header information decoding unit 2001 decodes the syntax shown in FIG. 20 (l) and outputs the decoded syntax to the tile decoding unit 2002 and the tile synthesis unit 2003.
  • the tile decoding unit 2002 decodes an encoded stream of valid tiles and outputs the decoded stream to the tile synthesis unit 2003.
  • the moving image encoding device and the moving image decoding device By notifying the flag indicating the validity / invalidity of the tile, the moving image encoding device and the moving image decoding device perform only necessary encoding / decoding processing, so that useless processing can be reduced.
  • encoding / decoding is completed in units of regions (Region ()).
  • the region of width wReg [i] and height hReg [i] with the upper left coordinates (region_topleft_x [i], region_topleft_y [i]) as (0,0) is regarded as one picture, and in FIG. 20 (n) In the Region () shown, the syntax of Tile () shown in FIG. 5 (h) may be notified in the raster scan order.
  • the initial value of the quantization parameter defined by the slice may be used as the first quantization parameter of each region.
  • the picture may be processed as one slice.
  • the encoding process or the decoding process is performed independently for each region using the syntax shown in FIG. 32 or FIG. May be.
  • the area including the overlap area and the crop offset area is defined in CTU units based on the upper left coordinates of the net display area (tile active area) that is not limited to an integer multiple of the CTU. Encoding and decoding processes were performed. The upper left coordinate of the tile active area is not limited to an integer multiple of the CTU.
  • An arranged picture is generated, and this picture is used as an input picture to the moving picture coding apparatus 11.
  • the upper left coordinate of the tile coding area is set at a position that is an integral multiple of the CTU, and the size of the tile coding area is an integral multiple of the CTU.
  • the following formula includes an overlap area and a crop offset area, not the net size of the picture (first picture size). Is set (second picture size).
  • the picture width wPict and height hPict do not include the crop offset areas (wCRP [M-1] and hCRP [N-1]) at the right and bottom edges of the picture.
  • the image decoding device 31 decodes the tile encoding area, filters the overlap area with the adjacent tile active area, and discards the crop offset area, thereby reducing the size of the original picture (first image). (Picture size) picture is output.
  • the conventional tile encoding unit 2012 and tile decoding unit 2002 can be used for encoding processing and decoding processing, and the encoding processing and decoding processing are complicated. You can also reduce the degree.
  • FIG. 36 (a) is a diagram in which a picture is divided into tiles that are allowed to overlap and are not limited to an integral multiple of the CTU, as in the first embodiment.
  • the shaded area is an overlap area, which is an area overlapping with an adjacent tile active area.
  • FIG. 36 (b) is a diagram in which one tile of FIG. 36 (a) is taken out.
  • the tile (tile effective area) Tile [m] [n] has a width wT [m] and a height hT [n], and the width wOVLP [m] and height hOVLP [n] of the overlap area shown by diagonal lines is wT Included in [m] and hT [n], respectively.
  • FIG. 36 (c) is a picture generated by setting the upper left coordinate of the tile effective area at a position that is an integral multiple of the CTU so that adjacent tile effective areas do not overlap. This picture is an input image to the moving image encoding device 11.
  • the encoding process or the decoding process is performed with a tile encoding area (xTsmn, yTsmn) having a position that is an integral multiple of the CTU and a size that is an integral multiple of the CTU ( For width wTile [m] and height hTile [n]).
  • the tile coding area is an area obtained by combining the tile effective area and the crop offset area (tile invalid area) as shown in (Formula TCS-1) or Formula (TCS-3).
  • the upper left coordinates (xTsmn, yTsmn) of the tile coding area shown in FIG. 36 (c) are expressed by the following equations.
  • FIG. 37 shows syntax other than the picture width pic_width_in_luma_samples and the height pic_height_in_luma_samples.
  • the tile_info () in FIG. 37 differs from the tile_info () in FIG. 25A in that the total_cropoffset_width and total_cropoffset_height are notified when the uniform_spacing_flag is not 0.
  • wT [m] ((m + 1) * wPict1) / M- (m * wPict1) / M
  • hT [n] ((n + 1) * hPict1) / N- (n * hPict1) / N
  • wPict and hPict are the width and height (second picture size) of the input image calculated by (Formula TCS-2).
  • the width wT [m] and height hT [n] of the tile effective area are calculated by substituting column_width_in_luma_samples_div2_minus1 [m] and row_height_in_luma_samples_div2_minus1 [n] into (Formula TSP-10) Otherwise, it is calculated by substituting column_width_minus1 [m] and row_width_minus1 [n] into any one of (Formula TSP-7) to (Formula TSP-9).
  • overlap_tiles_flag is a flag indicating the presence or absence of a crop offset area including an overlap area. The other syntax is the same as that in FIG.
  • uniform_overlap_flag, tile_overlap_width_minus1 [], and tile_overlap_height_minus1 [] are notified by overlap_tiles_info () in FIG. If 0 is allowed for the size (width or height) of the overlap, the overlap width (tile_overlap_width []) and height (tile_overlap_height []) without subtracting 1 may be notified. Further, if the overlap size is always the same, uniform_overlap_flag may not be sent, and only a set of tile_overlap_width_minus1 and tile_overlap_height_minus1 may be earthed.
  • the width wOVLP [m] and the height hOVLP [n] of the overlap region may be calculated by (Expression OVLP-1) or (Expression OVLP-2). Further, for example, the width wCRP [m] and the height hCRP [n] of the crop offset area may be calculated by (Expression CRP-1).
  • the tile (tile coding area) processed by the tile coding unit 2012 or the tile decoding unit 2002 is an integer multiple of the CTU, and the top of the tile is an integer of the CTU. Since the double position is set, the conventional slice_segment_data () and coding_tree_unit () shown in FIG. 23 may be used.
  • Processing following slice data is the same as the conventional tile encoding unit 2012 and tile decoding unit 2002 that process tiles independently.
  • the processing content of the picture dividing unit 2010 is different from the processing described in Embodiments 1 to 6 in the encoding processing.
  • the processing content of the tile composition unit 2003 is different from the processing described in the first to sixth embodiments.
  • the tile information calculation unit 20101 of the picture dividing unit 2010 calculates the width wAT of the tile active area having no overlap as shown in FIG. 26 (a) from the picture size (first picture size). [m], height hAT [n], overlap area width wOVLP, height hOVLP, crop offset area width wCRP, height hCRP, tile effective area width wT [m], height hT [n], Tile information including the width wTile [m] and the height hTile [n] of the tile coding area is calculated.
  • the picture dividing unit A20102 of the picture dividing unit 2010 divides the picture into tile active areas according to the tile information calculated by the tile information calculating unit 20101, and includes the tile effective area Tile [m] [n ] Is copied to a memory of a size (second picture size) that can store the area of (wPict, hPict) calculated by (Expression TCS-2).
  • the memory size may be set to a size (wPict + wCRP [M-1], hPict + hCRP [N-1]) obtained by expanding (wPict, hPict) to an integer multiple of CTU. As shown in FIG.
  • the tile effective area Tile [m] [n] is arranged such that the upper left coordinate is an integer multiple of the CTU and the tile effective areas do not overlap.
  • the picture dividing unit 2010 sets pixel values in an area outside the tile effective area where no pixel value is set (crop offset area).
  • the pixel value to be set may be a pixel value of a tile effective area that is in contact with the crop offset area.
  • the pixel value vPic (x, y) at the pixel position (x, y) in the crop offset area is derived from the pixel value in the tile effective area by the following equation.
  • vPic [x] [y] Tile [m] [n] [wT [m] -1] [hT [n] -1] (wT [m] ⁇ x ⁇ wTile [m], hT [n] ⁇ y ⁇ hTile [n])
  • NBIT is the number of bits of the pixel value of the picture.
  • the picture dividing unit A20102 outputs the input image having the second picture size generated in this way to the tile encoding unit 2012 for each tile encoding region.
  • the tile encoding unit 2012 encodes each tile encoding area and generates an encoded stream of each tile encoding area.
  • the encoded stream generation unit 2013 generates an encoded stream of an input image from the encoded stream of each tile encoding area.
  • the header information decoding unit 2001 decodes header information including tile information from the input encoded stream, and outputs an input stream of each tile encoding area to the tile decoding unit 2002.
  • the tile decoding unit 2002 decodes each tile coding area from the input stream and outputs the decoded region to the tile synthesis unit 2003.
  • the smoothing processing unit 20031 uses the overlap region of each tile decoded by the tile decoding unit 2002, for example, filter processing shown in (Expression FLT-1) to (Expression FLT-3) (Averaging processing, weighted averaging processing) is performed, and the pixel value (in this case, tmp) of the overlapped region subjected to the filter processing is overwritten in the memory shown in FIG.
  • the filter processing result of the overlap area at the right end of Tile [0] [0] and the tile active area at the left end of Tile [1] [0] is displayed in the tile active area at the left end of Tile [1] [0].
  • the tiled active area at the top edge of Tile [0] [1] and the tile active area at the top edge of Tile [0] [1] Will be overwritten.
  • the synthesizer 20032 uses the tile active area (wAT [m]) from the memory of the second picture size wPict * hPict or from the memory of (wPict + wCRP [M-1]) * (hPict + hCRP [N-1 ⁇ ). , hAT [n]) is extracted, and a decoded image having the original picture size (first picture size) is synthesized even if it is arranged so as not to overlap.
  • the original picture size is the sum of the width and height of each tile active area ( ⁇ wAT [m], ⁇ hAT [n]), which is the size of the display image.
  • tile encoding processing and tile decoding processing can be used for encoding processing and decoding processing, and encoding processing and decoding are possible. Processing complexity can also be reduced.
  • a moving image decoding apparatus is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information.
  • a header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image
  • the tile includes an area that overlaps with an adjacent tile, and the synthesis unit filters a plurality of pixel values of each pixel in the overlap area of the tile, and the decoded image of the tile A display image is generated using the pixel value and the filtered pixel value.
  • the tile decoding unit decodes the target tile with reference to only the information on the target tile and the information on the collocated tile of the target tile.
  • the tile information includes the number of tiles, the width, the height, the presence / absence of overlap between adjacent tiles, and the width of the overlap region when the tiles overlap. And height.
  • the upper left coordinate of the tile is not limited to an integer multiple of the CTU.
  • the tile includes an area that overlaps an adjacent tile and a crop offset area (tile invalid area), and the tile includes an overlapping area and a crop offset area.
  • the size is an integral multiple of the CTU, and the upper left coordinate of the tile is limited to a position that is an integral multiple of the CTU.
  • the filtering process of the synthesis unit is a simple average of pixel values of a plurality of overlap regions.
  • the filtering process of the synthesis unit is a weighted sum that changes the weight depending on the distance from the tile boundary for the pixel values of the plurality of overlap regions. It is characterized by being.
  • a moving image encoding apparatus is a moving image encoding apparatus that divides an image into tiles and encodes a moving image in units of tiles, and includes a tile information calculation unit that calculates tile information;
  • a division unit that divides an image into tiles and a tile encoding unit that encodes tiles and generates an encoded stream are provided, and the division unit divides the tiles by allowing overlap.
  • the tile encoding unit encodes the target tile with reference to only the information on the target tile and the information on the collocated tile of the target tile.
  • the tile information includes the number of tiles, the width, the height, the presence / absence of overlap between adjacent tiles, and the overlap area when tiles overlap. Including width and height.
  • the dividing unit divides an image into tiles without limiting the upper left coordinates of the tiles to a position that is an integral multiple of a CTU.
  • the division unit In the tiles at the right and bottom edges of the image, the division unit has a crop offset area on the right and bottom tiles of the image when the width of the tile at the right edge of the image and the height of the tile at the bottom edge are not integer multiples of CTU. And the image is divided so that the width and height obtained by adding the tile and the crop offset area are an integral multiple of the CTU.
  • the dividing unit divides the image into tiles that overlap with adjacent tiles and tiles that include a crop offset region, and the overlapping region and the crop offset region.
  • the size of the tile including is an integer multiple of the CTU, and the upper left coordinate of the tile is set at a position that is an integer multiple of the CTU.
  • a moving image decoding apparatus is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information.
  • a header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image
  • the tile information includes information on the number and width of tiles included in each tile row, the number of tiles included in the tile row is different, and the combining unit includes at least a decoded image of the tiles.
  • a display image is generated by using the pixel value of.
  • a moving image encoding apparatus is a moving image encoding apparatus that divides an image into tiles and encodes a moving image in units of tiles, and includes a tile information calculation unit that calculates tile information; A header information generation unit that encodes header information including the tile information; a division unit that divides an image into tiles; and a tile encoding unit that encodes a tile and generates an encoded stream.
  • the tile information calculation unit divides the image into tiles so that the number of tiles included in the tile row is different, the tile information calculation unit calculates tile information regarding the number and width of tiles included in each tile row, and the header information generation unit The syntax of the tile information is generated.
  • a moving image decoding apparatus is a moving image decoding apparatus that divides an image into regions including one or more tiles and decodes a moving image in units of regions, and includes header information from an encoded stream.
  • a header information decoding unit that decodes and calculates region information and tile information
  • a tile decoding unit that decodes encoded data for each tile and generates a decoded image of the tile, and refers to the region information and the tile information
  • a synthesis unit that synthesizes the decoded images of the tiles to generate a display image, and the synthesis unit generates a display image using at least pixel values of the decoded images of the tiles.
  • a moving image encoding apparatus is a moving image encoding apparatus that divides an image into regions each including one or more tiles, and encodes a moving image in units of regions.
  • Region information calculation unit for calculating the number of images, upper left coordinates, width and height, pixel values set in the invalid area, tile information calculation unit for calculating tile information, and header information including the region information and tile information Header information generation unit that generates the syntax of, a division unit that divides an image into regions, the region is divided into tiles starting from the upper left coordinates of the region, and tile encoding that encodes tiles and generates an encoded stream And a section.
  • the region information includes a flag for notifying whether or not each tile is included in an invalid area.
  • the tile decoding unit when the flag included in the region information indicates that the target tile is included in an invalid area, the tile decoding unit does not decode the target tile, To do.
  • the tile decoding unit decodes the target tile with reference to only information on the target tile, the collocated tile of the target tile, and tile information included in the same region.
  • the tile encoding unit encodes the target tile with reference to only information on the target tile, the collocated tile of the target tile, and tile information included in the same region. It is characterized by doing.
  • a moving image decoding apparatus is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information.
  • a header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image
  • the tile is composed of a tile active area that is a unit for dividing a picture without overlapping and a hidden area (tile extension area), and the tile active area is added to the tile active area. Is decoded in units of CTUs.
  • the tile extension area overlaps with a tile active area of an adjacent tile, an overlap area used for reference and decoding, and a crop offset area that does not reference and decode ( Tile invalid area).
  • the size of the tile active area and the overlap area is not an integer multiple of the CTU size, and the upper left coordinate of the tile is not limited to an integer multiple of the CTU. It is characterized by that.
  • a moving image decoding apparatus is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information.
  • a header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image
  • the tile is composed of a tile effective area used for decoding / output and a crop offset area (tile invalid area) not used for decoding / output, and the tile effective area is a unit for dividing a picture It is composed of the tile active area that overlaps with the tile active area of the adjacent tile and is used for reference and decoding,
  • the serial tile effective area characterized in that decoding by CTU units.
  • the size of the tile effective area and the crop offset area is not an integer multiple of the CTU size, and the upper left coordinate of the tile is not limited to an integer multiple of the CTU. It is characterized by that.
  • the tile decoding unit decodes the target tile with reference to only the information on the target tile and the information on the collocated tile of the target tile.
  • the tile information includes the number of tiles, the width, the height, the presence / absence of overlap between adjacent tiles, and the width of the overlap region when the tiles overlap. And height.
  • the synthesis unit performs filtering using a simple average of pixel values of a plurality of overlapping regions.
  • the synthesis unit performs filtering using a weighted sum that changes a weight depending on a distance from a tile boundary with respect to pixel values of a plurality of overlap regions. It is characterized by processing.
  • a moving image encoding apparatus is a moving image encoding apparatus that divides an image into tiles and encodes a moving image in units of tiles, and includes a tile information calculation unit that calculates tile information;
  • the image processing apparatus includes a division unit that divides an image into tiles, and a tile encoding unit that encodes tiles and generates an encoded stream.
  • the tiles are hidden from a tile active area that is a unit for dividing a picture without overlapping.
  • a region obtained by adding the tile extension region to the tile active region is encoded in units of CTUs.
  • the tile extension area overlaps with a tile active area of an adjacent tile, an overlap area used for reference or encoding, and a crop that is not referenced or encoded. It is composed of an offset area (tile invalid area).
  • the size of the tile active area and the overlap area is not an integral multiple of the CTU size, and the upper left coordinate of the tile is limited to an integer multiple of the CTU. It is characterized by not being.
  • a moving image encoding apparatus is a moving image encoding apparatus that divides an image into tiles and encodes a moving image in units of tiles, and includes a tile information calculation unit that calculates tile information;
  • the image processing apparatus includes a division unit that divides an image into tiles, and a tile encoding unit that encodes tiles and generates an encoded stream.
  • the tile includes a tile effective area used for encoding and output, and encoding and output. It is composed of crop offset areas (tile invalid areas) that are not used for the tile.
  • the tile valid areas overlap with the tile active areas that are units for dividing a picture and the tile active areas of adjacent tiles, and are used for reference and encoding.
  • the tile effective area is encoded by CTU encoding.
  • the size obtained by adding the tile effective area and the crop offset area is not an integral multiple of the CTU size, and the upper left coordinate of the tile is limited to a position that is an integral multiple of the CTU. It is characterized by not being.
  • the tile encoding unit encodes the target tile with reference to only the information on the target tile and the information on the collocated tile of the target tile.
  • the tile information includes the number of tiles, the width, the height, the presence / absence of overlap between adjacent tiles, and the overlap area when the tiles overlap. Including width and height.
  • a video decoding device is a video decoding device that divides an image into regions each including one or more tiles, and decodes the video in units of regions.
  • a header information decoding unit that decodes header information from the encoded stream and calculates region information and tile information; a tile decoding unit that decodes encoded data for each tile and generates a decoded image of the tile; and the region information
  • a combining unit that combines the decoded image of the tile with reference to the tile information to generate a display image, and the size of the region is not an integer multiple of the CTU size, and the upper left coordinate is limited to a position that is an integer multiple of the CTU. It is characterized by not being.
  • the tile is a region obtained by dividing a rectangular region including a region and a region (guard band) that is not displayed outside the region.
  • a moving image encoding apparatus is a moving image encoding apparatus that divides an image into regions each including one or more tiles, and encodes a moving image in units of regions.
  • Region information calculation unit for calculating the number of images, upper left coordinates, width and height, pixel values to be set in the invalid area, etc., a tile information calculation unit for calculating tile information, and a header including the region information and the tile information
  • a header information generation unit that generates information syntax, a division unit that divides an image into regions, divides the region into tiles starting from the upper left coordinates of the region, and a tile code that encodes tiles and generates an encoded stream
  • the region size is not an integral multiple of the CTU size, and the upper left coordinate is not limited to a position that is an integral multiple of the CTU.
  • the dividing unit divides a rectangular area, which is a combination of a region and a non-display area (guard band) outside the region, into tiles.
  • the region information includes a flag for notifying whether or not each tile is included in an invalid area.
  • the tile decoding unit when the flag included in the region information indicates that the target tile is included in an invalid area, the tile decoding unit does not decode the target tile, To do.
  • the tile decoding unit decodes the target tile with reference to only information on the target tile, the collocated tile of the target tile, and tile information included in the same region.
  • the tile encoding unit encodes the target tile with reference to only information on the target tile, the collocated tile of the target tile, and tile information included in the same region. It is characterized by doing.
  • a moving picture decoding apparatus is a moving picture decoding apparatus that divides an image into tiles (tile coding areas) and decodes moving pictures in units of tile coding areas.
  • a header information decoding unit that decodes information and calculates tile information; a tile decoding unit that decodes encoded data for each tile and generates a decoded image of a tile encoding region; and the tile information with reference to the tile information.
  • a synthesizing unit that generates a display image by synthesizing a decoded image of the encoding area, and the tile encoding area includes a tile active area, an overlap area, and a crop offset area, and the tile active area is a first picture.
  • the crop offset area is an encoding process for setting the size of the tile encoding area to an integer multiple of the CTU.
  • the moving picture coding apparatus generates a second picture in which tiles (tile coding areas) are arranged without overlapping from the first picture, and each tile coding area Tile information calculation for calculating a second picture size (second picture size) and tile information (tile active area, overlap area, crop offset area size) And a picture dividing unit that divides a tile active area obtained by dividing the first picture according to the tile information, and a second picture composed of an overlap area and a crop offset area outside the tile active area.
  • a tile encoding unit that encodes the tile encoding area and generates an encoded stream.
  • the block area is a unit for dividing the first picture without overlapping, and the crop offset area is an invalid area not related to the encoding process for setting the size of the tile encoding area to an integral multiple of the CTU.
  • the size of the second picture is calculated by adding the tile active area, the tile overlap area, and the crop offset area, and the upper left coordinate of the tile coding area is an integer multiple of the CTU on the second picture.
  • the tile coding area size is set to be an integral multiple of the CTU.
  • a part of the tile encoding unit 2012 and the tile decoding unit 2002 in the above-described embodiment for example, the entropy decoding unit 301, the prediction parameter decoding unit 302, the loop filter 305, the predicted image generation unit 308, the inverse quantization / inverse transformation.
  • the prediction parameter encoding unit 111 may be realized by a computer.
  • the program for realizing the control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed.
  • the “computer system” is a computer system built in either the tile encoding unit 2012 or the tile decoding unit 2002, and includes an OS and hardware such as peripheral devices.
  • the “computer-readable recording medium” refers to a storage device such as a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a hard disk built in a computer system.
  • the “computer-readable recording medium” is a medium that dynamically holds a program for a short time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line,
  • a volatile memory inside a computer system serving as a server or a client may be included and a program that holds a program for a certain period of time.
  • the program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.
  • a part or all of the moving image encoding device 11 and the moving image decoding device 31 in the above-described embodiment may be realized as an integrated circuit such as an LSI (Large Scale Integration).
  • LSI Large Scale Integration
  • Each functional block of the moving image encoding device 11 and the moving image decoding device 31 may be individually made into a processor, or a part or all of them may be integrated into a processor.
  • the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Further, in the case where an integrated circuit technology that replaces LSI appears due to progress in semiconductor technology, an integrated circuit based on the technology may be used.
  • the moving image encoding device 11 and the moving image decoding device 31 described above can be used by being mounted on various devices that perform transmission, reception, recording, and reproduction of moving images.
  • the moving image may be a natural moving image captured by a camera or the like, or an artificial moving image (including CG and GUI) generated by a computer or the like.
  • moving image encoding device 11 and moving image decoding device 31 can be used for transmitting and receiving moving images.
  • FIG. 38 (a) is a block diagram showing a configuration of a transmission apparatus PROD_A in which the moving picture encoding apparatus 11 is mounted.
  • the transmission device PROD_A modulates a carrier wave with an encoding unit PROD_A1 that obtains encoded data by encoding a moving image, and with the encoded data obtained by the encoding unit PROD_A1.
  • a modulation unit PROD_A2 that obtains a modulation signal and a transmission unit PROD_A3 that transmits the modulation signal obtained by the modulation unit PROD_A2 are provided.
  • the moving image encoding device 11 described above is used as the encoding unit PROD_A1.
  • Transmission device PROD_A as a source of moving images to be input to the encoding unit PROD_A1, a camera PROD_A4 that captures moving images, a recording medium PROD_A5 that records moving images, an input terminal PROD_A6 for inputting moving images from the outside, and An image processing unit PRED_A7 that generates or processes an image may be further provided.
  • a configuration in which all of these are provided in the transmission device PROD_A is illustrated, but a part may be omitted.
  • the recording medium PROD_A5 may be a recording of a non-encoded moving image, or a recording of a moving image encoded by a recording encoding scheme different from the transmission encoding scheme. It may be a thing. In the latter case, a decoding unit (not shown) for decoding the encoded data read from the recording medium PROD_A5 in accordance with the recording encoding method may be interposed between the recording medium PROD_A5 and the encoding unit PROD_A1.
  • FIG. 38 is a block diagram illustrating a configuration of the receiving device PROD_B in which the moving image decoding device 31 is mounted.
  • the receiving device PROD_B includes a receiving unit PROD_B1 that receives a modulated signal, a demodulating unit PROD_B2 that obtains encoded data by demodulating the modulated signal received by the receiving unit PROD_B1, and a demodulator A decoding unit PROD_B3 that obtains a moving image by decoding the encoded data obtained by the unit PROD_B2.
  • the moving picture decoding apparatus 31 described above is used as the decoding unit PROD_B3.
  • the receiving device PROD_B is a display destination PROD_B4 for displaying a moving image, a recording medium PROD_B5 for recording a moving image, and an output terminal for outputting the moving image to the outside as a supply destination of the moving image output by the decoding unit PROD_B3 PROD_B6 may be further provided.
  • FIG. 38 (b) the configuration in which all of these are provided in the receiving device PROD_B is illustrated, but a part may be omitted.
  • the recording medium PROD_B5 may be used for recording a non-encoded moving image, or is encoded using a recording encoding method different from the transmission encoding method. May be. In the latter case, an encoding unit (not shown) for encoding the moving image acquired from the decoding unit PROD_B3 according to the recording encoding method may be interposed between the decoding unit PROD_B3 and the recording medium PROD_B5.
  • the transmission medium for transmitting the modulation signal may be wireless or wired.
  • the transmission mode for transmitting the modulated signal may be broadcasting (here, a transmission mode in which the transmission destination is not specified in advance) or communication (here, transmission in which the transmission destination is specified in advance). Refers to the embodiment). That is, the transmission of the modulation signal may be realized by any of wireless broadcasting, wired broadcasting, wireless communication, and wired communication.
  • a terrestrial digital broadcast broadcasting station (broadcasting equipment, etc.) / Receiving station (such as a television receiver) is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives modulated signals by wireless broadcasting.
  • a broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) of cable television broadcasting is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives a modulated signal by cable broadcasting.
  • a server workstation, etc.
  • Client television receiver, personal computer, smartphone, etc.
  • VOD Video On Demand
  • video sharing service using the Internet is a transmission device that transmits and receives modulated signals via communication.
  • PROD_A / receiving device PROD_B normally, either a wireless or wired transmission medium is used in a LAN, and a wired transmission medium is used in a WAN.
  • the personal computer includes a desktop PC, a laptop PC, and a tablet PC.
  • the smartphone also includes a multi-function mobile phone terminal.
  • the video sharing service client has a function of encoding a moving image captured by the camera and uploading it to the server. That is, the client of the video sharing service functions as both the transmission device PROD_A and the reception device PROD_B.
  • moving image encoding device 11 and moving image decoding device 31 can be used for recording and reproduction of moving images.
  • FIG. 39 (a) is a block diagram showing a configuration of a recording apparatus PROD_C equipped with the moving picture encoding apparatus 11 described above.
  • the recording device PROD_C includes an encoding unit PROD_C1 that obtains encoded data by encoding a moving image, and the encoded data obtained by the encoding unit PROD_C1 on the recording medium PROD_M.
  • the moving image encoding device 11 described above is used as the encoding unit PROD_C1.
  • the recording medium PROD_M may be of a type built into the recording device PROD_C, such as (1) HDD (Hard Disk Drive) or SSD (Solid State Drive), or (2) SD memory. It may be of the type connected to the recording device PROD_C, such as a card or USB (Universal Serial Bus) flash memory, or (3) DVD (Digital Versatile Disc) or BD (Blu-ray (registered trademark)) ) Disc: registered trademark) or the like, it may be loaded into a drive device (not shown) built in the recording device PROD_C.
  • HDD Hard Disk Drive
  • SSD Solid State Drive
  • SD memory such as a card or USB (Universal Serial Bus) flash memory, or (3) DVD (Digital Versatile Disc) or BD (Blu-ray (registered trademark)) ) Disc: registered trademark) or the like, it may be loaded into a drive device (not shown) built in the recording device PROD_C.
  • the recording device PROD_C is a camera PROD_C3 that captures moving images as a source of moving images to be input to the encoding unit PROD_C1, an input terminal PROD_C4 for inputting moving images from the outside, and a reception for receiving moving images
  • a unit PROD_C5 and an image processing unit PROD_C6 for generating or processing an image may be further provided.
  • FIG. 39A illustrates a configuration in which the recording apparatus PROD_C includes all of these, but some of them may be omitted.
  • the receiving unit PROD_C5 may receive a non-encoded moving image, or may receive encoded data encoded by a transmission encoding scheme different from the recording encoding scheme. You may do. In the latter case, a transmission decoding unit (not shown) that decodes encoded data encoded by the transmission encoding method may be interposed between the reception unit PROD_C5 and the encoding unit PROD_C1.
  • Examples of such a recording device PROD_C include a DVD recorder, a BD recorder, an HDD (Hard Disk Drive) recorder, and the like (in this case, the input terminal PROD_C4 or the receiver PROD_C5 is a main source of moving images). .
  • a camcorder in this case, the camera PROD_C3 is a main source of moving images
  • a personal computer in this case, the receiving unit PROD_C5 or the image processing unit C6 is a main source of moving images
  • a smartphone this In this case, the camera PROD_C3 or the reception unit PROD_C5 is a main source of moving images
  • the like is also an example of such a recording apparatus PROD_C.
  • FIG. 39 is a block showing a configuration of a playback device PROD_D equipped with the above-described video decoding device 31.
  • the playback device PROD_D reads a moving image by decoding a read unit PROD_D1 that reads encoded data written on the recording medium PROD_M and a read unit PROD_D1 that reads the encoded data. And a decoding unit PROD_D2 to obtain.
  • the moving picture decoding apparatus 31 described above is used as the decoding unit PROD_D2.
  • the recording medium PROD_M may be of the type built into the playback device PROD_D, such as (1) HDD or SSD, or (2) such as an SD memory card or USB flash memory. It may be of the type connected to the playback device PROD_D, or (3) may be loaded into a drive device (not shown) built in the playback device PROD_D, such as a DVD or BD. Good.
  • the playback device PROD_D has a display unit PROD_D3 that displays a moving image as a supply destination of the moving image output by the decoding unit PROD_D2, an output terminal PROD_D4 that outputs the moving image to the outside, and a transmission unit that transmits the moving image.
  • PROD_D5 may be further provided.
  • FIG. 39 (b) a configuration in which the playback apparatus PROD_D includes all of these is illustrated, but a part thereof may be omitted.
  • the transmission unit PROD_D5 may transmit a non-encoded moving image, or transmits encoded data encoded by a transmission encoding scheme different from the recording encoding scheme. You may do. In the latter case, it is preferable to interpose an encoding unit (not shown) that encodes a moving image using a transmission encoding method between the decoding unit PROD_D2 and the transmission unit PROD_D5.
  • Examples of such a playback device PROD_D include a DVD player, a BD player, and an HDD player (in this case, an output terminal PROD_D4 to which a television receiver or the like is connected is a main moving image supply destination).
  • a television receiver in this case, the display PROD_D3 is a main supply destination of moving images
  • a digital signage also referred to as an electronic signboard or an electronic bulletin board
  • the display PROD_D3 or the transmission unit PROD_D5 is the main supply of moving images.
  • Display PROD_D3 or transmission unit PROD_D5 is video
  • a smartphone which is a main image supply destination
  • a smartphone in this case, the display PROD_D3 or the transmission unit PROD_D5 is a main moving image supply destination
  • the like are also examples of such a playback device PROD_D.
  • Each block of the moving picture decoding device 31 and the moving picture encoding device 11 described above may be realized in hardware by a logic circuit formed on an integrated circuit (IC chip), or may be a CPU (Central Processing). Unit) may be implemented in software.
  • IC chip integrated circuit
  • CPU Central Processing
  • each of the above devices includes a CPU that executes instructions of a program that realizes each function, a ROM (Read Memory) that stores the program, a RAM (RandomAccess Memory) that develops the program, the program, and various data.
  • a storage device such as a memory for storing the.
  • the object of the embodiment of the present invention is a record in which the program code (execution format program, intermediate code program, source program) of the control program for each of the above devices, which is software that realizes the above-described functions, is recorded in a computer-readable manner This can also be achieved by supplying a medium to each of the above devices, and reading and executing the program code recorded on the recording medium by the computer (or CPU or MPU).
  • Examples of the recording medium include tapes such as magnetic tapes and cassette tapes, magnetic disks such as floppy (registered trademark) disks / hard disks, CD-ROMs (Compact Disc Read-Only Memory) / MO discs (Magneto-Optical discs). ) / MD (Mini Disc) / DVD (Digital Versatile Disc) / CD-R (CD Recordable) / Blu-ray Disc (Blu-ray (registered trademark) Disc), etc.
  • tapes such as magnetic tapes and cassette tapes
  • magnetic disks such as floppy (registered trademark) disks / hard disks
  • CD-ROMs Compact Disc Read-Only Memory
  • MO discs Magnetic-Optical discs.
  • MD Mini Disc
  • DVD Digital Versatile Disc
  • CD-R Compact Disc
  • Blu-ray Disc Blu-ray (registered trademark) Disc
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electrically-Erasable and Programmable Read-Only Memory: registered trademark
  • semiconductor memory such as flash ROM, or PLD Logic circuits such as (Programmable logic device) and FPGA (Field Programmable Gate Array) can be used.
  • each of the above devices may be configured to be connectable to a communication network, and the program code may be supplied via the communication network.
  • the communication network is not particularly limited as long as it can transmit the program code.
  • the Internet intranet, extranet, LAN (Local Area Network), ISDN (Integrated Services Digital Network), VAN (Value-Added Network), CATV (Community Area Antenna / television / CableTelevision) communication network, Virtual Private Network (Virtual Private Network)
  • a telephone network, a mobile communication network, a satellite communication network, etc. can be used.
  • the transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type.
  • IEEE Institute of Electrical and Electronic Engineers 1394, USB, power line carrier, cable TV line, telephone line, ADSL (Asymmetric Digital SubscriberLine) line, etc. wired such as IrDA (Infrared Data Association) or remote control BlueTooth (registered trademark), IEEE802.11 wireless, HDR (High Data Rate), NFC (Near Field Communication), DLNA (registered trademark) (Digital Living Network Alliance: registered trademark), mobile phone network, satellite line, terrestrial digital broadcasting It can also be used by radio such as a network.
  • the embodiment of the present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.
  • Embodiments of the present invention are preferably applied to a moving image decoding apparatus that decodes encoded data in which image data is encoded, and a moving image encoding apparatus that generates encoded data in which image data is encoded. be able to. Further, the present invention can be suitably applied to the data structure of encoded data that is generated by a video encoding device and referenced by the video decoding device.
  • Video encoding device 31 Video decoding device 41 Video display device 2002 Tile decoding unit 2012 Tile Coding Department

Abstract

When each tile is independently encoded/decoded while suppressing a reduction in encoding efficiency, distortion occurs in tile boundaries. This moving image decoding device, which divides an image into tiles and decodes a moving image by tile unit, is provided with: a header information decoding unit which decodes header information from an encoded stream and calculates tile information; a tile decoding unit which decodes encoded data for each tile and generates a decoded image of the tile; and a synthesis unit which synthesizes the decoded images of the tiles with reference to the tile information to generate a display image, wherein the tile includes an area overlapping an adjacent tile, and the synthesis unit filtering-processes a plurality of pixel values of each pixel in the overlapping area, and generates the display image by using pixel values of the decoded image of the tile and the filtering-processed pixel values.

Description

動画像符号化装置及び動画像復号装置Video encoding apparatus and video decoding apparatus
 本発明の一態様は、動画像復号装置、および動画像符号化装置に関する。 One embodiment of the present invention relates to a video decoding device and a video encoding device.
 動画像を効率的に伝送または記録するために、動画像を符号化することによって符号化データを生成する動画像符号化装置、および、当該符号化データを復号することによって復号画像を生成する動画像復号装置が用いられている。 In order to efficiently transmit or record a moving image, a moving image encoding device that generates encoded data by encoding the moving image, and a moving image that generates a decoded image by decoding the encoded data An image decoding device is used.
 具体的な動画像符号化方式としては、例えば、H.264/AVCやHEVC(High-Efficiency Video Coding)にて提案されている方式などが挙げられる。 Specific examples of the moving image encoding method include a method proposed in H.264 / AVC and HEVC (High-Efficiency Video Coding).
 このような動画像符号化方式においては、動画像を構成する画像(ピクチャ)は、画像を分割することにより得られるスライス、スライスを分割することにより得られる符号化ツリーユニット(CTU:Coding Tree Unit)、符号化ツリーユニットを分割することで得られる符号化単位(符号化ユニット(CU:Coding Unit)と呼ばれることもある)、及び、符号化単位を分割することより得られるブロックである予測ユニット(PU:PredictionUnit)、変換ユニット(TU:Transform Unit)からなる階層構造により管理され、CUごとに符号化/復号される。 In such a moving image coding system, an image (picture) constituting a moving image is a slice obtained by dividing the image, a coding tree unit (CTU: Coding Tree Unit obtained by dividing the slice). ), A coding unit obtained by dividing the coding tree unit (sometimes called a coding unit (CU)), and a prediction unit that is a block obtained by dividing the coding unit (PU: PredictionUnit) and a hierarchical structure composed of transform units (TU: Transform Unit) are managed and encoded / decoded for each CU.
 また、このような動画像符号化方式においては、通常、入力画像を符号化/復号することによって得られる局所復号画像に基づいて予測画像が生成され、当該予測画像を入力画像(原画像)から減算して得られる予測残差(「差分画像」または「残差画像」と呼ぶこともある)が符号化される。予測画像の生成方法としては、画面間予測(インター予測)、および、画面内予測(イントラ予測)が挙げられる(非特許文献1)。 In such a moving image coding method, a predicted image is usually generated based on a local decoded image obtained by encoding / decoding an input image, and the predicted image is generated from the input image (original image). A prediction residual obtained by subtraction (sometimes referred to as “difference image” or “residual image”) is encoded. Examples of the method for generating a predicted image include inter-screen prediction (inter prediction) and intra-screen prediction (intra prediction) (Non-Patent Document 1).
 また、近年、マルチコアCPUやGPU等のプロセッサの進化に伴い、動画像符号化および復号処理において、並列処理を行いやすい構成やアルゴリズムが採用されるようになってきた。並列化しやすい構成の一例として、タイル(Tile)という画面(ピクチャ)分割単位が導入されている。タイルはスライスと異なり、ピクチャを矩形領域に分割したものであり、タイル毎に独立に符号化・復号することができる(特許文献1、非特許文献2)。 In recent years, with the evolution of processors such as multi-core CPUs and GPUs, configurations and algorithms that facilitate parallel processing have been adopted in video encoding and decoding processing. As an example of a configuration that facilitates parallelization, a screen (picture) division unit called a tile is introduced. Unlike a slice, a tile is obtained by dividing a picture into rectangular areas, and can be encoded and decoded independently for each tile (Patent Document 1, Non-Patent Document 2).
 さらに、近年、4Kや8K、あるいはVR、360度動画のような360度全方位を取り込んだ動画に代表される、動画像の高解像度化が進み、プロジェクションフォーマットの標準化が進んでいる(非特許文献3)。これらをスマホやHMD(Head Mount Display)で視聴する時は高解像度映像の一部を切り出してディスプレイに表示する。スマホやHMDでは電池の容量は大きくなく、表示に必要な一部の領域を抽出して、最小限の復号処理により映像を視聴できる仕組みが期待されている。 Furthermore, in recent years, the resolution of moving images, such as 4K, 8K, VR, and 360-degree moving images such as 360-degree moving images, has been increasing, and the standardization of projection formats has been progressing. Reference 3). When viewing these on a smartphone or HMD (Head-Mount-Display), a part of the high-resolution video is cut out and displayed on the display. Smartphones and HMDs are not large in battery capacity, and a mechanism is expected that allows users to view a video with a minimum of decoding processing by extracting a part of the area required for display.
日本国特許公報「特許6241504」Japanese Patent Gazette "Patent 6241504"
 上記で説明したように、タイルはピクチャを矩形領域に分割したものであり、タイル外の情報(予測モード、MV、画素値)を参照せずに、空間および時間方向に復号することができる。しかしながら、対象タイルの隣接タイルの情報、およびコロケートタイル(対象タイルとは異なるピクチャ上にある同じ位置のタイル)の隣接タイルの情報を全く参照しないため、タイル境界の不連続性に起因する歪(以降ではタイル歪と呼ぶ)が発生し、タイル歪は非常に視認しやすいという問題がある。また、符号化効率も低下する。 As described above, a tile is obtained by dividing a picture into rectangular areas, and can be decoded in the spatial and temporal directions without referring to information outside the tile (prediction mode, MV, pixel value). However, since the adjacent tile information of the target tile and the adjacent tile information of the collocated tile (the tile at the same position on the picture different from the target tile) are not referred to at all, distortion caused by discontinuity of the tile boundary ( In the following, this will be referred to as tile distortion), and the tile distortion is very easy to visually recognize. Also, the coding efficiency is reduced.
 また、タイルサイズがCTUの整数倍という制限があり、ロードバランスのために同じサイズに分割することや、360度動画のフェース(face)サイズに合わせたタイルを構成することが難しいという課題がある。 In addition, there is a limitation that the tile size is an integer multiple of the CTU, and it is difficult to divide into the same size for load balancing and to configure tiles that match the face size of 360-degree movies. .
 そこで、本発明は、上記の課題に鑑みてなされたものであり、その目的は、符号化効率の低下を抑えつつ、空間方向および時間方向において各タイルを独立に符号化・復号する場合に、タイル歪を除去、あるいは抑制する仕組みを提供することである。また、CTUの整数倍の制限がないタイル分割を提供することである。 Therefore, the present invention has been made in view of the above problems, and its purpose is to independently encode and decode each tile in the spatial direction and the temporal direction while suppressing a decrease in encoding efficiency. It is to provide a mechanism for removing or suppressing tile distortion. It also provides tile partitioning that is not limited to an integer multiple of the CTU.
 本発明の一態様に係る動画像復号装置は、画像をタイルに分割し、タイル単位に動画像を復号する動画像復号装置であって、符号化ストリームからヘッダ情報を復号し、タイル情報を算出するヘッダ情報復号部と、タイル毎の符号化データを復号し、タイルの復号画像を生成するタイル復号部と、前記タイル情報を参照して前記タイルの復号画像を合成し表示画像を生成する合成部とを備え、前記タイルは、ピクチャを重複することなく分割する単位であるタイルアクティブ領域と隠れている領域(タイル拡張領域)から構成され、前記タイルアクティブ領域に前記タイル拡張領域を加えた領域を、CTU単位で復号することを特徴とする。 A moving image decoding apparatus according to an aspect of the present invention is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information. A header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image The tile is composed of a tile active area that is a unit for dividing a picture without overlapping and a hidden area (tile extension area), and the tile active area is added to the tile active area. Is decoded in units of CTUs.
 本発明の一態様によれば、動画像において、各タイルの復号の独立性を保証する仕組みと、タイル歪を除去、抑制する仕組みを提供する。これにより、表示等に必要な領域を選択して復号する時に処理量を大幅に削減できると共に、タイル境界において歪の無い画像を表示することができる。 According to one aspect of the present invention, there are provided a mechanism for ensuring the independence of decoding of each tile and a mechanism for removing and suppressing tile distortion in a moving image. As a result, it is possible to greatly reduce the amount of processing when selecting and decoding an area necessary for display or the like, and it is possible to display an image without distortion at the tile boundary.
本実施形態に係る画像伝送システムの構成を示す概略図である。It is the schematic which shows the structure of the image transmission system which concerns on this embodiment. 本実施形態に係る符号化ストリームのデータの階層構造を示す図である。It is a figure which shows the hierarchical structure of the data of the encoding stream which concerns on this embodiment. タイルを説明する図である。It is a figure explaining a tile. タイル情報等に関するシンタックス表である。It is a syntax table regarding tile information and the like. タイル情報等に関する別のシンタックス表である。It is another syntax table regarding tile information and the like. タイルの時間方向の参照について説明する図である。It is a figure explaining the reference of the time direction of a tile. ピクチャを、重複を許してM*N個のタイルに分割する一例である。This is an example in which a picture is divided into M * N tiles with duplication allowed. 水平方向に隣接するタイルのオーバーラップ領域のフィルタ処理を説明する図である。It is a figure explaining the filter process of the overlap area | region of the tile adjacent to a horizontal direction. 本発明に係る動画像復号装置の構成を示すブロック図である。It is a block diagram which shows the structure of the moving image decoding apparatus which concerns on this invention. 本実施形態に係るタイル復号部の構成を示す図である。It is a figure which shows the structure of the tile decoding part which concerns on this embodiment. 本発明に係る動画像符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the moving image encoder which concerns on this invention. 本実施形態に係るタイル符号化部の構成を示すブロック図である。It is a block diagram which shows the structure of the tile encoding part which concerns on this embodiment. ピクチャを、重複を許してM*N個のタイルに分割する別の一例である。This is another example in which a picture is divided into M * N tiles allowing duplication. 動画像符号化装置、動画像復号装置の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of a moving image encoder and a moving image decoder. 重み係数のテーブルの一例である。It is an example of the table of a weighting coefficient. 垂直方向に隣接するタイルのオーバーラップ領域のフィルタ処理を説明する図である。It is a figure explaining the filter process of the overlap area | region of the tile adjacent to a perpendicular direction. プロジェクション画像をパッキングして2次元画像を生成する一例である。It is an example which packs a projection image and produces | generates a two-dimensional image. プロジェクション画像をパッキングして2次元画像を生成する別の一例である。It is another example which produces | generates a two-dimensional image by packing a projection image. プロジェクション画像をパッキングして2次元画像を生成する別の一例である。It is another example which produces | generates a two-dimensional image by packing a projection image. タイル情報等に関する別のシンタックス表である。It is another syntax table regarding tile information and the like. タイルサイズがCTUの整数倍である時のピクチャのタイル分割、および、タイルのCTU分割を示す図である。It is a figure which shows the tile division | segmentation of the picture when a tile size is an integral multiple of CTU, and the CTU division | segmentation of a tile. 本実施形態に係るピクチャのタイル分割、および、タイルのCTU分割を示す図である。It is a figure which shows the tile division | segmentation of the picture which concerns on this embodiment, and the CTU division | segmentation of a tile. タイルサイズがCTUの整数倍である時のスライスデータ、および、CTUデータのシンタックス例である。This is a syntax example of slice data and CTU data when the tile size is an integral multiple of CTU. 本実施形態に係るスライスデータ、および、CTUデータのシンタックス例である。It is a syntax example of slice data and CTU data according to the present embodiment. ピクチャを、CTUの倍数によらずにタイルに分割する一例を説明するシンタックスである。This is a syntax for explaining an example of dividing a picture into tiles regardless of multiples of CTUs. ピクチャを、CTUの倍数によらずにタイルに分割する一例を説明する図である。It is a figure explaining an example which divides | segments a picture into a tile irrespective of the multiple of CTU. ピクチャを、CTUの倍数によらずにタイルに分割する別の一例を説明する図である。It is a figure explaining another example which divides | segments a picture into a tile irrespective of the multiple of CTU. ピクチャを、CTUの倍数によらずにタイルに分割する別の一例を説明するシンタックスである。It is the syntax explaining another example which divides | segments a picture into a tile irrespective of the multiple of CTU. ピクチャを、CTUの倍数によらずにタイルに分割する場合の、CTUの四分木分割の一例のシンタックスである。This is a syntax of an example of quadtree division of CTU when a picture is divided into tiles regardless of multiples of CTU. ピクチャを、CTUの倍数によらずにタイルに分割する場合の、CTUの二分木分割の一例のシンタックスである。This is a syntax of an example of binary tree division of CTU when a picture is divided into tiles regardless of multiples of CTU. ピクチャを、CTUの倍数によらずにリージョン、タイルに分割する別の一例を説明する図である。It is a figure explaining another example which divides | segments a picture into a region and a tile irrespective of the multiple of CTU. リージョンを、CTUの倍数によらずにタイルに分割する一例を説明するシンタックスである。This is a syntax for explaining an example of dividing a region into tiles regardless of a multiple of CTU. ピクチャを、CTUの倍数によらずにタイルに分割する場合の、CTUの一例のシンタックスである。This is an example of the syntax of a CTU when a picture is divided into tiles regardless of a multiple of the CTU. リージョンを、CTUの倍数によらずにタイルに分割する別の一例を説明するシンタックスである。This is a syntax for explaining another example of dividing a region into tiles regardless of a multiple of CTU. 無効領域のタイルの通知方法を説明する一例である。It is an example explaining the notification method of the tile of an invalid area | region. ピクチャを、CTUの倍数によらずにタイルに分割する別の一例を説明する図である。It is a figure explaining another example which divides | segments a picture into a tile irrespective of the multiple of CTU. ピクチャを、CTUの倍数によらずにタイルに分割する別の一例を説明するシンタックスである。It is the syntax explaining another example which divides | segments a picture into a tile irrespective of the multiple of CTU. 本実施形態に係る動画像符号化装置を搭載した送信装置、および、動画像復号装置を搭載した受信装置の構成について示した図である。(a)は、動画像符号化装置を搭載した送信装置を示しており、(b)は、動画像復号装置を搭載した受信装置を示している。It is the figure shown about the structure of the transmitter which mounts the moving image encoder which concerns on this embodiment, and the receiver which mounts a moving image decoder. (A) shows a transmitting apparatus equipped with a moving picture coding apparatus, and (b) shows a receiving apparatus equipped with a moving picture decoding apparatus. 本実施形態に係る動画像符号化装置を搭載した記録装置、および、動画像復号装置を搭載した再生装置の構成について示した図である。(a)は、動画像符号化装置を搭載した記録装置を示しており、(b)は、動画像復号装置を搭載した再生装置を示している。It is the figure shown about the structure of the recording device carrying the moving image encoder which concerns on this embodiment, and the reproducing | regenerating apparatus carrying a moving image decoding apparatus. (A) shows a recording apparatus equipped with a moving picture coding apparatus, and (b) shows a reproduction apparatus equipped with a moving picture decoding apparatus.
  (実施形態1)
 以下、図面を参照しながら本発明の実施形態について説明する。
(Embodiment 1)
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
 図1は、本実施形態に係る画像伝送システム1の構成を示す概略図である。 FIG. 1 is a schematic diagram showing a configuration of an image transmission system 1 according to the present embodiment.
 画像伝送システム1は、符号化対象画像を符号化した符号を伝送し、伝送された符号を復号し画像を表示するシステムである。画像伝送システム1は、動画像符号化装置(画像符号化装置)11、ネットワーク21、動画像復号装置(画像復号装置)31及び動画像表示装置(画像表示装置)41を含んで構成される。 The image transmission system 1 is a system that transmits a code obtained by encoding an encoding target image, decodes the transmitted code, and displays an image. The image transmission system 1 includes a moving image encoding device (image encoding device) 11, a network 21, a moving image decoding device (image decoding device) 31, and a moving image display device (image display device) 41.
 動画像符号化装置11には、画像Tが入力される。 The image T is input to the moving image encoding device 11.
 ネットワーク21は、動画像符号化装置11が生成した符号化ストリームTeを動画像復号装置31に伝送する。ネットワーク21は、インターネット(internet)、広域ネットワーク(WAN:Wide Area Network)、小規模ネットワーク(LAN:Local Area Network)またはこれらの組み合わせである。ネットワーク21は、必ずしも双方向の通信網に限らず、地上デジタル放送、衛星放送等の放送波を伝送する一方向の通信網であっても良い。また、ネットワーク21は、DVD(Digital Versatile Disc)、BD(Blue-ray Disc)等の符号化ストリームTeを記録した記憶媒体で代替されても良い。 The network 21 transmits the encoded stream Te generated by the video encoding device 11 to the video decoding device 31. The network 21 is the Internet, a wide area network (WAN: Wide Area Network), a small network (LAN: Local Area Network), or a combination thereof. The network 21 is not necessarily limited to a bidirectional communication network, and may be a unidirectional communication network that transmits broadcast waves such as terrestrial digital broadcasting and satellite broadcasting. The network 21 may be replaced with a storage medium that records an encoded stream Te such as a DVD (Digital Versatile Disc) or a BD (Blue-ray Disc).
 動画像復号装置31は、ネットワーク21が伝送した符号化ストリームTeのそれぞれを復号し、それぞれ復号した1または複数の復号画像Tdを生成する。 The video decoding device 31 decodes each of the encoded streams Te transmitted by the network 21, and generates one or a plurality of decoded images Td that are respectively decoded.
 動画像表示装置41は、動画像復号装置31が生成した1または複数の復号画像Tdの全部または一部を表示する。動画像表示装置41は、例えば、液晶ディスプレイ、有機EL(Electro-luminescence)ディスプレイ等の表示デバイスを備える。ディスプレイの形態としては、据え置き、モバイル、HMD等が挙げられる。 The moving image display device 41 displays all or a part of one or a plurality of decoded images Td generated by the moving image decoding device 31. The moving image display device 41 includes, for example, a display device such as a liquid crystal display or an organic EL (Electro-luminescence) display. Examples of the display form include stationary, mobile, and HMD.
  <演算子>
 本明細書で用いる演算子を以下に記載する。
<Operator>
The operators used in this specification are described below.
 >>は右ビットシフト、<<は左ビットシフト、&はビットワイズAND、|はビットワイズOR、|=はOR代入演算子である。 >> is right bit shift, << is left bit shift, & is bitwise AND, | is bitwise OR, | = is OR assignment operator.
 x ? y : zは、xが真(0以外)の場合にy、xが偽(0)の場合にzをとる3項演算子である。 X? Y: z is a ternary operator that takes y when x is true (non-zero) and takes z when x is false (0).
 Clip3(a,b,c)は、cをa以上b以下の値にクリップする関数であり、c<aの場合にはaを返し、c>bの場合にはbを返し、その他の場合にはcを返す関数である(ただし、a<=b)。 Clip3 (a, b, c) is a function that clips c to a value between a and b, and returns a if c <a, b if c> b, otherwise Is a function that returns c (where a <= b).
 abs(a)はaの絶対値を返す関数である。 Abs (a) is a function that returns the absolute value of a.
 Int(a)はaの整数値を返す関数である。 Int (a) is a function that returns an integer value of a.
 floor(a)はa以下の最大の整数を返す関数である。 Floor (a) is a function that returns the largest integer less than or equal to a.
 ceil(a)はa以上の最小の整数を返す関数である。 Ceil (a) is a function that returns the smallest integer greater than or equal to a.
 a/dはdによるaの除算を表す。 A / d represents the division of a by d.
  <符号化ストリームTeの構造>
 本実施形態に係る動画像符号化装置11および動画像復号装置31の詳細な説明に先立って、動画像符号化装置11によって生成され、動画像復号装置31によって復号される符号化ストリームTeのデータ構造について説明する。
<Structure of encoded stream Te>
Prior to detailed description of the video encoding device 11 and the video decoding device 31 according to the present embodiment, data of the encoded stream Te generated by the video encoding device 11 and decoded by the video decoding device 31. The structure will be described.
 図2は、符号化ストリームTeにおけるデータの階層構造を示す図である。符号化ストリームTeは、例示的に、シーケンス、およびシーケンスを構成する複数のピクチャを含む。図2の(a)~(f)は、それぞれ、シーケンスSEQを既定する符号化ビデオシーケンス、ピクチャPICTを規定する符号化ピクチャ、スライスSを規定する符号化スライス、スライスデータを規定する符号化スライスデータ、符号化スライスデータに含まれる符号化ツリーユニット(Coding Tree Unit:CTU)、CTUに含まれる符号化ユニット(Coding Unit:CU)を示す図である。 FIG. 2 is a diagram showing a hierarchical structure of data in the encoded stream Te. The encoded stream Te illustratively includes a sequence and a plurality of pictures constituting the sequence. (A) to (f) of FIG. 2 respectively show an encoded video sequence defining a sequence SEQ, an encoded picture defining a picture PICT, an encoded slice defining a slice S, and an encoded slice defining a slice data It is a figure which shows the coding tree unit (Coding | Tree | Unit: CTU) contained in data and coding slice data, and the coding unit (Coding | Unit: CU) contained in CTU.
  (符号化ビデオシーケンス)
 符号化ビデオシーケンスでは、処理対象のシーケンスSEQを復号するために動画像復号装置31が参照するデータの集合が規定されている。シーケンスSEQは、図2の(a)に示すように、ビデオパラメータセットVPS(Video Parameter Set)、シーケンスパラメータセットSPS(Sequence Parameter Set)、ピクチャパラメータセットPPS(Picture Parameter Set)、ピクチャPICT、及び、付加拡張情報SEI(Supplemental Enhancement Information)を含んでいる。
(Encoded video sequence)
In the encoded video sequence, a set of data referred to by the video decoding device 31 for decoding the sequence SEQ to be processed is defined. As shown in FIG. 2A, the sequence SEQ includes a video parameter set VPS (Video Parameter Set), a sequence parameter set SPS (Sequence Parameter Set), a picture parameter set PPS (Picture Parameter Set), a picture PICT, It includes supplemental enhancement information (SEI).
 ビデオパラメータセットVPSは、複数のレイヤから構成されている動画像において、複数の動画像に共通する符号化パラメータの集合、および、動画像に含まれる複数のレイヤおよび個々のレイヤに関連する符号化パラメータの集合が規定されている。 The video parameter set VPS is a set of encoding parameters common to a plurality of moving images in a moving image composed of a plurality of layers, and encoding related to a plurality of layers and individual layers included in the moving image. A set of parameters is defined.
 シーケンスパラメータセットSPSでは、対象シーケンスを復号するために動画像復号装置31が参照する符号化パラメータの集合が規定されている。例えば、ピクチャの幅や高さが規定される。なお、SPSは複数存在してもよい。その場合、PPSから複数のSPSの何れかを選択する。 In the sequence parameter set SPS, a set of encoding parameters referred to by the video decoding device 31 for decoding the target sequence is defined. For example, the width and height of the picture are defined. A plurality of SPSs may exist. In that case, one of a plurality of SPSs is selected from the PPS.
 ピクチャパラメータセットPPSでは、対象シーケンス内の各ピクチャを復号するために動画像復号装置31が参照する符号化パラメータの集合が規定されている。例えば、ピクチャの復号に用いられる量子化幅の基準値(pic_init_qp_minus26)や重み付き予測の適用を示すフラグ(weighted_pred_flag)が含まれる。なお、PPSは複数存在してもよい。その場合、対象シーケンス内の各スライスヘッダから複数のPPSの何れかを選択する。 In the picture parameter set PPS, a set of encoding parameters referred to by the video decoding device 31 for decoding each picture in the target sequence is defined. For example, a quantization width reference value (pic_init_qp_minus26) used for picture decoding and a flag (weighted_pred_flag) indicating application of weighted prediction are included. There may be a plurality of PPSs. In that case, one of a plurality of PPSs is selected from each slice header in the target sequence.
  (符号化ピクチャ)
 符号化ピクチャでは、処理対象のピクチャPICTを復号するために動画像復号装置31が参照するデータの集合が規定されている。ピクチャPICTは、図2の(b)に示すように、スライスS0~SNS-1を含んでいる(NSはピクチャPICTに含まれるスライスの総数)。
(Encoded picture)
In the coded picture, a set of data referred to by the video decoding device 31 for decoding the picture PICT to be processed is defined. The picture PICT includes slices S0 to SNS-1 as shown in FIG. 2B (NS is the total number of slices included in the picture PICT).
 なお、以下、スライスS0~SNS-1のそれぞれを区別する必要が無い場合、符号の添え字を省略して記述することがある。また、以下に説明する符号化ストリームTeに含まれるデータであって、添え字を付している他のデータについても同様である。 In addition, hereinafter, when it is not necessary to distinguish each of the slices S0 to SNS-1, the subscripts may be omitted. The same applies to data included in an encoded stream Te described below and to which other subscripts are attached.
  (符号化スライス)
 符号化スライスでは、処理対象のスライスSを復号するために動画像復号装置31が参照するデータの集合が規定されている。スライスSは、図2の(c)に示すように、スライスヘッダSH、および、スライスデータSDATAを含んでいる。
(Encoded slice)
In the coded slice, a set of data referred to by the video decoding device 31 for decoding the slice S to be processed is defined. As shown in FIG. 2C, the slice S includes a slice header SH and slice data SDATA.
 スライスヘッダSHには、対象スライスの復号方法を決定するために動画像復号装置31が参照する符号化パラメータ群が含まれる。スライスタイプを指定するスライスタイプ指定情報(slice_type)は、スライスヘッダSHに含まれる符号化パラメータの一例である。 The slice header SH includes a coding parameter group that is referred to by the video decoding device 31 in order to determine a decoding method of the target slice. Slice type designation information (slice_type) for designating a slice type is an example of an encoding parameter included in the slice header SH.
 スライスタイプ指定情報により指定可能なスライスタイプとしては、(1)符号化の際にイントラ予測のみを用いるIスライス、(2)符号化の際に単方向予測、または、イントラ予測を用いるPスライス、(3)符号化の際に単方向予測、双方向予測、または、イントラ予測を用いるBスライスなどが挙げられる。なお、インター予測は、単予測、双予測に限定されず、より多くの参照ピクチャを用いて予測画像を生成してもよい。以下、P、Bスライスと呼ぶ場合には、インター予測を用いることができるブロックを含むスライスを指す。 As slice types that can be specified by the slice type specification information, (1) I slice using only intra prediction at the time of encoding, (2) P slice using unidirectional prediction or intra prediction at the time of encoding, (3) B-slice using unidirectional prediction, bidirectional prediction, or intra prediction at the time of encoding may be used. Note that inter prediction is not limited to single prediction and bi-prediction, and a predicted image may be generated using more reference pictures. Hereinafter, the P and B slices refer to slices including blocks that can use inter prediction.
 なお、スライスヘッダSHには、上記符号化ビデオシーケンスに含まれる、ピクチャパラメータセットPPSへの参照(pic_parameter_set_id)を含んでいても良い。 Note that the slice header SH may include a reference (pic_parameter_set_id) to the picture parameter set PPS included in the encoded video sequence.
  (符号化スライスデータ)
 符号化スライスデータでは、処理対象のスライスデータSDATAを復号するために動画像復号装置31が参照するデータの集合が規定されている。スライスデータSDATAは、図2の(d)に示すように、符号化ツリーユニットCTU(CTUブロック)を含んでいる。CTUは、スライスを構成する固定サイズ(例えば64x64)のブロックであり、最大符号化単位(LCU:Largest Coding Unit)と呼ぶこともある。
(Encoded slice data)
In the encoded slice data, a set of data referred to by the video decoding device 31 for decoding the slice data SDATA to be processed is defined. The slice data SDATA includes a coding tree unit CTU (CTU block) as shown in FIG. A CTU is a block of a fixed size (for example, 64x64) that constitutes a slice, and is sometimes called a maximum coding unit (LCU: Large Coding Unit).
  (符号化ツリーユニット)
 図2の(e)には、処理対象のCTUを復号するために動画像復号装置31が参照するデータの集合が規定されている。CTUは、再帰的な4分木分割(QT分割)または2分木分割(BT分割)により符号化処理の基本的な単位である符号化ユニットCUに分割される。再帰的な4分木分割または2分木分割により得られる木構造を符号化ツリー(CT:Coding Tree)、木構造のノードのことを符号化ノード(CN:Coding Node)と称する。4分木及び2分木の中間ノードは、CNであり、CTU自身も最上位のCNとして規定される。
(Encoding tree unit)
In FIG. 2 (e), a set of data referred to by the video decoding device 31 for decoding the CTU to be processed is defined. The CTU is divided into coding units CU which are basic units of the coding process by recursive quadtree division (QT division) or binary tree division (BT division). A tree structure obtained by recursive quadtree partitioning or binary tree partitioning is called a coding tree (CT), and a node of the tree structure is called a coding node (CN). An intermediate node of the quadtree and the binary tree is a CN, and the CTU itself is also defined as the highest CN.
 CTは、CT情報として、QT分割を行うか否かを示すQT分割フラグ(cu_split_flag)、及びBT分割の分割方法を示すBT分割モード(split_bt_mode)を含む。cu_split_flag及び/又はsplit_bt_modeはCNごとに伝送される。cu_split_flagが1の場合には、CNは4つのCNに分割される。cu_split_flagが0の時、split_bt_modeが1の場合には、CNは2つのCNに水平分割され、split_bt_modeが2の場合には、CNは2つのCNに垂直分割され、split_bt_modeが0の場合には、CNは分割されず、1つのCUをノードとして持つ。CUはCNの末端ノード(リーフノード)であり、これ以上分割されない。 CT includes, as CT information, a QT split flag (cu_split_flag) indicating whether or not to perform QT split, and a BT split mode (split_bt_mode) indicating a split method of BT split. cu_split_flag and / or split_bt_mode are transmitted for each CN. When cu_split_flag is 1, CN is divided into four CNs. When cu_split_flag is 0, when split_bt_mode is 1, CN is horizontally divided into two CNs, when split_bt_mode is 2, CN is vertically divided into two CNs, and when split_bt_mode is 0, CN is not divided and has one CU as a node. CU is a terminal node (leaf node) of CN and is not further divided.
 また、CTUのサイズが64x64画素の場合には、CUのサイズは、64x64画素、64x32画素、32x64画素、32x32画素、64x16画素、16x64画素、32x16画素、16x32画素、16x16画素、64x8画素、8x64画素、32x8画素、8x32画素、16x8画素、8x16画素、8x8画素、64x4画素、4x64画素、32x4画素、4x32画素、16x4画素、4x16画素、8x4画素、4x8画素、及び、4x4画素の何れかをとり得る。 If the CTU size is 64x64 pixels, the CU size is 64x64 pixels, 64x32 pixels, 32x64 pixels, 32x32 pixels, 64x16 pixels, 16x64 pixels, 32x16 pixels, 16x32 pixels, 16x16 pixels, 64x8 pixels, 8x64 pixels , 32x8 pixels, 8x32 pixels, 16x8 pixels, 8x16 pixels, 8x8 pixels, 64x4 pixels, 4x64 pixels, 32x4 pixels, 4x32 pixels, 16x4 pixels, 4x16 pixels, 8x4 pixels, 4x8 pixels, and 4x4 pixels .
  (符号化ユニット)
 図2の(f)には、処理対象のCUを復号するために動画像復号装置31が参照するデータの集合が規定されている。具体的には、CUは、予測ツリー(PT:Prediction Tree)、変換ツリー(TT:Transform Tree)、CUヘッダCUHから構成される。CUヘッダでは予測モード、分割方法(PU分割モード)等が規定される。
(Encoding unit)
In FIG. 2F, a set of data referred to by the video decoding device 31 for decoding the CU to be processed is defined. Specifically, the CU includes a prediction tree (PT: Prediction Tree), a transform tree (TT: Transform Tree), and a CU header CUH. In the CU header, a prediction mode, a division method (PU division mode), and the like are defined.
 PTでは、CUを1または複数に分割した各予測ユニット(PU:Prediction Unit)の予測パラメータ(参照ピクチャインデックス、動きベクトル等)が規定される。別の表現でいえば、PUは、CUを構成する1または複数の重複しない領域である。また、PTは、上述の分割により得られた1または複数のPUを含む。なお、以下では、PUをさらに分割した予測単位を「サブブロック」と呼ぶ。サブブロックは、複数の画素によって構成されている。PUとサブブロックのサイズが等しい場合には、PU中のサブブロックは1つである。PUがサブブロックのサイズよりも大きい場合には、PUは、サブブロックに分割される。たとえばPUが8x8、サブブロックが4x4の場合には、PUは水平に2分割、垂直に2分割からなる、4つのサブブロックに分割される。 In PT, prediction parameters (reference picture index, motion vector, etc.) of each prediction unit (PU: Prediction Unit) obtained by dividing CU into one or a plurality are defined. In other words, the PU is one or more non-overlapping areas constituting the CU. The PT includes one or more PUs obtained by the above division. Hereinafter, a prediction unit obtained by further dividing the PU is referred to as a “sub-block”. The sub block is composed of a plurality of pixels. When the size of the PU and the sub block is equal, there is one sub block in the PU. If the PU is larger than the size of the sub-block, the PU is divided into sub-blocks. For example, when the PU is 8x8 and the sub-block is 4x4, the PU is divided into four sub-blocks that are divided into two horizontally and vertically divided into two.
 予測処理は、このPU(あるいは、サブブロック)ごとに行ってもよい。 The prediction process may be performed for each PU (or sub block).
 PTにおける予測の種類は、大まかにいえば、イントラ予測の場合と、インター予測の場合との2つがある。イントラ予測とは、同一ピクチャ内の予測であり、インター予測とは、互いに異なるピクチャ間(例えば、表示時刻間、レイヤ画像間)で行われる予測処理を指す。 There are roughly two types of prediction in PT: intra prediction and inter prediction. Intra prediction is prediction within the same picture, and inter prediction refers to prediction processing performed between different pictures (for example, between display times and between layer images).
 イントラ予測の場合、分割方法は、2Nx2N(符号化ユニットと同一サイズ)と、NxNとがある。 In the case of intra prediction, there are 2Nx2N (the same size as the encoding unit) and NxN division methods.
 インター予測の場合、分割方法は、符号化データのPU分割モード(part_mode)により符号化され、2Nx2N(符号化ユニットと同一サイズ)、2NxN、2NxnU、2NxnD、Nx2N、nLx2N、nRx2N、および、NxNなどがある。なお、2NxN、Nx2Nは1:1の対称分割を示し、2NxnU、2NxnDおよびnLx2N、nRx2Nは、1:3、3:1の非対称分割を示す。CUに含まれるPUを順にPU0、PU1、PU2、PU3と表現する。 In the case of inter prediction, the division method is encoded by the PU division mode (part_mode) of encoded data, and 2Nx2N (same size as the encoding unit), 2NxN, 2NxnU, 2NxnD, Nx2N, nLx2N, nRx2N, NxN, etc. There is. 2NxN and Nx2N indicate 1: 1 symmetrical division, and 2NxnU, 2NxnD and nLx2N and nRx2N indicate 1: 3 and 3: 1 asymmetric division. The PUs included in the CU are expressed as PU0, PU1, PU2, and PU3 in this order.
 TTにおいては、CUが1または複数の変換ユニット(TU:Transform Unit)に分割され、各TUの位置とサイズとが規定される。別の表現でいえば、TUは、CUを構成する1または複数の重複しない領域のことである。また、TTは、上述の分割より得られた1または複数のTUを含む。 In TT, a CU is divided into one or a plurality of transform units (TU: Transform Unit), and the position and size of each TU are defined. In other words, the TU is one or more non-overlapping areas constituting the CU. Further, TT includes one or a plurality of TUs obtained by the above division.
 TTにおける分割には、CUと同一のサイズの領域をTUとして割り付けるものと、上述したCUの分割と同様、再帰的な4分木分割によるものがある。 There are two types of partitioning in TT: one that allocates an area of the same size as the CU as a TU, and one that uses recursive quadtree partitioning, similar to the above-described CU partitioning.
 変換処理は、このTUごとに行われる。 Conversion processing is performed for each TU.
  (予測パラメータ)
 PUの予測画像は、PUに付随する予測パラメータによって導出される。予測パラメータには、イントラ予測の予測パラメータもしくはインター予測の予測パラメータがある。以下、インター予測の予測パラメータ(インター予測パラメータ)について説明する。インター予測パラメータは、予測リスト利用フラグpredFlagL0、predFlagL1と、参照ピクチャインデックスrefIdxL0、refIdxL1と、動きベクトルmvL0、mvL1から構成される。予測リスト利用フラグpredFlagL0、predFlagL1は、各々L0リスト、L1リストと呼ばれる参照ピクチャリストが用いられるか否かを示すフラグであり、値が1の場合に、対応する参照ピクチャリストが用いられる。なお、本明細書中「XXであるか否かを示すフラグ」と記す場合、フラグが0以外(たとえば1)をXXである場合、0をXXではない場合とし、論理否定、論理積などでは1を真、0を偽と扱う(以下同様)。但し、実際の装置や方法では真値、偽値として他の値を用いることもできる。
(Prediction parameter)
The predicted image of the PU is derived by a prediction parameter associated with the PU. The prediction parameters include a prediction parameter for intra prediction or a prediction parameter for inter prediction. Hereinafter, prediction parameters for inter prediction (inter prediction parameters) will be described. The inter prediction parameter includes prediction list use flags predFlagL0 and predFlagL1, reference picture indexes refIdxL0 and refIdxL1, and motion vectors mvL0 and mvL1. The prediction list use flags predFlagL0 and predFlagL1 are flags indicating whether or not reference picture lists called L0 list and L1 list are used, respectively. When the value is 1, a corresponding reference picture list is used. In this specification, when “flag indicating whether or not it is XX” is described, when the flag is not 0 (for example, 1) is XX, 0 is not XX, and logical negation, logical product, etc. 1 is treated as true and 0 is treated as false (the same applies hereinafter). However, other values can be used as true values and false values in an actual apparatus or method.
  (参照ピクチャリスト)
 参照ピクチャリストは、参照ピクチャメモリ306に記憶された参照ピクチャからなるリストである。
(Reference picture list)
The reference picture list is a list including reference pictures stored in the reference picture memory 306.
  (マージ予測とAMVP予測)
 予測パラメータの復号(符号化)方法には、マージ予測(merge)モードとAMVP(Adaptive Motion Vector Prediction、適応動きベクトル予測)モードがある、マージフラグmerge_flagは、これらを識別するためのフラグである。マージモードは、予測リスト利用フラグpredFlagLX(またはインター予測識別子inter_pred_idc)、参照ピクチャインデックスrefIdxLX、動きベクトルmvLXを符号化データに含めずに、既に処理した近傍PUの予測パラメータから導出するモードである。AMVPモードは、インター予測識別子inter_pred_idc、参照ピクチャインデックスrefIdxLX、動きベクトルmvLXを符号化データに含めるモードである。なお、動きベクトルmvLXは、予測ベクトルmvpLXを識別する予測ベクトルインデックスmvp_lX_idxと差分ベクトルmvdLXとして符号化される。
(Merge prediction and AMVP prediction)
The prediction parameter decoding (encoding) method includes a merge prediction (merge) mode and an AMVP (Adaptive Motion Vector Prediction) mode. The merge flag merge_flag is a flag for identifying these. The merge mode is a mode in which the prediction list use flag predFlagLX (or inter prediction identifier inter_pred_idc), the reference picture index refIdxLX, and the motion vector mvLX are not included in the encoded data and are derived from the prediction parameters of already processed neighboring PUs. The AMVP mode is a mode in which the inter prediction identifier inter_pred_idc, the reference picture index refIdxLX, and the motion vector mvLX are included in the encoded data. Note that the motion vector mvLX is encoded as a prediction vector index mvp_lX_idx for identifying the prediction vector mvpLX and a difference vector mvdLX.
  (動きベクトル)
 動きベクトルmvLXは、異なる2つのピクチャ上のブロック間のずれ(シフト)量を示す。動きベクトルmvLXに関する予測ベクトル、差分ベクトルを、それぞれ予測ベクトルmvpLX、差分ベクトルmvdLXと呼ぶ。
(Motion vector)
The motion vector mvLX indicates a shift amount between blocks on two different pictures. A prediction vector and a difference vector related to the motion vector mvLX are referred to as a prediction vector mvpLX and a difference vector mvdLX, respectively.
  (イントラ予測)
 イントラ予測パラメータとは、CUをピクチャ内の情報で予測する処理に用いられるパラメータ、例えば、イントラ予測モードIntraPredModeであり、輝度イントラ予測モードIntraPredModeYと色差イントラ予測モードIntraPredModeCは異なっても良い。イントラ予測モードは、例えば67種類存在し、プレーナ予測、DC予測、Angular(方向)予測からなる。色差予測モードIntraPredModeCは、例えば、プレーナ予測、DC予測、Angular予測、ダイレクトモード(輝度の予測モードを使用するモード)、LM予測(輝度画素から線形予測するモード)の何れかを用いる。
(Intra prediction)
The intra prediction parameter is a parameter used for processing of predicting a CU with information in a picture, for example, an intra prediction mode IntraPredMode, and the luminance intra prediction mode IntraPredModeY and the color difference intra prediction mode IntraPredModeC may be different. There are, for example, 67 types of intra prediction modes, which include planar prediction, DC prediction, and angular (direction) prediction. As the color difference prediction mode IntraPredModeC, for example, any one of planar prediction, DC prediction, Angular prediction, direct mode (a mode using a luminance prediction mode), and LM prediction (a mode in which linear prediction is performed from luminance pixels) is used.
 輝度イントラ予測モードIntraPredModeYは、対象ブロックに適用される確率が高いと推定されたイントラ予測モードからなるMPM(Most Probable Mode)候補リストを用いて導出する場合と、MPM候補リストに含まれない予測モードであるREMから導出する場合がある。どちらの方法を用いるかをフラグprev_intra_luma_pred_flagで通知し、前者の場合は、インデックスmpm_idxと、隣接ブロックのイントラ予測モードから導出したMPM候補リストを用いてIntraPredModeYを導出する。後者の場合は、フラグrem_selected_mode_flagと、モードrem_selected_modeおよびrem_non_selected_modeを用いてイントラ予測モードを導出する。 The luminance intra prediction mode IntraPredModeY is derived using an MPM (Most Probable Mode) candidate list composed of intra prediction modes estimated to have a high probability of being applied to the target block, and prediction modes not included in the MPM candidate list May be derived from REM. Which method is used is notified by the flag prev_intra_luma_pred_flag, and in the former case, IntraPredModeY is derived using the index mpm_idx and the MPM candidate list derived from the intra prediction mode of the adjacent block. In the latter case, the intra prediction mode is derived using the flag rem_selected_mode_flag and the modes rem_selected_mode and rem_non_selected_mode.
 色差イントラ予測モードIntraPredModeCは、LM予測を用いるか否かを示すフラグnot_lm_chroma_flagを用いて導出する場合、ダイレクトモードを用いるか否かを示すフラグnot_dm_chroma_flagを用いて導出する場合、色差画素に適用されるイントラ予測モードを直接指定するインデックスchroma_intra_mode_idxを用いて導出する場合がある。 The color difference intra prediction mode IntraPredModeC is derived using a flag not_lm_chroma_flag indicating whether to use LM prediction, or is derived using a flag not_dm_chroma_flag indicating whether to use the direct mode. It may be derived using an index chroma_intra_mode_idx that directly specifies the prediction mode.
  (ループフィルタ)
 ループフィルタは符号化ループ内に設けたフィルタで、ブロック歪やリンギング歪を除去し、画質を改善するフィルタである。ループフィルタには、主に、デブロッキングフィルタ、サンプル適応オフセット(SAO:Sample Adaptive Offset)、適応ループフィルタ(ALF:Adaptive Loop Filter)がある。
(Loop filter)
The loop filter is a filter provided in the encoding loop, which removes block distortion and ringing distortion and improves image quality. The loop filter mainly includes a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF).
 デブロッキングフィルタは、ブロック境界を介して互いに隣接する輝度成分の画素のデブロック前画素値の差が予め定められた閾値よりも小さい場合に、当該ブロック境界に対して、輝度および色差成分の画素にデブロッキング処理を施すことによって、当該ブロック境界付近の画像のフィルタ処理を行う。 The deblocking filter is a pixel for luminance and color difference components with respect to a block boundary when a difference between pre-deblock pixel values of pixels of luminance components adjacent to each other via the block boundary is smaller than a predetermined threshold. Is subjected to a deblocking process, thereby filtering an image near the block boundary.
 SAOはデブロッキングフィルタ後に適用するフィルタであり、リンギング歪や量子化歪を除去する効果がある。SAOはCTU単位の処理であり、画素値をいくつかのカテゴリに分類して、カテゴリ毎に画素単位にオフセットを加減算するフィルタである。SAOのエッジオフセット(EO)処理は、対象画素と隣接画素(参照画素)との大小関係に応じて画素値に加算するオフセット値を決定する。 SAO is a filter that is applied after the deblocking filter and has the effect of removing ringing distortion and quantization distortion. SAO is a process in units of CTUs, and is a filter that classifies pixel values into several categories and adds or subtracts offsets in units of pixels for each category. In the SAO edge offset (EO) processing, an offset value to be added to the pixel value is determined according to the magnitude relationship between the target pixel and the adjacent pixel (reference pixel).
 ALFは、ALF前復号画像に対して、符号化ストリームTeから復号されたALFパラメータ(フィルタ係数)ALFPを用いた適応的フィルタ処理を施すことによって、ALF済復号画像を生成する。 The ALF generates an ALF-completed decoded image by performing an adaptive filtering process using an ALF parameter (filter coefficient) ALFP decoded from the encoded stream Te on the ALF pre-decoded image.
 (エントロピー符号化)
 エントロピー符号化には、シンタックスの種類や周囲の状況に応じて適応的に選択したコンテキスト(確率モデル)を用いてシンタックスを可変長符号化する方式と、あらかじめ定められた表、あるいは計算式を用いてシンタックスを可変長符号化する方式がある。前者のCABAC(Context Adaptive Binary Arithmetic Coding)では、符号化あるいは復号したピクチャ毎に更新した確率モデルをメモリに格納する。そして、後続のインター予測を用いるPピクチャ、あるいはBピクチャにおいて、対象ピクチャのコンテキストの初期状態は、メモリに格納された確率モデルの中から、同じスライスタイプ、同じスライスレベルの量子化パラメータを使用したピクチャの確率モデルを選択して、符号化、復号処理に使用する。
(Entropy coding)
Entropy coding includes a variable length coding method using a context (probability model) adaptively selected according to the type of syntax and the surrounding situation, a predetermined table, or a calculation formula. There is a method for variable-length coding of syntax using. In the former CABAC (Context Adaptive Binary Arithmetic Coding), an updated probability model is stored in memory for each encoded or decoded picture. In the P picture or B picture using the subsequent inter prediction, the initial state of the context of the target picture uses the same slice type and the same slice level quantization parameter from the probability model stored in the memory. A picture probability model is selected and used for encoding and decoding.
  (タイル)
 図3(a)はピクチャをN個のタイル(実線の矩形、図はN=9の例)に分割した例を示す図である。タイルはさらに複数のCTU(破線の矩形)に分割される。図3(a)の中央に示すように、タイルの左上座標を(xTs,yTs)、幅をwT、高さをhTと記す。またピクチャの幅をwPict、高さをhPictと記す。なお、タイルの分割数やサイズに関する情報をタイル情報と呼び、詳細は後述する。xTs、yTs、wT、hT、wPict、hPictの単位は画素である。ピクチャの幅、高さは図4(a)に示すsequence_parameter_set_rbsp()(SPSと称す)で通知される、pic_width_in_luma_samples、pic_height_in_luma_samplesをセットする。
(tile)
FIG. 3A is a diagram illustrating an example in which a picture is divided into N tiles (solid-line rectangle, N = 9 is an example). The tile is further divided into a plurality of CTUs (dashed rectangles). As shown in the center of FIG. 3A, the upper left coordinate of the tile is denoted as (xTs, yTs), the width as wT, and the height as hT. The width of the picture is written as wPict and the height is written as hPict. Note that information on the number of tile divisions and the size is referred to as tile information, and details will be described later. The unit of xTs, yTs, wT, hT, wPict, and hPict is a pixel. The picture width and height are set in pic_width_in_luma_samples and pic_height_in_luma_samples, which are notified by sequence_parameter_set_rbsp () (referred to as SPS) shown in FIG.
  wPict = pic_width_in_luma_samples
  hPict = pic_height_in_luma_samples
 図3(b)はピクチャをタイルに分割した時の、CTUの符号化、復号順序を示す図である。各タイルに記載されている番号はTileId(ピクチャ内のタイルの識別子)であり、ピクチャ内のタイルに対し左上から右下にラスタスキャン順で番号TileIdを割り当ててもよい。また、CTUは各タイル内を左上から右下へラスタスキャン順に処理され、1つのタイル内での処理が終わると、次のタイル内のCTUが処理される。
wPict = pic_width_in_luma_samples
hPict = pic_height_in_luma_samples
FIG. 3B is a diagram showing the CTU encoding and decoding order when a picture is divided into tiles. The number described in each tile is TileId (the identifier of the tile in the picture), and the number TileId may be assigned to the tile in the picture from the upper left to the lower right in the raster scan order. Further, the CTU is processed in the order of raster scanning from the upper left to the lower right in each tile, and when the processing in one tile is completed, the CTU in the next tile is processed.
 図3(c)は時間方向に連続するタイルを示す図である。図3(c)に示されるように、ビデオシーケンスは、時間方向に連続する複数のピクチャから構成されている。タイルシーケンスは、時間方向に連続する1つ以上の時刻のタイルから構成されている。図中のTile(n,tk)は、時刻tkにおけるTileId=nのタイルを表す。なお、図中のCVS(Coded Video Sequence)は、あるイントラピクチャから復号順で別のイントラピクチャの直前のピクチャまでのピクチャ群である。 FIG. 3 (c) is a diagram showing tiles continuous in the time direction. As shown in FIG. 3C, the video sequence is composed of a plurality of pictures that are continuous in the time direction. The tile sequence is composed of tiles at one or more times that are continuous in the time direction. Tile (n, tk) in the figure represents a tile with TileId = n at time tk. Note that CVS (Coded | Video | Sequence) in a figure is a picture group from a certain intra picture to the picture immediately before another intra picture in decoding order.
 図4は、タイル情報等に関するシンタックスの例である。 FIG. 4 shows an example of syntax related to tile information and the like.
 図4(b)に示すPPS(pic_parameter_set_rbsp())で、タイルに関するパラメータtile_parameters()を通知する。以下、パラメータを通知するとは、パラメータを符号化データ(ビットストリーム)に含めることを意味し、動画像符号化装置では当該パラメータを符号化し、動画像復号装置では当該パラメータを復号する。tile_parameters()には、図4(c)に示すように、tileが存在するか否かを示すtile_enabled_flagが1の場合、タイル情報tile_info()を通知する。また、tile_enabled_flagが1の場合、タイルが時間的に連続する複数のピクチャにわたって独立に復号できるか否かを示すindependent_tiles_flagを通知する。independent_tiles_flagが0の場合、タイルは、参照ピクチャ中の隣接タイルを参照して復号する(独立に復号できない)。independent_tiles_flagが1の場合、参照ピクチャ中の隣接タイルを参照せずに復号する。タイルを用いる場合、independent_tiles_flagの値によらず、対象ピクチャ中の隣接タイルを参照せずに復号するため、複数のタイルを並列に復号することができる。図4(c)に示すように、independent_tiles_flagが0の場合に参照ピクチャにかけるタイル境界でのループフィルタのオンオフを示す、loop_filter_across_tiles_enable_flagを伝送(present)する。independent_tiles_flagが1の場合において、loop_filter_across_tiles_enable_flagを伝送(present)せずに常に0としてもよい。 The parameter tile_parameters () related to the tile is notified by PPS (pic_parameter_set_rbsp ()) shown in FIG. 4 (b). Hereinafter, to notify the parameter means to include the parameter in the encoded data (bitstream). The moving image encoding apparatus encodes the parameter, and the moving image decoding apparatus decodes the parameter. As shown in FIG. 4C, when tile_enabled_flag indicating whether or not a tile is 1, tile_parameters () is notified of tile information tile_info (). Also, when tile_enabled_flag is 1, independent_tiles_flag indicating whether or not tiles can be decoded independently over a plurality of temporally continuous pictures is notified. When independent_tiles_flag is 0, tiles are decoded with reference to adjacent tiles in the reference picture (cannot be independently decoded). When independent_tiles_flag is 1, decoding is performed without referring to adjacent tiles in the reference picture. When tiles are used, decoding is performed without referring to adjacent tiles in the target picture regardless of the value of independent_tiles_flag, so that a plurality of tiles can be decoded in parallel. As shown in FIG. 4 (c), when independent_tiles_flag is 0, loop_filter_across_tiles_enable_flag indicating the on / off of the loop filter at the tile boundary applied to the reference picture is transmitted (present). When independent_tiles_flag is 1, loop_filter_across_tiles_enable_flag may not be transmitted (present) and may always be 0.
 なお、シーケンスを通してタイルを独立に処理する場合、独立タイルフラグindependent_tiles_flagは図4(a)に示すようにSPSで通知してもよい。independent_tiles_flagについては後述する。 When tiles are processed independently through a sequence, the independent tile flag independent_tiles_flag may be notified by SPS as shown in FIG. 4 (a). The independent_tiles_flag will be described later.
 タイル情報tile_info()は、例えば図4(d)に示すように、num_tile_columns_minus1、num_tile_rows_minus1、uniform_spacing_flag、column_width_minus1[i]、row_height_minus1[i]であるが、overlap_tiles_flagとオーバーラップ情報等を含めてもよい。ここで、num_tile_columns_minus1、num_tile_rows_minus1は各々ピクチャ内の水平、垂直方向のタイル数M、Nから各々1を引いた値である。uniform_spacing_flagはピクチャが均等にタイル分割されるか否かを示すフラグである。uniform_spacing_flagの値が1の場合、ピクチャの各タイルの幅、高さは等しく設定されるので、ピクチャ内の水平、垂直方向のタイル数から動画像符号化装置、動画像復号装置において、タイルの幅、高さを導出することができる。 The tile information tile_info () is, for example, num_tile_columns_minus1, num_tile_rows_minus1, uniform_spacing_flag, column_width_minus1 [i], row_height_minus1 [i] as shown in FIG. 4 (d), and may include overlap_tiles_flag and overlap information. Here, num_tile_columns_minus1 and num_tile_rows_minus1 are values obtained by subtracting 1 from the number of horizontal and vertical tiles M and N in the picture, respectively. uniform_spacing_flag is a flag indicating whether or not a picture is equally tiled. When the uniform_spacing_flag value is 1, the width and height of each tile of the picture are set to be equal, so the tile width is determined from the number of tiles in the picture in the horizontal and vertical directions. The height can be derived.
  M = num_tile_columns_minus1+1
  N = num_tile_rows_minus1+1
  wT[m] = floor(wPict/M) (0<=m<M-1) (式TSP-1)
  wT[M-1] = wPict-Σ(wT[m]) (0<=m<M-1)
  hT[n] = floor(hPict/N) (0<=n<N-1)
  hT[N-1] = hPict-Σ(hT[n])  (0<=n<N-1)
 あるいはwT[m]、hT[n]は下式で表現してもよい。
M = num_tile_columns_minus1 + 1
N = num_tile_rows_minus1 + 1
wT [m] = floor (wPict / M) (0 <= m <M-1) (Formula TSP-1)
wT [M-1] = wPict-Σ (wT [m]) (0 <= m <M-1)
hT [n] = floor (hPict / N) (0 <= n <N-1)
hT [N-1] = hPict-Σ (hT [n]) (0 <= n <N-1)
Alternatively, wT [m] and hT [n] may be expressed by the following equations.
  wT[m] = ceil(wPict/M) (0<=m<M-1) (式TSP-2)
  hT[n] = ceil(hPict/N) (0<=n<N-1)
 あるいはwT[m]、hT[n]は下式で表現してもよい。
wT [m] = ceil (wPict / M) (0 <= m <M-1) (Formula TSP-2)
hT [n] = ceil (hPict / N) (0 <= n <N-1)
Alternatively, wT [m] and hT [n] may be expressed by the following equations.
  for(m=0; m<M; m++ )
   wT[m] = ((m+1)*wPict)/M-(m*wPict)/M (式TSP-3)
  for(n=0; n<N; n++ )
   hT[n] = ((n+1)*hPict)/N-(n*hPict)/N
 またタイルサイズは、タイル単位のサイズ(タイルの最小サイズ)wUnitTile、hUnitTileの倍数でもよい。この場合、以下で導出する。
for (m = 0; m <M; m ++)
wT [m] = ((m + 1) * wPict) / M- (m * wPict) / M (Formula TSP-3)
for (n = 0; n <N; n ++)
hT [n] = ((n + 1) * hPict) / N- (n * hPict) / N
The tile size may be a multiple of a tile unit size (minimum tile size) wUnitTile and hUnitTile. In this case, it derives below.
  wT[m] = floor(wPict/M/wUnitTile)*wUnitTile (0<=m<M) (式TSP-4)
  hT[n] = floor(hPict/N/hUnitTile)*hUnitTile (0<=n<N)
あるいは下式で表現してもよい。
wT [m] = floor (wPict / M / wUnitTile) * wUnitTile (0 <= m <M) (Formula TSP-4)
hT [n] = floor (hPict / N / hUnitTile) * hUnitTile (0 <= n <N)
Or you may express with the following formula.
  wT[m] = ceil(wPict/M/wUnitTile)*wUnitTile (0<=m<M) (式TSP-5)
  hT[n] = ceil(hPict/N/hUnitTile)*hUnitTile (0<=n<N)
  for(m=0; m<M; m++ )
   wT[m] = ((m+1)*wPict/M/wUnitTile-m*wPict/M/wUnitTile)*wUnitTIle (式TSP-6)
  for(n=0; n<N; n++ )
   hT[n] = ((n+1)*hPict/N/hUnitTIle-n*hPict/N/hUnitTIle)*hUnitTile
wPict、hPictがそれぞれM、Nの整数倍でない場合、wT[m]あるいはhT[n]の一部に余りの画素数を配分するとよい。たとえば、wPict=500でM=3の場合には、2画素が余るため、wT[0]とwT[1]を1大きくする。あるいはM-1から逆順に、wT[M-1]とwT[M-2]を1大きくする。または、wT[0]やwT[M-1]など、特定の要素を2だけ大きくしてもよい。
uniform_spacing_flagの値が0の場合、ピクチャの各タイルの幅、高さは等しく設定される必要はない。動画像符号化装置では、各タイルの幅column_width_minus1[i](図3のwTを、wUnitTileを単位として表現した値)、高さrow_height_minus1[i](図3のhTを、hUnitTileを単位として表現した値)をタイル毎に符号化する。動画像復号装置では、符号化された(column_width_minus1[],row_width_minus1[])に基づいてタイル毎に、タイルサイズwT[m]、hT[n]を下記のように復号する。
wT [m] = ceil (wPict / M / wUnitTile) * wUnitTile (0 <= m <M) (Formula TSP-5)
hT [n] = ceil (hPict / N / hUnitTile) * hUnitTile (0 <= n <N)
for (m = 0; m <M; m ++)
wT [m] = ((m + 1) * wPict / M / wUnitTile-m * wPict / M / wUnitTile) * wUnitTIle (Formula TSP-6)
for (n = 0; n <N; n ++)
hT [n] = ((n + 1) * hPict / N / hUnitTIle-n * hPict / N / hUnitTIle) * hUnitTile
If wPict and hPict are not integer multiples of M and N, respectively, it is preferable to allocate the surplus number of pixels to a part of wT [m] or hT [n]. For example, when wPict = 500 and M = 3, two pixels are left, so wT [0] and wT [1] are increased by one. Alternatively, wT [M-1] and wT [M-2] are increased by 1 in reverse order from M-1. Alternatively, specific elements such as wT [0] and wT [M-1] may be increased by 2.
If the uniform_spacing_flag value is 0, the width and height of each tile in the picture need not be set equal. In the moving image encoding apparatus, the width of each tile column_width_minus1 [i] (value expressed in wUnitTile in wT in FIG. 3) and the height row_height_minus1 [i] (in FIG. 3, hT expressed in hUnitTile as a unit) Value) for each tile. In the moving picture decoding apparatus, the tile sizes wT [m] and hT [n] are decoded as follows for each tile based on the encoded (column_width_minus1 [], row_width_minus1 []).
  wT[m] = (column_width_minus1[m]+1)*wUnitTile (0<=m<M)(式TSP-7)
  hT[n] = (row_height_minus1[m]+1)*hUnitTile  (0<=n<N)
ここで、wUnitTile、hUnitTileはタイルの単位サイズ(最小サイズ)である。また、タイルサイズを、最小CUサイズMIN_CU_SIZE(=1<<log2CUSize)の整数倍(wUnitTile=hUnitTile=MIN_CU_SIZE)とし、タイルサイズwT[m]、hT[n]を下記のように復号してもよい。
wT [m] = (column_width_minus1 [m] +1) * wUnitTile (0 <= m <M) (Formula TSP-7)
hT [n] = (row_height_minus1 [m] +1) * hUnitTile (0 <= n <N)
Here, wUnitTile and hUnitTile are the unit size (minimum size) of the tile. In addition, the tile size may be an integral multiple of the minimum CU size MIN_CU_SIZE (= 1 << log2CUSize) (wUnitTile = hUnitTile = MIN_CU_SIZE), and the tile sizes wT [m] and hT [n] may be decoded as follows: .
  wT[m] = ((column_width_minus1[m]+1)<<log2CUSize) (0<=m<M)(式TSP-8)
  hT[n] = ((row_height_minus1[m]+1)<<log2CUSize)  (0<=n<N)
さらに、タイルサイズは、CTUサイズ(wCTU,hCTU)の整数倍(wUnitTile=wCTU、hUnitTile=hCTU)とし、タイルサイズwT[m]、hT[n]を下記のように復号してもよい。
wT [m] = ((column_width_minus1 [m] +1) << log2CUSize) (0 <= m <M) (Formula TSP-8)
hT [n] = ((row_height_minus1 [m] +1) << log2CUSize) (0 <= n <N)
Further, the tile size may be an integral multiple of the CTU size (wCTU, hCTU) (wUnitTile = wCTU, hUnitTile = hCTU), and the tile sizes wT [m] and hT [n] may be decoded as follows.
  wT[m] = (column_width_minus1[m]+1)*wCTU (0<=m<M)(式TSP-9)
  hT[n] = (row_height_minus1[m]+1)*hCTU  (0<=n<N)
 overlap_tiles_flagは、タイル境界付近の領域が隣接タイルとオーバーラップするか否かを示す。overlap_tiles_flagが1の場合、隣接タイルとオーバーラップすることを示し、図5(f)に示すオーバーラップ情報overlap_tiles_info()を通知する。overlap_tiles_flagが0の場合、隣接タイルとオーバーラップしない。ここで、オーバーラップとは、2つ以上のタイルが同じ画像の領域を含むことを意味し、オーバーラップ領域とは、2つ以上のタイルに含まれる領域を示す。
wT [m] = (column_width_minus1 [m] +1) * wCTU (0 <= m <M) (Formula TSP-9)
hT [n] = (row_height_minus1 [m] +1) * hCTU (0 <= n <N)
overlap_tiles_flag indicates whether or not an area near a tile boundary overlaps with an adjacent tile. When overlap_tiles_flag is 1, it indicates that it overlaps with an adjacent tile, and overlap information overlap_tiles_info () shown in FIG. 5 (f) is notified. When overlap_tiles_flag is 0, it does not overlap with adjacent tiles. Here, the overlap means that two or more tiles include a region of the same image, and the overlap region indicates a region included in two or more tiles.
 オーバーラップ情報overlap_tiles_info()は、uniform_overlap_flagとオーバーラップ領域の幅と高さを示す情報を含む。uniform_overlap_flagは各タイルのオーバーラップ領域の幅あるいは高さが等しいか否かを示すフラグである。各タイルのオーバーラップ領域の全ての幅、あるいは全ての高さが等しい場合、uniform_overlap_flagを1にセットし、オーバーラップ領域の幅と高さを示すシンタックスtile_overlap_width_div2、tile_overlap_height_div2を通知する。各タイルのオーバーラップ領域の幅、あるいは高さが異なる場合、uniform_overlap_flagを0にセットし、各タイルのオーバーラップ領域の幅と高さを示すシンタックスtile_overlap_width_div2[m]、tile_overlap_height_div2[n]を通知する。uniform_overlap_flagが1の場合は下記の関係が成り立つ。 The overlap information overlap_tiles_info () includes uniform_overlap_flag and information indicating the width and height of the overlap area. uniform_overlap_flag is a flag indicating whether the width or height of the overlap area of each tile is equal. When all the widths or all the heights of the overlap area of each tile are equal, uniform_overlap_flag is set to 1, and syntaxes tile_overlap_width_div2 and tile_overlap_height_div2 indicating the width and height of the overlap area are notified. If the width or height of the overlap area of each tile is different, set uniform_overlap_flag to 0 and notify the syntax tile_overlap_width_div2 [m] and tile_overlap_height_div2 [n] indicating the width and height of the overlap area of each tile . When uniform_overlap_flag is 1, the following relationship holds.
  tile_overlap_width_div2[m] = tile_overlap_width_div2 (0<=m<M-1)
  tile_overlap_height_div2[n] = tile_overlap_height_div2 (0<=n<N-1)
 実際のオーバーラップ領域の幅wOVLP、高さhOVLPとの関係を下式で示す。これらの単位は画素である。
tile_overlap_width_div2 [m] = tile_overlap_width_div2 (0 <= m <M-1)
tile_overlap_height_div2 [n] = tile_overlap_height_div2 (0 <= n <N-1)
The relationship between the actual overlap area width wOVLP and height hOVLP is shown by the following equation. These units are pixels.
  wOVLP = tile_overlap_width_div2[m]*2
  hOVLP = tile_overlap_height_div2[n]*2
 なお、オーバーラップしない場合、overlap_tiles_flagを0にセットし、オーバーラップ領域の幅と高さを0にセットする。overlap_tiles_flagが0の場合、tile_overlap_width_div2、tile_overlap_height_div2は、符号化データに含まれず、tile_overlap_width_div2=0とtile_overlap_height_div2=0が導出される。
wOVLP = tile_overlap_width_div2 [m] * 2
hOVLP = tile_overlap_height_div2 [n] * 2
If there is no overlap, overlap_tiles_flag is set to 0, and the width and height of the overlap area are set to 0. When overlap_tiles_flag is 0, tile_overlap_width_div2 and tile_overlap_height_div2 are not included in the encoded data, and tile_overlap_width_div2 = 0 and tile_overlap_height_div2 = 0 are derived.
 上記では、YUV4:2:0の場合を考慮して、オーバーラップ領域の幅と高さを2の倍数にしたが、YUV4:2:2の場合のオーバーラップ領域の高さやYUV4:4:4の場合のオーバーラップ領域の幅と高さは、1画素単位を2の倍数にせず、オーバーラップ領域の幅と高さを画素単位で通知てもよい。以降の"_div2"で表されるパラメータも、色差フォーマット(4:2:0、4:2:2、4:4:4)に応じて、サイズを2画素単位で表すか、1画素単位で表すかを切り替えてもよい。 In the above, considering the case of YUV4: 2: 0, the width and height of the overlap area were multiples of 2, but the overlap area height and YUV4: 4: 4 in the case of YUV4: 2: 2 In this case, the width and height of the overlap region may be notified in units of pixels without making each pixel unit a multiple of two. The parameters represented by "_div2" are also expressed in 2 pixel units or 1 pixel unit depending on the color difference format (4: 2: 0, 4: 2: 2, 4: 4: 4). You may switch whether to represent.
 なお、位置(m,n)のタイルの識別子TileIdは下記で算出してもよい。 The tile identifier TileId of the position (m, n) may be calculated as follows.
  TileId = n*M+m
 あるいは、TileIdが既知の場合、TileIdからタイルの位置を示す(m,n)を算出してもよい。
TileId = n * M + m
Alternatively, when TileId is known, (m, n) indicating the position of the tile may be calculated from TileId.
  m = TileId%M
  n = TileId/M
 図5(g)のスライスデータ(slice_segment_data())では、図4(b)で示すPPSで通知したタイル情報を用いて、ピクチャ上の(xTsmn,yTsmn)の位置から始まるM*N個のタイル毎に、タイルのシンタックスTile(m,n)を通知する。具体的には、ピクチャ上の(xTsmn,yTsmn)を各タイルの左上座標(0,0)として、図5(h)に示すように、タイルを、CTU(幅wCTU、高さhCTU)に分割し、各CTUの符号化データcoding_quadtree()を通知してもよい。ここで、(xTsmn,yTsmn)は(xTs00,yTs00)~(xTsM-1 N-1,yTsM-1 N-1)である。
m = TileId% M
n = TileId / M
In the slice data (slice_segment_data ()) of FIG. 5 (g), M * N tiles starting from the position of (xTsmn, yTsmn) on the picture using the tile information notified by the PPS shown in FIG. 4 (b) Every time, the tile syntax Tile (m, n) is notified. Specifically, with (xTsmn, yTsmn) on the picture as the upper left coordinates (0, 0) of each tile, the tile is divided into CTUs (width wCTU, height hCTU) as shown in FIG. The encoded data coding_quadtree () of each CTU may be notified. Here, (xTsmn, yTsmn) is (xTs00, yTs00) to (xTsM-1 N-1, yTsM-1 N-1).
 なお、図4(d)のtile_info()の代わりに図25に示すtile_info()を通知してもよい。図4(d)のtile_info()と図25(a)のtile_info()の違いは、図4(d)では、タイルの幅と高さをタイルの最小単位、あるいはCTU単位で表現したcolumn_width_minus1[i]、row_height_minus1[i]を通知するが、図25(a)では、overlap_tiles_flagが0でない場合、つまりオーバーラップする場合は、タイルの幅と高さを画素単位で表現したcolumn_width_in_luma_samples_div2_minus1[i]、row_height_in_luma_samples_div2_minus1[i]を通知し、overlap_tiles_flagが0の場合は、図4(d)と同様、タイルの幅と高さをタイルの最小単位、あるいはCTU単位で表現したcolumn_width_minus1[i]、row_height_minus1[i]を通知することである。column_width_in_luma_samples_div2_minus1[i]、row_height_in_luma_samples_div2_minus1[i]はタイルの画素単位の幅と高さを2で割った値である。この場合、画素単位のタイルの幅wT[m]と高さhT[n]は下式で表される。 Note that tile_info () shown in FIG. 25 may be notified instead of tile_info () shown in FIG. 4 (d). The difference between tile_info () in FIG. 4 (d) and tile_info () in FIG. 25 (a) is that in FIG. 4 (d), column_width_minus1 [which represents the tile width and height in the minimum tile unit or CTU unit. i] and row_height_minus1 [i] are notified, but in FIG. 25 (a), when overlap_tiles_flag is not 0, that is, overlap, column_width_in_luma_samples_div2_minus1 [i], row_height_in_luma_samples_div2_minus1 When [i] is notified and overlap_tiles_flag is 0, column_width_minus1 [i] and row_height_minus1 [i] that represent the tile width and height in the minimum tile unit or CTU unit, as in FIG. It is to notify. column_width_in_luma_samples_div2_minus1 [i] and row_height_in_luma_samples_div2_minus1 [i] are values obtained by dividing the width and height of the tile pixel unit by two. In this case, the width wT [m] and the height hT [n] of the tile in pixel units are expressed by the following equations.
  wT[m] = column_width_in_luma_samples_div2_minus1[m]*2+1 (式TSP-10)
  hT[n] = row_height_in_luma_samples_div2_minus1[n]*2+1
 なお、column_width_in_luma_samples_div2_minus1[m]、row_height_in_luma_samples_div2_minus1[n]*2は、色差フォーマット(4:2:0、4:2:2、4:4:4)に応じて、サイズを2画素単位で表すか、1画素単位で表すかを切り替えてもよい。
wT [m] = column_width_in_luma_samples_div2_minus1 [m] * 2 + 1 (Formula TSP-10)
hT [n] = row_height_in_luma_samples_div2_minus1 [n] * 2 + 1
Note that column_width_in_luma_samples_div2_minus1 [m] and row_height_in_luma_samples_div2_minus1 [n] * 2 are either expressed in units of 2 pixels depending on the color difference format (4: 2: 0, 4: 2: 2, 4: 4: 4), or 1 It may be switched whether it is expressed in units of pixels.
 また、column_width_in_luma_samples_div2_minus1[i]、row_height_in_luma_samples_div2_minus1[i]は可変長符号化(ue(v))ではなく、固定長符号化(f(n))してもよい。画素単位で表現するため、これらのシンタックスは値が大きくなりやすく、可変長符号化よりも固定長符号化の方が、符号量が小さくなるからである。 Also, column_width_in_luma_samples_div2_minus1 [i] and row_height_in_luma_samples_div2_minus1 [i] may be fixed length encoding (f (n)) instead of variable length encoding (ue (v)). Since the values are expressed in units of pixels, the value of these syntaxes tends to be large, and the code amount is smaller in the fixed-length coding than in the variable-length coding.
 なお、図25(a)では、オーバーラップ領域の有無でタイルの幅と高さの単位を切り替えたが、これに限らず、後述のタイル無効領域の有無でタイルの幅と高さの単位を切り替えてもよい。 In FIG. 25 (a), the unit of the width and height of the tile is switched depending on whether or not there is an overlap area. You may switch.
 また図5(f)では、下式のように、画素単位のオーバーラップの幅と高さを各々2で割った値を通知する。 In FIG. 5 (f), a value obtained by dividing the overlap width and height in pixel units by 2 is notified as shown in the following equation.
  wOVLP[m] = tile_overlap_width_div2*2  (式OVLP-1)
  hOVLP[m] = tile_overlap_height_div2*2
この他に、図25(b)のように、画素単位のオーバーラップの幅と高さから各々1を差し引いた値を通知してもよい。
wOVLP [m] = tile_overlap_width_div2 * 2 (Formula OVLP-1)
hOVLP [m] = tile_overlap_height_div2 * 2
In addition, as shown in FIG. 25B, a value obtained by subtracting 1 from the width and height of the overlap in pixel units may be notified.
  wOVLP[m] = tile_overlap_width_minus1+1  (式OVLP-2)
  hOVLP[m] = tile_overlap_height_minus1+1
 (タイル境界制限)
 タイル情報はPPSで通知されるため、ピクチャ毎にタイルの位置やサイズを変更することができる。一方、タイルシーケンスを独立に復号する場合、つまり等しいTileIdをもつタイルが、異なるTileIdをもつタイルの情報を参照することなく復号可能な場合には、ピクチャ毎にタイルの位置やサイズを変更しない。つまり、各タイルが異なる時刻のピクチャ(参照ピクチャ)を参照する場合に、CVSの全てのピクチャにおいて、同一のタイル分割を適用してもよい。この場合、等しいTileIdをもつタイルは、CVSの全ピクチャを通して、左上座標、幅、高さは等しく設定する。
wOVLP [m] = tile_overlap_width_minus1 + 1 (Formula OVLP-2)
hOVLP [m] = tile_overlap_height_minus1 + 1
(Tile boundary limit)
Since tile information is notified by PPS, the position and size of tiles can be changed for each picture. On the other hand, when the tile sequence is decoded independently, that is, when tiles having the same TileId can be decoded without referring to information on tiles having different TileId, the tile position and size are not changed for each picture. That is, when each tile refers to a picture (reference picture) at a different time, the same tile division may be applied to all pictures in the CVS. In this case, the tiles having the same TileId are set to have the same upper left coordinate, width, and height throughout all the CVS pictures.
 タイル情報がCVSを通じて変化しないことは、図4(e)に示すvui_parameter()のtiles_fixed_structure_flagの値を1にセットすることで通知される。つまり、tiles_fixed_structure_flagの値が1の場合、CVSを通して、PPSで通知されるnum_tile_columns_minus1、num_tile_rows_minus1、uniform_spacing_flag、column_width_minus1[i]、row_height_minus1[i]、overlap_tiles_flag、loop_filter_across_tiles_enabled_flagの値は一意である。tiles_fixed_structure_flagの値が1の場合、CVS内では、TileIdが等しいタイルは、時刻(POC:Picture Order Count)が異なるピクチャにおいても、ピクチャ上のタイル位置(タイルの左上座標、幅、高さ)とオーバーラップ情報は変更されない。また、tiles_fixed_structure_flagの値が0である場合、タイルシーケンスは時刻によってサイズが異なってもよい。 The fact that tile information does not change through CVS is notified by setting the value of tiles_fixed_structure_flag of vui_parameter () shown in FIG. 4 (e) to 1. That is, when the value of tiles_fixed_structure_flag is 1, num_tile_columns_minus1, num_tile_rows_minus1, uniform_spacing_flag, column_width_minus1 [i], row_height_minus1 [i], overlap_tiles_flag, and loop_filter_acrossflags_enabled_tile_enabled_tile_enabled_tile_values are unique. When the value of tiles_fixed_structure_flag is 1, tiles with the same TileId in CVS are over the tile position (tile upper left coordinates, width, height) on the picture, even in pictures with different times (POC: Picture Order Count). The lap information is not changed. Further, when the value of tiles_fixed_structure_flag is 0, the size of the tile sequence may be different depending on the time.
 図4(a)はシーケンスパラメータセットSPSの一部を抜粋したシンタックス表である。独立タイルフラグindependent_tiles_flagは、タイルシーケンスが対象ピクチャ内(空間方向)に加え、時間的に連続するシーケンス内(時間方向)でも独立に符号化、復号できるか否かを示すフラグである。independent_tiles_flagの値が1である場合、タイルシーケンスは独立に符号化、復号できることを意味し、タイルの符号化・復号及び符号化データのシンタックスにおいて下記の制約を課してもよい。
(制約1)CVS内では、タイルはTileIdが異なるタイルの情報を参照しない。
(制約2)CVSを通して、PPSで通知されるピクチャ内の水平、垂直方向のタイル数、タイルの幅、タイルの高さ、オーバーラップ領域の幅と高さは等しい。CVS内では、TileIdが等しいタイルは、時刻(POC)が異なるピクチャにおいても、ピクチャ上のタイル位置(タイルの左上座標、幅、高さ)は変更されない。vui_parameter()のtiles_fixed_structure_flagの値は1に設定する。
FIG. 4A is a syntax table excerpted from a part of the sequence parameter set SPS. The independent tile flag independent_tiles_flag is a flag indicating whether the tile sequence can be independently encoded and decoded not only in the target picture (in the spatial direction) but also in the temporally continuous sequence (in the temporal direction). When the value of independent_tiles_flag is 1, it means that the tile sequence can be encoded and decoded independently, and the following restrictions may be imposed on the encoding / decoding of tiles and the syntax of encoded data.
(Constraint 1) In CVS, tiles do not refer to information on tiles having different TileIds.
(Constraint 2) The number of tiles in the horizontal and vertical directions, the tile width, the tile height, and the overlap area width and height in the picture notified by PPS through CVS are equal. Within CVS, tiles with the same TileId are not changed in tile position (tile upper left coordinates, width, height) on the picture, even in pictures with different time (POC). The value of tiles_fixed_structure_flag of vui_parameter () is set to 1.
 上述の(制約1)「タイルはTileIdが異なるタイルの情報を参照しない」について、詳細に説明する。 The above (Restriction 1) “Tiles do not refer to information on tiles with different TileIds” will be described in detail.
 図6は時間方向(異なるピクチャ間)のタイルの参照について説明する図である。図6(a)は、時刻t0のイントラピクチャPict(t0)をN個のタイルに分割した例である。図6(b)は、時刻t1=t0+1のインターピクチャPict(t1)をN個のタイルに分割した例である。Pict(t1)はPict(t0)を参照する。図中、Tile(n,t)は時刻tのTileId=n(n=0..N-1)のタイルを表す。上述の(制約2)から、どの時刻においても、TileId=nのタイルの左上座標、幅、高さは等しい。 FIG. 6 is a diagram for explaining reference to tiles in the temporal direction (between different pictures). FIG. 6A shows an example in which the intra picture Pict (t0) at time t0 is divided into N tiles. FIG. 6B shows an example in which the inter picture Pict (t1) at time t1 = t0 + 1 is divided into N tiles. Pict (t1) refers to Pict (t0). In the figure, Tile (n, t) represents a tile of TileId = n (n = 0..N-1) at time t. From the above (Constraint 2), the upper left coordinates, width, and height of the tile of TileId = n are equal at any time.
 図6(b)において、タイルTile(n,t1)内のCU1、CU2、CU3は、図6(a)のブロックBLK1、BLK2、BLK3を参照する。この場合、BLK1とBLK3はタイルTile(n,t0)外のタイルに含まれるブロックであり、これらを参照するには、時刻t0において、Tile(n,t0)だけではなく、Pict(t0)全体を復号する必要がある。つまりTileId=nに対応するタイルシーケンスを時刻t0、t1で復号するだけではタイルTile(n,t1)を復号することはできず、TileId=nに加えてTileId=n以外のタイルシーケンスの復号も必要である。従ってタイルシーケンスを独立に復号するためには、タイル内のCUの動き補償画像を導出する場合に参照する参照ピクチャ内の参照画素がコロケートタイル(参照ピクチャ上の同じ位置のタイル)内に含まれる必要がある。 In FIG. 6 (b), CU1, CU2, and CU3 in tile Tile (n, t1) refer to blocks BLK1, BLK2, and BLK3 in FIG. 6 (a). In this case, BLK1 and BLK3 are blocks included in tiles outside the tile Tile (n, t0). To refer to them, not only Tile (n, t0) but also the entire Pict (t0) at time t0 Need to be decrypted. In other words, tile Tile (n, t1) cannot be decoded by simply decoding the tile sequence corresponding to TileId = n at times t0 and t1, and tile sequences other than TileId = n can also be decoded. is necessary. Therefore, in order to independently decode the tile sequence, the reference pixel in the reference picture to be referred to when deriving the motion compensated image of the CU in the tile is included in the collocated tile (the tile at the same position on the reference picture). There is a need.
 また、independent_tiles_flagの値が0である場合、タイルシーケンスが独立に復号できなくてもよいことを意味する。 Also, if the value of independent_tiles_flag is 0, it means that the tile sequence may not be decoded independently.
 上記で説明したように、タイル境界ではタイル外の画素を参照するイントラ予測やループフィルタをオフにすることによって、対象タイルに隣接するタイルの情報を参照せず、またインター予測は参照する範囲をコロケートタイルに限定することによって、任意の位置のタイルのみを符号化・復号することができる。 As explained above, by turning off intra prediction and loop filter that refer to pixels outside the tile at the tile boundary, information on tiles adjacent to the target tile is not referred to, and inter prediction is used to specify the range to be referred to. By limiting to collocated tiles, only tiles at arbitrary positions can be encoded / decoded.
 特に、8Kのような高解像度画像をモバイル端末で視聴する場合や、VR、360度動画をHMDで視聴する場合に、画面内の特定の領域のみを抽出して視聴することが一般的である。画面内の特定の領域のみを見る場合、動画像の一部だけを復号すればよいので、処理に必要な電力の消費を抑え、視聴時間を長くすることができる。 In particular, when viewing a high-resolution image such as 8K on a mobile device or viewing a VR or 360-degree video on an HMD, it is common to extract and view only a specific area on the screen. . When only a specific area in the screen is viewed, only a part of the moving image needs to be decoded, so that power consumption required for processing can be suppressed and the viewing time can be extended.
 一方で、対象タイルに隣接するタイル、コロケートタイルに隣接するタイルを参照しないため、タイル境界で画素値が不連続となり、タイル歪が発生する。以下では、個々のタイルを独立に符号化、復号しつつ、タイル歪を発生させない技術について説明する。 On the other hand, since the tile adjacent to the target tile and the tile adjacent to the collocated tile are not referred to, the pixel value becomes discontinuous at the tile boundary, and tile distortion occurs. In the following, a technique that does not cause tile distortion while encoding and decoding individual tiles independently will be described.
 本願の実施形態1では、ピクチャをタイルに分割する時に、図7に示すように、ピクチャ上の領域を、重複を許して分割することによってタイルを生成する。 In Embodiment 1 of the present application, when a picture is divided into tiles, as shown in FIG. 7, tiles are generated by dividing an area on the picture while allowing overlap.
 図7(a)は、ピクチャ(幅wPict、高さhPict)をM*N個のタイルに分割した図である。位置(m,n)のタイルをTile[m][n]で表す。ここで0<=m<M、0<=n<Nである。図7(a)ではM=3、N=2である。タイルTile[m][n]の幅と高さをwT[m]、hT[n]、左上座標(図7(a)の黒丸で示す位置)を(xTsmn,yTsmn)と表す。図中の斜線部は複数のタイルが重複(オーバーラップ)する領域である。wPict、hPict、wT[m]、hT[n]、xTsmn、yTsmnの単位は画素である。 FIG. 7 (a) is a diagram in which a picture (width wPict, height hPict) is divided into M * N tiles. The tile at position (m, n) is represented by Tile [m] [n]. Here, 0 <= m <M and 0 <= n <N. In FIG. 7A, M = 3 and N = 2. The width and height of the tile Tile [m] [n] are represented as wT [m] and hT [n], and the upper left coordinates (positions indicated by black circles in FIG. 7A) are represented as (xTsmn, yTsmn). The shaded area in the figure is an area where a plurality of tiles overlap (overlap). The units of wPict, hPict, wT [m], hT [n], xTsmn, and yTsmn are pixels.
 図7(b)は、隣接する2つのタイルTile[0][0]、Tile[1][0]の関係を示す図である。Tile[0][0]の右端の斜線部はTile[1][0]とオーバーラップする領域であり、下端の斜線部はTile[0][1]とオーバーラップする領域である。そして、Tile[0][0]の幅wT[0]、高さhT[0]は、Tile[1][0]、Tile[0][1]とオーバーラップする領域を含むタイルの幅と高さを示す。同様に、Tile[1][0]の左端の斜線部はTile[0][0]とオーバーラップする領域であり、右端の斜線部はTile[2][0]とオーバーラップする領域であり、下端の斜線部はTile[1][1]とオーバーラップする領域である。そして、Tile[1][0]の幅wT[1]、高さhT[0]は、Tile[0][0]、Tile[2][0]、Tile[1][1]と各々オーバーラップする領域を含む。 FIG. 7B is a diagram showing a relationship between two adjacent tiles Tile [0] [0] and Tile [1] [0]. The hatched area at the right end of Tile [0] [0] is an area that overlaps Tile [1] [0], and the shaded area at the bottom is an area that overlaps Tile [0] [1]. The width wT [0] and height hT [0] of Tile [0] [0] are the width of the tile including the area overlapping with Tile [1] [0] and Tile [0] [1]. Indicates the height. Similarly, the left hatched area of Tile [1] [0] is an area that overlaps Tile [0] [0], and the right hatched area is an area that overlaps Tile [2] [0]. The hatched area at the bottom is an area that overlaps Tile [1] [1]. And the width wT [1] and height hT [0] of Tile [1] [0] are respectively over Tile [0] [0], Tile [2] [0], and Tile [1] [1]. Includes the area to wrap.
 つまり、Tile[0][0]の右側の斜線部は、Tile[0][0]とTile[1][0]で各々(重複して)符号化される領域である。 That is, the hatched portion on the right side of Tile [0] [0] is an area that is encoded (overlapping) with Tile [0] [0] and Tile [1] [0].
 そして、各タイルのサイズをCTU単位とする構成では、各タイルの幅と高さはCTUの幅と高さの整数倍であることから、下記のように制約してもよい。 And, in the configuration in which the size of each tile is CTU unit, the width and height of each tile is an integral multiple of the width and height of the CTU.
  wT[m] = wCTU*a
  hT[n] = hCTU*b
 ここで、wCTU、hCTUはCTUの幅と高さであり、a、bは正の整数である。各タイルのサイズをCTU単位とする構成であっても、ピクチャの右端のタイルの幅、および、下端のタイルの高さはCTUの整数倍にならない場合があるので、図7(a)に示すように、ピクチャの右端と下端にクロップオフセット領域を設け(図7(a)の横線領域)、タイルとクロップオフセット領域を加算した幅と高さをCTUの整数倍に設定する。クロップオフセット領域は表示することを意図せず、CTU単位で処理しやすいように便宜上、処理する領域のサイズを大きくするために用いられる領域である。出力する場合には、例えば、画素値として便宜上、グレイ(Y,Cb,Cr)=(1<<(bitDepthY-1),1<<(bitDepthCb-1),1<<(bitDepthCr-1))、あるいは、ピクチャの右端/下端の画素値をパディングした値をセットする。また、タイル単位で(m,n)位置の各タイルの左上座標(xTsmn,yTsmn)はCTUの整数倍の位置とは限らない。後述するように、(wT,hT)のサイズで示されるタイル有効領域から、(wOVLP,hOVLP)で示されるオーバーラップ領域を引いた正味の表示領域をタイルアクティブ領域と呼んでもよい。
wT [m] = wCTU * a
hT [n] = hCTU * b
Here, wCTU and hCTU are the width and height of the CTU, and a and b are positive integers. Even if the size of each tile is CTU unit, the width of the tile at the right end of the picture and the height of the tile at the bottom end may not be an integral multiple of the CTU. In this manner, crop offset areas are provided at the right and bottom edges of the picture (horizontal line area in FIG. 7A), and the width and height obtained by adding the tile and the crop offset area are set to integer multiples of the CTU. The crop offset area is not intended to be displayed, and is an area used for increasing the size of the area to be processed for the sake of convenience so as to facilitate processing in units of CTUs. For output, for example, gray (Y, Cb, Cr) = (1 << (bitDepthY-1), 1 << (bitDepthCb-1), 1 << (bitDepthCr-1)) as a pixel value for convenience. Alternatively, the value obtained by padding the pixel values at the right end / bottom end of the picture is set. Further, the upper left coordinates (xTsmn, yTsmn) of each tile at the (m, n) position in tile units are not necessarily a position that is an integer multiple of the CTU. As will be described later, a net display area obtained by subtracting an overlap area indicated by (wOVLP, hOVLP) from a tile effective area indicated by a size of (wT, hT) may be called a tile active area.
 例えば、ピクチャが(wPict,hPict)=(1920,1080)、(wCTU,hCTU)=(128,128)、オーバーラップ領域の幅wOVLP=4、オーバーラップ領域の高さhOVLP=4の場合、タイル情報は下記のように設定してもよい。 For example, if the picture is (wPict, hPict) = (1920,1080), (wCTU, hCTU) = (128,128), overlap area width wOVLP = 4, overlap area height hOVLP = 4, tile information is You may set as follows.
  M = 3
  N = 2
  uniform_spacing_flag = 0
  wT[0] = 768
  wT[1] = 640
  wT[2] = 520
  hT[0] = 640
  hT[1] = 444
  overlap_tiles_flag = 1
  uniform_overlap_flag = 1
  tile_overlap_width_div2 = 2
  tile_overlap_height_div2 = 2
 column_width_minus1[2]とrow_height_minus1[1]はCTUの整数倍に対応するため、クロップオフセット領域を設け、タイルのサイズをCTUサイズの整数倍としてもよい。この場合、クロップオフセット領域の幅wCRP[2]と高さhCRP[1]は下記に設定する。wCRP[]、hCRP[]の単位は画素である。
M = 3
N = 2
uniform_spacing_flag = 0
wT [0] = 768
wT [1] = 640
wT [2] = 520
hT [0] = 640
hT [1] = 444
overlap_tiles_flag = 1
uniform_overlap_flag = 1
tile_overlap_width_div2 = 2
tile_overlap_height_div2 = 2
Since column_width_minus1 [2] and row_height_minus1 [1] correspond to integer multiples of the CTU, a crop offset area may be provided and the tile size may be an integer multiple of the CTU size. In this case, the width wCRP [2] and the height hCRP [1] of the crop offset area are set as follows. The unit of wCRP [] and hCRP [] is a pixel.
  wCRP[2] = 120
  hCRP[1] = 68
 タイルの幅wT[2]と高さhT[1]と、クロップオフセット領域の幅wCRP[2]と高さhCRP[1]を加算すると、CTUサイズである。
wCRP [2] = 120
hCRP [1] = 68
When the width wT [2] and height hT [1] of the tile and the width wCRP [2] and height hCRP [1] of the crop offset area are added, the CTU size is obtained.
  wT[2]*wCTU+wCRP[2] = 520+120 = 640 = 128*5
  hT[1]*hCTU+hCRP[1] = 444+68 = 512 = 128*4
 なお、タイルサイズはCTUサイズに限定されず、タイル単位のサイズ(wUnitTile、hUnitTile)や最小CUサイズMIN_CU_SIZEの整数倍などとしてもよい。
wT [2] * wCTU + wCRP [2] = 520 + 120 = 640 = 128 * 5
hT [1] * hCTU + hCRP [1] = 444 + 68 = 512 = 128 * 4
The tile size is not limited to the CTU size, and may be a tile unit size (wUnitTile, hUnitTile), an integer multiple of the minimum CU size MIN_CU_SIZE, or the like.
 クロップオフセット領域のサイズは、タイルとクロップオフセット領域のサイズの加算値がCTUの整数倍との制約から、タイルのサイズに基づいて導出することができる。 The size of the crop offset area can be derived based on the size of the tile from the constraint that the added value of the size of the tile and the crop offset area is an integer multiple of the CTU.
 また、ラスタ順に設定したタイル単位の位置(m,n)で示される、ピクチャ内での各タイルの左上座標(xTsmn,yTsmn)は下記計算式で算出する。各タイルの左上座標は、タイル先頭のCTUの左上座標でもある。 Also, the upper left coordinates (xTsmn, yTsmn) of each tile in the picture indicated by the tile unit position (m, n) set in raster order are calculated by the following formula. The upper left coordinate of each tile is also the upper left coordinate of the CTU at the beginning of the tile.
  xTsmn = ΣwT[m-1]-wOVLP*m (1<=m<Mの場合、Σは1~mまでの総和) (式TLA-1)
      0         (m=0の場合)
  yTsmn = ΣhT[n-1]-hOVLP*n (1<=n<Nの場合、Σは1~nまでの総和)
      0         (n=0の場合)
 より具体的には、以下になる。
xTsmn = ΣwT [m-1] -wOVLP * m (If 1 <= m <M, Σ is the sum of 1 to m) (Formula TLA-1)
0 (when m = 0)
yTsmn = ΣhT [n-1] -hOVLP * n (If 1 <= n <N, Σ is the sum of 1 to n)
0 (when n = 0)
More specifically, it is as follows.
  (xTs00,yTs00) = (0,0)
  (xTs10,yTs10) = (764,0)
  (xTs20,yTs20) = (1400,0)
  (xTs01,yTs01) = (0,636)
  (xTs11,yTs11) = (764,636)
  (xTs21,yTs21) = (1400,636)
 このようにピクチャ内の各タイルの左上座標(タイル先頭のCTUの左上座標)は、ピクチャ内で必ずしもCTUの整数倍の位置ではない。
(xTs00, yTs00) = (0,0)
(xTs10, yTs10) = (764,0)
(xTs20, yTs20) = (1400,0)
(xTs01, yTs01) = (0,636)
(xTs11, yTs11) = (764,636)
(xTs21, yTs21) = (1400,636)
As described above, the upper left coordinate of each tile in the picture (the upper left coordinate of the CTU at the head of the tile) is not necessarily an integer multiple of the CTU in the picture.
 各タイルを符号化・復号する時に、タイルのオーバーラップ領域はタイル毎に符号化・復号され、複数の復号画像が生成される。例えば図7(a)において、Tile[0][0]とTile[1][0]のオーバーラップ領域は各々のタイルで1回ずつ符号化・復号されるので2つの復号画像が生成される。また、Tile[0][0]とTile[0][1]のオーバーラップ領域も各々のタイルで1回ずつ符号化・復号されるので2つの復号画像が生成される。また、Tile[0][0]、Tile[1][0]、Tile[0][1]、Tile[1][1]のオーバーラップ領域は各々のタイルで1回ずつ符号化・復号されるので4つの復号画像が生成される。これらの領域は、復号後に合成処理(タイル境界のフィルタ処理)を実施することで、タイル歪のない合成画像(表示画像)を生成することができる。一例を図8(a)に示す。図8(a)では、2つの復号画像の重み付和を計算することで合成画像を生成する。画像の合成方法については後述する。 When encoding / decoding each tile, the overlap region of the tile is encoded / decoded for each tile, and a plurality of decoded images are generated. For example, in FIG. 7A, the overlap region of Tile [0] [0] and Tile [1] [0] is encoded and decoded once for each tile, so that two decoded images are generated. . In addition, since the overlap region of Tile [0] [0] and Tile [0] [1] is encoded and decoded once for each tile, two decoded images are generated. The overlap area of Tile [0] [0], Tile [1] [0], Tile [0] [1], and Tile [1] [1] is encoded and decoded once for each tile. Therefore, four decoded images are generated. For these regions, a composite image (display image) without tile distortion can be generated by performing a composite process (filtering of tile boundaries) after decoding. An example is shown in FIG. In FIG. 8A, a composite image is generated by calculating a weighted sum of two decoded images. A method for synthesizing images will be described later.
  (動画像復号装置の構成)
 図9(a)に本発明の動画像復号装置(画像復号装置)31を示す。動画像復号装置31は、ヘッダ情報復号部2001、タイル復号部2002a~2002nおよびタイル合成部2003を含んで構成される。
(Configuration of video decoding device)
FIG. 9 (a) shows a video decoding device (image decoding device) 31 of the present invention. The moving picture decoding apparatus 31 includes a header information decoding unit 2001, tile decoding units 2002a to 2002n, and a tile synthesis unit 2003.
 ヘッダ情報復号部2001は、外部から入力され、NAL(network abstraction layer)ユニット単位で符号化された符号化ストリームTeからヘッダ情報を復号する。また、ヘッダ情報復号部2001は、外部から入力された、ディスプレイ等に表示すべき画像領域を示す制御情報から、表示に必要なタイル(TileId)を導出する。また、ヘッダ情報復号部2001は、符号化ストリームTeから表示に必要な符号化タイルを抽出し、タイル復号部2002a~2002nに伝送する。また、ヘッダ情報復号部2001は、PPSを復号して得られたタイル情報(タイルの分割に関する情報)と、タイル復号部2002で復号されるタイルのTileIdをタイル合成部2003に伝送する。タイル情報は、具体的には、num_tile_columns_minus1、num_tile_rows_minus1、uniform_spacing_flag、column_width_minus1[i]、row_height_minus1[i]、overlap_tiles_flag、オーバーラップ情報等のシンタックスから算出した、タイルの水平方向の個数M、垂直方向の個数N、タイルの幅wT[m]と高さhT[n]、オーバーラップ領域の幅wOVLP[m]と高さhOVLP[n]等である。また、これらの情報からクロップオフセット領域の幅wCRP[m]と高さhCRP[n]を導出する。 The header information decoding unit 2001 decodes header information from an encoded stream Te input from the outside and encoded in units of NAL (network abstraction) layers. The header information decoding unit 2001 derives a tile (TileId) necessary for display from control information indicating an image area to be displayed on a display or the like input from the outside. In addition, the header information decoding unit 2001 extracts an encoded tile necessary for display from the encoded stream Te, and transmits the encoded tile to the tile decoding units 2002a to 2002n. Also, the header information decoding unit 2001 transmits tile information (information related to tile division) obtained by decoding the PPS and TileId of the tile decoded by the tile decoding unit 2002 to the tile synthesis unit 2003. Specifically, tile information is num_tile_columns_minus1, num_tile_rows_minus1, uniform_spacing_flag, column_width_minus1 [i], row_height_minus1 [i], overlap_tiles_flag, the number of tiles in the horizontal direction, calculated from the syntax of overlap information, etc. N, tile width wT [m] and height hT [n], overlap area width wOVLP [m] and height hOVLP [n], and the like. Also, the width wCRP [m] and height hCRP [n] of the crop offset area are derived from these pieces of information.
 タイル復号部2002a~2002nは、各符号化タイルを復号し、復号したタイルをタイル合成部2003に伝送する。 The tile decoding units 2002a to 2002n decode the encoded tiles and transmit the decoded tiles to the tile synthesis unit 2003.
 ここで、タイル復号部2002a~2002nは、タイルシーケンスを1つの独立なビデオシーケンスとして復号処理を行うため、復号処理を行う際に時間的にも空間的にもタイルシーケンス間の予測情報を参照しない。すなわち、タイル復号部2002a~2002nは、あるピクチャ内のタイルを復号する場合に、別の(異なるTileIdをもつ)タイルシーケンスのタイルを参照しない。 Here, since the tile decoding units 2002a to 2002n perform the decoding process with the tile sequence as one independent video sequence, the tile decoding units 2002a to 2002n do not refer to the prediction information between the tile sequences in time and space when performing the decoding process. . That is, the tile decoding units 2002a to 2002n do not refer to tiles of another tile sequence (having different TileId) when decoding tiles in a certain picture.
 このように、タイル復号部2002a~2002nが各々タイルを復号するため、複数のタイルを並列に復号処理することも、1つのタイルのみを独立して復号することもできる。その結果、タイル復号部2002a~2002nによれば、必要最小限の復号処理のみ実行することで表示に必要な画像を復号できる等、効率よく復号処理を実行することができる。 In this way, since the tile decoding units 2002a to 2002n each decode the tile, it is possible to decode a plurality of tiles in parallel or to decode only one tile independently. As a result, according to the tile decoding units 2002a to 2002n, it is possible to efficiently execute the decoding process, such as decoding an image necessary for display by executing only the minimum necessary decoding process.
  (タイル復号部の構成)
 タイル復号部2002a~2002nの構成について説明する。図10は、タイル復号部2002a~2002nの1つである2002の構成を示すブロック図である。タイル復号部2002は、エントロピー復号部301、予測パラメータ復号部(予測画像復号装置)302、ループフィルタ305、参照ピクチャメモリ306、予測パラメータメモリ307、予測画像生成部(予測画像生成装置)308、逆量子化・逆変換部311、及び加算部312を含んで構成される。なお、後述のタイル符号化部2012に合わせ、タイル復号部2002にループフィルタ305が含まれない構成もある。
(Configuration of tile decoding unit)
The configuration of the tile decoding units 2002a to 2002n will be described. FIG. 10 is a block diagram showing the configuration of 2002, which is one of the tile decoding units 2002a to 2002n. The tile decoding unit 2002 includes an entropy decoding unit 301, a prediction parameter decoding unit (prediction image decoding device) 302, a loop filter 305, a reference picture memory 306, a prediction parameter memory 307, a prediction image generation unit (prediction image generation device) 308, and inversely. A quantization / inverse transform unit 311 and an adder 312 are included. Note that there is a configuration in which the tile decoding unit 2002 does not include the loop filter 305 in accordance with the tile encoding unit 2012 described later.
 また、予測パラメータ復号部302は、インター予測パラメータ復号部303及びイントラ予測パラメータ復号部304を含んで構成される。予測画像生成部308は、インター予測画像生成部309及びイントラ予測画像生成部310を含んで構成される。 The prediction parameter decoding unit 302 includes an inter prediction parameter decoding unit 303 and an intra prediction parameter decoding unit 304. The predicted image generation unit 308 includes an inter predicted image generation unit 309 and an intra predicted image generation unit 310.
 以降では、処理の単位としてCTU、CU、PU、TUを使用した例を記載するが、この例に限らず、TUあるいはPU単位の代わりにCU単位で処理をしてもよい。あるいはCTU、CU、PU、TUをブロックと読み替え、ブロック単位の処理としてもよい。 Hereinafter, an example in which CTU, CU, PU, and TU are used as processing units will be described. However, the present invention is not limited to this example, and processing may be performed in units of CUs instead of units of TUs or PUs. Alternatively, CTU, CU, PU, and TU may be read as blocks, and processing in units of blocks may be performed.
 エントロピー復号部301は、外部から入力された符号化ストリームTeに対してエントロピー復号を行って、個々の符号(シンタックス要素)を分離し復号する。分離された符号には、予測画像を生成するための予測パラメータおよび、差分画像を生成するための残差情報などがある。 The entropy decoding unit 301 performs entropy decoding on the coded stream Te input from the outside, and separates and decodes individual codes (syntax elements). The separated code includes a prediction parameter for generating a prediction image and residual information for generating a difference image.
 エントロピー復号部301は、分離した符号の一部を予測パラメータ復号部302に出力する。分離した符号の一部とは、例えば、予測モードpredMode、PU分割モードpart_mode、参照ピクチャインデックスref_idx_lX、予測ベクトルインデックスmvp_lX_idx、差分ベクトルmvdLXである。どの符号を復号するかの制御は、予測パラメータ復号部302の指示に基づいて行われる。エントロピー復号部301は、量子化変換係数を逆量子化・逆変換部311に出力する。この量子化変換係数は、符号化処理において、残差信号に対してDCT(Discrete Cosine Transform、離散コサイン変換)、DST(Discrete Sine Transform、離散サイン変換)、KLT(Karyhnen Loeve Transform、カルーネンレーベ変換)等の周波数変換を行い量子化して得られる係数である。 The entropy decoding unit 301 outputs a part of the separated code to the prediction parameter decoding unit 302. Some of the separated codes are, for example, a prediction mode predMode, a PU partition mode part_mode, a reference picture index ref_idx_lX, a prediction vector index mvp_lX_idx, and a difference vector mvdLX. Control of which code is decoded is performed based on an instruction from the prediction parameter decoding unit 302. The entropy decoding unit 301 outputs the quantized transform coefficient to the inverse quantization / inverse transform unit 311. In the coding process, this quantized transform coefficient is DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), KLT (Karyhnen Loeve Transform) ) Etc., and is a coefficient obtained by performing quantization and quantizing.
 インター予測パラメータ復号部303は、エントロピー復号部301から入力された符号に基づいて、予測パラメータメモリ307に記憶された予測パラメータを参照してインター予測パラメータを復号する。また、インター予測パラメータ復号部303は、復号したインター予測パラメータを予測画像生成部308に出力し、また予測パラメータメモリ307に記憶する。 The inter prediction parameter decoding unit 303 decodes the inter prediction parameter with reference to the prediction parameter stored in the prediction parameter memory 307 based on the code input from the entropy decoding unit 301. Also, the inter prediction parameter decoding unit 303 outputs the decoded inter prediction parameters to the prediction image generation unit 308 and stores them in the prediction parameter memory 307.
 イントラ予測パラメータ復号部304は、エントロピー復号部301から入力された符号に基づいて、予測パラメータメモリ307に記憶された予測パラメータを参照してイントラ予測パラメータを復号する。イントラ予測パラメータ復号部304は、復号したイントラ予測パラメータを予測画像生成部308に出力し、また予測パラメータメモリ307に記憶する。 The intra prediction parameter decoding unit 304 refers to the prediction parameter stored in the prediction parameter memory 307 on the basis of the code input from the entropy decoding unit 301 and decodes the intra prediction parameter. The intra prediction parameter decoding unit 304 outputs the decoded intra prediction parameter to the prediction image generation unit 308 and stores it in the prediction parameter memory 307.
 ループフィルタ305は、加算部312が生成したCUの復号画像に対し、デブロッキングフィルタ、サンプル適応オフセット(SAO)、適応ループフィルタ(ALF)等のフィルタを施す。なお、ループフィルタ305はタイル符号化部2012と対になっていれば、必ずしも上記3種類のフィルタを含まなくてもよく、例えばデブロッキングフィルタのみの構成であってもよい。 The loop filter 305 applies filters such as a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) to the decoded image of the CU generated by the adding unit 312. As long as the loop filter 305 is paired with the tile encoding unit 2012, the above three types of filters do not necessarily have to be included, and for example, a configuration with only a deblocking filter may be used.
 参照ピクチャメモリ306は、加算部312が生成したCUの復号画像を、復号対象のピクチャ及びCTUあるいはCU毎に予め定めた位置に記憶する。 The reference picture memory 306 stores the decoded image of the CU generated by the adding unit 312 at a predetermined position for each decoding target picture and CTU or CU.
 予測パラメータメモリ307は、予測パラメータを、復号対象のピクチャ及びPU(もしくはサブブロック、固定サイズブロック、ピクセル)毎に予め定めた位置に記憶する。具体的には、予測パラメータメモリ307は、インター予測パラメータ復号部303が復号したインター予測パラメータ、イントラ予測パラメータ復号部304が復号したイントラ予測パラメータ及びエントロピー復号部301が分離した予測モードpredModeを記憶する。 The prediction parameter memory 307 stores the prediction parameter at a predetermined position for each decoding target picture and PU (or sub-block, fixed-size block, pixel). Specifically, the prediction parameter memory 307 stores the inter prediction parameter decoded by the inter prediction parameter decoding unit 303, the intra prediction parameter decoded by the intra prediction parameter decoding unit 304, and the prediction mode predMode separated by the entropy decoding unit 301. .
 予測画像生成部308には、エントロピー復号部301から入力された予測モードpredModeが入力され、また予測パラメータ復号部302から予測パラメータが入力される。また、予測画像生成部308は、参照ピクチャメモリ306から参照ピクチャを読み出す。予測画像生成部308は、予測モードpredModeが示す予測モードで、入力された予測パラメータと読み出した参照ピクチャ(参照ピクチャブロック)を用いてPU(ブロック)もしくはサブブロックの予測画像を生成する。 The prediction image generation unit 308 receives the prediction mode predMode input from the entropy decoding unit 301 and the prediction parameter from the prediction parameter decoding unit 302. Further, the predicted image generation unit 308 reads a reference picture from the reference picture memory 306. The prediction image generation unit 308 generates a prediction image of a PU (block) or a sub-block using the input prediction parameter and the read reference picture (reference picture block) in the prediction mode indicated by the prediction mode predMode.
 ここで、予測モードpredModeがインター予測モードを示す場合、インター予測画像生成部309は、インター予測パラメータ復号部303から入力されたインター予測パラメータと読み出した参照ピクチャ(参照ピクチャブロック)を用いてインター予測によりPU(ブロック)もしくはサブブロックの予測画像を生成する。 Here, when the prediction mode predMode indicates the inter prediction mode, the inter prediction image generation unit 309 uses the inter prediction parameter input from the inter prediction parameter decoding unit 303 and the read reference picture (reference picture block). To generate a prediction image of PU (block) or sub-block.
 インター予測画像生成部309は、予測リスト利用フラグpredFlagLXが1である参照ピクチャリスト(L0リスト、もしくはL1リスト)に対し、参照ピクチャインデックスrefIdxLXで示される参照ピクチャから、復号対象PUを基準として動きベクトルmvLXが示す位置にある参照ピクチャブロックを参照ピクチャメモリ306から読み出す。インター予測画像生成部309は、読み出した参照ピクチャブロックをもとに補間を行ってPUの予測画像(補間画像、動き補償画像)を生成する。インター予測画像生成部309は、生成したPUの予測画像を加算部312に出力する。ここで、参照ピクチャブロックとは、参照ピクチャ上の画素の集合(通常矩形であるのでブロックと呼ぶ)であり、PUもしくはサブブロックの予測画像を生成するために参照する領域である。 The inter prediction image generation unit 309 performs a motion vector on the basis of the decoding target PU from the reference picture indicated by the reference picture index refIdxLX for a reference picture list (L0 list or L1 list) having a prediction list use flag predFlagLX of 1. The reference picture block at the position indicated by mvLX is read from the reference picture memory 306. The inter predicted image generation unit 309 performs interpolation based on the read reference picture block, and generates a PU predicted image (interpolated image, motion compensated image). The inter prediction image generation unit 309 outputs the generated prediction image of the PU to the addition unit 312. Here, a reference picture block is a set of pixels on a reference picture (usually called a block because it is a rectangle), and is an area that is referred to in order to generate a predicted image of a PU or sub-block.
  (タイル境界パディング)
 参照ピクチャブロック(参照ブロック)は、予測リスト利用フラグpredFlagLX=1の参照ピクチャリストに対し、参照ピクチャインデックスrefIdxLXで示される参照ピクチャ上にあって、対象CU(ブロック)の位置を基準として、動きベクトルmvLXが示す位置のブロックである。既に説明したように、対象タイルと等しいTileIdをもつ参照ピクチャ上のタイル(コロケートタイル)内に、参照ブロックの画素が位置する保証はない。そこで、一例として、参照ピクチャにおいて、図6(c)に示すように各タイルの外側をパディング(タイル境界の画素値で補填する)することで、コロケートタイル外の画素値を参照することなく参照ブロックを読みだすことができる。
(Tile border padding)
The reference picture block (reference block) is a motion vector on the reference picture indicated by the reference picture index refIdxLX with respect to the reference picture list of the prediction list use flag predFlagLX = 1 and based on the position of the target CU (block) This is the block at the position indicated by mvLX. As already described, there is no guarantee that the pixel of the reference block is located within the tile (collocated tile) on the reference picture having the same TileId as the target tile. Therefore, as an example, in the reference picture, reference is made without referring to the pixel values outside the collocated tile by padding the outside of each tile (complementing with the pixel value at the tile boundary) as shown in FIG. You can read the block.
 タイル境界パディング(タイル外パディング)は、インター予測画像生成部309による動き補償において、参照画素の位置(xIntL+i,yIntL+j)の画素値として、以下の位置(xRef+i,yRef+j)の画素値refImg[xRef+i][yRef+j]を用いることで実現する。すなわち、参照画素参照時に、参照位置を、タイルの上下左右の境界画素の位置でクリッピングすることで実現する。 Tile boundary padding (outside tile padding) is the following position (xRef + i, yRef + j) as the pixel value of the reference pixel position (xIntL + i, yIntL + j) in motion compensation by the inter prediction image generation unit 309. ) Pixel value refImg [xRef + i] [yRef + j]. That is, when referring to the reference pixel, the reference position is clipped at the positions of the upper, lower, left and right boundary pixels of the tile.
 xRef+i = Clip3(xTs, xYs+wT-1, xIntL+i) 
 yRef+j = Clip3(yTs, yYs+hT-1, yIntL+j)
ここで、(xTs,yTs)は、対象ブロックが位置する対象タイルの左上座標、wT、hTは、対象タイルの幅と高さである。これらの単位は画素である。
xRef + i = Clip3 (xTs, xYs + wT-1, xIntL + i)
yRef + j = Clip3 (yTs, yYs + hT-1, yIntL + j)
Here, (xTs, yTs) is the upper left coordinates of the target tile where the target block is located, and wT and hT are the width and height of the target tile. These units are pixels.
 なお、xIntL、yIntLは、ピクチャの左上座標を基準とした対象ブロックの左上座標を(xb,yb)、動きベクトルを(mvLX[0],mvLX[1])とすると、
 xIntL = xb+(mvLX[0]>>log2(MVUNIT)) 
 yIntL = yb+(mvLX[1]>>log2(MVUNIT))
で導出しても良い。ここでMVUNITは、動きベクトルの精度が1/MVUNITペル(pel)であることを示す。
XIntL and yIntL are (xb, yb) as the upper left coordinates of the target block with reference to the upper left coordinates of the picture, and (mvLX [0], mvLX [1]) as the motion vectors.
xIntL = xb + (mvLX [0] >> log2 (MVUNIT))
yIntL = yb + (mvLX [1] >> log2 (MVUNIT))
It may be derived by Here, MVUNIT indicates that the accuracy of the motion vector is 1 / MVUNIT pel.
 座標(xRef+i,yRef+j)の画素値を読み出すことで、図6(c)のパディングを実現できる。 By reading the pixel value of the coordinates (xRef + i, yRef + j), the padding of FIG. 6 (c) can be realized.
 independent_tiles_flag=1の場合にタイル境界をパディングすることで、インター予測で動きベクトルがコロケートタイル外を指しても、コロケートタイル内の画素値を用いて参照画素を置き換えるので、タイルシーケンスを独立にインター予測を用いて復号することができる。 By padding tile boundaries when independent_tiles_flag = 1, even if the motion vector points outside the collocated tile in inter prediction, the pixel value in the collocated tile is used to replace the reference pixel, so the tile sequence is inter predicted independently. Can be used for decoding.
  (タイル境界動きベクトル制限)
 タイル境界パディングの他の制限方法として、タイル境界動きベクトル制限がある。本処理では、インター予測画像生成部309による動き補償において、参照画素の位置(xIntL+i,yIntL+j)がコロケートタイル内に入るように動きベクトルを制限(クリッピング)する。
(Tile boundary motion vector restriction)
As another method for restricting tile boundary padding, there is a tile boundary motion vector restriction. In this process, in the motion compensation by the inter prediction image generation unit 309, the motion vector is limited (clipped) so that the position of the reference pixel (xIntL + i, yIntL + j) falls within the collocated tile.
 本処理では、対象ブロック(対象サブブロックもしくは対象ブロック)の左上座標(xb,yb)、ブロックのサイズ(BW,BH)、対象タイルの左上座標(xTs,yTs)、対象タイルの幅と高さがwT、hTである場合に、ブロックの動きベクトルmvLXを入力とし、制限された動きベクトルmvLXを出力する。 In this process, the upper left coordinates (xb, yb) of the target block (target subblock or target block), the block size (BW, BH), the upper left coordinates (xTs, yTs) of the target tile, the width and height of the target tile When w is wT and hT, the motion vector mvLX of the block is input and a limited motion vector mvLX is output.
 対象ブロックの補間画像生成における参照画素の左端posL、右端posR、上端posU、下端posDは各々、以下である。なお、NTAPは補間画像生成に使用するフィルタのタップ数である。ここでMVUNITは、動きベクトルの精度が1/MVUNITペル(pel)であることを示す。 The left end posL, right end posR, upper end posU, and lower end posD of the reference pixel in the interpolation image generation of the target block are as follows. NTAP is the number of filter taps used for generating the interpolation image. Here, MVUNIT indicates that the accuracy of the motion vector is 1 / MVUNIT pel.
 posL = xb+(mvLX[0]>>log2(MVUNIT))-NTAP/2+1)
 posR = xb+BW-1+(mvLX[0]>>log2(MVUNIT))+NTAP/2
 posU = yb+(mvLX[1]>>log2(MVUNIT))-NTAP/2+1
 posD = yb+BH-1+(mvLX[1]>>log2(MVUNIT))+NTAP/2
 上記参照画素がコロケートタイル内に入るための制限は以下のとおりである。
posL = xb + (mvLX [0] >> log2 (MVUNIT))-NTAP / 2 + 1)
posR = xb + BW-1 + (mvLX [0] >> log2 (MVUNIT)) + NTAP / 2
posU = yb + (mvLX [1] >> log2 (MVUNIT))-NTAP / 2 + 1
posD = yb + BH-1 + (mvLX [1] >> log2 (MVUNIT)) + NTAP / 2
The restrictions for the reference pixel to enter the collocated tile are as follows.
 posL >= xTs
 posR <= xTs+wT-1
 posU >= yTs
 posD <= yTs+hT-1
である。また、以下の変形ができる。
posL> = xTs
posR <= xTs + wT-1
posU> = yTs
posD <= yTs + hT-1
It is. The following modifications can be made.
 posL = xb+(mvLX[0]>>log2(MVUNIT))-NTAP/2+1 >= xTs
  (mvLX[0]>>log2(MVUNIT)) >= xTs-xb+NTAP/2-1
 posR = xb+BW-1+(mvLX[0]>>log2(MVUNIT))+NTAP/2 <= xTs+wT-1
  (mvLX[0]>>log2(MVUNIT)) <= xTs+wT-1-xb-BW+1-NTAP/2
 posU = yb+(mvLX[1]>>log2(MVUNIT))-NTAP/2+1 >= yTs
  (mvLX[0]>>log2(MVUNIT)) >= yTs-yb+NTAP/2-1
 posD = yb+BH-1+(mvLX[1]>>log2(MVUNIT))+NTAP/2 <= yTs+hT-1
  (mvLX[1]>>log2(MVUNIT)) <= yTs+hT-1-yb-BH+1-NTAP/2
したがって、動きベクトルの制限は、以下の式で導出できる。
posL = xb + (mvLX [0] >> log2 (MVUNIT))-NTAP / 2 + 1> = xTs
(mvLX [0] >> log2 (MVUNIT))> = xTs-xb + NTAP / 2-1
posR = xb + BW-1 + (mvLX [0] >> log2 (MVUNIT)) + NTAP / 2 <= xTs + wT-1
(mvLX [0] >> log2 (MVUNIT)) <= xTs + wT-1-xb-BW + 1-NTAP / 2
posU = yb + (mvLX [1] >> log2 (MVUNIT))-NTAP / 2 + 1> = yTs
(mvLX [0] >> log2 (MVUNIT))> = yTs-yb + NTAP / 2-1
posD = yb + BH-1 + (mvLX [1] >> log2 (MVUNIT)) + NTAP / 2 <= yTs + hT-1
(mvLX [1] >> log2 (MVUNIT)) <= yTs + hT-1-yb-BH + 1-NTAP / 2
Therefore, the motion vector limit can be derived by the following equation.
 mvLX[0] = Clip3(vxmin, vxmax, mvLX[0]) 
 mvLX[1] = Clip3(vymin, vymax, mvLX[1])
ここで
 vxmin = (xTs-xb+NTAP/2-1)<<log2(MVUNIT) 
 vxmax = (xTs+wT-xb-BW-NTAP/2)<<log2(MVUNIT)
 vymin = (yTs-yb+NTAP/2-1)<<log2(MVUNIT)
 vymax = (yTs+hT-yb-BH-NTAP/2)<<log2(MVUNIT)
 independent_tiles_flag=1の場合に、動きベクトルを制限することにより、インター予測で動きベクトルがコロケートタイル内を常に指すことができる。この構成により、タイルシーケンスを独立にインター予測を用いて復号することができる。
mvLX [0] = Clip3 (vxmin, vxmax, mvLX [0])
mvLX [1] = Clip3 (vymin, vymax, mvLX [1])
Where vxmin = (xTs-xb + NTAP / 2-1) << log2 (MVUNIT)
vxmax = (xTs + wT-xb-BW-NTAP / 2) << log2 (MVUNIT)
vymin = (yTs-yb + NTAP / 2-1) << log2 (MVUNIT)
vymax = (yTs + hT-yb-BH-NTAP / 2) << log2 (MVUNIT)
When independent_tiles_flag = 1, by limiting the motion vector, the motion vector can always point in the collocated tile in the inter prediction. With this configuration, tile sequences can be independently decoded using inter prediction.
 予測モードpredModeがイントラ予測モードを示す場合、イントラ予測画像生成部310は、イントラ予測パラメータ復号部304から入力されたイントラ予測パラメータと読み出した参照ピクチャを用いてイントラ予測を行う。具体的には、イントラ予測画像生成部310は、復号対象のピクチャであって、既に復号されたPUのうち、復号対象PUから予め定めた範囲にある隣接PUを参照ピクチャメモリ306から読み出す。予め定めた範囲とは、復号対象PUがいわゆるラスタスキャンの順序で順次移動する場合、例えば、左、左上、上、右上の隣接PUのうちのいずれかであり、イントラ予測モードによって異なる。ラスタスキャンの順序とは、各ピクチャにおいて、上端から下端まで各行について、順次左端から右端まで移動させる順序である。 When the prediction mode predMode indicates the intra prediction mode, the intra predicted image generation unit 310 performs intra prediction using the intra prediction parameter input from the intra prediction parameter decoding unit 304 and the read reference picture. Specifically, the intra predicted image generation unit 310 reads, from the reference picture memory 306, neighboring PUs that are pictures to be decoded and are in a predetermined range from the decoding target PUs among the PUs that have already been decoded. The predetermined range is, for example, one of the left, upper left, upper, and upper right adjacent PUs when the decoding target PU sequentially moves in the so-called raster scan order, and differs depending on the intra prediction mode. The raster scan order is an order in which each row is sequentially moved from the left end to the right end in each picture from the upper end to the lower end.
 イントラ予測画像生成部310は、読み出した隣接PUに基づいてイントラ予測モードIntraPredModeが示す予測モードで予測を行ってPUの予測画像を生成する。イントラ予測画像生成部310は、生成したPUの予測画像を加算部312に出力する。 The intra-predicted image generation unit 310 performs prediction in the prediction mode indicated by the intra-prediction mode IntraPredMode based on the read adjacent PU, and generates a predicted image of the PU. The intra predicted image generation unit 310 outputs the generated predicted image of the PU to the adding unit 312.
 逆量子化・逆変換部311は、エントロピー復号部301から入力された量子化変換係数を逆量子化して変換係数を求める。逆量子化・逆変換部311は、求めた変換係数について逆DCT、逆DST、逆KLT等の逆周波数変換を行い、予測残差信号を算出する。逆量子化・逆変換部311は、算出した残差信号を加算部312に出力する。 The inverse quantization / inverse transform unit 311 performs inverse quantization on the quantized transform coefficient input from the entropy decoding unit 301 to obtain a transform coefficient. The inverse quantization / inverse transform unit 311 performs inverse frequency transform such as inverse DCT, inverse DST, inverse KLT on the obtained transform coefficient, and calculates a prediction residual signal. The inverse quantization / inverse transform unit 311 outputs the calculated residual signal to the adder 312.
 加算部312は、インター予測画像生成部309またはイントラ予測画像生成部310から入力されたPUの予測画像と逆量子化・逆変換部311から入力された残差信号を画素毎に加算して、PUの復号画像を生成する。加算部312は、生成したブロックの復号画像をデブロッキングフィルタ、SAO(サンプル適応オフセット)部、またはALFの少なくとも何れかに出力する。 The addition unit 312 adds the prediction image of the PU input from the inter prediction image generation unit 309 or the intra prediction image generation unit 310 and the residual signal input from the inverse quantization / inverse conversion unit 311 for each pixel, Generate a decoded PU image. The adding unit 312 outputs the generated decoded image of the block to at least one of a deblocking filter, a SAO (sample adaptive offset) unit, and an ALF.
  (タイル合成部の構成)
 タイル合成部2003は、ヘッダ情報復号部2001から伝送されたタイル情報および表示に必要なタイルのTileIdと、タイル復号部2002a~2002nによって復号されたタイルとを参照し、復号画像Tdを生成、合成画像(表示画像)を出力する。タイル合成部2003は、図9(b)に示すように、平滑化処理部20031と合成部20032からなる。平滑化処理部20031は、overlap_tiles_flagが1の場合は、タイル復号部2002で復号された各タイルのオーバーラップ領域を用いてフィルタ処理(平均化処理、重み付き平均化処理)を実施してもよい。つまりオーバーラップ領域に対応する2つ以上のタイルの画素を用いて、1つの画素を導出してもよい。例えば、水平方向に隣接する2つのタイルTile[m-1][n]とTile[m][n]のフィルタ処理後のオーバーラップ領域の画素値tmpは下式で算出される。
(Configuration of tile composition unit)
The tile synthesis unit 2003 generates and synthesizes a decoded image Td by referring to the tile information transmitted from the header information decoding unit 2001, the tile TileId necessary for display, and the tile decoded by the tile decoding units 2002a to 2002n. Outputs an image (display image). As shown in FIG. 9B, the tile composition unit 2003 includes a smoothing processing unit 20031 and a composition unit 20032. When overlap_tiles_flag is 1, the smoothing processing unit 20031 may perform filter processing (averaging processing, weighted averaging processing) using the overlap region of each tile decoded by the tile decoding unit 2002 . That is, one pixel may be derived using pixels of two or more tiles corresponding to the overlap region. For example, the pixel value tmp of the overlap region after the filter processing of two tiles Tile [m−1] [n] and Tile [m] [n] adjacent in the horizontal direction is calculated by the following equation.
  tmp[m][n][x][y] = (Tile[m][n][x][y]+Tile[m-1][n][wT[m-1]-wOVLP[m-1]+x][y]+1)>>1 (式FLT-1)
 ここで、wT[m-1]-wOVLP[m-1]+xは、タイル位置wT[m-1]-wOVLP[m-1]から開始してxだけ右の位置を示す。tmp[m][n][x][y]は、(m,n)位置のタイルにおいて、タイル左上座標を(0,0)として、(x,y)に位置するオーバーラップ領域のフィルタ処理後の画素値を表す。Tile[m][n][x][y]は、(m,n)位置のタイルにおいて、タイル左上座標を(0,0)として、(x,y)に位置するタイルの画素値を表す。また、垂直方向に隣接する2つのタイルTile[m][n-1]とTile[m][n]のフィルタ処理後のオーバーラップ領域の画素値tmpは下式で算出される。
tmp [m] [n] [x] [y] = (Tile [m] [n] [x] [y] + Tile [m-1] [n] [wT [m-1] -wOVLP [m- 1] + x] [y] +1) >> 1 (Formula FLT-1)
Here, wT [m−1] −wOVLP [m−1] + x indicates a position right by x starting from the tile position wT [m−1] −wOVLP [m−1]. tmp [m] [n] [x] [y] is the filter of the overlap area located at (x, y) with the upper left coordinate of the tile at (0,0) in the tile at (m, n) position This represents the subsequent pixel value. Tile [m] [n] [x] [y] represents the pixel value of the tile located at (x, y), where the tile upper left coordinate is (0,0) in the tile at (m, n) position . Further, the pixel value tmp of the overlap region after the filter processing of two tiles Tile [m] [n−1] and Tile [m] [n] adjacent in the vertical direction is calculated by the following equation.
  tmp[m][n][x][y] = (Tile[m][n][x][y]+Tile[m][n-1][x][hT[n-1]-hOVLP[n-1]+y]+1)>>1 (式FLT-2)
 また、4つのタイルTile[m-1][n-1]、Tile[m-1][n]、Tile[m][n-1]、Tile[m][n]がオーバーラップする領域の画素値は下式で算出される。
tmp [m] [n] [x] [y] = (Tile [m] [n] [x] [y] + Tile [m] [n-1] [x] [hT [n-1] -hOVLP [n-1] + y] +1) >> 1 (Formula FLT-2)
The four tiles Tile [m-1] [n-1], Tile [m-1] [n], Tile [m] [n-1], and Tile [m] [n] The pixel value is calculated by the following formula.
  tmp[m][n][x][y] = (Tile[m][n][x][y]+Tile[m][n-1][x][hT[n-1]-hOVLP[n-1]+y]+Tile[m-1][n][wT[m-1]-wOVLP[m-1]+x][y]+Tile[m-1][n-1][wT[m-1]-wOVLP[m-1]+x][hT[n-1]-hOVLP[n-1]+y]+2)>>2 (式FLT-3)
 平滑化処理部20031(フィルタ処理部、平均化処理部、重み付き平均化処理部)は、タイルの画素値とフィルタ処理したオーバーラップ領域の画素値(ここではtmp)を、合成部20032に出力する。
tmp [m] [n] [x] [y] = (Tile [m] [n] [x] [y] + Tile [m] [n-1] [x] [hT [n-1] -hOVLP [n-1] + y] + Tile [m-1] [n] [wT [m-1] -wOVLP [m-1] + x] [y] + Tile [m-1] [n-1] [wT [m-1] -wOVLP [m-1] + x] [hT [n-1] -hOVLP [n-1] + y] +2) >> 2 (Formula FLT-3)
The smoothing processing unit 20031 (filter processing unit, averaging processing unit, weighted averaging processing unit) outputs the pixel value of the tile and the pixel value of the overlapped area (in this case, tmp) to the combining unit 20032. To do.
 合成部20032は、タイルの画素値とオーバーラップ領域の画素値から、ピクチャあるいは制御情報(TileId)で指定された所定の領域を生成する。合成画像の全体あるいは所定の領域Rec[x][y]は例えば下式の単純平均で表される。 The composition unit 20032 generates a predetermined area specified by the picture or control information (TileId) from the pixel value of the tile and the pixel value of the overlap area. The entire composite image or the predetermined region Rec [x] [y] is represented by, for example, the simple average of the following equation.
  Rec[xTsmn+x][yTsmn+y] = tmp[m][n][x][y]   (m!=0 && 0<=x<wOVLP[m-1]、またはn!=0 && 0<=y<hOVLP[n-1])
  Rec[xTsmn+x][yTsmn+y] = Tile[m][n][x][y]  (上記以外で、m=0 or n=0, 0<=x<wT[0]-wOVLP[0], 0<=y<hT[0]-hOVLP[0])
  Rec[xTsmn+x][yTsmn+y] = Tile[m][n][x][y]  (上記以外)
 なお、合成部の出力がピクチャの場合は対応するタイルは全タイル(0<=m<M,0<=n<N)であり、合成部の出力が所定の領域の場合はTileIdが示す(m,n)に対応するタイルを合成する。これらの処理はタイル復号部の外部で実施するため、合成した画像はタイルの復号に使用しない。
Rec [xTsmn + x] [yTsmn + y] = tmp [m] [n] [x] [y] (m! = 0 && 0 <= x <wOVLP [m-1], or n! = 0 && 0 <= y <hOVLP [n-1])
Rec [xTsmn + x] [yTsmn + y] = Tile [m] [n] [x] [y] (Other than above, m = 0 or n = 0, 0 <= x <wT [0] -wOVLP [ 0], 0 <= y <hT [0] -hOVLP [0])
Rec [xTsmn + x] [yTsmn + y] = Tile [m] [n] [x] [y] (other than above)
When the output of the combining unit is a picture, the corresponding tiles are all tiles (0 <= m <M, 0 <= n <N), and when the output of the combining unit is a predetermined area, TileId indicates ( Composite tiles corresponding to m, n). Since these processes are performed outside the tile decoding unit, the synthesized image is not used for decoding the tiles.
 以上の処理により、タイルを独立に復号しつつ、重複して復号した複数のタイル境界を平均化することにより、タイル歪を除去することができる。 Through the above processing, tile distortion can be removed by averaging tile boundaries that have been decoded in duplicate while decoding tiles independently.
  (動画像符号化装置の構成)
 図11(a)に本発明の動画像符号化装置11を示す。動画像符号化装置11は、ピクチャ分割部2010、ヘッダ情報生成部2011、タイル符号化部2012a~2012nおよび符号化ストリーム生成部2013を含んで構成される。
(Configuration of video encoding device)
FIG. 11 (a) shows the moving picture encoding apparatus 11 of the present invention. The moving image encoding apparatus 11 includes a picture dividing unit 2010, a header information generating unit 2011, tile encoding units 2012a to 2012n, and an encoded stream generating unit 2013.
 ピクチャ分割部2010は、複数のタイルに分割し、タイルをタイル符号化部2012a~2012nに伝送する。ヘッダ情報生成部2011は、分割したタイルからタイル情報(TileId、タイルの分割数、サイズ、オーバーラップに関する情報)を生成し、ヘッダ情報として符号化ストリーム生成部2013に伝送する。タイルがオーバーラップする場合のタイルの分割に関しては後述する。 The picture dividing unit 2010 divides the tile into a plurality of tiles and transmits the tiles to the tile encoding units 2012a to 2012n. The header information generation unit 2011 generates tile information (TileId, number of tile divisions, size, overlap information) from the divided tiles, and transmits the generated tile information to the encoded stream generation unit 2013 as header information. The tile division when the tiles overlap will be described later.
 タイル符号化部2012a~2012nは、各タイルを符号化する。また、タイル符号化部2012a~2012nは、タイルシーケンス単位でタイルを符号化する。このように、タイル符号化部2012a~2012nによれば、タイルを並列に符号化処理することができる。 The tile encoders 2012a to 2012n encode each tile. Further, the tile encoding units 2012a to 2012n encode tiles in units of tile sequences. Thus, according to the tile encoding units 2012a to 2012n, tiles can be encoded in parallel.
 ここで、タイル符号化部2012a~2012nは、1つの独立なビデオシーケンスと同様に、タイルシーケンスに対し符号化処理を行い、TileIdの異なるタイルシーケンスの予測情報は符号化処理を行う際に時間的にも空間的にも参照しない。すなわち、タイル符号化部2012a~2012nは、あるピクチャ内のタイルを符号化する場合に、空間的にも時間的にも別のタイルを参照しない。 Here, the tile encoders 2012a to 2012n perform the encoding process on the tile sequence as in the case of one independent video sequence, and the prediction information of the tile sequences having different TileIds is temporally processed when the encoding process is performed. Neither spatially nor spatially. That is, when encoding a tile in a certain picture, the tile encoding units 2012a to 2012n do not refer to another tile both spatially and temporally.
 符号化ストリーム生成部2013は、ヘッダ情報生成部2011から伝送されたタイル情報を含むヘッダ情報と、タイル符号化部2012a~2012nがタイルを符号化して、NALユニット単位の符号化ストリームTeを生成する。 The encoded stream generation unit 2013 generates header information including tile information transmitted from the header information generation unit 2011, and tile encoding units 2012a to 2012n encode the tiles to generate an encoded stream Te in units of NAL units. .
 このように、タイル符号化部2012a~2012nが各タイルを独立に符号化することができるため、複数のタイルを並列に符号化処理することも、復号装置側で複数のタイルを並列に復号処理することも、1つのタイルのみを独立して復号することもできる。 As described above, since the tile encoding units 2012a to 2012n can independently encode each tile, a plurality of tiles can be encoded in parallel, or a plurality of tiles can be decoded in parallel on the decoding device side. Or only one tile can be decoded independently.
  (ピクチャ分割部)
 図11(a)のピクチャ分割部2010は、図11(b)に示すタイル情報算出部20101とピクチャ分割部A 20102からなる。
(Picture division)
The picture dividing unit 2010 in FIG. 11 (a) includes a tile information calculating unit 20101 and a picture dividing unit A 20102 shown in FIG. 11 (b).
 タイル情報算出部20101はピクチャの幅wPictと高さhPict、タイルの単位サイズの幅wUnitTileと高さhUnitTile、分割するタイルの水平方向の個数M、垂直方向の個数N、オーバーラップ領域の幅wOVLP[m]と高さhOVLP[n]から、タイルの幅wT[m]と高さhT[n]、クロップオフセット領域の幅wCRP[m]と高さhCRP[n]を導出する。ここでは、オーバーラップ領域の幅と高さを固定値wOVLP、hOVLPに設定した例を示す。 The tile information calculation unit 20101 includes a picture width wPict and a height hPict, a tile unit size width wUnitTile and a height hUnitTile, a horizontal number M of tiles to be divided, a vertical number N, and an overlap area width wOVLP [ The width wT [m] and height hT [n] of the tile and the width wCRP [m] and height hCRP [n] of the crop offset area are derived from m] and the height hOVLP [n]. Here, an example is shown in which the width and height of the overlap region are set to fixed values wOVLP and hOVLP.
  wT[m] = ceil((wPict+1)/wUnitTile/M)*wUnitTile  (0<=m<=M-2) 
  wT[M-1] = wPict-ΣwCRP[m]+(M-1)*wOVLP (Σはm=0からM-2の総和)
  hT[n] = ceil((hPict+1)/hUnitTile/N)*hUnitTile  (0<=n<=N-2)
  hT[N-1] = hPict-ΣhCRP[n]+(N-1)*hOVLP (Σはn=0からN-2の総和)
  wCRP[M-1] = ceil(wT[M-1]/wUnitTile)*wUnitTile-wT[M-1]
  hCRP[N-1] = ceil(hT[N-1]/hUnitTile)*hUnitTile-hT[N-1]
 なお、wT[m]、hT[n]の算出式は(式TSP-1)~(式TSP-10)のいずれであってもよい。
wT [m] = ceil ((wPict + 1) / wUnitTile / M) * wUnitTile (0 <= m <= M-2)
wT [M-1] = wPict-ΣwCRP [m] + (M-1) * wOVLP (Σ is the sum of m-2 from m = 0)
hT [n] = ceil ((hPict + 1) / hUnitTile / N) * hUnitTile (0 <= n <= N-2)
hT [N-1] = hPict-ΣhCRP [n] + (N-1) * hOVLP (Σ is the sum from n = 0 to N-2)
wCRP [M-1] = ceil (wT [M-1] / wUnitTile) * wUnitTile-wT [M-1]
hCRP [N-1] = ceil (hT [N-1] / hUnitTile) * hUnitTile-hT [N-1]
Note that the formula for calculating wT [m] and hT [n] may be any one of (Formula TSP-1) to (Formula TSP-10).
 また、ピクチャのCTU単位の幅PicWidthInCtbsY、高さPicHeightInCtbsYは、下式で表される。 Also, the width PicWidthInCtbsY and the height PicHeightInCtbsY of the picture in CTU units are expressed by the following equations.
  PicWidthInCtbsY = ΣTileWidthinCtbs[m] (Σはm=0..M-1の総和)
  PicHeightInCtbsY = ΣTileHeightinCtbs[n] (Σはn=0..N-1の総和)
 ここで、TileWidthinCtbs[m]、TileHeightinCtbs[n]はタイルの幅と高さをCTU単位で表したパラメータである。
PicWidthInCtbsY = ΣTileWidthinCtbs [m] (Σ is the sum of m = 0..M-1)
PicHeightInCtbsY = ΣTileHeightinCtbs [n] (Σ is the sum of n = 0..N-1)
Here, TileWidthinCtbs [m] and TileHeightinCtbs [n] are parameters representing the width and height of the tile in CTU units.
  TileWidthinCtbs[m] = ceil(wT[m]/M) 
  TileHeightinCtbs[n] = ceil(hT[n]/N) 
 オーバーラップ領域の幅や高さが大きいほどタイル歪を除去する効果が大きいが、符号量は増加し符号化効率が犠牲になる。適切なオーバーラップ領域の幅wOVLP[m]と高さhOVLP[n]は2~6であってもよい。またタイルの単位サイズはCTUサイズ(wUnitTile=wCTU,hUnitTile=hCTU)であってもよく、オーバーラップ領域の幅wOVLPL[m]と高さhOVLP[n]は全て同じ(例えばwOVLP=hOVLP=sOVLP)であってもよい。以下は、オーバーラップ領域の幅wOVLPL[m]、高さhOVLP[n]を全てsOVLPにセットした時の、図7のタイル情報の算出式の一例である。
TileWidthinCtbs [m] = ceil (wT [m] / M)
TileHeightinCtbs [n] = ceil (hT [n] / N)
The larger the width and height of the overlap region, the greater the effect of removing tile distortion, but the code amount increases and the coding efficiency is sacrificed. A suitable overlap region width wOVLP [m] and height hOVLP [n] may be 2-6. The unit size of tiles may be CTU size (wUnitTile = wCTU, hUnitTile = hCTU), and the width wOVLPL [m] and height hOVLP [n] of the overlap area are all the same (for example, wOVLP = hOVLP = sOVLP) It may be. The following is an example of the tile information calculation formula of FIG. 7 when the overlap area width wOVLPL [m] and height hOVLP [n] are all set to sOVLP.
  wT[m] = ceil((wPict+1)/wCTU/M)*wCTU  (0<=m<=M-2) 
  wT[M-1] = wPict-ΣwCRP[m]+(M-1)*sOVLP (Σはm=0からM-2の総和)
  hT[n] = ceil((hPict+1)/hCTU/N)*hCTU  (0<=n<=N-2)
  hT[N-1] = hPict-ΣhCRP[n]+(N-1)*sOVLP (Σはn=0からN-2の総和)
  wCRP[M-1] = ceil(wT[M-1]/wCTU)*wCTU-wT[M-1]
  hCRP[N-1] = ceil(hT[N-1]/hCTU)*hCTU-hT[N-1]
 タイル情報算出部20101は、ピクチャ分割部A 20102とヘッダ情報生成部2011に、算出したタイル情報を出力する。
wT [m] = ceil ((wPict + 1) / wCTU / M) * wCTU (0 <= m <= M-2)
wT [M-1] = wPict-ΣwCRP [m] + (M-1) * sOVLP (Σ is the sum of m-2 from m = 0)
hT [n] = ceil ((hPict + 1) / hCTU / N) * hCTU (0 <= n <= N-2)
hT [N-1] = hPict-ΣhCRP [n] + (N-1) * sOVLP (Σ is the sum from n = 0 to N-2)
wCRP [M-1] = ceil (wT [M-1] / wCTU) * wCTU-wT [M-1]
hCRP [N-1] = ceil (hT [N-1] / hCTU) * hCTU-hT [N-1]
The tile information calculation unit 20101 outputs the calculated tile information to the picture division unit A 20102 and the header information generation unit 2011.
 ピクチャ分割部A 20102は、タイル情報算出部20101で算出したタイル情報を用いてピクチャをタイルに分割する。つまり、Tile[m][n]はピクチャ上でx座標はxTsmn..(xTsmn+wT[m]-1)、y座標はyTsmn..(yTsmn+hT[n]-1)の領域を抽出し、タイル符号化部2012に出力する。なお、ピクチャの右端、下端のタイルにwCRP[M-1]、hCRP[N-1]のクロップオフセット領域を付加した後、タイル符号化部2012に出力する。 The picture dividing unit A 20102 divides a picture into tiles using the tile information calculated by the tile information calculating unit 20101. In other words, Tile [m] [n] extracts the area of xTsmn .. (xTsmn + wT [m] -1) and yTsmn .. (yTsmn + hT [n] -1) on the picture. And output to the tile encoding unit 2012. Note that a crop offset area of wCRP [M-1] and hCRP [N-1] is added to the right and bottom tiles of the picture, and then output to the tile encoding unit 2012.
  (ヘッダ情報生成部)
 ヘッダ情報生成部2011では、パラメータセットおよびタイル情報をシンタックスの表現に変換して符号化ストリーム生成部2013に出力する。タイル情報のシンタックス表現を下記に示す。
(Header information generator)
In the header information generation unit 2011, the parameter set and tile information are converted into a syntax expression and output to the encoded stream generation unit 2013. The syntax expression of tile information is shown below.
  num_tile_columns_minus1 = M-1
  num_tile_rows_minus1 = N-1
  uniform_spacing_flag = 1      (全てのwT[m]が等しく、全てのhT[n]が等しい場合)
   column_width_minus1 = ceil(wT[0]/wUnitTile)-1
   row_height_minus1 = ceil(hT[0]/hUnitTile)-1
  uniform_spacing_flag = 0     (上記以外)
   column_width_minus1[m] = ceil(wT[m]/wUnitTile)-1
   row_height_minus1[n] = ceil(hT[n]/hUnitTile)-1
  overlap_tiles_flag = 1
  uniform_overlap_flag = 1
  tile_overlap_width_div2 = sOVLP/2
  tile_overlap_height_div2 = sOVLP/2
  (タイル符号化部の構成)
 次に、タイル符号化部2012a~2012nの構成について説明する。図12は、タイル符号化部2012a~2012nの1つである2012の構成を示すブロック図である。タイル符号化部2012は、予測画像生成部101、減算部102、変換・量子化部103、エントロピー符号化部104、逆量子化・逆変換部105、加算部106、ループフィルタ107、予測パラメータメモリ(予測パラメータ記憶部、フレームメモリ)108、参照ピクチャメモリ(参照画像記憶部、フレームメモリ)109、符号化パラメータ決定部110、予測パラメータ符号化部111を含んで構成される。予測パラメータ符号化部111は、インター予測パラメータ符号化部112及びイントラ予測パラメータ符号化部113を含んで構成される。なお、タイル符号化部2012はループフィルタ107が含まれない構成であってもよい。
num_tile_columns_minus1 = M-1
num_tile_rows_minus1 = N-1
uniform_spacing_flag = 1 (when all wT [m] are equal and all hT [n] are equal)
column_width_minus1 = ceil (wT [0] / wUnitTile) -1
row_height_minus1 = ceil (hT [0] / hUnitTile) -1
uniform_spacing_flag = 0 (other than above)
column_width_minus1 [m] = ceil (wT [m] / wUnitTile) -1
row_height_minus1 [n] = ceil (hT [n] / hUnitTile) -1
overlap_tiles_flag = 1
uniform_overlap_flag = 1
tile_overlap_width_div2 = sOVLP / 2
tile_overlap_height_div2 = sOVLP / 2
(Configuration of tile coding unit)
Next, the configuration of the tile encoding units 2012a to 2012n will be described. FIG. 12 is a block diagram illustrating a configuration of 2012, which is one of the tile encoding units 2012a to 2012n. The tile encoding unit 2012 includes a prediction image generation unit 101, a subtraction unit 102, a transform / quantization unit 103, an entropy encoding unit 104, an inverse quantization / inverse transform unit 105, an addition unit 106, a loop filter 107, and a prediction parameter memory. (Prediction parameter storage unit, frame memory) 108, reference picture memory (reference image storage unit, frame memory) 109, encoding parameter determination unit 110, and prediction parameter encoding unit 111. The prediction parameter encoding unit 111 includes an inter prediction parameter encoding unit 112 and an intra prediction parameter encoding unit 113. The tile encoding unit 2012 may be configured not to include the loop filter 107.
 予測画像生成部101は画像Tの各ピクチャについて、そのピクチャを分割した領域であるCU毎にPUの予測画像を生成する。ここで、予測画像生成部101は、予測パラメータ符号化部111から入力された予測パラメータに基づいて参照ピクチャメモリ109から復号済のブロックを読み出す。例えばインター予測の場合、予測画像生成部101は、対象PUを起点として動きベクトルが示す参照ピクチャ上の位置にあるブロックを読み出す。またイントラ予測の場合、イントラ予測モードで使用する隣接PUの画素値を参照ピクチャメモリ109から読み出し、PUの予測画像を生成する。予測画像生成部101は、読み出した参照ピクチャブロックについて複数の予測方式のうちの1つの予測方式を用いてPUの予測画像を生成する。予測画像生成部101は、生成したPUの予測画像を減算部102に出力する。 The predicted image generation unit 101 generates a predicted image of the PU for each picture of the image T for each CU that is an area obtained by dividing the picture. Here, the predicted image generation unit 101 reads a decoded block from the reference picture memory 109 based on the prediction parameter input from the prediction parameter encoding unit 111. For example, in the case of inter prediction, the predicted image generation unit 101 reads out a block at a position on a reference picture indicated by a motion vector with the target PU as a starting point. In the case of intra prediction, the pixel value of the adjacent PU used in the intra prediction mode is read from the reference picture memory 109 to generate a predicted image of the PU. The prediction image generation unit 101 generates a prediction image of the PU using one prediction method among a plurality of prediction methods for the read reference picture block. The predicted image generation unit 101 outputs the generated predicted image of the PU to the subtraction unit 102.
 なお、予測画像生成部101は、タイル境界でのパディング処理を含め、既に説明した予測画像生成部308と同じ動作であり、説明を省略する。 Note that the predicted image generation unit 101 includes the padding process at the tile boundary and has the same operation as the predicted image generation unit 308 already described, and a description thereof will be omitted.
 減算部102は、予測画像生成部101から入力されたPUの予測画像の信号値を、画像Tの対応するPU位置の画素値から減算して、残差信号を生成する。減算部102は、生成した残差信号を変換・量子化部103に出力する。 The subtraction unit 102 subtracts the signal value of the prediction image of the PU input from the prediction image generation unit 101 from the pixel value at the corresponding PU position of the image T to generate a residual signal. The subtraction unit 102 outputs the generated residual signal to the transform / quantization unit 103.
 変換・量子化部103は、減算部102から入力された予測残差信号に対し周波数変換を行い、変換係数を算出する。変換・量子化部103は、算出した変換係数を量子化して量子化変換係数を求める。変換・量子化部103は、求めた量子化変換係数をエントロピー符号化部104及び逆量子化・逆変換部105に出力する。 The transform / quantization unit 103 performs frequency transform on the prediction residual signal input from the subtraction unit 102, and calculates a transform coefficient. The transform / quantization unit 103 quantizes the calculated transform coefficient to obtain a quantized transform coefficient. The transform / quantization unit 103 outputs the obtained quantized transform coefficient to the entropy coding unit 104 and the inverse quantization / inverse transform unit 105.
 エントロピー符号化部104には、変換・量子化部103から量子化変換係数が入力され、予測パラメータ符号化部111から予測パラメータが入力される。 The entropy encoding unit 104 receives the quantized transform coefficient from the transform / quantization unit 103 and the prediction parameter from the prediction parameter encoding unit 111.
 エントロピー符号化部104は、入力された分割情報、予測パラメータ、量子化変換係数等をエントロピー符号化して符号化ストリームTeを生成し、生成した符号化ストリームTeを外部に出力する。 The entropy encoding unit 104 entropy-encodes the input division information, prediction parameters, quantization transform coefficients, and the like to generate an encoded stream Te, and outputs the generated encoded stream Te to the outside.
 逆量子化・逆変換部105は、タイル復号部2002における、逆量子化・逆変換部311(図10)と同じであり、変換・量子化部103から入力された量子化変換係数を逆量子化して変換係数を求める。逆量子化・逆変換部105は、求めた変換係数について逆変換を行い、残差信号を算出する。逆量子化・逆変換部105は、算出した残差信号を加算部106に出力する。 The inverse quantization / inverse transform unit 105 is the same as the inverse quantization / inverse transform unit 311 (FIG. 10) in the tile decoding unit 2002, and the quantized transform coefficient input from the transform / quantization unit 103 is inversely quantized. To obtain the conversion coefficient. The inverse quantization / inverse transform unit 105 performs inverse transform on the obtained transform coefficient to calculate a residual signal. The inverse quantization / inverse transform unit 105 outputs the calculated residual signal to the addition unit 106.
 加算部106は、予測画像生成部101から入力されたPUの予測画像の信号値と逆量子化・逆変換部105から入力された残差信号の信号値を画素毎に加算して、復号画像を生成する。加算部106は、生成した復号画像を参照ピクチャメモリ109に記憶する。 The addition unit 106 adds the signal value of the prediction image of the PU input from the prediction image generation unit 101 and the signal value of the residual signal input from the inverse quantization / inverse conversion unit 105 for each pixel, and generates a decoded image. Is generated. The adding unit 106 stores the generated decoded image in the reference picture memory 109.
 ループフィルタ107は加算部106が生成した復号画像に対し、デブロッキングフィルタ、サンプル適応オフセット(SAO)、適応ループフィルタ(ALF)を施す。なお、ループフィルタ107は、必ずしも上記3種類のフィルタを含まなくてもよく、例えばデブロッキングフィルタのみの構成であってもよい。 The loop filter 107 performs a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) on the decoded image generated by the adding unit 106. Note that the loop filter 107 does not necessarily include the above three types of filters, and may have a configuration including only a deblocking filter, for example.
 予測パラメータメモリ108は、符号化パラメータ決定部110が生成した予測パラメータを、符号化対象のピクチャ及びCU毎に予め定めた位置に記憶する。 The prediction parameter memory 108 stores the prediction parameter generated by the encoding parameter determination unit 110 at a predetermined position for each encoding target picture and CU.
 参照ピクチャメモリ109は、ループフィルタ107が生成した復号画像を、符号化対象のピクチャ及びCU毎に予め定めた位置に記憶する。 The reference picture memory 109 stores the decoded image generated by the loop filter 107 at a predetermined position for each picture to be encoded and each CU.
 符号化パラメータ決定部110は、符号化パラメータの複数のセットのうち、1つのセットを選択する。符号化パラメータとは、上述したQTあるいはBT分割パラメータや予測パラメータやこれらに関連して生成される符号化の対象となるパラメータである。予測画像生成部101は、これらの符号化パラメータのセットの各々を用いてPUの予測画像を生成する。 The encoding parameter determination unit 110 selects one set from among a plurality of sets of encoding parameters. The encoding parameter is the above-described QT or BT partition parameter, prediction parameter, or parameter to be encoded that is generated in association with these parameters. The predicted image generation unit 101 generates a predicted image of the PU using each of these encoding parameter sets.
 符号化パラメータ決定部110は、複数のセットの各々について情報量の大きさと符号化誤差を示すRDコスト値を算出する。RDコスト値は、例えば、符号量と二乗誤差に係数λを乗じた値との和である。符号量は、残差信号と符号化パラメータをエントロピー符号化して得られる符号化ストリームTeの情報量である。二乗誤差は、減算部102において算出された残差信号の残差値の二乗値についての画素間の総和である。係数λは、予め設定されたゼロよりも大きい実数である。符号化パラメータ決定部110は、算出したRDコスト値が最小となる符号化パラメータのセットを選択する。これにより、エントロピー符号化部104は、選択した符号化パラメータのセットを符号化ストリームTeとして外部に出力し、選択されなかった符号化パラメータのセットを出力しない。符号化パラメータ決定部110は決定した符号化パラメータを予測パラメータメモリ108に記憶する。 The encoding parameter determination unit 110 calculates an RD cost value indicating the amount of information and the encoding error for each of a plurality of sets. The RD cost value is, for example, the sum of a code amount and a square error multiplied by a coefficient λ. The code amount is the information amount of the encoded stream Te obtained by entropy encoding the residual signal and the encoding parameter. The square error is the sum between pixels regarding the square value of the residual value of the residual signal calculated by the subtracting unit 102. The coefficient λ is a real number larger than a preset zero. The encoding parameter determination unit 110 selects a set of encoding parameters that minimizes the calculated RD cost value. As a result, the entropy encoding unit 104 outputs the selected set of encoding parameters to the outside as the encoded stream Te, and does not output the set of unselected encoding parameters. The encoding parameter determination unit 110 stores the determined encoding parameter in the prediction parameter memory 108.
 予測パラメータ符号化部111は、符号化パラメータ決定部110から入力されたパラメータから、符号化するための形式を導出し、エントロピー符号化部104に出力する。符号化するための形式の導出とは、例えば動きベクトルと予測ベクトルから差分ベクトルを導出することである。また予測パラメータ符号化部111は、符号化パラメータ決定部110から入力されたパラメータから予測画像を生成するために必要なパラメータを導出し、予測画像生成部101に出力する。予測画像を生成するために必要なパラメータとは、例えばサブブロック単位の動きベクトルである。 The prediction parameter encoding unit 111 derives a format for encoding from the parameters input from the encoding parameter determination unit 110 and outputs the format to the entropy encoding unit 104. Deriving the format for encoding is, for example, deriving a difference vector from a motion vector and a prediction vector. Also, the prediction parameter encoding unit 111 derives parameters necessary for generating a prediction image from the parameters input from the encoding parameter determination unit 110 and outputs the parameters to the prediction image generation unit 101. The parameter necessary for generating the predicted image is, for example, a motion vector in units of sub-blocks.
 インター予測パラメータ符号化部112は、符号化パラメータ決定部110から入力された予測パラメータに基づいて、差分ベクトルのようなインター予測パラメータを導出する。インター予測パラメータ符号化部112は、予測画像生成部101に出力する予測画像の生成に必要なパラメータを導出する構成として、インター予測パラメータ復号部303がインター予測パラメータを導出する構成と一部同一の構成を含む。 The inter prediction parameter encoding unit 112 derives an inter prediction parameter such as a difference vector based on the prediction parameter input from the encoding parameter determination unit 110. The inter prediction parameter encoding unit 112 is partially the same as the configuration in which the inter prediction parameter decoding unit 303 derives the inter prediction parameters as a configuration for deriving parameters necessary for generating the prediction image to be output to the prediction image generating unit 101. Includes configuration.
 また、イントラ予測パラメータ符号化部113は、予測画像生成部101に出力する予測画像の生成に必要な予測パラメータを導出する構成として、イントラ予測パラメータ復号部304がイントラ予測パラメータを導出する構成と、一部同一の構成を含む。 In addition, the intra prediction parameter encoding unit 113 has a configuration in which the intra prediction parameter decoding unit 304 derives an intra prediction parameter as a configuration for deriving a prediction parameter necessary for generating a prediction image to be output to the prediction image generating unit 101. Some parts have the same configuration.
 以上の処理により、タイルを独立に符号化しつつ、重複して符号化した複数のタイル境界を動画像復号装置側でフィルタ処理することにより、タイル歪を除去することができる。 Through the above processing, tile distortion can be removed by filtering a plurality of overlapping tile boundaries on the video decoding device side while encoding the tiles independently.
  (変形例1)
 本願の変形例1は、ピクチャのタイルへの分割方法を図7に示す分割方法から図13に示す分割方法に変更したものである。図7と図13の違いは、図7は、オーバーラップ領域を含むタイルであり、図13は、オーバーラップ領域の他に、利用されない領域であるクロップオフセット領域を含むタイルである。つまり図13のタイルは、画面端のタイルを含む全てのタイルがクロップオフセット領域を含んでもよい。図13(b)は水平方向に隣接するTile[0][0]とTile[1][0]を示す図であるが、タイルはオーバーラップ領域(斜線部)とクロップオフセット領域(横線部)を含む。また、タイルの幅wT[m]、高さhT[n]と、クロップオフセット領域の幅wCRP[m]、高さhCRP[n]は下記の関係がある。
(Modification 1)
In Modification 1 of the present application, the method of dividing a picture into tiles is changed from the dividing method shown in FIG. 7 to the dividing method shown in FIG. 7 differs from FIG. 13 in that FIG. 7 is a tile including an overlap area, and FIG. 13 is a tile including a crop offset area that is an unused area in addition to the overlap area. That is, all the tiles including the tiles at the screen edge may include the crop offset area. Fig. 13 (b) is a diagram showing Tile [0] [0] and Tile [1] [0] that are adjacent in the horizontal direction. The tile is the overlap area (shaded area) and the crop offset area (horizontal line area). including. Further, the width wT [m] and the height hT [n] of the tile and the width wCRP [m] and the height hCRP [n] of the crop offset area have the following relationship.
  wT[m]+wCRP[m] = wTile[m] = wCTU*a  (式TCS-1)
  hT[n]+hCRP[n] = hTile[n] = hCTU*b
  wTile[m] = TileWidthinCtbs[m]<<CtbLog2SizeY
  hTile[n] = TileHeightinCtbs[n]<<CtbLog2SizeY
  wCRP[m] = wTile[m]-wT[m]  (式CRP-1)
  hCRP[n] = hTile[n]-hT[n]
 ここでwTile[m]、hTile[n]は符号化するタイルの幅と高さである。それ以外は実施形態2と同じである。
wT [m] + wCRP [m] = wTile [m] = wCTU * a (Formula TCS-1)
hT [n] + hCRP [n] = hTile [n] = hCTU * b
wTile [m] = TileWidthinCtbs [m] << CtbLog2SizeY
hTile [n] = TileHeightinCtbs [n] << CtbLog2SizeY
wCRP [m] = wTile [m] -wT [m] (Formula CRP-1)
hCRP [n] = hTile [n] -hT [n]
Here, wTile [m] and hTile [n] are the width and height of the tile to be encoded. The rest is the same as in the second embodiment.
 図13のようにピクチャをタイルに分割することで、タイルの左上座標をCTUの整数倍の位置に設定することができる。従って、実施形態2の効果に加え、個々のタイルへのアクセスが簡単になるという効果もある。 分割 By dividing the picture into tiles as shown in Fig. 13, the upper left coordinate of the tile can be set at a position that is an integral multiple of the CTU. Therefore, in addition to the effect of the second embodiment, there is an effect that access to individual tiles is simplified.
 (CTUの整数倍によらないタイル分割の詳細)
 さらに図21~図24を用いて、CTUの整数倍によらないタイル分割の動作と効果を説明する。図21は、ピクチャ境界以外では、CTUの整数倍にタイルサイズを限定したピクチャ分割を示す図である。図21(a)は、タイルサイズがCTUの整数倍であって、1920x1080のHD画像を4x3のタイルに分割する図である。図に示すように、CTUサイズを例えば128x128とすると、タイルサイズがCTUの整数倍の場合、4x3のタイルに分割しようとしても等サイズに分割できない(512x384,384x384,512x312,384x312に分割される)ため、複数のプロセッサやハードウェアに処理を分割しても等しくロードバランスできない課題がある。図21(b)は、各タイルのCTU分割を表す図である。ピクチャ境界にかからないタイルは整数個のCTUに分割される。ピクチャ境界のタイルをCTU単位で分割する時には、ピクチャ外の領域をクロップオフセット領域として扱う。
(Details of tile division not depending on integer multiple of CTU)
Further, the operation and effect of tile division not depending on an integer multiple of CTU will be described with reference to FIGS. FIG. 21 is a diagram illustrating picture division in which the tile size is limited to an integral multiple of the CTU except for picture boundaries. FIG. 21A is a diagram in which a tile size is an integral multiple of a CTU and a 1920 × 1080 HD image is divided into 4 × 3 tiles. As shown in the figure, if the CTU size is 128x128, for example, if the tile size is an integer multiple of the CTU, even if you try to divide it into 4x3 tiles, it cannot be divided into equal sizes (divided into 512x384, 384x384, 512x312, 384x312) Therefore, there is a problem that load balancing cannot be performed even if the processing is divided into a plurality of processors and hardware. FIG. 21 (b) is a diagram showing CTU partitioning of each tile. Tiles that do not reach picture boundaries are divided into an integer number of CTUs. When dividing a picture boundary tile into CTU units, the area outside the picture is treated as a crop offset area.
 図22(a)は、本実施形態の技術であって、1920x1080のHD画像を4x3のタイルに分割する図である。4x3のタイルに分割する時に、全タイルを等サイズに分割できる(480x360に分割される)ため、複数のプロセッサやハードウェアに等しくロードバランスできる効果を有する。タイルサイズはピクチャ境界にかかわらずCTUの整数倍以外のサイズをとることが可能である。図22(b)は各タイルのCTU分割を表す図である。CTUに分割する際、タイルサイズがCTUサイズの整数倍でない場合は、タイルの外側にクロップオフセット領域を設ける。特にTILE Bに示すように、CTUは各タイルの左上を基準として分割される。したがって、CTUの左上座標は、CTUサイズの整数倍に限定されない。 FIG. 22 (a) is a technique of the present embodiment, in which a 1920x1080 HD image is divided into 4x3 tiles. When dividing into 4x3 tiles, all the tiles can be divided into equal sizes (divided into 480x360), which has the effect of being able to equally load balance multiple processors and hardware. The tile size can take a size other than an integer multiple of the CTU regardless of the picture boundary. FIG. 22 (b) is a diagram showing CTU division of each tile. When dividing into CTUs, if the tile size is not an integral multiple of the CTU size, a crop offset area is provided outside the tile. In particular, as shown in TILE B, the CTU is divided based on the upper left of each tile. Therefore, the upper left coordinate of the CTU is not limited to an integer multiple of the CTU size.
 図23は、CTUの整数倍のタイルサイズにおけるスライスデータのシンタックス例である。CTU単位の符号化データであるCTUデータのシンタックスcoding_tree_unitをスライスデータ内のCTUの個数分だけ呼び出す。coding_tree_unitにおいて、タイルサイズがCTUの整数倍の場合、ピクチャをCTU単位で分割するためピクチャ内CTUアドレスCtbAddrInRsから一意にCTUの左上座標(xCtb,yCtb)を導出することができる。すなわち、coding_tree_unitにおいて、CTUの左上座標(xCtb,yCtb)は、ピクチャ内CTUアドレスCtbAddrInRsに基づいて、CTUの整数倍となるように1<<CtbLog2SizeY倍に設定される。ここでCtbAddrInTsは、タイル単位でのCTUのラスタスキャンを行うタイルスキャンアドレスである。CtbAddrInRsはピクチャ単位でのCTUのラスタスキャンアドレスを表し、0~PicSizeInCtbsY-1である。 Fig. 23 shows an example of the syntax of slice data at a tile size that is an integral multiple of the CTU. Call the syntax coding_tree_unit of CTU data, which is encoded data in CTU units, for the number of CTUs in the slice data. In coding_tree_unit, when the tile size is an integer multiple of CTU, the upper left coordinates (xCtb, yCtb) of the CTU can be uniquely derived from the CTU address CtbAddrInRs in the picture because the picture is divided in CTU units. That is, in coding_tree_unit, the upper left coordinates (xCtb, yCtb) of the CTU are set to 1 << CtbLog2SizeY times so as to be an integer multiple of the CTU based on the intra-picture CTU address CtbAddrInRs. Here, CtbAddrInTs is a tile scan address for performing raster scan of the CTU in tile units. CtbAddrInRs represents the raster scan address of the CTU in units of pictures, and is 0 to PicSizeInCtbsY-1.
  PicSizeInCtbsY = PicWidthInCtbsY*PicHeightInCtbsY
 図24は、本実施形態におけるスライスデータのシンタックス例である。本実施形態においても、CTU単位の符号化データであるCTUデータのシンタックスcoding_tree_unitをスライスデータ内のCTUの個数分だけ呼び出す。実施形態においては、ピクチャをCTU単位で分割しないために、ピクチャ内CTUアドレスCtbAddrInRsから一意にCTUの左上座標(xCtb,yCtb)を導出することができない。したがって、タイル左上座標に基づいてCTU座標を導出する。具体的には、対象タイルのIDがTileIdであり、対象タイルの左上座標が(TileAddrX[TileId],TileAddrY[TileId])で示される場合に、以下の式を用いて、CTU座標を導出する。
PicSizeInCtbsY = PicWidthInCtbsY * PicHeightInCtbsY
FIG. 24 shows a syntax example of slice data in the present embodiment. Also in this embodiment, CTU data syntax coding_tree_unit, which is encoded data in CTU units, is called for the number of CTUs in the slice data. In the embodiment, since the picture is not divided in units of CTUs, the upper left coordinates (xCtb, yCtb) of the CTU cannot be uniquely derived from the intra-picture CTU address CtbAddrInRs. Therefore, CTU coordinates are derived based on the upper left coordinates of the tile. Specifically, when the ID of the target tile is TileId and the upper left coordinates of the target tile are indicated by (TileAddrX [TileId], TileAddrY [TileId]), the CTU coordinates are derived using the following formula.
 xCtb = (CtbAddrInTile%TileWidthinCtbs[TildId])<<CtbLog2SizeY+TileAddrX[TileId]
 yCtb = (CtbAddrInTile/TileWidthinCtbs[TildId])<<CtbLog2SizeY+TileAddrY[TileId]
 ここでCtbAddrInTileはタイルの先頭を0とした、CTUのタイル内ラスタスキャン位置である。タイル先頭のCTUアドレスをfirstCtbAddrInTsとすると、CtbAddrInTileは下式で表される。ここで、CtbAddrInTsはピクチャを通してのタイルスキャンアドレスである。
xCtb = (CtbAddrInTile% TileWidthinCtbs [TildId]) << CtbLog2SizeY + TileAddrX [TileId]
yCtb = (CtbAddrInTile / TileWidthinCtbs [TildId]) << CtbLog2SizeY + TileAddrY [TileId]
Here, CtbAddrInTile is the raster scan position within the tile of the CTU, where the top of the tile is 0. If the CTU address at the top of the tile is firstCtbAddrInTs, CtbAddrInTile is expressed by the following equation. Here, CtbAddrInTs is a tile scan address through a picture.
  CtbAddrInTile = CtbAddrInTs-firstCtbAddrInTile
 すなわち、本実施形態においては、タイル内のCTUアドレスCtbAddrInTileから導出される、CTUのタイル内座標((CtbAddrInTile%TileWidthinCtbs[TildId])<<CtbLog2SizeY,(CtbAddrInTile/TileWidthinCtbs[TildId])<<CtbLog2SizeY)と、タイル左上位置のピクチャ内座標(TileAddrX[TileId],TileAddrY[TileId])とを用いて、CTU位置のピクチャ内座標を導出する。つまり、CTUの左上座標(xCtb,yCtb)を、CTUのタイル内座標とタイル先頭のピクチャ内座標の和から導出しても良い。
CtbAddrInTile = CtbAddrInTs-firstCtbAddrInTile
That is, in this embodiment, the CTU in-tile coordinates ((CtbAddrInTile% TileWidthinCtbs [TildId]) << CtbLog2SizeY, (CtbAddrInTile / TileWidthinCtbs [TildId]) << CtbLog2SizeY) derived from the CTU address CtbAddrInTile in the tile Then, using the in-picture coordinates (TileAddrX [TileId], TileAddrY [TileId]) at the upper left position of the tile, the in-picture coordinates at the CTU position are derived. That is, the upper left coordinates (xCtb, yCtb) of the CTU may be derived from the sum of the in-tile coordinates of the CTU and the in-picture coordinates at the head of the tile.
 ここで、TileIdのタイルの左上座標(TileAddrX[TileId],TileAddrY[TileId])は、既に説明した(m,n)の位置のタイルの左上座標(xTsmn,yTsmn)を用いて、以下で表現してもよい。 Here, the upper left coordinates (TileAddrX [TileId], TileAddrY [TileId]) of the tile of TileId are expressed below using the upper left coordinates (xTsmn, yTsmn) of the tile at the position (m, n) already described. May be.
 TileId=n*M+m
 TileAddrX[TileId]=xTsmn
 TileAddrY[TileId]=yTsmn
 TileWidthinCtbs[TileId]=ceil(wT[m]/wCTU)
 TileHeightinCtbs[TileId]=ceil(hT[n]/hCTU)
 すなわち、以下の式を用いて、CTU座標を導出してもよい。
TileId = n * M + m
TileAddrX [TileId] = xTsmn
TileAddrY [TileId] = yTsmn
TileWidthinCtbs [TileId] = ceil (wT [m] / wCTU)
TileHeightinCtbs [TileId] = ceil (hT [n] / hCTU)
That is, the CTU coordinates may be derived using the following equation.
 xCtb = (CtbAddrInTile%ceil(wT[m]/wCTU))<<CtbLog2SizeY+xTsmn
 yCtb = (CtbAddrInTile/ceil(wT[m]/hCTU))<<CtbLog2SizeY+yTsmn
 また、column_width_minus1、row_height_minus1のシンタックスを用いてCTU座標を導出してもよい。
xCtb = (CtbAddrInTile% ceil (wT [m] / wCTU)) << CtbLog2SizeY + xTsmn
yCtb = (CtbAddrInTile / ceil (wT [m] / hCTU)) << CtbLog2SizeY + yTsmn
Further, CTU coordinates may be derived using the syntax of column_width_minus1 and row_height_minus1.
 xCtb = (CtbAddrInTile%(column_width_minus1[m]+1))<<CtbLog2SizeY+xTsmn
 yCtb = (CtbAddrInTile/(column_width_minus1[m]+1))<<CtbLog2SizeY+yTsmn
 上記の実施形態の構成では、CTUの座標をタイルの左上座標に基づいて導出するため、ピクチャを分割した単位によらずにタイルを位置付けた場合にも、CTU単位の処理を行うことができる。後述するリージョンを導入する実施形態4以降の場合は、タイル左上座標を任意の場所に位置することができる本実施形態は特に効果がある。
xCtb = (CtbAddrInTile% (column_width_minus1 [m] +1)) << CtbLog2SizeY + xTsmn
yCtb = (CtbAddrInTile / (column_width_minus1 [m] +1)) << CtbLog2SizeY + yTsmn
In the configuration of the above embodiment, since the CTU coordinate is derived based on the upper left coordinate of the tile, even when the tile is positioned regardless of the unit in which the picture is divided, the processing in units of CTU can be performed. In the case of Embodiment 4 or later that introduces a region to be described later, this embodiment in which the upper left coordinate of the tile can be located at an arbitrary place is particularly effective.
 上記で説明した動画像符号化装置11、動画像復号装置31の動作を、図14のフローチャートで説明する。 The operations of the video encoding device 11 and the video decoding device 31 described above will be described with reference to the flowchart of FIG.
 図14(a)は動画像符号化装置11の処理の流れである。 FIG. 14 (a) shows the flow of processing of the video encoding device 11.
 タイル情報算出部20101は、タイルの個数やオーバーラップ領域を設定し、タイルに関する情報(幅、高さ、左上座標、あればクロップオフセット領域)を算出する(S1500)。 The tile information calculation unit 20101 sets the number of tiles and the overlap area, and calculates information about the tile (width, height, upper left coordinates, and crop offset area if any) (S1500).
 ピクチャ分割部A 20102は、図7あるいは図13のように、オーバーラップを許してピクチャをタイルに分割する(S1502)。 The picture dividing unit A 20102 divides the picture into tiles allowing overlap as shown in FIG. 7 or FIG. 13 (S1502).
 ヘッダ情報生成部2011は、タイル情報のシンタックスを生成し、SPS、PPS、スライスヘッダ等のヘッダ情報を生成する(S1504)。 The header information generation unit 2011 generates the tile information syntax and generates header information such as SPS, PPS, and slice header (S1504).
 タイル符号化部2012は、各タイルを符号化する(S1506)。 The tile encoding unit 2012 encodes each tile (S1506).
 符号化ストリーム生成部2013は、ヘッダ情報と各タイルの符号化ストリームから、符号化ストリームTeを生成する(S1508)。 The encoded stream generation unit 2013 generates an encoded stream Te from the header information and the encoded stream of each tile (S1508).
 図14(b)は動画像復号装置31の処理の流れである。 FIG. 14 (b) shows the processing flow of the video decoding device 31.
 ヘッダ情報復号部2001は、ヘッダを復号し、タイル情報(タイルの個数、幅、高さ、左上座標、オーバーラップ幅、高さ、あればクロップオフセット領域)を設定、あるいは算出する。また、外部から指定された表示領域をカバーするために必要なタイルの識別子を導出する(S1520)。 The header information decoding unit 2001 decodes the header, and sets or calculates tile information (number of tiles, width, height, upper left coordinates, overlap width, height, if any, crop offset area). In addition, a tile identifier necessary for covering the display area designated from the outside is derived (S1520).
 タイル復号部2002は、各タイルを復号する(S1522)。 The tile decoding unit 2002 decodes each tile (S1522).
 平滑化処理部20031は、各タイルのオーバーラップ領域に対しフィルタ処理を施す(S1524)。 The smoothing processing unit 20031 performs a filtering process on the overlap area of each tile (S1524).
 合成部20032は、フィルタ処理を施した領域を含む各タイルを合成し、ピクチャを生成する(S1526)。 The synthesizing unit 20032 synthesizes each tile including the filtered area to generate a picture (S1526).
  (実施形態2)
 本願の実施形態2では、フィルタ処理について説明する。
(Embodiment 2)
In the second embodiment of the present application, filter processing will be described.
 実施形態1のフィルタ処理では、複数個のオーバーラップ領域の画素値を単純平均することで、タイル境界に隣接する領域の画素値を算出した。実施形態2では、タイル境界からの距離に依存して重みを変更する重み付き和によりフィルタ処理を実施する例を説明する。 In the filter processing of the first embodiment, the pixel values of the areas adjacent to the tile boundary are calculated by simply averaging the pixel values of the plurality of overlapping areas. In the second embodiment, an example will be described in which filtering is performed by a weighted sum that changes the weight depending on the distance from the tile boundary.
 図9に示す、タイル合成部2003の平滑化処理部20031は以下を実施する。タイル合成部2003以外の動作は実施形態1で説明した動作と同じであり、説明を省略する。 The smoothing processing unit 20031 of the tile composition unit 2003 shown in FIG. Operations other than the tile composition unit 2003 are the same as those described in the first embodiment, and a description thereof will be omitted.
 平滑化処理部20031では、図8に示すように、タイル境界からの距離に応じて、重み係数ww[x]を設定する。図8(a)は、図7において、水平方向に隣接する2つのタイルTile[m-1][n]とTile[m][n]のオーバーラップ領域のフィルタ処理を説明する図である。Tile[m][n]の重み係数はww[x]、Tile[m-1][n]の重み係数は1-ww[x]である。ここで、0<ww[x]<1である。Tile[m][n]、Tile[m-1][n]において、オーバーラップ領域の外側の画素では重み係数ww[x]は0か1にセットされ、オーバーラップ領域の重み係数は線形補間で導出される。 The smoothing processing unit 20031 sets a weighting coefficient ww [x] according to the distance from the tile boundary as shown in FIG. FIG. 8A is a diagram for explaining the filter processing of the overlapping region of two tiles Tile [m−1] [n] and Tile [m] [n] adjacent in the horizontal direction in FIG. The weighting coefficient of Tile [m] [n] is ww [x], and the weighting coefficient of Tile [m-1] [n] is 1-ww [x]. Here, 0 <ww [x] <1. In Tile [m] [n] and Tile [m-1] [n], the weight coefficient ww [x] is set to 0 or 1 for the pixels outside the overlap area, and the weight coefficient in the overlap area is linearly interpolated. Is derived by
  ww[x] = 1/(wOVLP+1)*(x+1)       (0<=x<wOVLP)
 この重み係数を用いて、タイルTile[m-1][n]とTile[m][n]のオーバーラップ領域の画素値は下式で算出される。
ww [x] = 1 / (wOVLP + 1) * (x + 1) (0 <= x <wOVLP)
Using this weighting factor, the pixel value of the overlap region of the tiles Tile [m−1] [n] and Tile [m] [n] is calculated by the following equation.
  xx = wT[m-1]-wOVLP+x
  Tile[m-1][n][xx][y] = Tile[m-1][n][xx][y]*(1-ww[x])+Tile[m][n][x][y]*ww[x]
  (1<m<M-1)
上式では、Tile[m-1][n]の右側のオーバーラップ領域(図8(a)のOVLP_RIGHT)の画素値を、フィルタ後の画素値に置き換えている。
xx = wT [m-1] -wOVLP + x
Tile [m-1] [n] [xx] [y] = Tile [m-1] [n] [xx] [y] * (1-ww [x]) + Tile [m] [n] [x ] [y] * ww [x]
(1 <m <M-1)
In the above equation, the pixel value of the overlap region on the right side of Tile [m−1] [n] (OVLP_RIGHT in FIG. 8A) is replaced with the pixel value after filtering.
 同様に、図16(a)に示す垂直方向に隣接する2つのタイルのオーバーラップ領域のフィルタ処理を説明する。図16(a)は、図13に示すタイルのうち、Tile[m][n-1]とTile[m][n]を抽出した図である。Tile[m][n]の重み係数をwh[y]、Tile[m][n-1]の重み係数を1-wh[y](0<wh[y]<1)とすると、Tile[m][n] 、Tile[m][n-1]において、オーバーラップ領域の外側の画素では重み係数wh[y]は0か1にセットされ、オーバーラップ領域の重み係数は線形補間で導出される。 Similarly, the filtering process of the overlapping area of two tiles adjacent in the vertical direction shown in FIG. 16 (a) will be described. FIG. 16 (a) is a diagram in which Tile [m] [n-1] and Tile [m] [n] are extracted from the tiles shown in FIG. If the weighting factor for Tile [m] [n] is wh [y] and the weighting factor for Tile [m] [n-1] is 1-wh [y] (0 <wh [y] <1), then Tile [ In m] [n] and Tile [m] [n-1], the weighting factor wh [y] is set to 0 or 1 for the pixels outside the overlapping region, and the weighting factor for the overlapping region is derived by linear interpolation. Is done.
  wh[y] = 1/(hOVLP+1)*(y+1)       (0<=y<hOVLP)
 この重み係数を用いて、タイルTile[m][n-1]とTile[m][n]のオーバーラップ領域の画素値は下式で算出される。
wh [y] = 1 / (hOVLP + 1) * (y + 1) (0 <= y <hOVLP)
Using this weighting factor, the pixel value of the overlap region of tiles Tile [m] [n-1] and Tile [m] [n] is calculated by the following equation.
  yy = hT[n-1]-hOVLP+y
  Tile[m][n-1][x][yy] = Tile[m][n-1][x][yy]*(1-wh[y])+Tile[m][n][x][y]*wh[y]
  (1<n<N-1)
上式では、Tile[m][n-1]の下側のオーバーラップ領域(OVLP_BOTTOM)の画素値を、フィルタ後の画素値に置き換えている。
yy = hT [n-1] -hOVLP + y
Tile [m] [n-1] [x] [yy] = Tile [m] [n-1] [x] [yy] * (1-wh [y]) + Tile [m] [n] [x ] [y] * wh [y]
(1 <n <N-1)
In the above equation, the pixel value in the overlap region (OVLP_BOTTOM) below Tile [m] [n-1] is replaced with the pixel value after filtering.
 合成部20032では、各タイルの非オーバーラップ領域と平滑化処理部20031でフィルタ処理したオーバーラップ領域を合成して、合成画像(表示画像)Rec[][]を生成する。 The synthesizing unit 20032 synthesizes the non-overlap area of each tile and the overlap area filtered by the smoothing processing unit 20031, and generates a synthesized image (display image) Rec [] [].
  Rec[xTsmn+x][yTsmn+y] = Tile[0][0][x][y]  (m=n=0, 0<=x<wT[0], 0<=y<hT[0])
  Rec[xTsmn+x][yTsmn+y] = Tile[m][0][x][y]   (m!=0, n=0, wOVLP<=x<wT[m], 0<=y<hT[n])
  Rec[xTsmn+x][yTsmn+y] = Tile[0][n][x][y]   (m=0, n!=0, 0<=x<wT[m], hOVLP<=y<hT[n])
  Rec[xTsmn+x][yTsmn+y] = Tile[m][n][x][y]   (m!=0, n!=0, wOVLP<=x<wOVLP[m], hOVLP<=y<hT[n])
 フィルタ処理後の画素値は、Tile[m][n]の左側、あるいは上側のタイルのオーバーラップ領域(図8のOVLP_RIGHT、図16のOVLP_BOTTOM)にセットされたので、ピクチャを合成する場合は、これらの画素値を用い、Tile[m][n]の左側、あるいは上側のオーバーラップ領域(図8のOVLP_LEFT、図16のOVLP_ABOVE)は使用しない。
Rec [xTsmn + x] [yTsmn + y] = Tile [0] [0] [x] [y] (m = n = 0, 0 <= x <wT [0], 0 <= y <hT [0 ])
Rec [xTsmn + x] [yTsmn + y] = Tile [m] [0] [x] [y] (m! = 0, n = 0, wOVLP <= x <wT [m], 0 <= y < hT [n])
Rec [xTsmn + x] [yTsmn + y] = Tile [0] [n] [x] [y] (m = 0, n! = 0, 0 <= x <wT [m], hOVLP <= y < hT [n])
Rec [xTsmn + x] [yTsmn + y] = Tile [m] [n] [x] [y] (m! = 0, n! = 0, wOVLP <= x <wOVLP [m], hOVLP <= y <hT [n])
The pixel value after filtering is set in the overlap area (OVLP_RIGHT in FIG. 8 and OVLP_BOTTOM in FIG. 16) on the left side or the upper side of Tile [m] [n]. Using these pixel values, the overlap region (OVLP_LEFT in FIG. 8, OVLP_ABOVE in FIG. 16) on the left side or the upper side of Tile [m] [n] is not used.
 なお、Tile[m][n]の左側、あるいは上側のタイルのオーバーラップ領域(図8のOVLP_RIGHT、図16のOVLP_BOTTOM)の画素値を、フィルタ処理後の画素値に置き換えるのではなく、Tile[m][n]の左側、あるいは上側のオーバーラップ領域(図8のOVLP_LEFT、図16のOVLP_ABOVE)の画素値を、フィルタ処理後の画素値に置き換えてもよい。その場合、ピクチャを合成する場合は、Tile[m][n]の左側、あるいは上側のオーバーラップ領域(図8のOVLP_LEFT、図16のOVLP_ABOVE)の画素値を用い、Tile[m][n]の左側、あるいは下側のタイルのオーバーラップ領域(図8のOVLP_RIGHT、図16のOVLP_BOTTOM)は使用しない。これ以外に、フィルタ処理後の画素値を、各タイルの画像ではなくRec[][]に直接格納してもよい。 It should be noted that the pixel value of the overlap area (OVLP_RIGHT in FIG. 8, OVLP_BOTTOM in FIG. 16) on the left or upper tile of Tile [m] [n] is not replaced with the pixel value after the filter processing, but Tile [ The pixel values in the overlap region (OVLP_LEFT in FIG. 8, OVLP_ABOVE in FIG. 16) on the left side or the upper side of m] [n] may be replaced with the pixel values after filtering. In that case, when compositing a picture, the pixel value of the overlap region (OVLP_LEFT in FIG. 8, OVLP_ABOVE in FIG. 16) on the left side or the upper side of Tile [m] [n] is used, and Tile [m] [n] The overlap area of the tile on the left side or the lower side (OVLP_RIGHT in FIG. 8 and OVLP_BOTTOM in FIG. 16) is not used. In addition to this, the pixel value after the filter processing may be directly stored in Rec [] [] instead of the image of each tile.
 また、上式では重み係数ww[]、wh[]を算出したが、オーバーラップ領域の幅と高さが定数の場合は、あらかじめ作製したテーブルを参照して重み係数を求めてもよい。重み係数のテーブルの一例を図15(a)に示す。例えば、wOVLP=4の場合、ww[]={0.2,0.4,0.6,0.8}である。 In the above formula, the weighting factors ww [] and wh [] are calculated. However, when the width and height of the overlap region are constant, the weighting factor may be obtained by referring to a table prepared in advance. An example of the weighting coefficient table is shown in FIG. For example, when wOVLP = 4, ww [] = {0.2,0.4,0.6,0.8}.
 あるいは、重み係数を、除算を用いず、乗算とシフト演算で近似した値に置き換えてもよい。重み係数を正数WGT[]とシフトWSHTで表したテーブルの一例を図15(b)に示す。例えばhOVLP=4の場合、wh[]={0.125,0.375,0.625,0.875}={1>>3,3>>3,5>>3,7>>3}であり、WGT[]={2,3,5,7}、WSHT=3である。つまり、重み係数は、WGT[]>>WSHTで表すことができる。前記の例の場合、WSHT=3である。 Alternatively, the weighting factor may be replaced with a value approximated by multiplication and shift operation without using division. FIG. 15 (b) shows an example of a table in which weighting factors are represented by positive numbers WGT [] and shift WSHT. For example, when hOVLP = 4, wh [] = {0.125,0.375,0.625,0.875} = {1 >> 3,3 >> 3,5 >> 3,7 >> 3} and WGT [] = { 2,3,5,7} and WSHT = 3. That is, the weighting coefficient can be expressed as WGT [] >> WSHT. In the above example, WSHT = 3.
 なお、重みは線形補間以外の方法で求めてもよく、座標に基づいて補間式やテーブルを変更してもよい。 It should be noted that the weight may be obtained by a method other than linear interpolation, and the interpolation formula or table may be changed based on the coordinates.
 図8(b)と図16(b)は、クロップオフセット領域の幅あるいは高さがタイルの幅あるいは高さに含まれる例を示した図13における、オーバーラップ領域のフィルタ処理を説明する図である。クロップオフセット領域はフィルタ処理やピクチャの合成・表示の対象ではないので、図13のタイルのフィルタ処理は、図8(b)と図16(b)に示すようにオーバーラップ領域に対してのみ実施され、図8(a)と図16(a)のオーバーラップ領域に対する処理と同じである。従って実施形態2の説明をそのまま利用することができる。 FIGS. 8B and 16B are diagrams for explaining the filtering process of the overlap area in FIG. 13 showing an example in which the width or height of the crop offset area is included in the width or height of the tile. is there. Since the crop offset area is not subject to filter processing or picture composition / display, the tile filter processing in FIG. 13 is performed only on the overlap area as shown in FIGS. 8 (b) and 16 (b). The processing is the same as the processing for the overlap region in FIGS. 8 (a) and 16 (a). Therefore, the description of the second embodiment can be used as it is.
  (追加説明1)
 追加説明1では、実施形態1、2で述べたタイルに対して、別の表現方法を用いて、ピクチャのタイル分割方法、タイルのCTU分割方法を再度説明する。実施形態1、2では、タイルを、タイル、オーバーラップ領域、クロップオフセット領域から構成される領域として説明した。追加説明1では、タイルを、タイルアクティブ領域とタイル拡張領域からなる領域として説明する。タイルアクティブ領域は、オーバーラップ領域を含まない正味の表示領域である。タイル拡張領域は、オーバーラップ領域とクロップオフセット領域から構成される領域である。
(Additional explanation 1)
In the additional explanation 1, the tile division method of a picture and the CTU division method of a tile will be described again by using another representation method for the tiles described in the first and second embodiments. In the first and second embodiments, the tile has been described as an area including a tile, an overlap area, and a crop offset area. In the additional description 1, the tile will be described as an area including a tile active area and a tile extension area. The tile active area is a net display area that does not include an overlap area. The tile extension area is an area composed of an overlap area and a crop offset area.
 タイル拡張領域の有無を示すフラグとして、図25(a)のtile_info()で通知するoverlap_tiles_flagを読み替えた、cropoffset_flagを使用してもよい。cropoffset_flagが0の場合はタイル拡張領域が存在せず、それ以外の場合はタイル拡張領域が存在することを表す。 As a flag indicating the presence / absence of a tile extension area, cropoffset_flag obtained by replacing overlap_tiles_flag notified by tile_info () in FIG. 25 (a) may be used. When cropoffset_flag is 0, the tile extension area does not exist, and otherwise, the tile extension area exists.
 図26は、ピクチャを、CTUの倍数によらずにタイルに分割する一例を示す。図26(a)に示すように、ピクチャは、CTUの倍数によらないタイル(タイルアクティブ領域)に分割される。タイルアクティブ領域はピクチャを重複することなく構成する領域であり、言い換えると、ピクチャは、重複することなく「タイルアクティブ領域」に分割される。タイルアクティブ領域の幅と高さをwAT[m]、hAT[n]、ピクチャの幅と高さをwPict、hPictとすると、下式で表すことができる。 FIG. 26 shows an example of dividing a picture into tiles regardless of multiples of CTUs. As shown in FIG. 26 (a), a picture is divided into tiles (tile active areas) that do not depend on multiples of CTUs. The tile active area is an area that configures pictures without overlapping. In other words, a picture is divided into “tile active areas” without overlapping. If the width and height of the tile active area are wAT [m] and hAT [n], and the width and height of the picture are wPict and hPict, they can be expressed by the following equations.
  wPict = ΣwAT[m] (Σはm=0..M-1の総和)
  hPict = ΣhAT[n] (Σはn=0..N-1の総和)
 uniform_spacing_flagが0でない場合、つまり、タイルアクティブ領域がほぼ等サイズの場合、下式で表すことができる。M、Nはタイルの水平および垂直方向の個数である。
wPict = ΣwAT [m] (Σ is the sum of m = 0..M-1)
hPict = ΣhAT [n] (Σ is the sum of n = 0..N-1)
When uniform_spacing_flag is not 0, that is, when the tile active area is approximately the same size, it can be expressed by the following equation. M and N are the number of tiles in the horizontal and vertical directions.
  for(m=0; m<M; m++ )
   wAT[m] = ((m+1)*wPict)/M-(m*wPict)/M (式TAS-1)
  for(n=0; n<N; n++ )
   hAT[n] = ((n+1)*hPict)/N-(n*hPict)/N
 あるいは、タイルアクティブ領域はタイル単位のサイズ(タイルの最小サイズ)wUnitTile、hUnitTileの倍数として、下式で表してもよい。
for (m = 0; m <M; m ++)
wAT [m] = ((m + 1) * wPict) / M- (m * wPict) / M (Formula TAS-1)
for (n = 0; n <N; n ++)
hAT [n] = ((n + 1) * hPict) / N- (n * hPict) / N
Alternatively, the tile active area may be represented by the following expression as a multiple of tile unit size (minimum tile size) wUnitTile and hUnitTile.
  wAT[m] = floor(wPict/M/wUnitTile)*wUnitTile (0<=m<M) (式TAS-2)
  hAT[n] = floor(hPict/N/hUnitTile)*hUnitTile (0<=n<N)
あるいは
  wAT[m] = ceil(wPict/M/wUnitTile)*wUnitTile (0<=m<M)(式TAS-3)
  hAT[n] = ceil(hPict/N/hUnitTile)*hUnitTile (0<=n<N)
あるいは
  for(m=0; m<M; m++ )
   wAT[m] = ((m+1)*wPict/M/wUnitTile-m*wPict/M/wUnitTile)*wUnitTIle (式TAS-4)
  for(n=0; n<N; n++ )
   hAT[n] = ((n+1)*hPict/N/hUnitTile-n*hPict/N/hUnitTile)*hUnitTile
 uniform_spacing_flagが0の場合、タイルアクティブ領域のサイズは下式で表すことができる。
wAT [m] = floor (wPict / M / wUnitTile) * wUnitTile (0 <= m <M) (Formula TAS-2)
hAT [n] = floor (hPict / N / hUnitTile) * hUnitTile (0 <= n <N)
Or wAT [m] = ceil (wPict / M / wUnitTile) * wUnitTile (0 <= m <M) (Formula TAS-3)
hAT [n] = ceil (hPict / N / hUnitTile) * hUnitTile (0 <= n <N)
Or for (m = 0; m <M; m ++)
wAT [m] = ((m + 1) * wPict / M / wUnitTile-m * wPict / M / wUnitTile) * wUnitTIle (expression TAS-4)
for (n = 0; n <N; n ++)
hAT [n] = ((n + 1) * hPict / N / hUnitTile-n * hPict / N / hUnitTile) * hUnitTile
When uniform_spacing_flag is 0, the size of the tile active area can be expressed by the following equation.
  wAT[m] = column_width_in_luma_samples_div2_minus1[m]*2 (式TAS-5)
  hAT[n] = row_height_in_luma_samples_div2_minus1[n]*2
 タイルを符号化する場合、実際にはタイルはCTU単位で符号化される。このとき「タイルアクティブ領域」に拡張領域を加えた画像を符号化してもよい。このときに追加される拡張領域を「タイル拡張領域」と呼ぶ。「タイル拡張領域」は、実施形態1、2でオーバーラップ領域、クロップオフセット領域と名付けられた領域に相当する。タイル拡張領域は、必ずしも復号及び出力に利用されない領域であり、復号後に破棄される領域として扱ってもよい。また、タイル拡張領域の一部または全てを後続のピクチャの参照(復号)に利用しても良いし、出力画像の生成に利用してもよい。「タイルアクティブ領域」と「タイル拡張領域」とを合わせて「タイル符号化領域」と呼ぶ。「タイル符号化領域」は実際に符号化される領域である。
wAT [m] = column_width_in_luma_samples_div2_minus1 [m] * 2 (Formula TAS-5)
hAT [n] = row_height_in_luma_samples_div2_minus1 [n] * 2
When encoding a tile, the tile is actually encoded in units of CTUs. At this time, an image obtained by adding an extension area to the “tile active area” may be encoded. The extension area added at this time is called a “tile extension area”. The “tile extension area” corresponds to the areas named the overlap area and the crop offset area in the first and second embodiments. The tile extension area is not necessarily used for decoding and output, and may be treated as an area discarded after decoding. Further, a part or all of the tile extension area may be used for reference (decoding) of a subsequent picture, or may be used for generation of an output image. The “tile active area” and the “tile extension area” are collectively referred to as a “tile coding area”. The “tile encoding area” is an area that is actually encoded.
 タイル拡張領域のうち、参照、復号に利用される領域をオーバーラップ領域、参照、復号されない領域をクロップオフセット領域(タイル無効領域)と呼ぶ。実施形態1では、タイル拡張領域の全てを参照、復号する場合を説明しており、タイル拡張領域はオーバーラップ領域である。変形例1では、タイル拡張領域の一部をオーバーラップ領域として参照、復号に利用し、残る一部をクロップオフセット領域として参照、復号に利用しない例を説明した。なお、「タイル符号化領域」は、復号・出力に利用される「タイル有効領域」と、復号・出力に利用されないタイルクロップ領域(タイル無効領域)から構成されると言い換えてもよい。タイル有効領域は、ピクチャを分割する単位であるタイルアクティブ領域とオーバーラップ領域から構成される。 Among the tile extension areas, areas used for reference and decoding are called overlap areas, and areas not referenced and decoded are called crop offset areas (tile invalid areas). The first embodiment describes the case where all the tile extension areas are referred to and decoded, and the tile extension areas are overlap areas. In the first modification, an example has been described in which a part of the tile extension area is referred to as an overlap area and used for decoding, and the remaining part is referred to as a crop offset area and is not used for decoding. In addition, the “tile coding area” may be rephrased to be composed of a “tile effective area” used for decoding / output and a tile crop area (tile invalid area) not used for decoding / output. The tile effective area is composed of a tile active area which is a unit for dividing a picture and an overlap area.
 図26(b)は、実際に符号化されるタイル(タイル符号化領域とも呼ぶ)について説明する図である。図26(b)に示すようにタイル(タイル符号化領域)は、左上座標(xTsmn,yTsmn)、幅wTile[m]、高さhTile[n]の矩形であり、タイルアクティブ領域Tile[m][n](幅wAT[m]、高さhAT[n]の矩形)とタイル拡張領域(タイルアクティブ領域以外のタイル、幅wCRP[m]、高さhCRP[n]の領域)からなる。 FIG. 26 (b) is a diagram illustrating tiles that are actually encoded (also referred to as tile encoding areas). As shown in FIG. 26 (b), the tile (tile coding area) is a rectangle having an upper left coordinate (xTsmn, yTsmn), a width wTile [m], and a height hTile [n], and the tile active area Tile [m] [n] (a rectangle with a width wAT [m] and a height hAT [n]) and a tile extension area (a tile other than the tile active area, an area with a width wCRP [m] and a height hCRP [n]).
  wTile[m]=wAT[m]+wCRP[m] 
  hTile[n]=hAT[n]+hCRP[n]
あるいは、タイル符号化領域は、タイルアクティブ領域のCTU単位の幅TileWidthinCtbs[m]と高さTileHeightinCtbs[m]を用いて、下式で表現してもよい。
wTile [m] = wAT [m] + wCRP [m]
hTile [n] = hAT [n] + hCRP [n]
Alternatively, the tile coding area may be expressed by the following expression using the width TileWidthinCtbs [m] and the height TileHeightinCtbs [m] of the tile active area in CTU units.
  TileWidthinCtbs[m] = ceil(wAT[m]/M) 
  TileHeightinCtbs[n] = ceil(hAT[n]/N) 
  wTile[m] = TileWidthinCtbs[m]<<CtbLog2SizeY
  hTile[n] = TileHeightinCtbs[n]<<CtbLog2SizeY
 図26(c)は、タイルをCTUに分割する一例である。タイルの左上座標を開始点として、CTUに分割される。図26(c)に示すように、タイルアクティブ領域のサイズはCTUサイズの整数倍であってもよいし、CTUサイズの整数倍でなくてもよい。ピクチャはタイルアクティブ領域に分割されるため、タイル単位で(m,n)の位置にあるタイルの左上座標(xTsmn,yTsmn)は、タイルアクティブ領域(wAT[i],hAT[i])の和と一致する。
TileWidthinCtbs [m] = ceil (wAT [m] / M)
TileHeightinCtbs [n] = ceil (hAT [n] / N)
wTile [m] = TileWidthinCtbs [m] << CtbLog2SizeY
hTile [n] = TileHeightinCtbs [n] << CtbLog2SizeY
FIG. 26 (c) is an example of dividing a tile into CTUs. Divided into CTUs, starting from the upper left coordinate of the tile. As shown in FIG. 26 (c), the size of the tile active area may be an integer multiple of the CTU size or may not be an integer multiple of the CTU size. Since the picture is divided into tile active areas, the upper left coordinates (xTsmn, yTsmn) of the tile at the position (m, n) in tile units are the sum of the tile active areas (wAT [i], hAT [i]). Matches.
  xTsmn=ΣwAT[i] (Σはi=0..m-1の総和)(式TLA-2)
  yTsmn=ΣhAT[i] (Σはi=0..n-1の総和)
 タイルアクティブ領域とオーバーラップ領域を足したタイル有効領域のサイズも、CTUサイズの整数倍であってもよいし、CTUサイズの整数倍でなくてもよい。
xTsmn = ΣwAT [i] (Σ is the sum of i = 0..m-1) (Formula TLA-2)
yTsmn = ΣhAT [i] (Σ is the sum of i = 0..n-1)
The size of the tile effective area obtained by adding the tile active area and the overlap area may be an integer multiple of the CTU size or may not be an integer multiple of the CTU size.
 また、図27は、タイル拡張領域がオーバーラップ領域とクロップオフセット領域からなる一例である。図27において、オーバーラップ領域はタイルアクティブ領域の外側の斜線領域である。オーバーラップ領域は隣接するタイルのタイルアクティブ領域と重なり合う。オーバーラップ領域の幅wOVLP[m]、高さhOVLP[n]とタイル拡張領域の幅wCRP[m]、高さhCRP[n]は下記の関係がある。 FIG. 27 shows an example in which the tile extension area is composed of an overlap area and a crop offset area. In FIG. 27, the overlap area is a hatched area outside the tile active area. The overlap area overlaps the tile active area of the adjacent tile. The width wOVLP [m] and height hOVLP [n] of the overlap area and the width wCRP [m] and height hCRP [n] of the tile extension area have the following relationship.
  0<=wOVLP[m]<=wCRP[m]
  0<=hOVLP[n]<=hCRP[n]
  (まとめ)
 タイル符号化領域(wTile,hTile)は、ピクチャを分割する単位であるタイルアクティブ領域(wAT,hAT)と隠れている領域(タイル拡張領域)から構成される。
0 <= wOVLP [m] <= wCRP [m]
0 <= hOVLP [n] <= hCRP [n]
(Summary)
The tile coding area (wTile, hTile) includes a tile active area (wAT, hAT) that is a unit for dividing a picture and a hidden area (tile extension area).
 あるいは、タイル符号化領域(wTile,hTile)は、復号・出力に利用されるタイル有効領域(wT, hT)と、復号・出力に利用されないクロップオフセット領域、すなわちタイル無効領域(wCRP,hCRP)から構成される、と言い換えてもよい。 Alternatively, the tile coding area (wTile, hTile) is derived from the tile effective area (wT, hT) used for decoding / output and the crop offset area not used for decoding / output, that is, the tile invalid area (wCRP, hCRP). In other words, it may be configured.
 オーバーラップ領域は、ピクチャを分割する単位であるタイルアクティブ領域(wAT,hAT)の外側だが、復号・出力に利用されるタイル有効領域(wT,hT)には含まれる。 The overlap area is outside the tile active area (wAT, hAT), which is a unit for dividing a picture, but is included in the tile effective area (wT, hT) used for decoding / output.
 よって、タイル有効領域は
  wT[m] = wAT[m]+wOVLP[m]
  hT[n] = hAT[n]+hOVLP[n]
 また、さらにクロップ領域を含めて、タイル符号化領域となる
  wTile[m] = wT[m]+wCRP[m]= wAT[m]+wOVLP[m]+wCRP[m]
  hTile[n] = hT[n]+hCRP[n] = hAT[n]+hOVLP[n]+hCRP[n]
  (CTU単位処理の一例)
 図28(a)はスライスデータslice_segment_data()のシンタックスの一例である。シンタックスを参照しながら、動画像符号化装置11、動画像復号装置31の動作を以下に説明する。
Therefore, the tile effective area is wT [m] = wAT [m] + wOVLP [m]
hT [n] = hAT [n] + hOVLP [n]
In addition, it includes the crop area and becomes the tile coding area wTile [m] = wT [m] + wCRP [m] = wAT [m] + wOVLP [m] + wCRP [m]
hTile [n] = hT [n] + hCRP [n] = hAT [n] + hOVLP [n] + hCRP [n]
(Example of CTU unit processing)
FIG. 28 (a) shows an example of the syntax of slice data slice_segment_data (). The operations of the video encoding device 11 and the video decoding device 31 will be described below with reference to the syntax.
 図中、coding_tree_unit()はCTUのシンタックスを示す。CtbAddrInTs、CtbAddrInRs、CtbAddrInTileはCTUのアドレスであり、CtbAddrInTsはピクチャ内のタイルスキャン順のCTUアドレス、CtbAddrInRsはピクチャ内のラスタスキャン順のCTUアドレス、CtbAddrInTileはタイル内のタイルスキャン順のCTUアドレスである。各タイルの最後のCTUの後ではend_of_subset_one_bitに1をセットし、符号化データをバイトアラインする。 In the figure, coding_tree_unit () indicates the CTU syntax. CtbAddrInTs, CtbAddrInRs, and CtbAddrInTile are CTU addresses, CtbAddrInTs is a CTU address in the tile scan order in the picture, CtbAddrInRs is a CTU address in the raster scan order in the picture, and CtbAddrInTile is a CTU address in the tile scan order in the tile. After the last CTU of each tile, end_of_subset_one_bit is set to 1, and the encoded data is byte aligned.
 図28(b)は、CTUのシンタックスcoding_tree_unit()の一例である。タイル(タイル符号化領域)の左上座標がCTUの整数倍の位置ではない場合に対応するため、CTUの左上座標(xCtb,yCtb)を各タイルで導出する。具体的には、タイル内アドレスCtbAddrInTileから導出されるCTUのタイル内座標((CtbAddrInTile%TileWidthinCtbs[TileId])<<CtbLog2SizeY, (CtbAddrInTile/TileWidthinCtbs[TileId])<<CtbLog2SizeY)に、タイル左上座標(TileAddrX[TileId],TileAddrY[TileId])を加算することで、ピクチャ内におけるタイルのCTUの座標を導出する
  xCtb = ((CtbAddrInTile%TileWidthinCtbs[TileId])<<CtbLog2SizeY)+TileAddrX[TileId]
  yCtb = ((CtbAddrInTile/TileWidthinCtbs[TileId])<<CtbLog2SizeY)+TileAddrY[TileId]
 ここで、TileWidthinCtbs[]はCTU単位のタイル有効領域の幅、wT[]とhT[]はタイル有効領域の画素単位の幅と高さ、CtbLog2SizeYはCTUサイズの対数値、(TileAddrX,TileAddrY)は画素単位のタイルの左上座標である。なおタイル有効領域の幅と高さ(wT[],hT[])の代わりにタイル符号化領域の幅と高さ(wTile[],hTile[])を用いても良い。
FIG. 28 (b) is an example of CTU syntax coding_tree_unit (). In order to cope with the case where the upper left coordinate of the tile (tile coding area) is not an integer multiple of the CTU, the upper left coordinate (xCtb, yCtb) of the CTU is derived for each tile. Specifically, the CTU in-tile coordinates derived from the in-tile address CtbAddrInTile ((CtbAddrInTile% TileWidthinCtbs [TileId]) << CtbLog2SizeY, (CtbAddrInTile / TileWidthinCtbs [TileId]) << CtbLog2SizeY) Deriving the CTU coordinates of the tile in the picture by adding [TileId], TileAddrY [TileId]) xCtb = ((CtbAddrInTile% TileWidthinCtbs [TileId]) << CtbLog2SizeY) + TileAddrX [TileId]
yCtb = ((CtbAddrInTile / TileWidthinCtbs [TileId]) << CtbLog2SizeY) + TileAddrY [TileId]
Where TileWidthinCtbs [] is the width of the tile effective area in CTU units, wT [] and hT [] are the width and height in pixel units of the tile effective area, CtbLog2SizeY is the logarithmic value of the CTU size, and (TileAddrX, TileAddrY) is This is the upper left coordinate of the tile in pixel units. Note that the width and height (wTile [], hTile []) of the tile coding area may be used instead of the width and height (wT [], hT []) of the tile effective area.
 図29はブロック(CU又はCTU)を四分木分割するシンタックスcoding_quadtree()、図30はブロックを二分木分割するシンタックスcoding_binarytree()の一例である。図29において、タイルの左上座標がCTUの整数倍の位置に対応しないため、下式のように、タイルを用いる場合にはCTUの左上座標(xCtb,yCtb)とタイルサイズを考慮して、さらなる四分木分割を実施するか否かを示すsplit_cu_flagを通知する。 FIG. 29 shows an example of syntax coding_quadtree () for dividing a block (CU or CTU) into quadtrees, and FIG. 30 shows an example of syntax coding_binarytree () for dividing a block into binary trees. In FIG. 29, the upper left coordinate of the tile does not correspond to a position that is an integral multiple of the CTU.Therefore, when using a tile, the upper left coordinate of the CTU (xCtb, yCtb) and the tile size are considered, as shown in the following formula. Notifies split_cu_flag indicating whether or not to perform quadtree partitioning.
  if (x0+(1<<log2CbSize)-xTile<=wT && y0+(1<<log2CbSize)-yTile<=hT && log2CbSize>MinCbLog2SizeY)
   split_cu_flag[x0][y0]
 ここで、(x0,y0)はブロックの左上座標、(xTile,yTile)はタイルの左上座標、log2CbSizeはブロックサイズの対数値、wTとhTはタイル有効領域(もしくはタイル符号化領域)の幅と高さ、MinCbLog2SizeYはブロックの最小サイズの対数値である。
if (x0 + (1 << log2CbSize) -xTile <= wT && y0 + (1 << log2CbSize) -yTile <= hT &&log2CbSize> MinCbLog2SizeY)
split_cu_flag [x0] [y0]
Where (x0, y0) is the upper left coordinate of the block, (xTile, yTile) is the upper left coordinate of the tile, log2CbSize is the logarithm of the block size, wT and hT are the width of the tile effective area (or tile coding area) Height, MinCbLog2SizeY is the logarithm of the minimum size of the block.
 ブロックの右端の座標x0+(1<<log2CbSize)、下端の座標y0+(1<<log2CbSize)が、タイル有効領域の右端の座標xTile+wTと下端の座標yTile+hTよりも小さい場合、対象ブロックはタイル有効領域内に存在する。ブロックがタイル内に存在し、ブロックサイズが最小値よりも大きい場合(log2CbSize>MinCbLog2SizeY)、ブロックをさらに分割するか否かを示すフラグsplit_cu_flagを通知する。ブロックをさらに四分木分割する場合、split_cu_flagを1にセットし、ブロックを四分木分割しない場合、split_cu_flagを0にセットする。そして、split_cu_flagが1の場合、再帰的にcoding_quadtree()を呼び出し、さらに四分木分割するか否かを通知する。split_cu_flagが0の場合、coding_binarytree()を呼び出し二分木分割するか否かを通知する(復号する)。 If the right coordinate x0 + (1 << log2CbSize) and the bottom coordinate y0 + (1 << log2CbSize) are smaller than the right coordinate xTile + wT and the bottom coordinate yTile + hT of the tile effective area, the target block is It exists in the tile effective area. When the block exists in the tile and the block size is larger than the minimum value (log2CbSize> MinCbLog2SizeY), a flag split_cu_flag indicating whether or not the block is further divided is notified. If the block is further divided into quadtrees, split_cu_flag is set to 1. If the block is not divided into quadtrees, split_cu_flag is set to 0. If split_cu_flag is 1, recursively calls coding_quadtree () to notify whether or not to further perform quadtree partitioning. If split_cu_flag is 0, coding_binarytree () is called to notify whether or not to perform binary tree splitting (decoding).
 また、図29に示すように、4分木分割で得られる4つのブロックの何れかがタイル有効領域外(もしくはタイル符号化領域外)にある場合には、そのブロックを符号化しない。具体的には、4分木分割で得られた、(x1,y0)に位置するブロックであるcoding_quadtree(x1,y0,log2CbSize-1,cqtDepth+1,wT,hT,xTile,yTile)は、x1がタイル内に位置する場合に符号化あるいは復号される。 Also, as shown in FIG. 29, if any of the four blocks obtained by the quadtree partitioning is outside the tile effective area (or outside the tile encoding area), the block is not encoded. Specifically, coding_quadtree (x1, y0, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile) which is a block located at (x1, y0) obtained by quadtree partitioning is x1 Is encoded or decoded if is located within a tile.
  if (x1-xTile<wT)
   coding_quadtree(x1,y0,log2CbSize-1,cqtDepth+1,wT,hT,xTile,yTile) 
同様に、(x0,y1)に位置するブロックであるcoding_quadtree(x0,y1,log2CbSize-1,cqtDepth+1,wT,hT,xTile,yTile)は、y1がタイル内に位置する場合に符号化あるいは復号される。
if (x1-xTile <wT)
coding_quadtree (x1, y0, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile)
Similarly, coding_quadtree (x0, y1, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile), which is a block located at (x0, y1), is encoded when y1 is located in the tile or Decrypted.
  if (y1-yTile<hT)
   coding_quadtree(x0,y1,log2CbSize-1,cqtDepth+1,wT,hT,xTile,yTile) 
同様に、(x1,y1)に位置するブロックであるcoding_quadtree(x1,y1,log2CbSize-1,cqtDepth+1,wT,hT,xTile,yTile)は、x1,y1がタイル内に位置する場合に符号化あるいは復号される。
if (y1-yTile <hT)
coding_quadtree (x0, y1, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile)
Similarly, coding_quadtree (x1, y1, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile), which is a block located at (x1, y1), is coded when x1, y1 is located within the tile. Or decoded.
  if (x1-xTile<wT && y1-yTile<hT)
   coding_quadtree(x1,y1,log2CbSize-1,cqtDepth+1,wT,hT,xTile,yTile)
 なお、タイルを利用しない場合には、(xTile,yTile)=(0,0)、(wT,hT)=(pic_width_in_luma_samples, pic_height_in_luma_samples)にセットし、以下の条件で四分木分割を実施するか否かを示すsplit_cu_flagを通知してもよい。
if (x1-xTile <wT && y1-yTile <hT)
coding_quadtree (x1, y1, log2CbSize-1, cqtDepth + 1, wT, hT, xTile, yTile)
If tiles are not used, set (xTile, yTile) = (0,0), (wT, hT) = (pic_width_in_luma_samples, pic_height_in_luma_samples), and whether to perform quadtree partitioning under the following conditions: You may notify split_cu_flag which shows.
  if (x0+(1<<log2CbSize)<=pic_width_in_luma_samples && y0+(1<<log2CbSize)<=pic_height_in_luma_samples && log2CbSize>MinCbLog2SizeY)
 二分木の場合も同様に、タイルを用いる場合にはCTUの左上座標(xCtb,yCtb)とタイルサイズを考慮して、さらなる2分木分割を実施するか否かを示すsplit_bt_modeを通知する(復号する)。具体的には、以下の式で二分木分割を実施するか否かを示すsplit_bt_modeを通知してもよい。
if (x0 + (1 << log2CbSize) <= pic_width_in_luma_samples && y0 + (1 << log2CbSize) <= pic_height_in_luma_samples &&log2CbSize> MinCbLog2SizeY)
Similarly, in the case of a binary tree, when using a tile, split_bt_mode indicating whether or not to perform further binary tree division is notified in consideration of the upper left coordinates (xCtb, yCtb) of the CTU and the tile size (decoding) To do). Specifically, split_bt_mode indicating whether or not to perform binary tree division may be notified by the following equation.
  if (((1<<log2CbHeight)>minBTSize || (1<<log2CbWidth)>minBTSize) && ((1<<log2CbWidth)<=maxBTSize && (1<<log2CbHeight)<=maxBTSize) && (x0+(1<<log2CbWidth)-xTile<=wT && y0+(1<<log2CbHeight)-yTile<=hT) && cbtDepth<maxBTDepth)
   split_bt_mode
つまり、ブロックサイズが2分木分割可能な最小サイズminBTSizeより大きく、2分木分割可能な最大サイズmaxBTSize以下であり、かつ、2分木した時の下側あるいは右側のブロックの左上座標がタイル内に位置し、2分木分割の深度が分割可能な最大深度より小さい場合に、2分木分割するか否かと2分木の方向を示すsplit_bt_modeを通知する。ブロックをさらに二分木分割する場合、split_bt_flagを1にセットし、ブロックを二分木分割しない場合、split_bt_modeを0にセットする。そして、split_bt_modeが1の場合、再帰的にcoding_binarytree()を呼び出し、さらに二分木分割するか否かを通知する。split_bt_modeが0の場合、coding_unit(x0,y0,log2CbWidth,log2CbHeight)を呼び出し、ブロックを実際に符号化あるいは復号する。
if (((1 <<log2CbHeight)> minBTSize || (1 <<log2CbWidth)> minBTSize) && ((1 << log2CbWidth) <= maxBTSize && (1 << log2CbHeight) <= maxBTSize) && (x0 + (1 <<log2CbWidth) -xTile <= wT && y0 + (1 << log2CbHeight) -yTile <= hT) && cbtDepth <maxBTDepth)
split_bt_mode
In other words, the block size is larger than the minimum size minBTSize that can be divided into binary trees and less than the maximum size maxBTSize that can be divided into binary trees, and the upper left coordinate of the lower or right block when the binary tree is divided is within the tile. If the depth of binary tree division is smaller than the maximum divisible depth, split_bt_mode indicating whether or not to perform binary tree division and the direction of the binary tree is notified. If the block is further divided into binary trees, split_bt_flag is set to 1. If the block is not divided into binary trees, split_bt_mode is set to 0. When split_bt_mode is 1, recursively calls coding_binarytree () to notify whether or not to perform binary tree division. When split_bt_mode is 0, coding_unit (x0, y0, log2CbWidth, log2CbHeight) is called to actually encode or decode the block.
 また、図30に示すように、二分木分割で得られる2つのブロックの何れかがタイル有効領域外(もしくはタイル符号化領域外)にある場合には、そのブロックを符号化しない。具体的には、上下2分割で得られるブロックのうち、(x0,y1)に位置するブロックであるcoding_binarytree(x0,y1,log2CbWidth,log2CbHeight-1,cqtDepth,cbtDepth+1,wT,hT,xTile,yTile)は、y1がタイル内に位置する場合に符号化あるいは復号される。 Also, as shown in FIG. 30, if any of the two blocks obtained by binary tree partitioning is outside the tile effective area (or outside the tile encoding area), that block is not encoded. Specifically, among the blocks obtained by dividing the upper and lower parts, coding_binarytree (x0, y1, log2CbWidth, log2CbHeight-1, cqtDepth, cbtDepth + 1, wT, hT, xTile, which is a block located at (x0, y1) yTile) is encoded or decoded when y1 is located in the tile.
  if (y1-yTile<hT)
   coding_binarytree(x0,y1,log2CbWidth,log2CbHeight-1,cqtDepth,cbtDepth+1,wT,hT,xTile,yTile)
同様に左右2分割で得られるブロックのうち、(x1,y0)に位置するブロックであるcoding_binarytree(x1,y0,log2CbWidth-1,log2CbHeight,cqtDepth,cbtDepth+1,wT,hT,xTile,yTile)は、x1がタイル内に位置する場合に符号化あるいは復号される。
if (y1-yTile <hT)
coding_binarytree (x0, y1, log2CbWidth, log2CbHeight-1, cqtDepth, cbtDepth + 1, wT, hT, xTile, yTile)
Similarly, coding_binarytree (x1, y0, log2CbWidth-1, log2CbHeight, cqtDepth, cbtDepth + 1, wT, hT, xTile, yTile) which is a block located at (x1, y0) among the blocks obtained by right and left splitting is , X1 is encoded or decoded when located in the tile.
  if (x1-xTile<wT)
   coding_binarytree(x1,y0,log2CbWidth-1,log2CbHeight,cqtDepth,cbtDepth+1,wT,hT,xTile,yTile)
 実施形態1、2及び上記で説明した座標計算処理、及び分割処理により、ピクチャをCTUの倍数によらないサイズのタイルに分割することができる。
if (x1-xTile <wT)
coding_binarytree (x1, y0, log2CbWidth-1, log2CbHeight, cqtDepth, cbtDepth + 1, wT, hT, xTile, yTile)
By the coordinate calculation process and the division process described in the first and second embodiments and the above, the picture can be divided into tiles having a size that does not depend on a multiple of the CTU.
  (実施形態3)
 本願の実施形態3では、360度映像やVR映像のように表示(プロジェクション画像)が球面であるような場合、伝送・蓄積時に画像データを符号化するために、2次元画像にマッピングした画像の処理について説明する。
(Embodiment 3)
In Embodiment 3 of the present application, when a display (projection image) is a spherical surface, such as a 360-degree video or a VR video, an image mapped to a two-dimensional image is encoded in order to encode image data during transmission / storage. Processing will be described.
 プロジェクション画像をパッキングして2次元画像を生成する一例を図17、図18、図19に示す。図17(a)、(c)はERP(Equi Rectangular Projection:正距円筒図法)Formatであり、赤道から離れるに従って、領域を横方向に拡大することで、球を矩形として表現する。図17(c)はcube formatである。図17(c)の縦線領域は画像データの存在しない領域である。図17(a)のような2次元画像へのマッピング、及び、パッキングは、動画像符号化装置11に入力する前に、前処理として画像に施される。図11のピクチャ分割部2010は、図17(a)では矩形1~11に、図17(c)では矩形0~5に対し、各々タイルを割り当て、各タイルはタイル符号化部2012によって符号化される。 An example of generating a two-dimensional image by packing projection images is shown in FIGS. FIGS. 17 (a) and 17 (c) are ERP (Equi-Rectangular Projection) Formats, and the sphere is expressed as a rectangle by enlarging the region laterally away from the equator. FIG. 17 (c) shows a cube format. The vertical line area in FIG. 17 (c) is an area where no image data exists. Mapping and packing into a two-dimensional image as shown in FIG. 17 (a) are performed on the image as preprocessing before being input to the moving image encoding device 11. The picture dividing unit 2010 in FIG. 11 assigns tiles to the rectangles 1 to 11 in FIG. 17A and the rectangles 0 to 5 in FIG. 17C, and each tile is encoded by the tile encoding unit 2012. Is done.
 あるいは、例えば、図18はcubic like ERP Formatであり、図18(a)に示すように、赤道領域を5と6に分割する。そして、回転させて生成した極領域に対応する矩形と共にパッキングし、図18(b)のような矩形領域を、前処理において生成する。そして、図18(b)では、図11のピクチャ分割部2010は、例えば、矩形6、三角形領域1~4で構成される矩形、矩形5、三角形領域7~10で構成される矩形を各々タイルに割り当て、各タイルはタイル符号化部2012によって符号化される。 Or, for example, FIG. 18 is a cubic-like ERP format, and the equator region is divided into 5 and 6, as shown in FIG. 18 (a). Then, packing is performed together with a rectangle corresponding to the polar area generated by rotation, and a rectangular area as shown in FIG. 18B is generated in the preprocessing. In FIG. 18B, the picture dividing unit 2010 in FIG. 11 tiles, for example, a rectangle 6, a rectangle composed of triangular regions 1 to 4, a rectangle 5 and a rectangle composed of triangular regions 7 to 10, respectively. Each tile is encoded by the tile encoding unit 2012.
 図19はSPP(Segmented Sphere Projection) Formatであり、極領域を図19(a)の円領域1と2で表現し、赤道領域を図19(a)の矩形3~6で表現する。円の外側の縦線領域は画像データのない無効領域である。図11のピクチャ分割部2010は、円領域を拡張した矩形1、2と、矩形3~6に対し、各々タイルを割り当て、各タイルはタイル符号化部2012によって符号化される。 Fig. 19 shows SPP (Segmented Sphere Projection) Format, where the polar region is represented by circle regions 1 and 2 in Fig. 19 (a) and the equator region is represented by rectangles 3-6 in Fig. 19 (a). The vertical line area outside the circle is an invalid area without image data. The picture dividing unit 2010 in FIG. 11 assigns tiles to the rectangles 1 and 2 and the rectangles 3 to 6 in which the circular area is expanded, and each tile is encoded by the tile encoding unit 2012.
 このように球面を2次元にマッピングした画像の符号化では、図4(d)に示すように、画像をタイルに分割する時に、各タイル行に含まれるタイルの個数が等しい場合がある。一方、図17(a)、図17(c)や図18(b)に示すように、画像をタイルに分割する時に、各タイル行に含まれるタイルの個数が等しくない場合がある。あるいは各タイル列に含まれるタイルの個数が等しくない場合がある。このような場合、タイル情報のシンタックスは、図5(i)に示すように、タイルの垂直方向の個数に関する情報(num_tile_rows_minus1)を通知し、タイル行毎にタイルの高さに関する情報(row_height_minus1[i])、タイルの水平方向の個数に関する情報(num_tile_columns_minus1)、タイルの幅に関する情報(column_width_minus1[i])を通知する。また、図5(j)に示すオーバーラップ領域の情報(overlap_tiles_info())を通知する。overlap_tiles_info()では、全てのタイルのオーバーラップ幅あるいは高さが均一である場合(uniform_overlap_flag=1)は、図5(f)と同様のシンタックスを符号化する。そうでない場合(uniform_overlap_flag=0)、タイルの行毎にオーバーラップ高さに関する情報(tile_overlap_height_div2[i])と、個々のタイルのオーバーラップ幅に関する情報(tile_overlap_width_div2[i])を通知する。 In this way, when encoding an image in which a spherical surface is two-dimensionally mapped, as shown in FIG. 4D, when the image is divided into tiles, the number of tiles included in each tile row may be equal. On the other hand, as shown in FIGS. 17 (a), 17 (c), and 18 (b), when dividing an image into tiles, the number of tiles included in each tile row may not be equal. Alternatively, the number of tiles included in each tile row may not be equal. In such a case, as shown in FIG. 5 (i), the syntax of the tile information notifies information on the number of tiles in the vertical direction (num_tile_rows_minus1), and information on the height of the tile (row_height_minus1 [ i]), information on the number of tiles in the horizontal direction (num_tile_columns_minus1), and information on the width of the tile (column_width_minus1 [i]). Also, the overlap area information (overlap_tiles_info ()) shown in FIG. 5 (j) is notified. In overlap_tiles_info (), when the overlap width or height of all tiles is uniform (uniform_overlap_flag = 1), the same syntax as in FIG. 5 (f) is encoded. Otherwise (uniform_overlap_flag = 0), information on the overlap height (tile_overlap_height_div2 [i]) and information on the overlap width of each tile (tile_overlap_width_div2 [i]) are notified for each tile row.
 ヘッダ情報生成部2011では、図5(i)、(j)に示すシンタックスを生成し、タイル符号化部2012、符号化ストリーム生成部2013に出力する。 The header information generation unit 2011 generates the syntax shown in FIGS. 5 (i) and 5 (j) and outputs it to the tile encoding unit 2012 and the encoded stream generation unit 2013.
 また、図9の動画像復号装置31では、ヘッダ情報復号部2001は図5(i)、(j)に示すシンタックスを復号し、タイル復号部2002、タイル合成部2003に出力する。 9, the header information decoding unit 2001 decodes the syntaxes shown in FIGS. 5 (i) and 5 (j) and outputs the decoded syntaxes to the tile decoding unit 2002 and the tile synthesis unit 2003.
 このように、タイルの行毎にタイルの水平方向の個数、幅、オーバーラップ領域の幅を通知することで、2次元画像の符号化方式をツールレベルで変更することなく、360度映像やVR映像を符号化・復号することができる。 In this way, by reporting the number of tiles in the horizontal direction, the width, and the width of the overlap area for each tile row, 360-degree video and VR can be used without changing the encoding method of the two-dimensional image at the tool level. Video can be encoded / decoded.
  (実施形態4)
 実施形態3ではピクチャを直接タイルに分割したが、本願の実施形態4では、ピクチャをリージョンに分割し、リージョンをタイルに分割する方法を説明する。本実施形態では、指定した位置とサイズを用いてピクチャ内に配置できるリージョンと、リージョン内で、矩形サイズに分割するタイルを用いてピクチャを2段階に階層的に分割する。リージョンは、プロジェクション画像において連続した領域、あるいは、同じマッピング方法を用いた領域等を1つにまとめたものである。
(Embodiment 4)
In Embodiment 3, a picture is directly divided into tiles. In Embodiment 4 of the present application, a method of dividing a picture into regions and dividing the region into tiles will be described. In this embodiment, a picture is hierarchically divided into two stages using a region that can be arranged in a picture using a designated position and size, and tiles that are divided into rectangular sizes within the region. A region is a collection of continuous regions in a projection image or regions using the same mapping method.
 図17(b)は、ピクチャを3つのリージョンに分割し、さらに各リージョンをタイルに分k割することで図17(a)に示すタイルによりピクチャを分割する例である。 FIG. 17 (b) is an example in which the picture is divided into tiles shown in FIG. 17 (a) by dividing the picture into three regions and further dividing each region into tiles.
 図17(d)は、ピクチャを3つのリージョンに分割し、さらに各リージョンをタイルに分割することで図17(c)に示すタイルによりピクチャを分割する例である。図17(e)は、各リージョンをタイルに分割した別の一例である。リージョン0は、タイルTile[0][0]と、無効領域のタイルTile[1][0]~Tile[3][0]に分割される。リージョン1は、タイルTile[0][0]、タイルTile[1][0]に分割される。リージョン2は、タイルTile[0][0]と、無効領域のタイルTile[1][0]、Tile[2][0]、Tile[3][0]に分割される。なお、リージョン1は1つのタイルTile[0][0]として処理してもよい。 FIG. 17 (d) is an example in which the picture is divided into tiles shown in FIG. 17 (c) by dividing the picture into three regions and further dividing each region into tiles. FIG. 17 (e) is another example in which each region is divided into tiles. Region 0 is divided into tiles Tile [0] [0] and invalid region tiles Tile [1] [0] to Tile [3] [0]. Region 1 is divided into tile Tile [0] [0] and tile Tile [1] [0]. Region 2 is divided into tile Tile [0] [0] and invalid area tiles Tile [1] [0], Tile [2] [0], and Tile [3] [0]. The region 1 may be processed as one tile Tile [0] [0].
 図17(d)と同様に、図18(c)は18(b)に対応するリージョンである。図18(c)のリージョン0は図18(b)の矩形6に、リージョン1は図18(b)の三角形領域1~4、矩形5、三角形領域7~10に対応する。三角形領域1~4、矩形5、矩形6、三角形領域7~10はプロジェクション画像で各々連続した領域である。図18(d)は、各リージョンをタイルに分割した一例である。リージョン0は、タイルTile[0][0]、タイルTile[1][0]、Tile[2][0]に分割される。リージョン1は、三角形領域1~4を含むタイルTile[0][0]と、矩形5のタイルTile[1][0]と、三角形領域7~10を含むタイルTile[2][0]に分割される。なお、リージョン0は1つのタイルTile[0][0]として処理してもよい。 As in FIG. 17 (d), FIG. 18 (c) is a region corresponding to 18 (b). Region 0 in FIG. 18 (c) corresponds to rectangle 6 in FIG. 18 (b), and region 1 corresponds to triangle regions 1 to 4, rectangle 5, and triangle regions 7 to 10 in FIG. 18 (b). Triangular areas 1 to 4, rectangular 5, rectangular 6, and triangular areas 7 to 10 are continuous areas in the projection image. FIG. 18 (d) shows an example in which each region is divided into tiles. Region 0 is divided into tiles Tile [0] [0], tiles Tile [1] [0], and Tile [2] [0]. Region 1 includes tile Tile [0] [0] that includes triangular areas 1 to 4, tile Tile [1] [0] that is rectangular 5, and tile Tile [2] [0] that includes triangular areas 7 to 10. Divided. Region 0 may be processed as one tile Tile [0] [0].
 図19(b)は図19(a)に対応するリージョンである。図19(b)のリージョン0は図19(a)の円領域1、2とその周辺の無効領域に、リージョン1は図19(a)の矩形3~6に対応する。矩形3~6はプロジェクション画像で連続した領域であり、円領域1、2はのプロジェクション画像で連続した領域ではないが、どちらも極領域であり、マッピングの方法が同じである。図19(c)は、各リージョンをタイルに分割した一例である。リージョン0は、円領域1とその周辺の無効領域のタイルTile[0][0]と、円領域2とその周辺の無効領域のタイルTile[1][0]に分割される。リージョン1は、矩形3~6に割り当てられたタイルTile[0][0]~Tile[3][0]に分割される。 Fig. 19 (b) shows the region corresponding to Fig. 19 (a). Region 0 in FIG. 19B corresponds to the circular regions 1 and 2 in FIG. 19A and the surrounding invalid regions, and region 1 corresponds to rectangles 3 to 6 in FIG. 19A. The rectangles 3 to 6 are continuous regions in the projection image, and the circular regions 1 and 2 are not continuous regions in the projection image, but both are polar regions and the mapping method is the same. FIG. 19 (c) is an example in which each region is divided into tiles. The region 0 is divided into a tile area Tile [0] [0] of the invalid area around the circular area 1 and its surrounding area, and a tile Tile [1] [0] of the invalid area around the circular area 2 and its surrounding area. Region 1 is divided into tiles Tile [0] [0] to Tile [3] [0] assigned to rectangles 3-6.
 図31はピクチャ、リージョン、タイル、CTUの階層構造を示す図である。図31(a)は1枚のピクチャを示す図である。図31(b)はこのピクチャを3分割したリージョン(Region0、Region1、Region2)の図である。図31(c)は各リージョンをさらに分割したタイルの図である。図31(d)は図31(c)のRegion0を分割したタイルを、さらに分割したCTUの図である。 FIG. 31 shows a hierarchical structure of pictures, regions, tiles, and CTUs. FIG. 31 (a) is a diagram showing one picture. FIG. 31 (b) is a diagram of regions (Region 0, Region 1, and Region 2) obtained by dividing this picture into three. FIG. 31 (c) is a diagram of tiles obtained by further dividing each region. FIG. 31 (d) is a diagram of a CTU obtained by further dividing the tile obtained by dividing Region0 in FIG. 31 (c).
 図31(d)に示すように、リージョンRegion[0]の左上座標(xRs0,yRs0)、幅wReg[0]、高さhReg[0]はCTUの整数倍でなくてもよい。また、リージョンRegion[0]を分割したタイルのタイルアクティブ領域Tile[m][n]の左上座標(xTsmn,yTsmn)、幅wAT[m]、高さhAT[n]もCTUの整数倍でなくてもよい。 As shown in FIG. 31 (d), the upper left coordinates (xRs0, yRs0), width wReg [0], and height hReg [0] of the region Region [0] may not be an integer multiple of the CTU. Also, the upper left coordinates (xTsmn, yTsmn), width wAT [m], and height hAT [n] of the tile active area Tile [m] [n] of the tile divided from Region Region [0] are not an integral multiple of CTU. May be.
 このように、ピクチャをリージョンに分割し、リージョンをタイルに分割する時のシンタックスを図20(k)に示す。region_parameters()はリージョン情報を示すシンタックスであり、PPSから呼び出される。前述の図4(b)では、PPSでtile_parameters()を通知したが、本実施形態ではPPSでregion_parameters()を通知し、region_parameters()の中でtile_parameters()を通知する。 Fig. 20 (k) shows the syntax for dividing a picture into regions and dividing the region into tiles. region_parameters () is a syntax indicating region information, and is called from PPS. In FIG. 4B described above, tile_parameters () is notified by PPS, but in this embodiment, region_parameters () is notified by PPS, and tile_parameters () is notified in region_parameters ().
 図20(k)のregion_parameter()において、num_region_minus1はリージョンの個数から1を引いた値を示す。num_region_minus1が0の場合、リージョンは1つであり、以降で通知するシンタックスはピクチャを直接タイルに分割した場合と同じである。num_region_minus1が0より大きい場合、各リージョンにおいて、リージョンの左上座標(region_topleft_x[i],region_topleft_y[i])、幅region_width_div2_minus1と高さregion_height_div2_minus1を通知する。region_width_div2_minus1とregion_height_div2_minus1は、リージョンの幅と高さを2で割った値であり、実際のリージョンの幅wRegと高さhRegは下記で表される。 In region_parameter () of Fig. 20 (k), num_region_minus1 indicates the value obtained by subtracting 1 from the number of regions. When num_region_minus1 is 0, there is one region, and the syntax notified thereafter is the same as when the picture is directly divided into tiles. When num_region_minus1 is larger than 0, the upper left coordinates (region_topleft_x [i], region_topleft_y [i]), width region_width_div2_minus1 and height region_height_div2_minus1 are notified in each region. region_width_div2_minus1 and region_height_div2_minus1 are values obtained by dividing the width and height of the region by 2, and the actual region width wReg and height hReg are expressed as follows.
  wReg[p] = region_width_div2_minus1[p]*2+1
  hReg[p] = region_height_div2_minus1[p]*2+1
 タイルアクティブ領域の幅wAT[m]、高さhAT[n]はuniform_spacing_flagが0の場合は既に説明した(式TAS-1)~(式TAS-4)のいずれかにおいて、ピクチャの幅wPictと高さhPictをリージョンの幅wReg[p]とhReg[p]に置き換えて導出してもよい。uniform_spacing_flagが0でない場合は、(式TAS-5)を用いて導出してもよい。(式TAS-1)のwPict、hPictをwReg[p]、hReg[p]に置き換えた式を下記に示す。
wReg [p] = region_width_div2_minus1 [p] * 2 + 1
hReg [p] = region_height_div2_minus1 [p] * 2 + 1
The width wAT [m] and the height hAT [n] of the tile active area are the same as the width wPict of the picture in any of (Formula TAS-1) to (Formula TAS-4) already described when uniform_spacing_flag is 0. HPict may be derived by replacing the region widths wReg [p] and hReg [p]. If uniform_spacing_flag is not 0, it may be derived using (Expression TAS-5). Formulas in which wPict and hPict in (Formula TAS-1) are replaced with wReg [p] and hReg [p] are shown below.
  for(m=0; m<M; m++ )
   wAT[m] = ((m+1)*wReg[p])/M-(m*wReg[p])/M
  for(n=0; n<N; n++ )
   hAT[n] = ((n+1)*hReg[p])/N-(n*hReg[p])/N
 ここで、M、Nはリージョン内のタイルの水平方向の個数と垂直方向の個数を示す。リージョンRegion[p]の左上座標(xRsp,yRsp)は下記のようにセットする。
for (m = 0; m <M; m ++)
wAT [m] = ((m + 1) * wReg [p]) / M- (m * wReg [p]) / M
for (n = 0; n <N; n ++)
hAT [n] = ((n + 1) * hReg [p]) / N- (n * hReg [p]) / N
Here, M and N indicate the number of tiles in the region in the horizontal direction and the number in the vertical direction. The upper left coordinates (xRsp, yRsp) of the region Region [p] are set as follows.
  xRsp = region_topleft_x[p] (式REG-1)
  yRsp = region_top_left_y[p]
 なお、region_width_div2_minus1[p]、region_height_div2_minus1[p]は、色差フォーマット(4:2:0、4:2:2、4:4:4)に応じて、サイズを2画素単位で表すか、1画素単位で表すかを切り替えてもよい。
xRsp = region_topleft_x [p] (Formula REG-1)
yRsp = region_top_left_y [p]
Note that region_width_div2_minus1 [p] and region_height_div2_minus1 [p] are either expressed in 2 pixel units or in 1 pixel units depending on the color difference format (4: 2: 0, 4: 2: 2, 4: 4: 4) It may be switched whether or not.
 なお、リージョンを並列に符号化、復号するために、スライスやタイルの先頭と同様、リージョンの先頭でもCABACの初期化を実施する。 In addition, in order to encode and decode regions in parallel, CABAC initialization is performed at the beginning of the region as well as at the beginning of the slice and tile.
 fill_color_present_flagは、ピクチャあるいはリージョンの符号化しないタイル領域(以下、無効タイル)に対して、無効タイルの領域(無効領域)の画素値にセットする値を通知するか否かを示すフラグであり、fill_color_present_flagが1の場合、無効領域の画素値(fill_color_y,fill_color_cb,fill_color_cr)を通知する。fill_color_present_flagが0の場合、無効領域の画素値は黒(0,(1<<bitdepth-1),(1<<bitdepth-1))、あるいグレイ((1<<bitdepth-1),(1<<bitdepth-1),(1<<bitdepth-1))等に設定する。ここで、bitdepthは画素値のビット深度である。 The fill_color_present_flag is a flag indicating whether to notify a value to be set to the pixel value of the invalid tile area (invalid area) to a tile area (hereinafter, invalid tile) that is not encoded in the picture or region, and fill_color_present_flag Is 1, the pixel value (fill_color_y, fill_color_cb, fill_color_cr) of the invalid area is notified. When fill_color_present_flag is 0, the pixel value of the invalid area is black (0, (1 << bitdepth-1), (1 << bitdepth-1)) or gray ((1 << bitdepth-1), (1 Set to << bitdepth-1), (1 << bitdepth-1)), etc. Here, bitdepth is the bit depth of the pixel value.
 また、リージョン毎にtile_parameters()を通知する。tile_parameters()と、その中に含まれるタイル情報tile_info()は、図4(c)、図4(d)のシンタックスで表現してもよい。タイルはリージョンの左上座標(region_topleft_x[i],region_topleft_y[i])を(0,0)として、リージョン内を均一に分割する。 Also notify tile_parameters () for each region. tile_parameters () and tile information tile_info () included in the tile_parameters () may be expressed by the syntax shown in FIGS. 4 (c) and 4 (d). The tile divides the region uniformly with the upper left coordinates of the region (region_topleft_x [i], region_topleft_y [i]) as (0,0).
 図11(c)は、実施形態4を実現する図11(a)のピクチャ分割部2010の一例である。図11(c)では、ピクチャ分割部2010は、リージョン情報算出部20103、タイル情報算出部20101、ピクチャ分割部B20104からなる。リージョン情報算出部20103は、入力画像を、例えば、図17(d)、図18(c)、図19(b)に示すようなリージョンに分割するためのリージョン情報(リージョンの個数、左上座標、幅と高さ、無効領域にセットする画素値等)を算出する。タイル情報算出部20101は、リージョン情報算出部20103で算出したリージョン情報を参照して、ピクチャをリージョンに置換え、実施形態3で説明した方法でリージョンをタイルに分割(例えば図17(e)、図18(d)、図19(c)、図31(c)等)するためのタイル情報を算出する。ピクチャ分割部B20104は、リージョン情報を参照してピクチャをリージョン分割し、タイル情報を参照してリージョンをタイルに分割する。 FIG. 11 (c) is an example of the picture dividing unit 2010 of FIG. 11 (a) that implements the fourth embodiment. In FIG. 11 (c), the picture dividing unit 2010 includes a region information calculating unit 20103, a tile information calculating unit 20101, and a picture dividing unit B20104. The region information calculation unit 20103, for example, region information for dividing the input image into regions as shown in FIGS. 17 (d), 18 (c), and 19 (b) (number of regions, upper left coordinates, Width and height, pixel values set in the invalid area, etc.) are calculated. The tile information calculation unit 20101 refers to the region information calculated by the region information calculation unit 20103, replaces the picture with the region, and divides the region into tiles by the method described in the third embodiment (for example, FIG. 17 (e), FIG. 18 (d), FIG. 19 (c), FIG. 31 (c), etc.) are calculated. The picture division unit B20104 divides the picture into regions by referring to the region information, and divides the region into tiles with reference to the tile information.
 ヘッダ情報生成部2011では、図20(k)に示すシンタックスを生成し、タイル符号化部2012、符号化ストリーム生成部2013に出力する。 The header information generation unit 2011 generates the syntax shown in FIG. 20 (k) and outputs it to the tile encoding unit 2012 and the encoded stream generation unit 2013.
 タイル符号化部2012は分割したタイルを符号化し、符号化ストリーム生成部2013は、各タイルの符号化ストリームから符号化ストリームTeを生成する。 The tile encoding unit 2012 encodes the divided tiles, and the encoded stream generation unit 2013 generates an encoded stream Te from the encoded stream of each tile.
 また、図9の動画像復号装置31では、ヘッダ情報復号部2001は図20(k)に示すシンタックスを復号し、タイル復号部2002、タイル合成部2003に出力する。タイル復号部2002は、指定されたタイルの符号化ストリームを復号し、タイル合成部2003に出力する。 9, the header information decoding unit 2001 decodes the syntax shown in FIG. 20 (k) and outputs the decoded syntax to the tile decoding unit 2002 and the tile composition unit 2003. The tile decoding unit 2002 decodes the encoded stream of the designated tile and outputs it to the tile synthesis unit 2003.
 タイル合成部2003の平滑化処理部20031は、タイルのオーバーラップ領域があれば、オーバーラップ領域にフィルタ処理を施したタイルを合成部20032に出力し、タイルのオーバーラップ領域がなければ、タイル復号部2012の出力タイルをそのまま合成部20032に出力する。タイル合成部20032は、ヘッダ情報復号部2001で復号したリージョン情報とタイル情報から、指定された領域の復号画像を合成する。 The smoothing processing unit 20031 of the tile composition unit 2003 outputs a tile obtained by filtering the overlap region to the composition unit 20032 if there is an overlap region of the tile, and performs tile decoding if there is no overlap region of the tile. The output tile of the part 2012 is output as it is to the composition part 20032. The tile synthesis unit 20032 synthesizes the decoded image of the designated area from the region information and tile information decoded by the header information decoding unit 2001.
 このように、ピクチャをリージョンに分割後、リージョンをタイルに分割すると、リージョン内のタイルのサイズをほぼ均一に設定することができる。そのため、実施形態3にくらべ、ヘッダで通知するタイル情報を削減することができる。また、リージョン境界では、プロジェクション画像がほぼ不連続であるので、オーバーラップ領域を設ける必要がないが、リージョン内のタイル境界では、プロジェクション画像が連続している場合が多いので、オーバーラップ領域が必要である。従って、リージョン境界ではオーバーラップ領域を設けないことにより、冗長な符号化ストリームを削減することができる。 In this way, after dividing a picture into regions and then dividing the region into tiles, the size of the tiles in the region can be set almost uniformly. Therefore, the tile information notified by the header can be reduced as compared with the third embodiment. In addition, since the projection image is almost discontinuous at the region boundary, there is no need to provide an overlap region, but at the tile boundary in the region, the projection image is often continuous, so an overlap region is necessary. It is. Therefore, redundant encoded streams can be reduced by not providing overlap regions at region boundaries.
 また、プロジェクションフォーマット(ERP、SSP等)と、図17~19に示すようなパッキング方法から、プロジェクション画像において隣接するタイルが連続する位置は特定できる。従って、タイル境界において、プロジェクション画像が連続する位置はタイルにオーバーラップ領域を設け、そうでない場合はオーバーラップ領域を設けない。例えば、図18(d)ではリージョン0のTile[0][0]とTile[1][0]の境界、Tile[1][0]とTile[2][0]の境界、および、リージョン1のTile[0][0]とTile[1][0]の境界、Tile[1][0]とTile[2][0]の境界にはオーバーラップ領域を設ける。また、例えば、図19(c)では、リージョン0のTile[0][0]とTile[1][0]の境界にはオーバーラップ領域を設けず、リージョン1のTile[0][0]とTile[1][0]の境界、Tile[1][0]とTile[2][0]、Tile[2][0]とTile[3][0]の境界にはオーバーラップ領域を設ける。オーバーラップ領域を設けない場合、オーバーラップ領域の幅wOVLP、高さhOVLPを0にセットし、overlap_tiles_flagを0にセットする。 Also, from the projection format (ERP, SSP, etc.) and the packing method as shown in FIGS. 17 to 19, the position where adjacent tiles are continuous in the projection image can be specified. Therefore, at the tile boundary, an overlap area is provided in the tile at a position where projection images are continuous, and if not, no overlap area is provided. For example, in Figure 18 (d), the boundary between Tile [0] [0] and Tile [1] [0] in region 0, the boundary between Tile [1] [0] and Tile [2] [0], and the region An overlap region is provided at the boundary between Tile [0] [0] and Tile [1] [0] and at the boundary between Tile [1] [0] and Tile [2] [0]. Also, for example, in FIG. 19C, no overlap region is provided at the boundary between Tile [0] [0] and Tile [1] [0] in region 0, and Tile [0] [0] in region 1 And Tile [1] [0], Tile [1] [0] and Tile [2] [0], and Tile [2] [0] and Tile [3] [0] Provide. When no overlap region is provided, the width wOVLP and height hOVLP of the overlap region are set to 0, and overlap_tiles_flag is set to 0.
 このように、オーバーラップ領域が不要な場合、オーバーラップ領域の幅と高さの情報を符号化しないため、ヘッダ情報を削減することができる。また、オーバーラップによって、同じ領域を複数符号化することによる冗長な符号量を削減するので、符号化効率の低下を抑制することができる。 In this way, when the overlap area is unnecessary, the information on the width and height of the overlap area is not encoded, so that the header information can be reduced. In addition, since the redundant code amount due to the multiple encoding of the same region is reduced by the overlap, it is possible to suppress a decrease in encoding efficiency.
 図32はリージョンに関するシンタックスである。図32では、リージョンの終わりか否かを示すフラグend_of_region_flagが0である間(リージョンの終わりではない間)、CTUのシンタックスcoding_tree_unit()とend_of_region_flagを通知する。なお、タイルの終わりの位置ではタイルの終了を示すend_of_subset_one_bit(=1)を通知し、バイトアラインする。タイルの終了位置は下式で判定する。 Figure 32 shows the syntax for the region. In FIG. 32, while the flag end_of_region_flag indicating whether or not the end of the region is 0 (not the end of the region), the CTU syntax coding_tree_unit () and end_of_region_flag are notified. At the end position of the tile, end_of_subset_one_bit (= 1) indicating the end of the tile is notified, and byte alignment is performed. The end position of the tile is determined by the following formula.
  if (tiles_enabled_flag && CtbAddrInTile>=NumCtbInTile[TileId])
 CtbAddrInTsはピクチャを通してのCTUのアドレス、NumCtbInTile[]はタイル内のCTUの個数、CtbAddrInTileはタイル内のCTUのアドレスを示す。CtbAddrInTileがNumCtbInTile[]以上は、対象タイルの外を表すので、対象タイルの終わりであることが分かる。図32では、タイルの識別子TileIdはタイルの終わりで1インクリメントされる。つまり、リージョン内でTileIdは一意であり、異なるリージョンの先頭で0にリセットされる。
if (tiles_enabled_flag &&CtbAddrInTile> = NumCtbInTile [TileId])
CtbAddrInTs indicates the CTU address through the picture, NumCtbInTile [] indicates the number of CTUs in the tile, and CtbAddrInTile indicates the CTU address in the tile. If CtbAddrInTile is greater than or equal to NumCtbInTile [], it represents the outside of the target tile, so it can be seen that it is the end of the target tile. In FIG. 32, the tile identifier TileId is incremented by 1 at the end of the tile. That is, TileId is unique within a region and is reset to 0 at the beginning of a different region.
 次に、タイルをCTUの倍数によらずに分割した時のCTUのシンタックスcoding_tree_unit()を図33に示す。タイル(タイル有効領域)の左上座標がCTUの整数倍の位置ではない場合に対応するため、CTUの左上座標(xCtb,yCtb)を各タイルで導出する。具体的には、タイル内アドレスCtbAddrInTileから導出されるCTUのタイル内座標((CtbAddrInTile%TileWidthinCtbs[TileId])<<CtbLog2SizeY,(CtbAddrInTile/TileWidthinCtbs[TileId])<<CtbLog2SizeY)に、タイル左上座標(TileAddrX[TileId],TileAddrY[TileId])、および、リージョン左上座標(RegionAddrX[RegId],RegionAddrY[RegId])を加算することで、ピクチャ内におけるタイルのCTUの座標を導出する
  xCtb = ((CtbAddrInTile%TileWidthinCtbs[TileId])<<CtbLog2SizeY)+TileAddrX[TileId]+RegionAddrX[RegID]
  yCtb = ((CtbAddrInTile/TileWidthinCtbs[TileId])<<CtbLog2SizeY)+TileAddrY[TileId]+RegionAddrY[RegID]
 ここで、TileWidthinCtbs[]はCTU単位のタイル有効領域の幅、wT[]とhT[]はタイル有効領域の画素単位の幅と高さ、CtbLog2SizeYはCTUサイズの対数値、(TileAddrX,TileAddrY)は画素単位のタイルの左上座標、(RegionAddrX[RegId],RegionAddrY[RegId])は画素単位のリージョンの左上座標である。画素単位のタイルの左上座標(TileAddrX,TileAddrY)、およびリージョンの左上座標(RegionAddrX[RegId],RegionAddrY[RegId])には(式TLA-1)や(式TLA-2)で導出した(xTsmn,yTsmn)、(式REG-1)で導出した(xRsp,yRsp)をセットしてもよい。なおタイル有効領域の幅と高さ(wT[],hT[])の代わりにタイル符号化領域の幅と高さ(wTile[],hTile[])を用いても良い。
Next, FIG. 33 shows the syntax coding_tree_unit () of the CTU when the tile is divided regardless of the multiple of the CTU. In order to cope with the case where the upper left coordinate of the tile (tile effective area) is not an integer multiple of the CTU, the upper left coordinate (xCtb, yCtb) of the CTU is derived for each tile. Specifically, the CTU in-tile coordinates derived from the in-tile address CtbAddrInTile ((CtbAddrInTile% TileWidthinCtbs [TileId]) << CtbLog2SizeY, (CtbAddrInTile / TileWidthinCtbs [TileId]) << CtbLog2SizeY) [TileId], TileAddrY [TileId]) and region upper left coordinates (RegionAddrX [RegId], RegionAddrY [RegId]) are added to derive the CTU coordinates of the tile in the picture xCtb = ((CtbAddrInTile% TileWidthinCtbs [TileId]) << CtbLog2SizeY) + TileAddrX [TileId] + RegionAddrX [RegID]
yCtb = ((CtbAddrInTile / TileWidthinCtbs [TileId]) << CtbLog2SizeY) + TileAddrY [TileId] + RegionAddrY [RegID]
Where TileWidthinCtbs [] is the width of the tile effective area in CTU units, wT [] and hT [] are the width and height in pixel units of the tile effective area, CtbLog2SizeY is the logarithmic value of the CTU size, and (TileAddrX, TileAddrY) is The upper left coordinates of the tile in pixel units, (RegionAddrX [RegId], RegionAddrY [RegId]) are the upper left coordinates of the region in pixel units. The upper left coordinates (TileAddrX, TileAddrY) of the tile in pixels and the upper left coordinates of the region (RegionAddrX [RegId], RegionAddrY [RegId]) are derived from (Expression TLA-1) and (Expression TLA-2) (xTsmn, yTsmn), (xRsp, yRsp) derived by (Expression REG-1) may be set. Note that the width and height (wTile [], hTile []) of the tile coding area may be used instead of the width and height (wT [], hT []) of the tile effective area.
 図34は、リージョンを示す別のシンタックスである。図32では、スライスをリージョンに分割し、リージョンをタイルに分割したが、図34ではリージョンをスライスやタイルに分割してもよい。リージョン情報(リージョンの形状やサイズ)は図20(k)に示すようにPPSで通知される。そして、slice_segment_data()を復号する過程で、リージョンやタイルの終わりを検出すると、end_of_region_flag(=1)やend_of_subset_one_bit(=1)を挿入し、バイトアラインする。タイルの終了条件は図32と同じく、下式である。 Fig. 34 shows another syntax indicating a region. In FIG. 32, the slice is divided into regions and the regions are divided into tiles. However, in FIG. 34, the regions may be divided into slices and tiles. Region information (region shape and size) is notified by PPS as shown in FIG. 20 (k). When the end of the region or tile is detected in the process of decoding slice_segment_data (), end_of_region_flag (= 1) or end_of_subset_one_bit (= 1) is inserted and byte aligned. The end condition of the tile is the following equation, as in FIG.
  if (tiles_enabled_flag && CtbAddrInTile>=NumCtbInTile[RegId][TileId])
タイル内のCTUアドレスが所定の値NumCtbInTile[RegId][TileId]以上になると、対象タイルの処理は終了し、TileIdをインクリメントし、次のタイルの処理が始まる。リージョンの終了条件は、下式が成立しなくなった時である。
if (tiles_enabled_flag &&CtbAddrInTile> = NumCtbInTile [RegId] [TileId])
When the CTU address in the tile reaches a predetermined value NumCtbInTile [RegId] [TileId] or more, the processing of the target tile ends, TileId is incremented, and processing of the next tile starts. The region end condition is when the following expression no longer holds.
  while (TileId<NumTilesInRegion[RegId])
TileIdが所定の値NumTilesInRegion[RegId]以上になると、対象リージョンの処理は終了し、RegIdをインクリメントし、TileIdとCtbAddrInTsをリセットし、次のリージョンの処理が始まる。このように、TileIdやCtbAddrInTsはリージョン単位でリセットされる。
while (TileId <NumTilesInRegion [RegId])
When TileId becomes equal to or greater than a predetermined value NumTilesInRegion [RegId], the processing of the target region ends, RegId is incremented, TileId and CtbAddrInTs are reset, and processing of the next region starts. As described above, TileId and CtbAddrInTs are reset in units of regions.
 なお、図34で呼び出されるcoding_tree_unit(TileId)のシンタックスは図33と同じであり、CTUの倍数とは限らないサイズのリージョンやタイルの処理を実施するために、CTUの左上座標はタイルやリージョンの左上座標を用いて算出する。 Note that the syntax of coding_tree_unit (TileId) called in Fig. 34 is the same as in Fig. 33, and the upper left coordinate of CTU is the tile or region in order to process regions and tiles that are not necessarily a multiple of CTU. Is calculated using the upper left coordinates of.
 以上のように、CTUの倍数とは限らないサイズのリージョンをタイルに分割して符号化、復号することができる。 As described above, a region having a size that is not necessarily a multiple of CTU can be divided into tiles for encoding and decoding.
  (実施形態5)
 実施形態5では、実施形態3、4において、無効領域のタイルを通知する例を説明する。
(Embodiment 5)
In the fifth embodiment, an example in which a tile in an invalid area is notified in the third and fourth embodiments will be described.
 図17(e)は、図17(c)をリージョンに分割後、タイルに分割した図である。リージョン0、2は4タイル、リージョン1は2タイルに分割されている。リージョン0、2において、タイルTile[0][0]は、プロジェクション画像に対応する領域をもつ有効領域であるが、タイルTile[1][0]、Tile[2][0]、Tile[3][0]は無効領域である。従って、Tile[1][0]、Tile[2][0]、Tile[3][0]は符号化・復号する必要はない。 FIG. 17 (e) is a diagram in which FIG. 17 (c) is divided into regions and then divided into tiles. Regions 0 and 2 are divided into 4 tiles, and region 1 is divided into 2 tiles. In regions 0 and 2, tile Tile [0] [0] is an effective area having an area corresponding to the projection image, but tiles Tile [1] [0], Tile [2] [0], and Tile [3 ] [0] is an invalid area. Therefore, Tile [1] [0], Tile [2] [0], and Tile [3] [0] do not need to be encoded / decoded.
 図20(l)に示すシンタックスでは、無効領域のタイルを通知するフラグtile_valid_flagをタイル情報に含め、tile_valid_flagが1のタイルは復号し、tile_valid_flagが0のタイルは復号しない。それ以外のシンタックスは図5(i)のシンタックスと同じであり、説明を省略する。図20(l)では、タイルの幅および高さに関する情報は、タイルの垂直方向の個数に関する情報(num_tile_rows_minus1)を通知し、タイル行毎にタイルの高さに関する情報(row_height_minus1[i])、タイルの水平方向の個数に関する情報(num_tile_columns_minus1)、タイルの幅に関する情報(column_width_minus1[i])を通知するが、図4(d)と同様、タイルの高さに関する情報(row_height_minus1[i])とタイルの水平方向の個数分のタイルの幅に関する情報(column_width_minus1[i])は、各々垂直方向の個数分、水平方向の個数分だけ通知してもよい。 In the syntax shown in FIG. 20 (l), a tile tile_valid_flag for notifying a tile in an invalid area is included in tile information, a tile with a tile_valid_flag of 1 is decoded, and a tile with a tile_valid_flag of 0 is not decoded. The other syntax is the same as that shown in FIG. In FIG. 20 (l), the information on the tile width and height is notified of the information on the number of tiles in the vertical direction (num_tile_rows_minus1), the information on the tile height (row_height_minus1 [i]) for each tile row, Information on the number of horizontal tiles (num_tile_columns_minus1) and tile width information (column_width_minus1 [i]) are notified. As in FIG. 4 (d), tile height information (row_height_minus1 [i]) and tile height Information about the width of the tiles in the horizontal direction (column_width_minus1 [i]) may be notified for the number in the vertical direction and the number in the horizontal direction.
 また、無効領域の画素値は、図20(k)において、fill_color_present_flagに1をセットし、fill_color_y、fill_color_cb、fill_color_crで通知してもよい。 Also, the pixel value of the invalid area may be notified by setting fill_color_present_flag to 1 in FIG. 20 (k) and filling_color_y, fill_color_cb, and fill_color_cr.
 無効領域の別の例として、図35に示すRight-angled Triangular resion-wise packing for cube map projection Formatがある。図35(a)に示すように、Right-angled Triangular resion-wise packing for cube map projection Formatは、右斜め前から見える立方体の表面(Front、Left、TopとBottomの半分の領域)のみをパッキングし、符号化する。このパッキングの形態を図35(b)に示す。図35(b)のピクチャは3つのリージョンからなる。Region[0]は図35(a)のFrontとLeftからなる。Region[1]は図35(a)のTopとBottomの各々半分の領域(三角形領域)と、2つの三角形間のパディング領域からなる。Region[2]は図35(a)には存在しない無効領域であり、region[0]とregion[1]の高さが異なることにより発生する。図35(c)に示すように、region[0]は左上座標(xRs[0],yRs[0])、幅wReg[0]、高さhReg[0]であり、region[1]は左上座標(xRs[1],yRs[1])、幅wReg[1]、高さhReg[1]であり、region[2]は左上座標(xRs[2],yRs[2])、幅wReg[2]、高さhReg[2]である。 As another example of the invalid area, there is Right-angled Triangular resion-wise packing for cube map projection Format shown in Fig. 35. As shown in Figure 35 (a), Right-angled Triangular resion-wise packing for cube map project 立方 Format only packs the cube surface (front, left, half of Top, Bottom) that can be seen from the right front. , Encode. The form of this packing is shown in FIG. 35 (b). The picture in FIG. 35 (b) consists of three regions. Region [0] consists of Front and Left in FIG. 35 (a). Region [1] consists of a half area (triangle area) of each of Top and Bottom in FIG. 35 (a) and a padding area between two triangles. Region [2] is an invalid region that does not exist in FIG. 35 (a), and is generated when the heights of region [0] and region [1] are different. As shown in Figure 35 (c), region [0] is the upper left coordinate (xRs [0], yRs [0]), width wReg [0], height hReg [0], and region [1] is upper left Coordinates (xRs [1], yRs [1]), width wReg [1], height hReg [1], region [2] is the upper left coordinate (xRs [2], yRs [2]), width wReg [ 2], height hReg [2].
 図11のヘッダ情報生成部2011では、図20(l)に示すシンタックスを生成し、タイル符号化部2012、符号化ストリーム生成部2013に出力する。そして、タイル符号化部2012は有効なタイルのみ符号化する。 11 generates the syntax shown in FIG. 20 (l) and outputs it to the tile encoding unit 2012 and the encoded stream generating unit 2013. The tile encoding unit 2012 encodes only valid tiles.
 また、図9の動画像復号装置31では、ヘッダ情報復号部2001は図20(l)に示すシンタックスを復号し、タイル復号部2002、タイル合成部2003に出力する。タイル復号部2002は、有効なタイルの符号化ストリームを復号し、タイル合成部2003に出力する。 9, the header information decoding unit 2001 decodes the syntax shown in FIG. 20 (l) and outputs the decoded syntax to the tile decoding unit 2002 and the tile synthesis unit 2003. The tile decoding unit 2002 decodes an encoded stream of valid tiles and outputs the decoded stream to the tile synthesis unit 2003.
 それ以外の符号化・復号処理は実施形態3、4と同じである。 Other encoding / decoding processes are the same as those in the third and fourth embodiments.
 タイルの有効・無効を示すフラグを通知することで、動画像符号化装置および動画像復号装置は必要な符号化・復号処理のみを実施するので、無駄な処理を削減することができる。 By notifying the flag indicating the validity / invalidity of the tile, the moving image encoding device and the moving image decoding device perform only necessary encoding / decoding processing, so that useless processing can be reduced.
  (実施形態6)
 実施形態1~5では、所望の領域を表示するために、画像をタイルに分割、独立に符号化し、必要なタイルのみを復号するための技術を説明した。実施形態6では、リージョン単位で独立に符号化・復号する技術を説明する。この場合、リージョンを分割したタイルは、空間方向には隣接するタイルを参照しないが、時間方向には、同じリージョンに属する異なる時間のタイルを参照することができる。また、タイル境界にループフィルタをかけてもよい。これはリージョンを従来のピクチャとみなして符号化・復号する処理と同じである。従って、図20(m)に示すスライスデータ(slice_segment_data())内で通知する各リージョンに対し、リージョン(Region())単位で符号化・復号を完結させる。左上座標(region_topleft_x[i],region_topleft_y[i])を(0,0)とする、幅wReg[i]、高さhReg[i]のリージョンを1枚のピクチャとみなし、図20(n)に示すRegion()では、ラスタスキャン順に図5(h)に示すTile()のシンタックスを通知してもよい。なお、各リージョンの先頭の量子化パラメータは、スライスで規定された量子化パラメータの初期値を使用してもよい。ピクチャをリージョンに分割する際は、ピクチャを1つのスライスとして処理してもよい。また、図20(m)、図20(n)、図5(h)の代わりに、図32や図34に示すシンタックスを用いて、リージョン単位で独立に符号化処理あるいは復号処理を実施してもよい。
(Embodiment 6)
In the first to fifth embodiments, the technique for dividing an image into tiles and encoding them independently and decoding only the necessary tiles in order to display a desired region has been described. In the sixth embodiment, a technique for encoding and decoding independently for each region will be described. In this case, tiles obtained by dividing a region do not refer to adjacent tiles in the spatial direction, but can refer to tiles at different times belonging to the same region in the temporal direction. Further, a loop filter may be applied to the tile boundary. This is the same as the process of encoding / decoding a region as a conventional picture. Accordingly, for each region notified in the slice data (slice_segment_data ()) shown in FIG. 20 (m), encoding / decoding is completed in units of regions (Region ()). The region of width wReg [i] and height hReg [i] with the upper left coordinates (region_topleft_x [i], region_topleft_y [i]) as (0,0) is regarded as one picture, and in FIG. 20 (n) In the Region () shown, the syntax of Tile () shown in FIG. 5 (h) may be notified in the raster scan order. Note that the initial value of the quantization parameter defined by the slice may be used as the first quantization parameter of each region. When dividing a picture into regions, the picture may be processed as one slice. Also, instead of FIG. 20 (m), FIG. 20 (n), and FIG. 5 (h), the encoding process or the decoding process is performed independently for each region using the syntax shown in FIG. 32 or FIG. May be.
 タイル単位の独立処理を、リージョン単位の独立処理に変更することにより、各タイルで参照可能な情報(リージョン内のコロケートタイルの隣接タイルの情報)が増える。従って、符号化効率の低下を抑えつつ、画面の一部のみを復号することができる。 変 更 By changing the independent processing for each tile to independent processing for each region, the information that can be referred to by each tile (information on adjacent tiles of the collocated tile in the region) increases. Therefore, it is possible to decode only a part of the screen while suppressing a decrease in encoding efficiency.
  (実施形態7)
 ピクチャのタイルへの分割方法の別の実施形態を、図36を用いて説明する。実施形態1~6では、CTUの整数倍に限定されない正味の表示領域(タイルアクティブ領域)の左上座標を基準に、オーバーラップ領域、クロップオフセット領域(タイル無効領域)を含めた領域をCTU単位で符号化、復号処理を実施した。タイルアクティブ領域の左上座標はCTUの整数倍の位置には限定されない。
(Embodiment 7)
Another embodiment of the method for dividing a picture into tiles will be described with reference to FIG. In the first to sixth embodiments, the area including the overlap area and the crop offset area (tile invalid area) is defined in CTU units based on the upper left coordinates of the net display area (tile active area) that is not limited to an integer multiple of the CTU. Encoding and decoding processes were performed. The upper left coordinate of the tile active area is not limited to an integer multiple of the CTU.
 実施形態7のタイル分割方法は、タイルアクティブ領域とオーバーラップ領域を含むタイル有効領域に、さらにクロップオフセット領域を加えたタイル(タイル符号化領域)を、図36に示すようにオーバーラップすることなく配置したピクチャを生成し、このピクチャを動画像符号化装置11への入力画像とする。この入力画像において、タイル符号化領域の左上座標はCTUの整数倍の位置に設定され、タイル符号化領域のサイズはCTUの整数倍である。そして、図4(a)のSPSで通知されるピクチャの幅pic_width_in_luma_samplesと高さpic_height_in_luma_samplesには、ピクチャの正味のサイズ(第1のピクチャサイズ)ではなく、オーバーラップ領域やクロップオフセット領域を含む下式のサイズ(第2のピクチャサイズ)を設定する。 In the tile dividing method of the seventh embodiment, a tile (tile coding area) obtained by adding a crop offset area to a tile effective area including a tile active area and an overlap area without overlapping as shown in FIG. An arranged picture is generated, and this picture is used as an input picture to the moving picture coding apparatus 11. In this input image, the upper left coordinate of the tile coding area is set at a position that is an integral multiple of the CTU, and the size of the tile coding area is an integral multiple of the CTU. In the picture width pic_width_in_luma_samples and height pic_height_in_luma_samples notified by the SPS in FIG. 4 (a), the following formula includes an overlap area and a crop offset area, not the net size of the picture (first picture size). Is set (second picture size).
  wPict = pic_width_in_luma_samples = ΣwTile[m]-wCRP[M-1] =Σ(wAT[m]+wOVLP[m]+wCRP[m])-wCRP[M-1] (式TCS-2)
  hPict = pic_height_in_luma_samples = ΣhTile[n]-hCRP[N-1] =Σ(hAT[n]+hOVLP[n]+hCRP[n])-hCRP[N-1]
 ピクチャの幅wPict、高さhPictにはピクチャの右端、下端のクロップオフセット領域(wCRP[M-1]とhCRP[N-1])は含まれない。また、画像復号装置31では、タイル符号化領域を復号し、オーバーラップ領域は隣接するタイルアクティブ領域とフィルタ処理し、クロップオフセット領域は破棄することで、もとのピクチャと同じサイズ(第1のピクチャサイズ)のピクチャを出力する。このようにCTU単位でタイルを処理することで、符号化処理と復号処理に、従来のタイル符号化部2012、タイル復号部2002を使用することが可能であり、符号化処理および復号処理の複雑度を下げることもできる。
wPict = pic_width_in_luma_samples = ΣwTile [m] -wCRP [M-1] = Σ (wAT [m] + wOVLP [m] + wCRP [m])-wCRP [M-1] (Formula TCS-2)
hPict = pic_height_in_luma_samples = ΣhTile [n] -hCRP [N-1] = Σ (hAT [n] + hOVLP [n] + hCRP [n])-hCRP [N-1]
The picture width wPict and height hPict do not include the crop offset areas (wCRP [M-1] and hCRP [N-1]) at the right and bottom edges of the picture. Further, the image decoding device 31 decodes the tile encoding area, filters the overlap area with the adjacent tile active area, and discards the crop offset area, thereby reducing the size of the original picture (first image). (Picture size) picture is output. By processing tiles in units of CTUs in this way, the conventional tile encoding unit 2012 and tile decoding unit 2002 can be used for encoding processing and decoding processing, and the encoding processing and decoding processing are complicated. You can also reduce the degree.
 図36(a)は、実施形態1と同じく、ピクチャを、オーバーラップを許してCTUの整数倍に限定されないタイルに分割した図である。斜線部はオーバーラップ領域であり、隣接するタイルアクティブ領域とオーバーラップする領域である。図36(b)は、図36(a)の1つのタイルを取り出した図である。タイル(タイル有効領域)Tile[m][n]は幅wT[m]、高さhT[n]であり、斜線で示すオーバーラップ領域の幅wOVLP[m]と高さhOVLP[n]はwT[m]とhT[n]に各々含まれる。タイル有効領域の幅wT[m]、高さhT[n]、左上座標(sTsmn,yTsmn)はCTUの整数倍に限定されない値を取る。図36(c)は、隣接するタイル有効領域がオーバーラップしないように、CTUの整数倍の位置にタイル有効領域の左上座標を設定し生成したピクチャである。このピクチャが動画像符号化装置11への入力画像である。このようにタイル有効領域を配置した場合、符号化処理あるいは復号処理は、左上座標(xTsmn,yTsmn)がCTUの整数倍の位置であって、CTUの整数倍のサイズをもつタイル符号化領域(幅wTile[m]、高さhTile[n])に対し実施される。タイル符号化領域は、(式TCS-1)あるいは式(TCS-3)に示すように、タイル有効領域とクロップオフセット領域(タイル無効領域)を合わせた領域である。図36(c)に示す、タイル符号化領域の左上座標(xTsmn,yTsmn)は下式で表される。 FIG. 36 (a) is a diagram in which a picture is divided into tiles that are allowed to overlap and are not limited to an integral multiple of the CTU, as in the first embodiment. The shaded area is an overlap area, which is an area overlapping with an adjacent tile active area. FIG. 36 (b) is a diagram in which one tile of FIG. 36 (a) is taken out. The tile (tile effective area) Tile [m] [n] has a width wT [m] and a height hT [n], and the width wOVLP [m] and height hOVLP [n] of the overlap area shown by diagonal lines is wT Included in [m] and hT [n], respectively. The width wT [m], the height hT [n], and the upper left coordinates (sTsmn, yTsmn) of the tile effective area take values not limited to integer multiples of the CTU. FIG. 36 (c) is a picture generated by setting the upper left coordinate of the tile effective area at a position that is an integral multiple of the CTU so that adjacent tile effective areas do not overlap. This picture is an input image to the moving image encoding device 11. When the tile effective area is arranged in this way, the encoding process or the decoding process is performed with a tile encoding area (xTsmn, yTsmn) having a position that is an integral multiple of the CTU and a size that is an integral multiple of the CTU ( For width wTile [m] and height hTile [n]). The tile coding area is an area obtained by combining the tile effective area and the crop offset area (tile invalid area) as shown in (Formula TCS-1) or Formula (TCS-3). The upper left coordinates (xTsmn, yTsmn) of the tile coding area shown in FIG. 36 (c) are expressed by the following equations.
  xTsmn = ΣwTile[i] = Σceil(wT[i]) (Σはi=0..m-1の総和)
  yTsmn = ΣhTile[j] = Σceil(hT[j]) (Σはj=0..n-1の総和)
 ピクチャの幅pic_width_in_luma_samplesと高さpic_height_in_luma_samples以外のシンタックスを図37に示す。図37のtile_info()は、図25(a)のtile_info()と比べ、uniform_spacing_flagが0でない場合にtotal_cropoffset_widthとtotal_cropoffset_heightを通知する点が異なる。total_cropoffset_widthはM-1個のクロップオフセット領域の幅wCRP[m](m=0..M-2)の総和であり、total_cropoffset_heightはN-1個のクロップオフセット領域の高さhCRP[n](n=0..N-2)の総和であり、uniform_spacing_flagが0でない場合に、タイル有効領域の幅wT[m]、高さhT[n]を算出するために使用する。
xTsmn = ΣwTile [i] = Σceil (wT [i]) (Σ is the sum of i = 0..m-1)
yTsmn = ΣhTile [j] = Σceil (hT [j]) (Σ is the sum of j = 0..n-1)
FIG. 37 shows syntax other than the picture width pic_width_in_luma_samples and the height pic_height_in_luma_samples. The tile_info () in FIG. 37 differs from the tile_info () in FIG. 25A in that the total_cropoffset_width and total_cropoffset_height are notified when the uniform_spacing_flag is not 0. total_cropoffset_width is the sum of the widths wCRP [m] (m = 0..M-2) of M-1 crop offset areas, and total_cropoffset_height is the height hCRP [n] (n of N-1 crop offset areas = 0..N-2), and when uniform_spacing_flag is not 0, it is used to calculate the width wT [m] and height hT [n] of the tile effective area.
  wPict1 = wPict-total_croppoffset_width
  for(m=0; m<M; m++ )
   wT[m] = ((m+1)*wPict1)/M-(m*wPict1)/M 
  hPict1 = hPict-total_croppoffset_height
  for(n=0; n<N; n++ )
   hT[n] = ((n+1)*hPict1)/N-(n*hPict1)/N
ここで、wPictとhPictは(式TCS-2)で算出した入力画像の幅と高さ(第2のピクチャサイズ)である。uniform_spacing_flagが0の場合、タイル有効領域の幅wT[m]、高さhT[n]は、画素単位であればcolumn_width_in_luma_samples_div2_minus1[m]とrow_height_in_luma_samples_div2_minus1[n]を(式TSP-10)に代入して算出し、そうでなければcolumn_width_minus1[m]とrow_width_minus1[n]を(式TSP-7)~(式TSP-9)のいずれかに代入して算出する。なお、overlap_tiles_flagはオーバーラップ領域を含むクロップオフセット領域の有無を示すフラグである。それ以外のシンタックスは図25(a)と同じであるので、説明を省略する。
wPict1 = wPict-total_croppoffset_width
for (m = 0; m <M; m ++)
wT [m] = ((m + 1) * wPict1) / M- (m * wPict1) / M
hPict1 = hPict-total_croppoffset_height
for (n = 0; n <N; n ++)
hT [n] = ((n + 1) * hPict1) / N- (n * hPict1) / N
Here, wPict and hPict are the width and height (second picture size) of the input image calculated by (Formula TCS-2). If uniform_spacing_flag is 0, the width wT [m] and height hT [n] of the tile effective area are calculated by substituting column_width_in_luma_samples_div2_minus1 [m] and row_height_in_luma_samples_div2_minus1 [n] into (Formula TSP-10) Otherwise, it is calculated by substituting column_width_minus1 [m] and row_width_minus1 [n] into any one of (Formula TSP-7) to (Formula TSP-9). Note that overlap_tiles_flag is a flag indicating the presence or absence of a crop offset area including an overlap area. The other syntax is the same as that in FIG.
 オーバーラップ情報に関しては、図25(b)のoverlap_tiles_info()で、uniform_overlap_flag、tile_overlap_width_minus1[]、tile_overlap_height_minus1[]を通知する。また、オーバーラップの大きさ(幅または高さ)に0を許容するならば、1を減算しないオーバーラップの幅(tile_overlap_width[])および高さ(tile_overlap_height[])を通知してもよい。さらに、常にオーバーラップの大きさが同じとするなら、uniform_overlap_flagを送らず、一組のtile_overlap_width_minus1およびtile_overlap_height_minus1だけを土してもよい。これらの値を用いて、例えば、オーバーラップ領域の幅wOVLP[m]、高さhOVLP[n]は(式OVLP-1)あるいは(式OVLP-2)により算出してもよい。また、例えば、クロップオフセット領域の幅wCRP[m]、高さhCRP[n]は(式CRP-1)により算出してもよい。 Regarding overlap information, uniform_overlap_flag, tile_overlap_width_minus1 [], and tile_overlap_height_minus1 [] are notified by overlap_tiles_info () in FIG. If 0 is allowed for the size (width or height) of the overlap, the overlap width (tile_overlap_width []) and height (tile_overlap_height []) without subtracting 1 may be notified. Further, if the overlap size is always the same, uniform_overlap_flag may not be sent, and only a set of tile_overlap_width_minus1 and tile_overlap_height_minus1 may be earthed. Using these values, for example, the width wOVLP [m] and the height hOVLP [n] of the overlap region may be calculated by (Expression OVLP-1) or (Expression OVLP-2). Further, for example, the width wCRP [m] and the height hCRP [n] of the crop offset area may be calculated by (Expression CRP-1).
 一方、スライスデータ(slice_segment_data())以下のシンタックスは、タイル符号化部2012あるいはタイル復号部2002で処理するタイル(タイル符号化領域)がCTUの整数倍であり、タイルの先頭がCTUの整数倍の位置に設定されるため、図23に示す従来のslice_segment_data()やcoding_tree_unit()を利用してもよい。 On the other hand, in the syntax below slice data (slice_segment_data ()), the tile (tile coding area) processed by the tile coding unit 2012 or the tile decoding unit 2002 is an integer multiple of the CTU, and the top of the tile is an integer of the CTU. Since the double position is set, the conventional slice_segment_data () and coding_tree_unit () shown in FIG. 23 may be used.
 スライスデータ以下の処理は、タイルを独立に処理する従来のタイル符号化部2012、タイル復号部2002と同じである。しかし、符号化対象はオーバーラップ領域やクロップオフセット領域を含めた入力画像であるため、符号化処理では、ピクチャ分割部2010の処理内容が、実施形態1~6で説明した処理とは異なる。復号処理では、タイル合成部2003の処理内容が、実施形態1~6で説明した処理とは異なる。これらの処理について以下で説明する。 Processing following slice data is the same as the conventional tile encoding unit 2012 and tile decoding unit 2002 that process tiles independently. However, since the encoding target is an input image including an overlap region and a crop offset region, the processing content of the picture dividing unit 2010 is different from the processing described in Embodiments 1 to 6 in the encoding processing. In the decryption processing, the processing content of the tile composition unit 2003 is different from the processing described in the first to sixth embodiments. These processes will be described below.
 動画像符号化装置11では、ピクチャ分割部2010のタイル情報算出部20101は、ピクチャのサイズ(第1のピクチャサイズ)から、図26(a)に示すような重なりの無いタイルアクティブ領域の幅wAT[m]、高さhAT[n]、オーバーラップ領域の幅wOVLP、高さhOVLP、クロップオフセット領域の幅wCRP、高さhCRP、タイル有効領域の幅wT[m]、高さhT[n]、タイル符号化領域の幅wTile[m]、高さhTile[n]等を含めたタイル情報を算出する。 In the moving image encoding device 11, the tile information calculation unit 20101 of the picture dividing unit 2010 calculates the width wAT of the tile active area having no overlap as shown in FIG. 26 (a) from the picture size (first picture size). [m], height hAT [n], overlap area width wOVLP, height hOVLP, crop offset area width wCRP, height hCRP, tile effective area width wT [m], height hT [n], Tile information including the width wTile [m] and the height hTile [n] of the tile coding area is calculated.
 ピクチャ分割部2010のピクチャ分割部A20102は、タイル情報算出部20101で算出したタイル情報に従って、ピクチャをタイルアクティブ領域に分割し、その外側のオーバーラップ領域を含めたタイル有効領域Tile[m][n]の画素値を、(式TCS-2)で算出した(wPict,hPict)の領域を格納できるサイズ(第2のピクチャサイズ)のメモリにコピーする。なお、メモリのサイズは(wPict,hPict)を各々CTUの整数倍に拡大したサイズ(wPict+wCRP[M-1],hPict+hCRP[N-1])に設定してもよい。図36(c)に示すように、タイル有効領域Tile[m][n]の左上座標がCTUの整数倍の位置であって、かつ、タイル有効領域が重ならないように配置する。次に、ピクチャ分割部2010は、画素値を設定していないタイル有効領域の外側の領域(クロップオフセット領域)に画素値を設定する。設定する画素値は、クロップオフセット領域に接するタイル有効領域の画素値であってもよい。クロップオフセット領域の画素位置(x,y)の画素値vPic(x,y)は、タイル有効領域の画素値から下式で導出される。 The picture dividing unit A20102 of the picture dividing unit 2010 divides the picture into tile active areas according to the tile information calculated by the tile information calculating unit 20101, and includes the tile effective area Tile [m] [n ] Is copied to a memory of a size (second picture size) that can store the area of (wPict, hPict) calculated by (Expression TCS-2). The memory size may be set to a size (wPict + wCRP [M-1], hPict + hCRP [N-1]) obtained by expanding (wPict, hPict) to an integer multiple of CTU. As shown in FIG. 36 (c), the tile effective area Tile [m] [n] is arranged such that the upper left coordinate is an integer multiple of the CTU and the tile effective areas do not overlap. Next, the picture dividing unit 2010 sets pixel values in an area outside the tile effective area where no pixel value is set (crop offset area). The pixel value to be set may be a pixel value of a tile effective area that is in contact with the crop offset area. The pixel value vPic (x, y) at the pixel position (x, y) in the crop offset area is derived from the pixel value in the tile effective area by the following equation.
  vPic[x][y] = Tile[m][n][wT[m]-1][y] (wT[m]<x<wTile[m],0<=y<hT[n])
  vPic[x][y] = Tile[m][n][x][hT[n]-1] (0<=x<wT[m]<x<hT[n]<y<hTile[n])
  vPic[x][y] = Tile[m][n][wT[m]-1][hT[n]-1] (wT[m]<x<wTile[m],hT[n]<y<hTile[n])
あるいは所定の値、例えば(Y,Cb,CR)=(2^(NBIT-1),2^(NBIT-1),2^(NBIT-1))としてもよい。ここでNBITはピクチャの画素値のビット数である。ピクチャ分割部A20102は、このようにして生成した第2のピクチャサイズをもつ入力画像を、タイル符号化領域毎にタイル符号化部2012に出力する。タイル符号化部2012は各タイル符号化領域を符号化して、各タイル符号化領域の符号化ストリームを生成する。符号化ストリーム生成部2013は、各タイル符号化領域の符号化ストリームから、入力画像の符号化ストリームを生成する。
vPic [x] [y] = Tile [m] [n] [wT [m] -1] [y] (wT [m] <x <wTile [m], 0 <= y <hT [n])
vPic [x] [y] = Tile [m] [n] [x] [hT [n] -1] (0 <= x <wT [m] <x <hT [n] <y <hTile [n] )
vPic [x] [y] = Tile [m] [n] [wT [m] -1] [hT [n] -1] (wT [m] <x <wTile [m], hT [n] <y <hTile [n])
Alternatively, a predetermined value, for example, (Y, Cb, CR) = (2 ^ (NBIT-1), 2 ^ (NBIT-1), 2 ^ (NBIT-1)) may be used. Here, NBIT is the number of bits of the pixel value of the picture. The picture dividing unit A20102 outputs the input image having the second picture size generated in this way to the tile encoding unit 2012 for each tile encoding region. The tile encoding unit 2012 encodes each tile encoding area and generates an encoded stream of each tile encoding area. The encoded stream generation unit 2013 generates an encoded stream of an input image from the encoded stream of each tile encoding area.
 動画像復号装置31では、ヘッダ情報復号部2001が、入力された符号化ストリームから、タイル情報を含むヘッダ情報を復号し、タイル復号部2002に各タイル符号化領域の入力ストリームを出力する。タイル復号部2002は入力ストリームから各タイル符号化領域を復号し、タイル合成部2003に出力する。 In the moving picture decoding apparatus 31, the header information decoding unit 2001 decodes header information including tile information from the input encoded stream, and outputs an input stream of each tile encoding area to the tile decoding unit 2002. The tile decoding unit 2002 decodes each tile coding area from the input stream and outputs the decoded region to the tile synthesis unit 2003.
 平滑化処理部20031は、overlap_tiles_flagが1の場合は、タイル復号部2002で復号された各タイルのオーバーラップ領域を用いて、例えば(式FLT-1)~(式FLT-3)に示すフィルタ処理(平均化処理、重み付き平均化処理)を実施し、フィルタ処理したオーバーラップ領域の画素値(ここではtmp)を図36(c)に示すメモリに上書きする。例えば、Tile[0][0]の右端のオーバーラップ領域と、Tile[1][0]の左端のタイルアクティブ領域のフィルタ処理結果は、Tile[1][0]の左端のタイルアクティブ領域に上書きされ、Tile[0][0]の下端のオーバーラップ領域と、Tile[0][1]の上端のタイルアクティブ領域のフィルタ処理結果は、Tile[0][1]の上端のタイルアクティブ領域に上書きされる。 When the overlap_tiles_flag is 1, the smoothing processing unit 20031 uses the overlap region of each tile decoded by the tile decoding unit 2002, for example, filter processing shown in (Expression FLT-1) to (Expression FLT-3) (Averaging processing, weighted averaging processing) is performed, and the pixel value (in this case, tmp) of the overlapped region subjected to the filter processing is overwritten in the memory shown in FIG. For example, the filter processing result of the overlap area at the right end of Tile [0] [0] and the tile active area at the left end of Tile [1] [0] is displayed in the tile active area at the left end of Tile [1] [0]. Overwritten, the tiled active area at the top edge of Tile [0] [1] and the tile active area at the top edge of Tile [0] [1] Will be overwritten.
 合成部20032は、第2のピクチャサイズwPict*hPictのメモリ、あるいは(wPict+wCRP[M-1])*(hPict+hCRP[N-1})のメモリから、タイルアクティブ領域(wAT[m],hAT[n])を抽出し、オーバーラップしないように配置してもとのピクチャのサイズ(第1のピクチャサイズ)の復号画像を合成する。ここで、もとのピクチャのサイズとは、各タイルアクティブ領域の幅と高さの総和(ΣwAT[m],ΣhAT[n])であり、表示画像のサイズである。 The synthesizer 20032 uses the tile active area (wAT [m]) from the memory of the second picture size wPict * hPict or from the memory of (wPict + wCRP [M-1]) * (hPict + hCRP [N-1}). , hAT [n]) is extracted, and a decoded image having the original picture size (first picture size) is synthesized even if it is arranged so as not to overlap. Here, the original picture size is the sum of the width and height of each tile active area (ΣwAT [m], ΣhAT [n]), which is the size of the display image.
 このようにCTU単位でタイルを処理できるようにすることで、符号化処理と復号処理に、従来のタイル符号化処理、および、タイル復号処理を使用することが可能であり、符号化処理および復号処理の複雑度を下げることもできる。 By making it possible to process tiles in units of CTUs in this way, conventional tile encoding processing and tile decoding processing can be used for encoding processing and decoding processing, and encoding processing and decoding are possible. Processing complexity can also be reduced.
 本発明の一態様に係る動画像復号装置は、画像をタイルに分割し、タイル単位に動画像を復号する動画像復号装置であって、符号化ストリームからヘッダ情報を復号し、タイル情報を算出するヘッダ情報復号部と、タイル毎の符号化データを復号し、タイルの復号画像を生成するタイル復号部と、前記タイル情報を参照して前記タイルの復号画像を合成し表示画像を生成する合成部とを備え、前記タイルは、隣接するタイルとオーバーラップする領域を含み、前記合成部は、タイルのオーバーラップ領域において、各画素の複数の画素値をフィルタ処理し、前記タイルの復号画像の画素値とフィルタ処理した画素値を用いて表示画像を生成することを特徴とする。 A moving image decoding apparatus according to an aspect of the present invention is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information. A header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image The tile includes an area that overlaps with an adjacent tile, and the synthesis unit filters a plurality of pixel values of each pixel in the overlap area of the tile, and the decoded image of the tile A display image is generated using the pixel value and the filtered pixel value.
 本発明の一態様に係る動画像復号装置において、前記タイル復号部は、対象タイルの情報、および、対象タイルのコロケートタイルの情報のみを参照して、対象タイルを復号することを特徴とする。 In the video decoding device according to an aspect of the present invention, the tile decoding unit decodes the target tile with reference to only the information on the target tile and the information on the collocated tile of the target tile.
 本発明の一態様に係る動画像復号装置において、前記タイル情報は、タイルの個数、幅、高さ、隣接タイル間のオーバーラップの有無、および、タイルがオーバーラップする場合にオーバーラップ領域の幅と高さを含むことを特徴とする。 In the video decoding device according to an aspect of the present invention, the tile information includes the number of tiles, the width, the height, the presence / absence of overlap between adjacent tiles, and the width of the overlap region when the tiles overlap. And height.
 本発明の一態様に係る動画像復号装置において、前記タイルの左上座標は、CTUの整数倍の位置に限定されないことを特徴とする。 In the video decoding device according to an aspect of the present invention, the upper left coordinate of the tile is not limited to an integer multiple of the CTU.
 本発明の一態様に係る動画像復号装置において、前記タイルは、隣接するタイルとオーバーラップする領域と、クロップオフセット領域(タイル無効領域)を含み、オーバーラップする領域とクロップオフセット領域を含むタイルのサイズがCTUの整数倍であり、タイルの左上座標をCTUの整数倍の位置に限定することを特徴とする。 In the video decoding device according to an aspect of the present invention, the tile includes an area that overlaps an adjacent tile and a crop offset area (tile invalid area), and the tile includes an overlapping area and a crop offset area. The size is an integral multiple of the CTU, and the upper left coordinate of the tile is limited to a position that is an integral multiple of the CTU.
 本発明の一態様に係る動画像復号装置において、前記合成部のフィルタ処理は、複数個のオーバーラップ領域の画素値の単純平均であることを特徴とする。 In the moving picture decoding apparatus according to an aspect of the present invention, the filtering process of the synthesis unit is a simple average of pixel values of a plurality of overlap regions.
 本発明の一態様に係る動画像復号装置において、前記合成部のフィルタ処理は、複数個のオーバーラップ領域の画素値に対し、タイル境界からの距離に依存して重みを変更する重み付き和であることを特徴とする。 In the video decoding device according to an aspect of the present invention, the filtering process of the synthesis unit is a weighted sum that changes the weight depending on the distance from the tile boundary for the pixel values of the plurality of overlap regions. It is characterized by being.
 本発明の一態様に係る動画像符号化装置は、画像をタイルに分割し、タイル単位に動画像を符号化する動画像符号化装置であって、タイル情報を算出するタイル情報算出部と、画像をタイルに分割する分割部と、タイルを符号化し、符号化ストリームを生成するタイル符号化部とを備え、前記分割部は、オーバーラップを許して、タイルに分割することを特徴とする。 A moving image encoding apparatus according to an aspect of the present invention is a moving image encoding apparatus that divides an image into tiles and encodes a moving image in units of tiles, and includes a tile information calculation unit that calculates tile information; A division unit that divides an image into tiles and a tile encoding unit that encodes tiles and generates an encoded stream are provided, and the division unit divides the tiles by allowing overlap.
 本発明の一態様に係る動画像符号化装置において、前記タイル符号化部は、対象タイルの情報、および、対象タイルのコロケートタイルの情報のみを参照して、対象タイルを符号化することを特徴とする。 In the video encoding device according to an aspect of the present invention, the tile encoding unit encodes the target tile with reference to only the information on the target tile and the information on the collocated tile of the target tile. And
 本発明の一態様に係る動画像符号化装置において、前記タイル情報は、タイルの個数、幅、高さ、隣接タイル間のオーバーラップの有無、および、タイルがオーバーラップする場合はオーバーラップ領域の幅と高さを含むことを特徴とする。 In the video encoding apparatus according to an aspect of the present invention, the tile information includes the number of tiles, the width, the height, the presence / absence of overlap between adjacent tiles, and the overlap area when tiles overlap. Including width and height.
 本発明の一態様に係る動画像符号化装置において、前記分割部は、前記タイルの左上座標をCTUの整数倍の位置に限定せずに、画像をタイルに分割することを特徴とする。 In the moving picture coding apparatus according to an aspect of the present invention, the dividing unit divides an image into tiles without limiting the upper left coordinates of the tiles to a position that is an integral multiple of a CTU.
 本発明の一態様に係る動画像符号化装置において、
 前記分割部は、画像の右端および下端のタイルにおいて、画像の右端のタイルの幅、および、下端のタイルの高さがCTUの整数倍ではない場合、画像の右端と下端のタイルにクロップオフセット領域を設け、タイルとクロップオフセット領域を加算した幅と高さがCTUの整数倍になるように画像を分割することを特徴とする。
In the video encoding device according to one aspect of the present invention,
In the tiles at the right and bottom edges of the image, the division unit has a crop offset area on the right and bottom tiles of the image when the width of the tile at the right edge of the image and the height of the tile at the bottom edge are not integer multiples of CTU. And the image is divided so that the width and height obtained by adding the tile and the crop offset area are an integral multiple of the CTU.
 本発明の一態様に係る動画像符号化装置において、前記分割部は、画像を、隣接するタイルとオーバーラップする領域と、クロップオフセット領域を含むタイルに分割し、オーバーラップする領域とクロップオフセット領域を含むタイルのサイズがCTUの整数倍であり、タイルの左上座標をCTUの整数倍の位置に設定することを特徴とする。 In the video encoding device according to an aspect of the present invention, the dividing unit divides the image into tiles that overlap with adjacent tiles and tiles that include a crop offset region, and the overlapping region and the crop offset region. The size of the tile including is an integer multiple of the CTU, and the upper left coordinate of the tile is set at a position that is an integer multiple of the CTU.
 本発明の一態様に係る動画像復号装置は、画像をタイルに分割し、タイル単位に動画像を復号する動画像復号装置であって、符号化ストリームからヘッダ情報を復号し、タイル情報を算出するヘッダ情報復号部と、タイル毎の符号化データを復号し、タイルの復号画像を生成するタイル復号部と、前記タイル情報を参照して前記タイルの復号画像を合成し表示画像を生成する合成部とを備え、前記タイル情報にはタイル行毎に含まれるタイルの個数と幅に関する情報が含まれ、前記タイル行に含まれるタイルの個数は異なり、前記合成部は、少なくとも前記タイルの復号画像の画素値を用いて表示画像を生成することを特徴とする。 A moving image decoding apparatus according to an aspect of the present invention is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information. A header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image The tile information includes information on the number and width of tiles included in each tile row, the number of tiles included in the tile row is different, and the combining unit includes at least a decoded image of the tiles. A display image is generated by using the pixel value of.
 本発明の一態様に係る動画像符号化装置は、画像をタイルに分割し、タイル単位に動画像を符号化する動画像符号化装置であって、タイル情報を算出するタイル情報算出部と、前記タイル情報を含むヘッダ情報を符号化するヘッダ情報生成部と、画像をタイルに分割する分割部と、タイルを符号化し、符号化ストリームを生成するタイル符号化部とを備え、前記分割部は、タイル行に含まれるタイルの個数は異なるよう画像をタイルに分割し、前記タイル情報算出部は、タイル行ごとに含まれるタイルの個数と幅に関するタイル情報を算出し、前記ヘッダ情報生成部は、前記タイル情報のシンタックスを生成することを特徴とする。 A moving image encoding apparatus according to an aspect of the present invention is a moving image encoding apparatus that divides an image into tiles and encodes a moving image in units of tiles, and includes a tile information calculation unit that calculates tile information; A header information generation unit that encodes header information including the tile information; a division unit that divides an image into tiles; and a tile encoding unit that encodes a tile and generates an encoded stream. The tile information calculation unit divides the image into tiles so that the number of tiles included in the tile row is different, the tile information calculation unit calculates tile information regarding the number and width of tiles included in each tile row, and the header information generation unit The syntax of the tile information is generated.
 本発明の一態様に係る動画像復号装置は、画像を1つ以上のタイルからなるリージョンに分割し、リージョン単位に動画像を復号する動画像復号装置であって、符号化ストリームからヘッダ情報を復号し、リージョン情報とタイル情報を算出するヘッダ情報復号部と、タイル毎の符号化データを復号し、タイルの復号画像を生成するタイル復号部と、前記リージョン情報と前記タイル情報を参照して前記タイルの復号画像を合成し表示画像を生成する合成部とを備え、前記合成部は、少なくとも前記タイルの復号画像の画素値を用いて表示画像を生成することを特徴とする。 A moving image decoding apparatus according to an aspect of the present invention is a moving image decoding apparatus that divides an image into regions including one or more tiles and decodes a moving image in units of regions, and includes header information from an encoded stream. A header information decoding unit that decodes and calculates region information and tile information, a tile decoding unit that decodes encoded data for each tile and generates a decoded image of the tile, and refers to the region information and the tile information And a synthesis unit that synthesizes the decoded images of the tiles to generate a display image, and the synthesis unit generates a display image using at least pixel values of the decoded images of the tiles.
 本発明の一態様に係る動画像符号化装置は、画像を1つ以上のタイルからなるリージョンに分割し、リージョン単位に動画像を符号化する動画像符号化装置であって、リージョン情報(リージョンの個数、左上座標、幅と高さ、無効領域にセットする画素値等)を算出するリージョン情報算出部とタイル情報を算出するタイル情報算出部と、前記リージョン情報と前記タイル情報を含むヘッダ情報のシンタックスを生成するヘッダ情報生成部と、画像をリージョンに分割し、リージョンの左上座標を始点としてリージョンをタイルに分割する分割部と、タイルを符号化し、符号化ストリームを生成するタイル符号化部とを備えることを特徴とする。 A moving image encoding apparatus according to an aspect of the present invention is a moving image encoding apparatus that divides an image into regions each including one or more tiles, and encodes a moving image in units of regions. Region information calculation unit for calculating the number of images, upper left coordinates, width and height, pixel values set in the invalid area, tile information calculation unit for calculating tile information, and header information including the region information and tile information Header information generation unit that generates the syntax of, a division unit that divides an image into regions, the region is divided into tiles starting from the upper left coordinates of the region, and tile encoding that encodes tiles and generates an encoded stream And a section.
 本発明の一態様に係る動画像復号装置、および、動画像符号化装置において、前記リージョン情報は、各タイルが無効領域に含まれるか否かを通知するフラグを含むことを特徴とする。 In the video decoding device and the video encoding device according to an aspect of the present invention, the region information includes a flag for notifying whether or not each tile is included in an invalid area.
 本発明の一態様に係る動画像復号装置において、前記リージョン情報に含まれる前記フラグが、対象タイルが無効領域に含まれることを示す場合、前記タイル復号部は対象タイルを復号しないことを特徴とする。 In the video decoding device according to an aspect of the present invention, when the flag included in the region information indicates that the target tile is included in an invalid area, the tile decoding unit does not decode the target tile, To do.
 本発明の一態様に係る動画像復号装置において、前記タイル復号部は、対象タイルおよび対象タイルのコロケートタイルと、同じリージョンに含まれるタイルの情報のみを参照して、対象タイルを復号することを特徴とする。 In the video decoding device according to one aspect of the present invention, the tile decoding unit decodes the target tile with reference to only information on the target tile, the collocated tile of the target tile, and tile information included in the same region. Features.
 本発明の一態様に係る動画像符号化装置において、前記タイル符号化部は、対象タイルおよび対象タイルのコロケートタイルと、同じリージョンに含まれるタイルの情報のみを参照して、対象タイルを符号化することを特徴とする。 In the video encoding device according to an aspect of the present invention, the tile encoding unit encodes the target tile with reference to only information on the target tile, the collocated tile of the target tile, and tile information included in the same region. It is characterized by doing.
 本発明の一態様に係る動画像復号装置は、画像をタイルに分割し、タイル単位に動画像を復号する動画像復号装置であって、符号化ストリームからヘッダ情報を復号し、タイル情報を算出するヘッダ情報復号部と、タイル毎の符号化データを復号し、タイルの復号画像を生成するタイル復号部と、前記タイル情報を参照して前記タイルの復号画像を合成し表示画像を生成する合成部とを備え、前記タイルは、ピクチャを重複することなく分割する単位であるタイルアクティブ領域と隠れている領域(タイル拡張領域)から構成され、前記タイルアクティブ領域に前記タイル拡張領域を加えた領域を、CTU単位で復号することを特徴とする。 A moving image decoding apparatus according to an aspect of the present invention is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information. A header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image The tile is composed of a tile active area that is a unit for dividing a picture without overlapping and a hidden area (tile extension area), and the tile active area is added to the tile active area. Is decoded in units of CTUs.
 本発明の一態様に係る動画像復号装置において、前記タイル拡張領域は、隣接タイルのタイルアクティブ領域とオーバーラップし、参照、復号に利用されるオーバーラップ領域と、参照、復号しないクロップオフセット領域(タイル無効領域)から構成されることを特徴とする。 In the moving picture decoding apparatus according to an aspect of the present invention, the tile extension area overlaps with a tile active area of an adjacent tile, an overlap area used for reference and decoding, and a crop offset area that does not reference and decode ( Tile invalid area).
 本発明の一態様に係る動画像復号装置において、前記タイルアクティブ領域と前記オーバーラップ領域を足したサイズは、CTUサイズの整数倍でなく、タイルの左上座標はCTUの整数倍の位置に限定されないことを特徴とする。 In the video decoding device according to an aspect of the present invention, the size of the tile active area and the overlap area is not an integer multiple of the CTU size, and the upper left coordinate of the tile is not limited to an integer multiple of the CTU. It is characterized by that.
 本発明の一態様に係る動画像復号装置は、画像をタイルに分割し、タイル単位に動画像を復号する動画像復号装置であって、符号化ストリームからヘッダ情報を復号し、タイル情報を算出するヘッダ情報復号部と、タイル毎の符号化データを復号し、タイルの復号画像を生成するタイル復号部と、前記タイル情報を参照して前記タイルの復号画像を合成し表示画像を生成する合成部とを備え、前記タイルは、復号・出力に利用されるタイル有効領域と、復号・出力に利用されないクロップオフセット領域(タイル無効領域)から構成され、前記タイル有効領域は、ピクチャを分割する単位であるタイルアクティブ領域と、隣接タイルのタイルアクティブ領域とオーバーラップし、参照、復号に利用されるオーバーラップ領域から構成され、前記タイル有効領域をCTU単位で復号することを特徴とする。 A moving image decoding apparatus according to an aspect of the present invention is a moving image decoding apparatus that divides an image into tiles and decodes moving images in tile units, decodes header information from an encoded stream, and calculates tile information. A header information decoding unit that decodes encoded data for each tile, generates a decoded image of the tile, and combines the decoded image of the tile with reference to the tile information to generate a display image The tile is composed of a tile effective area used for decoding / output and a crop offset area (tile invalid area) not used for decoding / output, and the tile effective area is a unit for dividing a picture It is composed of the tile active area that overlaps with the tile active area of the adjacent tile and is used for reference and decoding, The serial tile effective area, characterized in that decoding by CTU units.
 本発明の一態様に係る動画像復号装置において、前記タイル有効領域と前記クロップオフセット領域を足したサイズは、CTUサイズの整数倍でなく、タイルの左上座標はCTUの整数倍の位置に限定されないことを特徴とする。 In the video decoding device according to an aspect of the present invention, the size of the tile effective area and the crop offset area is not an integer multiple of the CTU size, and the upper left coordinate of the tile is not limited to an integer multiple of the CTU. It is characterized by that.
 本発明の一態様に係る動画像復号装置において、前記タイル復号部は、対象タイルの情報、および、対象タイルのコロケートタイルの情報のみを参照して、対象タイルを復号することを特徴とする。 In the video decoding device according to an aspect of the present invention, the tile decoding unit decodes the target tile with reference to only the information on the target tile and the information on the collocated tile of the target tile.
 本発明の一態様に係る動画像復号装置において、前記タイル情報は、タイルの個数、幅、高さ、隣接タイル間のオーバーラップの有無、および、タイルがオーバーラップする場合にオーバーラップ領域の幅と高さを含むことを特徴とする。 In the video decoding device according to an aspect of the present invention, the tile information includes the number of tiles, the width, the height, the presence / absence of overlap between adjacent tiles, and the width of the overlap region when the tiles overlap. And height.
 本発明の一態様に係る動画像復号装置において、前記合成部は、複数個のオーバーラップ領域の画素値の単純平均を用いてフィルタ処理することを特徴とする。 In the moving picture decoding apparatus according to an aspect of the present invention, the synthesis unit performs filtering using a simple average of pixel values of a plurality of overlapping regions.
 本発明の一態様に係る動画像復号装置において、前記合成部は、複数個のオーバーラップ領域の画素値に対し、タイル境界からの距離に依存して重みを変更する重み付き和を用いてフィルタ処理することを特徴とする。 In the video decoding device according to an aspect of the present invention, the synthesis unit performs filtering using a weighted sum that changes a weight depending on a distance from a tile boundary with respect to pixel values of a plurality of overlap regions. It is characterized by processing.
 本発明の一態様に係る動画像符号化装置は、画像をタイルに分割し、タイル単位に動画像を符号化する動画像符号化装置であって、タイル情報を算出するタイル情報算出部と、画像をタイルに分割する分割部と、タイルを符号化し、符号化ストリームを生成するタイル符号化部とを備え、前記タイルは、ピクチャを重複することなく分割する単位であるタイルアクティブ領域と隠れている領域(タイル拡張領域)から構成され、前記タイルアクティブ領域に前記タイル拡張領域を加えた領域を、CTU単位で符号化することを特徴とする。 A moving image encoding apparatus according to an aspect of the present invention is a moving image encoding apparatus that divides an image into tiles and encodes a moving image in units of tiles, and includes a tile information calculation unit that calculates tile information; The image processing apparatus includes a division unit that divides an image into tiles, and a tile encoding unit that encodes tiles and generates an encoded stream. The tiles are hidden from a tile active area that is a unit for dividing a picture without overlapping. A region obtained by adding the tile extension region to the tile active region is encoded in units of CTUs.
 本発明の一態様に係る動画像符号化装置において、前記タイル拡張領域は、隣接タイルのタイルアクティブ領域とオーバーラップし、参照、符号化に利用されるオーバーラップ領域と、参照、符号化しないクロップオフセット領域(タイル無効領域)から構成されることを特徴とする。 In the video encoding device according to an aspect of the present invention, the tile extension area overlaps with a tile active area of an adjacent tile, an overlap area used for reference or encoding, and a crop that is not referenced or encoded. It is composed of an offset area (tile invalid area).
 本発明の一態様に係る動画像符号化装置において、前記タイルアクティブ領域と前記オーバーラップ領域を足したサイズは、CTUサイズの整数倍でなく、タイルの左上座標はCTUの整数倍の位置に限定されないことを特徴とする。 In the video encoding device according to an aspect of the present invention, the size of the tile active area and the overlap area is not an integral multiple of the CTU size, and the upper left coordinate of the tile is limited to an integer multiple of the CTU. It is characterized by not being.
 本発明の一態様に係る動画像符号化装置は、画像をタイルに分割し、タイル単位に動画像を符号化する動画像符号化装置であって、タイル情報を算出するタイル情報算出部と、画像をタイルに分割する分割部と、タイルを符号化し、符号化ストリームを生成するタイル符号化部とを備え、前記タイルは、符号化・出力に利用されるタイル有効領域と、符号化・出力に利用されないクロップオフセット領域(タイル無効領域)から構成され、前記タイル有効領域は、ピクチャを分割する単位であるタイルアクティブ領域と、隣接タイルのタイルアクティブ領域とオーバーラップし、参照、符号化に利用されるオーバーラップ領域から構成され、前記タイル有効領域をCTU符号化で符号化することを特徴とする。 A moving image encoding apparatus according to an aspect of the present invention is a moving image encoding apparatus that divides an image into tiles and encodes a moving image in units of tiles, and includes a tile information calculation unit that calculates tile information; The image processing apparatus includes a division unit that divides an image into tiles, and a tile encoding unit that encodes tiles and generates an encoded stream. The tile includes a tile effective area used for encoding and output, and encoding and output. It is composed of crop offset areas (tile invalid areas) that are not used for the tile. The tile valid areas overlap with the tile active areas that are units for dividing a picture and the tile active areas of adjacent tiles, and are used for reference and encoding. The tile effective area is encoded by CTU encoding.
 本発明の一態様に係る動画像符号化装置において、前記タイル有効領域と前記クロップオフセット領域を足したサイズは、CTUサイズの整数倍でなく、タイルの左上座標はCTUの整数倍の位置に限定されないことを特徴とする。 In the video encoding device according to an aspect of the present invention, the size obtained by adding the tile effective area and the crop offset area is not an integral multiple of the CTU size, and the upper left coordinate of the tile is limited to a position that is an integral multiple of the CTU. It is characterized by not being.
 本発明の一態様に係る動画像符号化装置において、前記タイル符号化部は、対象タイルの情報、および、対象タイルのコロケートタイルの情報のみを参照して、対象タイルを符号化することを特徴とする。 In the video encoding device according to an aspect of the present invention, the tile encoding unit encodes the target tile with reference to only the information on the target tile and the information on the collocated tile of the target tile. And
 本発明の一態様に係る動画像符号化装置において、前記タイル情報は、タイルの個数、幅、高さ、隣接タイル間のオーバーラップの有無、および、タイルがオーバーラップする場合にオーバーラップ領域の幅と高さを含むことを特徴とする。 In the video encoding device according to an aspect of the present invention, the tile information includes the number of tiles, the width, the height, the presence / absence of overlap between adjacent tiles, and the overlap area when the tiles overlap. Including width and height.
 本発明の一態様に係る動画像復号装置は、画像を1つ以上のタイルからなるリージョンに分割し、リージョン単位に動画像を復号する動画像復号装置であって、
 符号化ストリームからヘッダ情報を復号し、リージョン情報とタイル情報を算出するヘッダ情報復号部と、タイル毎の符号化データを復号し、タイルの復号画像を生成するタイル復号部と、前記リージョン情報と前記タイル情報を参照して前記タイルの復号画像を合成し表示画像を生成する合成部とを備え、前記リージョンのサイズはCTUサイズの整数倍でなく、左上座標はCTUの整数倍の位置に限定されないことを特徴とする。
A video decoding device according to an aspect of the present invention is a video decoding device that divides an image into regions each including one or more tiles, and decodes the video in units of regions.
A header information decoding unit that decodes header information from the encoded stream and calculates region information and tile information; a tile decoding unit that decodes encoded data for each tile and generates a decoded image of the tile; and the region information And a combining unit that combines the decoded image of the tile with reference to the tile information to generate a display image, and the size of the region is not an integer multiple of the CTU size, and the upper left coordinate is limited to a position that is an integer multiple of the CTU. It is characterized by not being.
 本発明の一態様に係る動画像復号装置において、前記タイルは、リージョンとリージョンの外側の表示されない領域(ガードバンド)を合わせた矩形領域を分割した領域であることを特徴とする。 In the video decoding device according to an aspect of the present invention, the tile is a region obtained by dividing a rectangular region including a region and a region (guard band) that is not displayed outside the region.
 本発明の一態様に係る動画像符号化装置は、画像を1つ以上のタイルからなるリージョンに分割し、リージョン単位に動画像を符号化する動画像符号化装置であって、リージョン情報(リージョンの個数、左上座標、幅と高さ、無効領域にセットする画素値等)を算出するリージョン情報算出部と、タイル情報を算出するタイル情報算出部と、前記リージョン情報と前記タイル情報を含むヘッダ情報のシンタックスを生成するヘッダ情報生成部と、画像をリージョンに分割し、リージョンの左上座標を始点としてリージョンをタイルに分割する分割部と、タイルを符号化し、符号化ストリームを生成するタイル符号化部とを備え、前記リージョンのサイズはCTUサイズの整数倍でなく、左上座標はCTUの整数倍の位置に限定されないことを特徴とする。 A moving image encoding apparatus according to an aspect of the present invention is a moving image encoding apparatus that divides an image into regions each including one or more tiles, and encodes a moving image in units of regions. Region information calculation unit for calculating the number of images, upper left coordinates, width and height, pixel values to be set in the invalid area, etc., a tile information calculation unit for calculating tile information, and a header including the region information and the tile information A header information generation unit that generates information syntax, a division unit that divides an image into regions, divides the region into tiles starting from the upper left coordinates of the region, and a tile code that encodes tiles and generates an encoded stream And the region size is not an integral multiple of the CTU size, and the upper left coordinate is not limited to a position that is an integral multiple of the CTU. To.
 本発明の一態様に係る動画像符号化装置において、前記分割部は、リージョンとリージョンの外側の表示されない領域(ガードバンド)を合わせた矩形領域をタイルに分割することを特徴とする。 In the moving picture encoding apparatus according to an aspect of the present invention, the dividing unit divides a rectangular area, which is a combination of a region and a non-display area (guard band) outside the region, into tiles.
 本発明の一態様に係る動画像復号装置、および、動画像符号化装置において、前記リージョン情報は、各タイルが無効領域に含まれるか否かを通知するフラグを含むことを特徴とする。 In the video decoding device and the video encoding device according to an aspect of the present invention, the region information includes a flag for notifying whether or not each tile is included in an invalid area.
 本発明の一態様に係る動画像復号装置において、前記リージョン情報に含まれる前記フラグが、対象タイルが無効領域に含まれることを示す場合、前記タイル復号部は対象タイルを復号しないことを特徴とする。 In the video decoding device according to an aspect of the present invention, when the flag included in the region information indicates that the target tile is included in an invalid area, the tile decoding unit does not decode the target tile, To do.
 本発明の一態様に係る動画像復号装置において、前記タイル復号部は、対象タイルおよび対象タイルのコロケートタイルと、同じリージョンに含まれるタイルの情報のみを参照して、対象タイルを復号することを特徴とする。 In the video decoding device according to one aspect of the present invention, the tile decoding unit decodes the target tile with reference to only information on the target tile, the collocated tile of the target tile, and tile information included in the same region. Features.
 本発明の一態様に係る動画像符号化装置において、前記タイル符号化部は、対象タイルおよび対象タイルのコロケートタイルと、同じリージョンに含まれるタイルの情報のみを参照して、対象タイルを符号化することを特徴とする。 In the video encoding device according to an aspect of the present invention, the tile encoding unit encodes the target tile with reference to only information on the target tile, the collocated tile of the target tile, and tile information included in the same region. It is characterized by doing.
 本発明の一態様に係る動画像復号装置は、画像をタイル(タイル符号化領域)に分割し、タイル符号化領域単位に動画像を復号する動画像復号装置であって、符号化ストリームからヘッダ情報を復号し、タイル情報を算出するヘッダ情報復号部と、タイル毎の符号化データを復号し、タイル符号化領域の復号画像を生成するタイル復号部と、前記タイル情報を参照して前記タイル符号化領域の復号画像を合成し表示画像を生成する合成部とを備え、前記タイル符号化領域は、タイルアクティブ領域、オーバーラップ領域、クロップオフセット領域から構成され、タイルアクティブ領域は第1のピクチャを重複することなく分割する単位であり、クロップオフセット領域は、タイル符号化領域のサイズをCTUの整数倍に設定するための符号化処理に関係しない無効領域であり、タイル符号化領域の左上座標はCTUの整数倍の位置、タイル符号化領域のサイズはCTUの整数倍に設定されることを特徴とする。 A moving picture decoding apparatus according to an aspect of the present invention is a moving picture decoding apparatus that divides an image into tiles (tile coding areas) and decodes moving pictures in units of tile coding areas. A header information decoding unit that decodes information and calculates tile information; a tile decoding unit that decodes encoded data for each tile and generates a decoded image of a tile encoding region; and the tile information with reference to the tile information. And a synthesizing unit that generates a display image by synthesizing a decoded image of the encoding area, and the tile encoding area includes a tile active area, an overlap area, and a crop offset area, and the tile active area is a first picture. The crop offset area is an encoding process for setting the size of the tile encoding area to an integer multiple of the CTU. Are invalid regions that are not related to, the upper left coordinates of the tile coding region position of integral multiple of the CTU, the size of the tile coding region characterized in that it is set to an integral multiple of the CTU.
 本発明の一態様に係る動画像符号化装置は、第1のピクチャから、タイル(タイル符号化領域)をオーバーラップすることなく配置した第2のピクチャを生成し、前記タイル符号化領域毎に符号化する動画像符号化装置であって、第2のピクチャのサイズ(第2のピクチャサイズ)、およびタイル情報(タイルアクティブ領域、オーバーラップ領域、クロップオフセット領域のサイズ)を算出するタイル情報算出部と、前記タイル情報に従って、第1のピクチャを分割したタイルアクティブ領域と、その外側のオーバーラップ領域とクロップオフセット領域から構成される第2のピクチャをタイル符号化領域に分割するピクチャ分割部と、タイル符号化領域を符号化して符号化ストリームを生成するタイル符号化部とを備え、タイルアクティブ領域は第1のピクチャをオーバーラップすることなく分割する単位であり、クロップオフセット領域は、タイル符号化領域のサイズをCTUの整数倍に設定するための符号化処理に関係しない無効領域であり、第2のピクチャのサイズは、タイルアクティブ領域、タイルオーバーラップ領域、クロップオフセット領域を加算して算出し、第2のピクチャ上で、タイル符号化領域の左上座標はCTUの整数倍の位置に設定され、タイル符号化領域のサイズはCTUの整数倍であることを特徴とする。 The moving picture coding apparatus according to an aspect of the present invention generates a second picture in which tiles (tile coding areas) are arranged without overlapping from the first picture, and each tile coding area Tile information calculation for calculating a second picture size (second picture size) and tile information (tile active area, overlap area, crop offset area size) And a picture dividing unit that divides a tile active area obtained by dividing the first picture according to the tile information, and a second picture composed of an overlap area and a crop offset area outside the tile active area. A tile encoding unit that encodes the tile encoding area and generates an encoded stream. The block area is a unit for dividing the first picture without overlapping, and the crop offset area is an invalid area not related to the encoding process for setting the size of the tile encoding area to an integral multiple of the CTU. The size of the second picture is calculated by adding the tile active area, the tile overlap area, and the crop offset area, and the upper left coordinate of the tile coding area is an integer multiple of the CTU on the second picture. The tile coding area size is set to be an integral multiple of the CTU.
  (ソフトウェアによる実現例)
 なお、上述した実施形態におけるタイル符号化部2012、タイル復号部2002の一部、例えば、エントロピー復号部301、予測パラメータ復号部302、ループフィルタ305、予測画像生成部308、逆量子化・逆変換部311、加算部312、予測画像生成部101、減算部102、変換・量子化部103、エントロピー符号化部104、逆量子化・逆変換部105、ループフィルタ107、符号化パラメータ決定部110、予測パラメータ符号化部111をコンピュータで実現するようにしても良い。その場合、この制御機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピュータシステム」とは、タイル符号化部2012、タイル復号部2002のいずれかに内蔵されたコンピュータシステムであって、OSや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ROM、CD-ROM等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。
(Example of software implementation)
In addition, a part of the tile encoding unit 2012 and the tile decoding unit 2002 in the above-described embodiment, for example, the entropy decoding unit 301, the prediction parameter decoding unit 302, the loop filter 305, the predicted image generation unit 308, the inverse quantization / inverse transformation. Unit 311, addition unit 312, predicted image generation unit 101, subtraction unit 102, transform / quantization unit 103, entropy encoding unit 104, inverse quantization / inverse transform unit 105, loop filter 107, encoding parameter determination unit 110, The prediction parameter encoding unit 111 may be realized by a computer. In that case, the program for realizing the control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed. Here, the “computer system” is a computer system built in either the tile encoding unit 2012 or the tile decoding unit 2002, and includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a hard disk built in a computer system. Furthermore, the “computer-readable recording medium” is a medium that dynamically holds a program for a short time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, In such a case, a volatile memory inside a computer system serving as a server or a client may be included and a program that holds a program for a certain period of time. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.
 また、上述した実施形態における動画像符号化装置11、動画像復号装置31の一部、または全部を、LSI(Large Scale Integration)等の集積回路として実現しても良い。動画像符号化装置11、動画像復号装置31の各機能ブロックは個別にプロセッサ化しても良いし、一部、または全部を集積してプロセッサ化しても良い。また、集積回路化の手法はLSIに限らず専用回路、または汎用プロセッサで実現しても良い。また、半導体技術の進歩によりLSIに代替する集積回路化の技術が出現した場合、当該技術による集積回路を用いても良い。 Further, a part or all of the moving image encoding device 11 and the moving image decoding device 31 in the above-described embodiment may be realized as an integrated circuit such as an LSI (Large Scale Integration). Each functional block of the moving image encoding device 11 and the moving image decoding device 31 may be individually made into a processor, or a part or all of them may be integrated into a processor. Further, the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Further, in the case where an integrated circuit technology that replaces LSI appears due to progress in semiconductor technology, an integrated circuit based on the technology may be used.
 以上、図面を参照してこの発明の一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 As described above, the embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the scope of the present invention. It is possible to
 〔応用例〕
 上述した動画像符号化装置11及び動画像復号装置31は、動画像の送信、受信、記録、再生を行う各種装置に搭載して利用することができる。なお、動画像は、カメラ等により撮像された自然動画像であってもよいし、コンピュータ等により生成された人工動画像(CGおよびGUIを含む)であってもよい。
[Application example]
The moving image encoding device 11 and the moving image decoding device 31 described above can be used by being mounted on various devices that perform transmission, reception, recording, and reproduction of moving images. The moving image may be a natural moving image captured by a camera or the like, or an artificial moving image (including CG and GUI) generated by a computer or the like.
 まず、上述した動画像符号化装置11及び動画像復号装置31を、動画像の送信及び受信に利用できることを、図38を参照して説明する。 First, it will be described with reference to FIG. 38 that the above-described moving image encoding device 11 and moving image decoding device 31 can be used for transmitting and receiving moving images.
 図38の(a)は、動画像符号化装置11を搭載した送信装置PROD_Aの構成を示したブロック図である。図38の(a)に示すように、送信装置PROD_Aは、動画像を符号化することによって符号化データを得る符号化部PROD_A1と、符号化部PROD_A1が得た符号化データで搬送波を変調することによって変調信号を得る変調部PROD_A2と、変調部PROD_A2が得た変調信号を送信する送信部PROD_A3と、を備えている。上述した動画像符号化装置11は、この符号化部PROD_A1として利用される。 FIG. 38 (a) is a block diagram showing a configuration of a transmission apparatus PROD_A in which the moving picture encoding apparatus 11 is mounted. As illustrated in (a) of FIG. 38, the transmission device PROD_A modulates a carrier wave with an encoding unit PROD_A1 that obtains encoded data by encoding a moving image, and with the encoded data obtained by the encoding unit PROD_A1. Thus, a modulation unit PROD_A2 that obtains a modulation signal and a transmission unit PROD_A3 that transmits the modulation signal obtained by the modulation unit PROD_A2 are provided. The moving image encoding device 11 described above is used as the encoding unit PROD_A1.
 送信装置PROD_Aは、符号化部PROD_A1に入力する動画像の供給源として、動画像を撮像するカメラPROD_A4、動画像を記録した記録媒体PROD_A5、動画像を外部から入力するための入力端子PROD_A6、及び、画像を生成または加工する画像処理部PRED_A7を更に備えていてもよい。図38の(a)においては、これら全てを送信装置PROD_Aが備えた構成を例示しているが、一部を省略しても構わない。 Transmission device PROD_A, as a source of moving images to be input to the encoding unit PROD_A1, a camera PROD_A4 that captures moving images, a recording medium PROD_A5 that records moving images, an input terminal PROD_A6 for inputting moving images from the outside, and An image processing unit PRED_A7 that generates or processes an image may be further provided. In FIG. 38A, a configuration in which all of these are provided in the transmission device PROD_A is illustrated, but a part may be omitted.
 なお、記録媒体PROD_A5は、符号化されていない動画像を記録したものであってもよいし、伝送用の符号化方式とは異なる記録用の符号化方式で符号化された動画像を記録したものであってもよい。後者の場合、記録媒体PROD_A5と符号化部PROD_A1との間に、記録媒体PROD_A5から読み出した符号化データを記録用の符号化方式に従って復号する復号部(不図示)を介在させるとよい。 Note that the recording medium PROD_A5 may be a recording of a non-encoded moving image, or a recording of a moving image encoded by a recording encoding scheme different from the transmission encoding scheme. It may be a thing. In the latter case, a decoding unit (not shown) for decoding the encoded data read from the recording medium PROD_A5 in accordance with the recording encoding method may be interposed between the recording medium PROD_A5 and the encoding unit PROD_A1.
 図38の(b)は、動画像復号装置31を搭載した受信装置PROD_Bの構成を示したブロック図である。図38の(b)に示すように、受信装置PROD_Bは、変調信号を受信する受信部PROD_B1と、受信部PROD_B1が受信した変調信号を復調することによって符号化データを得る復調部PROD_B2と、復調部PROD_B2が得た符号化データを復号することによって動画像を得る復号部PROD_B3と、を備えている。上述した動画像復号装置31は、この復号部PROD_B3として利用される。 (B) of FIG. 38 is a block diagram illustrating a configuration of the receiving device PROD_B in which the moving image decoding device 31 is mounted. As shown in FIG. 38 (b), the receiving device PROD_B includes a receiving unit PROD_B1 that receives a modulated signal, a demodulating unit PROD_B2 that obtains encoded data by demodulating the modulated signal received by the receiving unit PROD_B1, and a demodulator A decoding unit PROD_B3 that obtains a moving image by decoding the encoded data obtained by the unit PROD_B2. The moving picture decoding apparatus 31 described above is used as the decoding unit PROD_B3.
 受信装置PROD_Bは、復号部PROD_B3が出力する動画像の供給先として、動画像を表示するディスプレイPROD_B4、動画像を記録するための記録媒体PROD_B5、及び、動画像を外部に出力するための出力端子PROD_B6を更に備えていてもよい。図38の(b)においては、これら全てを受信装置PROD_Bが備えた構成を例示しているが、一部を省略しても構わない。 The receiving device PROD_B is a display destination PROD_B4 for displaying a moving image, a recording medium PROD_B5 for recording a moving image, and an output terminal for outputting the moving image to the outside as a supply destination of the moving image output by the decoding unit PROD_B3 PROD_B6 may be further provided. In FIG. 38 (b), the configuration in which all of these are provided in the receiving device PROD_B is illustrated, but a part may be omitted.
 なお、記録媒体PROD_B5は、符号化されていない動画像を記録するためのものであってもよいし、伝送用の符号化方式とは異なる記録用の符号化方式で符号化されたものであってもよい。後者の場合、復号部PROD_B3と記録媒体PROD_B5との間に、復号部PROD_B3から取得した動画像を記録用の符号化方式に従って符号化する符号化部(不図示)を介在させるとよい。 Note that the recording medium PROD_B5 may be used for recording a non-encoded moving image, or is encoded using a recording encoding method different from the transmission encoding method. May be. In the latter case, an encoding unit (not shown) for encoding the moving image acquired from the decoding unit PROD_B3 according to the recording encoding method may be interposed between the decoding unit PROD_B3 and the recording medium PROD_B5.
 なお、変調信号を伝送する伝送媒体は、無線であってもよいし、有線であってもよい。また、変調信号を伝送する伝送態様は、放送(ここでは、送信先が予め特定されていない送信態様を指す)であってもよいし、通信(ここでは、送信先が予め特定されている送信態様を指す)であってもよい。すなわち、変調信号の伝送は、無線放送、有線放送、無線通信、及び有線通信の何れによって実現してもよい。 Note that the transmission medium for transmitting the modulation signal may be wireless or wired. Further, the transmission mode for transmitting the modulated signal may be broadcasting (here, a transmission mode in which the transmission destination is not specified in advance) or communication (here, transmission in which the transmission destination is specified in advance). Refers to the embodiment). That is, the transmission of the modulation signal may be realized by any of wireless broadcasting, wired broadcasting, wireless communication, and wired communication.
 例えば、地上デジタル放送の放送局(放送設備など)/受信局(テレビジョン受像機など)は、変調信号を無線放送で送受信する送信装置PROD_A/受信装置PROD_Bの一例である。また、ケーブルテレビ放送の放送局(放送設備など)/受信局(テレビジョン受像機など)は、変調信号を有線放送で送受信する送信装置PROD_A/受信装置PROD_Bの一例である。 For example, a terrestrial digital broadcast broadcasting station (broadcasting equipment, etc.) / Receiving station (such as a television receiver) is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives modulated signals by wireless broadcasting. A broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) of cable television broadcasting is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives a modulated signal by cable broadcasting.
 また、インターネットを用いたVOD(Video On Demand)サービスや動画共有サービスなどのサーバ(ワークステーションなど)/クライアント(テレビジョン受像機、パーソナルコンピュータ、スマートフォンなど)は、変調信号を通信で送受信する送信装置PROD_A/受信装置PROD_Bの一例である(通常、LANにおいては伝送媒体として無線または有線の何れかが用いられ、WANにおいては伝送媒体として有線が用いられる)。ここで、パーソナルコンピュータには、デスクトップ型PC、ラップトップ型PC、及びタブレット型PCが含まれる。また、スマートフォンには、多機能携帯電話端末も含まれる。 In addition, a server (workstation, etc.) / Client (television receiver, personal computer, smartphone, etc.) such as a VOD (Video On Demand) service or a video sharing service using the Internet is a transmission device that transmits and receives modulated signals via communication. This is an example of PROD_A / receiving device PROD_B (normally, either a wireless or wired transmission medium is used in a LAN, and a wired transmission medium is used in a WAN). Here, the personal computer includes a desktop PC, a laptop PC, and a tablet PC. The smartphone also includes a multi-function mobile phone terminal.
 なお、動画共有サービスのクライアントは、サーバからダウンロードした符号化データを復号してディスプレイに表示する機能に加え、カメラで撮像した動画像を符号化してサーバにアップロードする機能を有している。すなわち、動画共有サービスのクライアントは、送信装置PROD_A及び受信装置PROD_Bの双方として機能する。 In addition to the function of decoding the encoded data downloaded from the server and displaying it on the display, the video sharing service client has a function of encoding a moving image captured by the camera and uploading it to the server. That is, the client of the video sharing service functions as both the transmission device PROD_A and the reception device PROD_B.
 次に、上述した動画像符号化装置11及び動画像復号装置31を、動画像の記録及び再生に利用できることを、図39を参照して説明する。 Next, it will be described with reference to FIG. 39 that the above-described moving image encoding device 11 and moving image decoding device 31 can be used for recording and reproduction of moving images.
 図39の(a)は、上述した動画像符号化装置11を搭載した記録装置PROD_Cの構成を示したブロック図である。図39の(a)に示すように、記録装置PROD_Cは、動画像を符号化することによって符号化データを得る符号化部PROD_C1と、符号化部PROD_C1が得た符号化データを記録媒体PROD_Mに書き込む書込部PROD_C2と、を備えている。上述した動画像符号化装置11は、この符号化部PROD_C1として利用される。 FIG. 39 (a) is a block diagram showing a configuration of a recording apparatus PROD_C equipped with the moving picture encoding apparatus 11 described above. As shown in (a) of FIG. 39, the recording device PROD_C includes an encoding unit PROD_C1 that obtains encoded data by encoding a moving image, and the encoded data obtained by the encoding unit PROD_C1 on the recording medium PROD_M. A writing unit PROD_C2 for writing. The moving image encoding device 11 described above is used as the encoding unit PROD_C1.
 なお、記録媒体PROD_Mは、(1)HDD(Hard Disk Drive)やSSD(Solid State Drive)などのように、記録装置PROD_Cに内蔵されるタイプのものであってもよいし、(2)SDメモリカードやUSB(Universal Serial Bus)フラッシュメモリなどのように、記録装置PROD_Cに接続されるタイプのものであってもよいし、(3)DVD(Digital Versatile Disc)やBD(Blu-ray(登録商標)Disc:登録商標)などのように、記録装置PROD_Cに内蔵されたドライブ装置(不図示)に装填されるものであってもよい。 The recording medium PROD_M may be of a type built into the recording device PROD_C, such as (1) HDD (Hard Disk Drive) or SSD (Solid State Drive), or (2) SD memory. It may be of the type connected to the recording device PROD_C, such as a card or USB (Universal Serial Bus) flash memory, or (3) DVD (Digital Versatile Disc) or BD (Blu-ray (registered trademark)) ) Disc: registered trademark) or the like, it may be loaded into a drive device (not shown) built in the recording device PROD_C.
 また、記録装置PROD_Cは、符号化部PROD_C1に入力する動画像の供給源として、動画像を撮像するカメラPROD_C3、動画像を外部から入力するための入力端子PROD_C4、動画像を受信するための受信部PROD_C5、及び、画像を生成または加工する画像処理部PROD_C6を更に備えていてもよい。図39の(a)においては、これら全てを記録装置PROD_Cが備えた構成を例示しているが、一部を省略しても構わない。 In addition, the recording device PROD_C is a camera PROD_C3 that captures moving images as a source of moving images to be input to the encoding unit PROD_C1, an input terminal PROD_C4 for inputting moving images from the outside, and a reception for receiving moving images A unit PROD_C5 and an image processing unit PROD_C6 for generating or processing an image may be further provided. FIG. 39A illustrates a configuration in which the recording apparatus PROD_C includes all of these, but some of them may be omitted.
 なお、受信部PROD_C5は、符号化されていない動画像を受信するものであってもよいし、記録用の符号化方式とは異なる伝送用の符号化方式で符号化された符号化データを受信するものであってもよい。後者の場合、受信部PROD_C5と符号化部PROD_C1との間に、伝送用の符号化方式で符号化された符号化データを復号する伝送用復号部(不図示)を介在させるとよい。 The receiving unit PROD_C5 may receive a non-encoded moving image, or may receive encoded data encoded by a transmission encoding scheme different from the recording encoding scheme. You may do. In the latter case, a transmission decoding unit (not shown) that decodes encoded data encoded by the transmission encoding method may be interposed between the reception unit PROD_C5 and the encoding unit PROD_C1.
 このような記録装置PROD_Cとしては、例えば、DVDレコーダ、BDレコーダ、HDD(Hard Disk Drive)レコーダなどが挙げられる(この場合、入力端子PROD_C4または受信部PROD_C5が動画像の主な供給源となる)。また、カムコーダ(この場合、カメラPROD_C3が動画像の主な供給源となる)、パーソナルコンピュータ(この場合、受信部PROD_C5または画像処理部C6が動画像の主な供給源となる)、スマートフォン(この場合、カメラPROD_C3または受信部PROD_C5が動画像の主な供給源となる)なども、このような記録装置PROD_Cの一例である。 Examples of such a recording device PROD_C include a DVD recorder, a BD recorder, an HDD (Hard Disk Drive) recorder, and the like (in this case, the input terminal PROD_C4 or the receiver PROD_C5 is a main source of moving images). . In addition, a camcorder (in this case, the camera PROD_C3 is a main source of moving images), a personal computer (in this case, the receiving unit PROD_C5 or the image processing unit C6 is a main source of moving images), a smartphone (this In this case, the camera PROD_C3 or the reception unit PROD_C5 is a main source of moving images), and the like is also an example of such a recording apparatus PROD_C.
 図39の(b)は、上述した動画像復号装置31を搭載した再生装置PROD_Dの構成を示したブロックである。図39の(b)に示すように、再生装置PROD_Dは、記録媒体PROD_Mに書き込まれた符号化データを読み出す読出部PROD_D1と、読出部PROD_D1が読み出した符号化データを復号することによって動画像を得る復号部PROD_D2と、を備えている。上述した動画像復号装置31は、この復号部PROD_D2として利用される。 (B) of FIG. 39 is a block showing a configuration of a playback device PROD_D equipped with the above-described video decoding device 31. As shown in (b) of FIG. 39, the playback device PROD_D reads a moving image by decoding a read unit PROD_D1 that reads encoded data written on the recording medium PROD_M and a read unit PROD_D1 that reads the encoded data. And a decoding unit PROD_D2 to obtain. The moving picture decoding apparatus 31 described above is used as the decoding unit PROD_D2.
 なお、記録媒体PROD_Mは、(1)HDDやSSDなどのように、再生装置PROD_Dに内蔵されるタイプのものであってもよいし、(2)SDメモリカードやUSBフラッシュメモリなどのように、再生装置PROD_Dに接続されるタイプのものであってもよいし、(3)DVDやBDなどのように、再生装置PROD_Dに内蔵されたドライブ装置(不図示)に装填されるものであってもよい。 The recording medium PROD_M may be of the type built into the playback device PROD_D, such as (1) HDD or SSD, or (2) such as an SD memory card or USB flash memory. It may be of the type connected to the playback device PROD_D, or (3) may be loaded into a drive device (not shown) built in the playback device PROD_D, such as a DVD or BD. Good.
 また、再生装置PROD_Dは、復号部PROD_D2が出力する動画像の供給先として、動画像を表示するディスプレイPROD_D3、動画像を外部に出力するための出力端子PROD_D4、及び、動画像を送信する送信部PROD_D5を更に備えていてもよい。図39の(b)においては、これら全てを再生装置PROD_Dが備えた構成を例示しているが、一部を省略しても構わない。 In addition, the playback device PROD_D has a display unit PROD_D3 that displays a moving image as a supply destination of the moving image output by the decoding unit PROD_D2, an output terminal PROD_D4 that outputs the moving image to the outside, and a transmission unit that transmits the moving image. PROD_D5 may be further provided. In FIG. 39 (b), a configuration in which the playback apparatus PROD_D includes all of these is illustrated, but a part thereof may be omitted.
 なお、送信部PROD_D5は、符号化されていない動画像を送信するものであってもよいし、記録用の符号化方式とは異なる伝送用の符号化方式で符号化された符号化データを送信するものであってもよい。後者の場合、復号部PROD_D2と送信部PROD_D5との間に、動画像を伝送用の符号化方式で符号化する符号化部(不図示)を介在させるとよい。 The transmission unit PROD_D5 may transmit a non-encoded moving image, or transmits encoded data encoded by a transmission encoding scheme different from the recording encoding scheme. You may do. In the latter case, it is preferable to interpose an encoding unit (not shown) that encodes a moving image using a transmission encoding method between the decoding unit PROD_D2 and the transmission unit PROD_D5.
 このような再生装置PROD_Dとしては、例えば、DVDプレイヤ、BDプレイヤ、HDDプレイヤなどが挙げられる(この場合、テレビジョン受像機等が接続される出力端子PROD_D4が動画像の主な供給先となる)。また、テレビジョン受像機(この場合、ディスプレイPROD_D3が動画像の主な供給先となる)、デジタルサイネージ(電子看板や電子掲示板等とも称され、ディスプレイPROD_D3または送信部PROD_D5が動画像の主な供給先となる)、デスクトップ型PC(この場合、出力端子PROD_D4または送信部PROD_D5が動画像の主な供給先となる)、ラップトップ型またはタブレット型PC(この場合、ディスプレイPROD_D3または送信部PROD_D5が動画像の主な供給先となる)、スマートフォン(この場合、ディスプレイPROD_D3または送信部PROD_D5が動画像の主な供給先となる)なども、このような再生装置PROD_Dの一例である。 Examples of such a playback device PROD_D include a DVD player, a BD player, and an HDD player (in this case, an output terminal PROD_D4 to which a television receiver or the like is connected is a main moving image supply destination). . In addition, a television receiver (in this case, the display PROD_D3 is a main supply destination of moving images), a digital signage (also referred to as an electronic signboard or an electronic bulletin board), and the display PROD_D3 or the transmission unit PROD_D5 is the main supply of moving images. Desktop PC (in this case, output terminal PROD_D4 or transmission unit PROD_D5 is the main video source), laptop or tablet PC (in this case, display PROD_D3 or transmission unit PROD_D5 is video) A smartphone (which is a main image supply destination), a smartphone (in this case, the display PROD_D3 or the transmission unit PROD_D5 is a main moving image supply destination), and the like are also examples of such a playback device PROD_D.
  (ハードウェア的実現およびソフトウェア的実現)
 また、上述した動画像復号装置31および動画像符号化装置11の各ブロックは、集積回路(ICチップ)上に形成された論理回路によってハードウェア的に実現してもよいし、CPU(Central Processing Unit)を用いてソフトウェア的に実現してもよい。
(Hardware implementation and software implementation)
Each block of the moving picture decoding device 31 and the moving picture encoding device 11 described above may be realized in hardware by a logic circuit formed on an integrated circuit (IC chip), or may be a CPU (Central Processing). Unit) may be implemented in software.
 後者の場合、上記各装置は、各機能を実現するプログラムの命令を実行するCPU、上記プログラムを格納したROM(Read Only Memory)、上記プログラムを展開するRAM(RandomAccess Memory)、上記プログラムおよび各種データを格納するメモリ等の記憶装置(記録媒体)などを備えている。そして、本発明の実施形態の目的は、上述した機能を実現するソフトウェアである上記各装置の制御プログラムのプログラムコード(実行形式プログラム、中間コードプログラム、ソースプログラム)をコンピュータで読み取り可能に記録した記録媒体を、上記各装置に供給し、そのコンピュータ(またはCPUやMPU)が記録媒体に記録されているプログラムコードを読み出し実行することによっても、達成可能である。 In the latter case, each of the above devices includes a CPU that executes instructions of a program that realizes each function, a ROM (Read Memory) that stores the program, a RAM (RandomAccess Memory) that develops the program, the program, and various data. A storage device (recording medium) such as a memory for storing the. The object of the embodiment of the present invention is a record in which the program code (execution format program, intermediate code program, source program) of the control program for each of the above devices, which is software that realizes the above-described functions, is recorded in a computer-readable manner This can also be achieved by supplying a medium to each of the above devices, and reading and executing the program code recorded on the recording medium by the computer (or CPU or MPU).
 上記記録媒体としては、例えば、磁気テープやカセットテープ等のテープ類、フロッピー(登録商標)ディスク/ハードディスク等の磁気ディスクやCD-ROM(Compact Disc Read-Only Memory)/MOディスク(Magneto-Optical disc)/MD(Mini Disc)/DVD(Digital Versatile Disc)/CD-R(CD Recordable)/ブルーレイディスク(Blu-ray(登録商標)Disc:登録商標)等の光ディスクを含むディスク類、ICカード(メモリカードを含む)/光カード等のカード類、マスクROM/EPROM(Erasable Programmable Read-Only Memory)/EEPROM(Electrically Erasable and Programmable Read-Only Memory:登録商標)/フラッシュROM等の半導体メモリ類、あるいはPLD(Programmable logic device)やFPGA(Field Programmable Gate Array)等の論理回路類などを用いることができる。 Examples of the recording medium include tapes such as magnetic tapes and cassette tapes, magnetic disks such as floppy (registered trademark) disks / hard disks, CD-ROMs (Compact Disc Read-Only Memory) / MO discs (Magneto-Optical discs). ) / MD (Mini Disc) / DVD (Digital Versatile Disc) / CD-R (CD Recordable) / Blu-ray Disc (Blu-ray (registered trademark) Disc), etc. Cards), optical cards, etc., mask ROM / EPROM (Erasable Programmable Read-Only Memory) / EEPROM (Electrically-Erasable and Programmable Read-Only Memory: registered trademark) / semiconductor memory such as flash ROM, or PLD Logic circuits such as (Programmable logic device) and FPGA (Field Programmable Gate Array) can be used.
 また、上記各装置を通信ネットワークと接続可能に構成し、上記プログラムコードを、通信ネットワークを介して供給してもよい。この通信ネットワークは、プログラムコードを伝送可能であればよく、特に限定されない。例えば、インターネット、イントラネット、エキストラネット、LAN(Local Area Network)、ISDN(Integrated Services DigitalNetwork)、VAN(Value-Added Network)、CATV(Community Antenna television/CableTelevision)通信網、仮想専用網(Virtual Private Network)、電話回線網、移動体通信網、衛星通信網等が利用可能である。また、この通信ネットワークを構成する伝送媒体も、プログラムコードを伝送可能な媒体であればよく、特定の構成または種類のものに限定されない。例えば、IEEE(Institute of Electrical and Electronic Engineers)1394、USB、電力線搬送、ケーブルTV回線、電話線、ADSL(Asymmetric Digital SubscriberLine)回線等の有線でも、IrDA(Infrared Data Association)やリモコンのような赤外線、BlueTooth(登録商標)、IEEE802.11無線、HDR(High Data Rate)、NFC(Near Field Communication)、DLNA(登録商標)(Digital Living Network Alliance:登録商標)、携帯電話網、衛星回線、地上デジタル放送網等の無線でも利用可能である。なお、本発明の実施形態は、上記プログラムコードが電子的な伝送で具現化された、搬送波に埋め込まれたコンピュータデータ信号の形態でも実現され得る。 Further, each of the above devices may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited as long as it can transmit the program code. For example, the Internet, intranet, extranet, LAN (Local Area Network), ISDN (Integrated Services Digital Network), VAN (Value-Added Network), CATV (Community Area Antenna / television / CableTelevision) communication network, Virtual Private Network (Virtual Private Network) A telephone network, a mobile communication network, a satellite communication network, etc. can be used. The transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type. For example, IEEE (Institute of Electrical and Electronic Engineers) 1394, USB, power line carrier, cable TV line, telephone line, ADSL (Asymmetric Digital SubscriberLine) line, etc. wired such as IrDA (Infrared Data Association) or remote control BlueTooth (registered trademark), IEEE802.11 wireless, HDR (High Data Rate), NFC (Near Field Communication), DLNA (registered trademark) (Digital Living Network Alliance: registered trademark), mobile phone network, satellite line, terrestrial digital broadcasting It can also be used by radio such as a network. The embodiment of the present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.
 本発明の実施形態は上述した実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。すなわち、請求項に示した範囲で適宜変更した技術的手段を組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The embodiments of the present invention are not limited to the above-described embodiments, and various modifications are possible within the scope of the claims. That is, embodiments obtained by combining technical means appropriately modified within the scope of the claims are also included in the technical scope of the present invention.
 本発明の実施形態は、画像データが符号化された符号化データを復号する動画像復号装置、および、画像データが符号化された符号化データを生成する動画像符号化装置に好適に適用することができる。また、動画像符号化装置によって生成され、動画像復号装置によって参照される符号化データのデータ構造に好適に適用することができる。 Embodiments of the present invention are preferably applied to a moving image decoding apparatus that decodes encoded data in which image data is encoded, and a moving image encoding apparatus that generates encoded data in which image data is encoded. be able to. Further, the present invention can be suitably applied to the data structure of encoded data that is generated by a video encoding device and referenced by the video decoding device.
11 動画像符号化装置
31 動画像復号装置
41 動画像表示装置
2002 タイル復号部
2012 タイル符号化部
11 Video encoding device
31 Video decoding device
41 Video display device
2002 Tile decoding unit
2012 Tile Coding Department

Claims (12)

  1.  画像をタイルに分割し、タイル単位に動画像を復号する動画像復号装置において、
     符号化ストリームからヘッダ情報を復号し、タイル情報を算出するヘッダ情報復号部と、
     タイル毎の符号化データを復号し、タイルの復号画像を生成するタイル復号部と、
     前記タイル情報を参照して前記タイルの復号画像を合成し表示画像を生成する合成部とを備え、
     前記タイルは、ピクチャを重複することなく分割する単位であるタイルアクティブ領域と隠れている領域(タイル拡張領域)から構成され、
     前記タイルアクティブ領域に前記タイル拡張領域を加えた領域を、CTU単位で復号することを特徴とする動画像復号装置。
    In a video decoding device that divides an image into tiles and decodes the video in tile units,
    A header information decoding unit that decodes header information from the encoded stream and calculates tile information;
    A tile decoding unit that decodes encoded data for each tile and generates a decoded image of the tile;
    A combining unit that generates a display image by combining the decoded image of the tile with reference to the tile information,
    The tile is composed of a tile active area which is a unit for dividing a picture without overlapping, and a hidden area (tile extension area).
    A moving picture decoding apparatus, wherein an area obtained by adding the tile extension area to the tile active area is decoded in units of CTUs.
  2.  前記タイル拡張領域は、隣接タイルのタイルアクティブ領域とオーバーラップし、参照、復号に利用されるオーバーラップ領域と、参照、復号しないクロップオフセット領域(タイル無効領域)から構成されることを特徴とする請求項1に記載の動画像復号装置。 The tile extension area overlaps with a tile active area of an adjacent tile, and includes an overlap area used for reference and decoding, and a crop offset area (tile invalid area) that is not referenced or decoded. The moving picture decoding apparatus according to claim 1.
  3.  前記タイルアクティブ領域と前記オーバーラップ領域を足したサイズは、CTUサイズの整数倍でなく、タイルの左上座標はCTUの整数倍の位置に限定されないことを特徴とする請求項2に記載の動画像復号装置。 The moving image according to claim 2, wherein a size obtained by adding the tile active area and the overlap area is not an integral multiple of a CTU size, and an upper left coordinate of the tile is not limited to an integer multiple of a CTU. Decoding device.
  4.  画像をタイルに分割し、タイル単位に動画像を復号する動画像復号装置において、
     符号化ストリームからヘッダ情報を復号し、タイル情報を算出するヘッダ情報復号部と、
     タイル毎の符号化データを復号し、タイルの復号画像を生成するタイル復号部と、
     前記タイル情報を参照して前記タイルの復号画像を合成し表示画像を生成する合成部とを備え、
     前記タイルは、復号・出力に利用されるタイル有効領域と、復号・出力に利用されないクロップオフセット領域(タイル無効領域)から構成され、
     前記タイル有効領域は、ピクチャを分割する単位であるタイルアクティブ領域と、隣接タイルのタイルアクティブ領域とオーバーラップし、参照、復号に利用されるオーバーラップ領域から構成され、
     前記タイル有効領域をCTU単位で復号することを特徴とする動画像復号装置。
    In a video decoding device that divides an image into tiles and decodes the video in tile units,
    A header information decoding unit that decodes header information from the encoded stream and calculates tile information;
    A tile decoding unit that decodes encoded data for each tile and generates a decoded image of the tile;
    A combining unit that generates a display image by combining the decoded image of the tile with reference to the tile information,
    The tile is composed of a tile effective area used for decoding / output and a crop offset area (tile invalid area) not used for decoding / output,
    The tile effective area is composed of a tile active area that is a unit for dividing a picture and an overlap area that overlaps with a tile active area of an adjacent tile and is used for reference and decoding.
    A moving picture decoding apparatus, wherein the tile effective area is decoded in CTU units.
  5.  前記タイル有効領域と前記クロップオフセット領域を足したサイズは、CTUサイズの整数倍でなく、タイルの左上座標はCTUの整数倍の位置に限定されないことを特徴とする請求項4に記載の動画像復号装置。 5. The moving image according to claim 4, wherein a size obtained by adding the tile effective area and the crop offset area is not an integer multiple of a CTU size, and an upper left coordinate of the tile is not limited to an integer multiple of a CTU. Decoding device.
  6.  前記タイル復号部は、対象タイルの情報、および、対象タイルのコロケートタイルの情報のみを参照して、対象タイルを復号することを特徴とする請求項1から5に記載の動画像復号装置。 6. The moving picture decoding apparatus according to claim 1, wherein the tile decoding unit decodes the target tile with reference to only information on the target tile and information on a collocated tile of the target tile.
  7.  画像をタイルに分割し、タイル単位に動画像を符号化する動画像符号化装置において、
     タイル情報を算出するタイル情報算出部と、
     画像をタイルに分割する分割部と、
     タイルを符号化し、符号化ストリームを生成するタイル符号化部とを備え、
     前記タイルは、ピクチャを重複することなく分割する単位であるタイルアクティブ領域と隠れている領域(タイル拡張領域)から構成され、
     前記タイルアクティブ領域に前記タイル拡張領域を加えた領域を、CTU単位で符号化することを特徴とする動画像符号化装置。
    In a video encoding device that divides an image into tiles and encodes a video in tile units,
    A tile information calculation unit for calculating tile information;
    A dividing unit for dividing the image into tiles;
    A tile encoding unit that encodes a tile and generates an encoded stream;
    The tile is composed of a tile active area which is a unit for dividing a picture without overlapping, and a hidden area (tile extension area).
    A moving picture encoding apparatus, wherein an area obtained by adding the tile extension area to the tile active area is encoded in units of CTUs.
  8.  前記タイル拡張領域は、隣接タイルのタイルアクティブ領域とオーバーラップし、参照、符号化に利用されるオーバーラップ領域と、参照、符号化しないクロップオフセット領域(タイル無効領域)から構成されることを特徴とする請求項7に記載の動画像符号化装置。 The tile extension area overlaps with a tile active area of an adjacent tile, and includes an overlap area used for reference and encoding, and a crop offset area (tile invalid area) that is not referenced or encoded. The moving picture encoding apparatus according to claim 7.
  9.  前記タイルアクティブ領域と前記オーバーラップ領域を足したサイズは、CTUサイズの整数倍でなく、タイルの左上座標はCTUの整数倍の位置に限定されないことを特徴とする請求項8に記載の動画像符号化装置。 9. The moving image according to claim 8, wherein a size obtained by adding the tile active area and the overlap area is not an integer multiple of a CTU size, and an upper left coordinate of the tile is not limited to an integer multiple of a CTU. Encoding device.
  10.  画像をタイルに分割し、タイル単位に動画像を符号化する動画像符号化装置において、
     タイル情報を算出するタイル情報算出部と、
     画像をタイルに分割する分割部と、
     タイルを符号化し、符号化ストリームを生成するタイル符号化部とを備え、
     前記タイルは、符号化・出力に利用されるタイル有効領域と、符号化・出力に利用されないクロップオフセット領域(タイル無効領域)から構成され、
     前記タイル有効領域は、ピクチャを分割する単位であるタイルアクティブ領域と、隣接タイルのタイルアクティブ領域とオーバーラップし、参照、符号化に利用されるオーバーラップ領域から構成され、
     前記タイル有効領域をCTU符号化で符号化することを特徴とする動画像符号化装置。
    In a video encoding device that divides an image into tiles and encodes a video in tile units,
    A tile information calculation unit for calculating tile information;
    A dividing unit for dividing the image into tiles;
    A tile encoding unit that encodes a tile and generates an encoded stream;
    The tile is composed of a tile effective area used for encoding / output and a crop offset area (tile invalid area) not used for encoding / output,
    The tile effective area is composed of a tile active area that is a unit for dividing a picture, and an overlap area that overlaps with a tile active area of an adjacent tile and is used for reference and encoding.
    A moving picture coding apparatus, wherein the tile effective area is coded by CTU coding.
  11.  前記タイル有効領域と前記クロップオフセット領域を足したサイズは、CTUサイズの整数倍でなく、タイルの左上座標はCTUの整数倍の位置に限定されないことを特徴とする請求項10に記載の動画像符号化装置。 The moving image according to claim 10, wherein the size of the tile effective area and the crop offset area is not an integral multiple of the CTU size, and the upper left coordinate of the tile is not limited to an integer multiple of the CTU. Encoding device.
  12.  前記タイル符号化部は、対象タイルの情報、および、対象タイルのコロケートタイルの情報のみを参照して、対象タイルを符号化することを特徴とする請求項7から11に記載の動画像符号化装置。 The moving image coding according to claim 7, wherein the tile coding unit codes the target tile with reference to only information on the target tile and information on a collocated tile of the target tile. apparatus.
PCT/JP2019/004497 2018-02-14 2019-02-07 Moving image encoding device and moving image decoding device WO2019159820A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2018-023894 2018-02-14
JP2018023894A JP2021064817A (en) 2018-02-14 2018-02-14 Moving image encoding device and moving image decoding device
JP2018054270A JP2021064819A (en) 2018-03-22 2018-03-22 Moving image encoding device and moving image decoding device
JP2018-054270 2018-03-22

Publications (1)

Publication Number Publication Date
WO2019159820A1 true WO2019159820A1 (en) 2019-08-22

Family

ID=67620983

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/004497 WO2019159820A1 (en) 2018-02-14 2019-02-07 Moving image encoding device and moving image decoding device

Country Status (1)

Country Link
WO (1) WO2019159820A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200137401A1 (en) * 2017-07-03 2020-04-30 Iucf-Hyu (Industry-University Cooperation Foundation Hanyang University) Method and device for decoding image by using partition unit including additional region
CN111953981A (en) * 2020-08-25 2020-11-17 西安万像电子科技有限公司 Encoding method and device, and decoding method and device
WO2021199783A1 (en) * 2020-03-30 2021-10-07 Kddi株式会社 Image decoding device, image decoding method, and program

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004248268A (en) * 2003-01-22 2004-09-02 Ricoh Co Ltd Image processor, image forming apparatus, image decoder, image processing method, program, and memory medium
JP2010130622A (en) * 2008-12-01 2010-06-10 Ricoh Co Ltd Encoding apparatus, encoding method, program, and recording medium
JP2015213277A (en) * 2014-05-07 2015-11-26 日本電信電話株式会社 Encoding method and encoding program
WO2016064862A1 (en) * 2014-10-20 2016-04-28 Google Inc. Continuous prediction domain
JP2016178698A (en) * 2012-04-06 2016-10-06 ソニー株式会社 Encoding device and encoding method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004248268A (en) * 2003-01-22 2004-09-02 Ricoh Co Ltd Image processor, image forming apparatus, image decoder, image processing method, program, and memory medium
JP2010130622A (en) * 2008-12-01 2010-06-10 Ricoh Co Ltd Encoding apparatus, encoding method, program, and recording medium
JP2016178698A (en) * 2012-04-06 2016-10-06 ソニー株式会社 Encoding device and encoding method
JP2015213277A (en) * 2014-05-07 2015-11-26 日本電信電話株式会社 Encoding method and encoding program
WO2016064862A1 (en) * 2014-10-20 2016-04-28 Google Inc. Continuous prediction domain

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KIRAN MISRA ET AL.: "Description of SDR and HDR video coding technology proposal by Sharp and Foxconn, Joint Video Exploration Team (JVET) 10th Meeting: San Diego", JVET-J0026-R1. DOCX, JVET-J0026-VLL, 20 April 2018 (2018-04-20) *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200137401A1 (en) * 2017-07-03 2020-04-30 Iucf-Hyu (Industry-University Cooperation Foundation Hanyang University) Method and device for decoding image by using partition unit including additional region
US10986351B2 (en) * 2017-07-03 2021-04-20 Iucf-Hyu (Industry-University Cooperation Foundation Hanyang University) Method and device for decoding image by using partition unit including additional region
US11509914B2 (en) 2017-07-03 2022-11-22 Iucf-Hyu (Industry-University Cooperation Foundation Hanyang University) Method and device for decoding image by using partition unit including additional region
WO2021199783A1 (en) * 2020-03-30 2021-10-07 Kddi株式会社 Image decoding device, image decoding method, and program
JP2021164005A (en) * 2020-03-30 2021-10-11 Kddi株式会社 Image decoding device, image decoding method, and program
CN111953981A (en) * 2020-08-25 2020-11-17 西安万像电子科技有限公司 Encoding method and device, and decoding method and device
CN111953981B (en) * 2020-08-25 2023-11-28 西安万像电子科技有限公司 Encoding method and device, decoding method and device

Similar Documents

Publication Publication Date Title
JP7223886B2 (en) Image decoding method
WO2018221368A1 (en) Moving image decoding device, and moving image encoding device
WO2018199001A1 (en) Image decoding device and image coding device
WO2018037853A1 (en) Image decoding apparatus and image coding apparatus
WO2018037896A1 (en) Image decoding apparatus, image encoding apparatus, image decoding method, and image encoding method
WO2018116925A1 (en) Intra prediction image generating device, image decoding device, and image coding device
WO2019221072A1 (en) Image encoding device, encoded stream extraction device, and image decoding device
JP2021010046A (en) Image encoding device and image decoding device
KR102606330B1 (en) Aps signaling-based video or image coding
WO2018110203A1 (en) Moving image decoding apparatus and moving image encoding apparatus
US20190037242A1 (en) Image decoding device, an image encoding device, and an image decoding method
WO2017195532A1 (en) Image decoding device and image encoding device
WO2018110462A1 (en) Image decoding device and image encoding device
KR20220041897A (en) In-loop filtering-based video coding apparatus and method
WO2018216688A1 (en) Video encoding device, video decoding device, and filter device
KR20220050088A (en) Cross-component adaptive loop filtering-based video coding apparatus and method
KR20220080738A (en) Image encoding/decoding method, apparatus, and bitstream transmission method using lossless color conversion
WO2019159820A1 (en) Moving image encoding device and moving image decoding device
CA3152954A1 (en) Apparatus and method for image coding based on filtering
JP2020061701A (en) Dynamic image coding device and dynamic image decoding device
WO2019230904A1 (en) Image decoding device and image encoding device
WO2018037723A1 (en) Image decoding device and image coding device
KR20220041898A (en) Apparatus and method for video coding based on adaptive loop filtering
WO2018173862A1 (en) Image decoding device and image coding device
WO2018143289A1 (en) Image encoding device and image decoding device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19754539

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19754539

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP