WO2014050554A1 - Dispositif de décodage d'image et dispositif de codage d'image - Google Patents

Dispositif de décodage d'image et dispositif de codage d'image Download PDF

Info

Publication number
WO2014050554A1
WO2014050554A1 PCT/JP2013/074483 JP2013074483W WO2014050554A1 WO 2014050554 A1 WO2014050554 A1 WO 2014050554A1 JP 2013074483 W JP2013074483 W JP 2013074483W WO 2014050554 A1 WO2014050554 A1 WO 2014050554A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
filter
prediction
unit
layer
Prior art date
Application number
PCT/JP2013/074483
Other languages
English (en)
Japanese (ja)
Other versions
WO2014050554A8 (fr
Inventor
将伸 八杉
知宏 猪飼
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Publication of WO2014050554A1 publication Critical patent/WO2014050554A1/fr
Publication of WO2014050554A8 publication Critical patent/WO2014050554A8/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/36Scalability techniques involving formatting the layers as a function of picture distortion after decoding, e.g. signal-to-noise [SNR] scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/196Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Definitions

  • the present invention relates to an image decoding apparatus that decodes hierarchically encoded data in which an image is hierarchically encoded, and an image encoding apparatus that generates hierarchically encoded data by hierarchically encoding an image.
  • One of information transmitted in a communication system or information recorded in a storage device is an image or a moving image. 2. Description of the Related Art Conventionally, a technique for encoding an image for transmitting and storing these images (hereinafter including moving images) is known.
  • Non-Patent Document 1 As the video encoding system, H.264 H.264 / MPEG-4. AVC and HEVC (High-Efficiency Video Coding) which is a successor codec are known (Non-Patent Document 1).
  • a predicted image is usually generated based on a local decoded image obtained by encoding / decoding an input image, and obtained by subtracting the predicted image from the input image (original image).
  • Prediction residuals (sometimes referred to as “difference images” or “residual images”) are encoded.
  • examples of the method for generating a predicted image include inter-screen prediction (inter prediction) and intra-screen prediction (intra prediction).
  • a predicted image in a corresponding frame are sequentially generated based on a locally decoded image in the same frame.
  • a predicted image is generated by motion compensation between frames.
  • Information relating to motion compensation is often not directly encoded to reduce the amount of code. Therefore, in the inter prediction, a motion compensation parameter is estimated based on a decoding situation or the like around the target block.
  • Hierarchical coding systems include H.264 and ISO / IEC and ITU-T standards.
  • SVC supports spatial scalability, temporal scalability, and SNR scalability.
  • spatial scalability an image obtained by down-sampling an original image to a desired resolution is used as a lower layer. It is encoded with H.264 / AVC.
  • the upper layer performs inter-layer prediction in order to remove redundancy between layers.
  • inter-layer prediction there is motion information prediction in which information related to motion prediction is predicted from information in a lower layer at the same time, or texture prediction in which prediction is performed from an image obtained by up-sampling a decoded image in a lower layer at the same time (non-patent document). 1).
  • motion information prediction motion information is encoded using motion information of a reference layer as an estimated value.
  • Filter techniques for switching processing according to filter parameters include SAO (sample adaptive offset, adaptive offset filter) that adds an offset according to the classification of the pixel to be filtered, and ALF using the product sum of the surrounding pixels of the filter target and the filter coefficient.
  • SAO sample adaptive offset, adaptive offset filter
  • ALF adaptive frequency domain filter
  • Non-Patent Document 2 Non-Patent Document 2
  • the adaptive filter processing can improve the objective / subjective image quality as compared with the conventional scaling processing and motion compensation processing using fixed filter parameters.
  • the processing is heavy as compared with a conventional filter using a fixed filter parameter.
  • the filter parameter is encoded for each region (filter unit unit) and the filter processing is switched, there is a problem that the code amount of the filter parameter becomes large.
  • the present invention has been made in view of the above-described problems, and an object of the present invention is to provide an objective / subjective of a predicted image by using an adaptive filter when generating a predicted image using a reference layer image.
  • an object of the present invention is to provide an objective / subjective of a predicted image by using an adaptive filter when generating a predicted image using a reference layer image.
  • it is to reduce the amount of filter processing.
  • the code amount of the filter parameter can be reduced.
  • the moving image decoding apparatus decodes hierarchically encoded data in which image information relating to images of different quality for each layer is hierarchically encoded, and restores an image in a target layer to be decoded.
  • a device A predicted image generation means for generating a predicted image of the target layer based on a prediction parameter, using the already decoded reference layer image and the already decoded target layer image as input, The predicted image generation means includes filter means for generating a filtered reference layer image by performing adaptive filter processing on the reference layer image based on a filter parameter, The moving picture decoding apparatus, wherein the predicted image generation means generates a predicted image using a filtered reference layer image only when the prediction parameter satisfies a predetermined condition.
  • the adaptive filter processing for the reference layer image is limited to the case where the prediction parameter satisfies a predetermined condition, the amount of adaptive filter processing is reduced.
  • the prediction image generation unit generates a prediction image using a filtered reference layer image on a predetermined condition that a prediction unit is a predetermined size or more.
  • the adaptive filter process for the reference layer image is limited to a predetermined size or larger, the amount of adaptive filter process can be reduced.
  • the prediction image generation unit generates a prediction image using a filtered reference layer image on a predetermined condition that a prediction unit is single prediction. To do.
  • the adaptive filter processing for the reference layer image is limited to the case where the prediction unit is uni-prediction, and thus there is an effect of reducing the amount of adaptive filter processing.
  • the predicted image generation unit generates a predicted image using a reference image included in a predetermined reference image list, and the reference image list is a predetermined list. On the condition that a predicted image is generated using a filtered reference layer image.
  • the adaptive filter processing for the reference layer image is limited to a reference image included in a predetermined reference image list, an effect of reducing the amount of adaptive filter processing can be achieved.
  • the predicted image generation unit further includes a scaling unit that performs a scaling process on the reference layer image, and the predicted image generation unit is not scaled by the scaling unit. With this as a predetermined condition, a predicted image is generated using a filtered reference layer image.
  • the adaptive filtering process for the reference layer image is limited to the case where the scaling process is not applied to the reference layer, an effect of reducing the processing amount of the adaptive filtering process is obtained.
  • the moving image decoding apparatus of the present invention decodes hierarchically encoded data in which image information relating to images of different quality for each layer is hierarchically encoded, and encodes an image in a target layer to be encoded.
  • An image encoding device A predicted image generation means for generating a predicted image of the target layer based on a prediction parameter, using an already encoded reference layer image and an already encoded target layer image as input, and
  • the moving image encoding apparatus of the present invention includes filter means for generating a filtered reference layer image by performing adaptive filter processing on the reference layer image based on a filter parameter, and the predicted image generation means includes the prediction parameter Is a moving picture coding apparatus that generates a predicted image using a filtered reference layer image only when a predetermined condition is satisfied.
  • the adaptive filter processing for the reference layer image is limited to the case where the prediction parameter satisfies a predetermined condition, the amount of adaptive filter processing is reduced.
  • a moving picture decoding apparatus decodes hierarchically encoded data in which image information relating to images of different quality for each layer is hierarchically encoded, and outputs an image in a target layer to be decoded.
  • An image decoding device to restore A predicted image generation means for generating a predicted image of the target layer based on a prediction parameter, using the already decoded reference layer image and the already decoded target layer image as input,
  • the prediction image generation unit includes a filter unit that performs adaptive filter processing on the reference layer image based on a filter parameter and generates a filtered reference layer image.
  • a filter parameter decoding means for decoding the filter parameter The filter parameter decoding means initializes a filter parameter decoded flag to 0 for each filter unit which is a predetermined unit, and a reference layer image is used for generation of a prediction image in the prediction image generation means, and A moving picture decoding apparatus, wherein when the filter parameter decoded flag is 0, the filter parameter is decoded, and further, the filter parameter decoded flag is set to 1.
  • the predicted image generation means determines whether or not a reference image used in the predicted image generation means is a reference layer image, and determines whether or not the reference image is used to generate a predicted image in the predicted image generation means It is characterized by.
  • the predicted image generation unit generates a predicted image using the filtered reference layer image only when the prediction parameter satisfies a predetermined condition, and further, the predicted image The generation unit determines whether the prediction parameter satisfies the predetermined condition in the prediction unit in the filter unit, and determines whether the prediction image generation unit uses a reference layer image for generation of the prediction image.
  • the filter parameter is not encoded when the adaptive filter is not applied to the reference layer image. There is an effect of reducing the code amount of the filter parameter.
  • An encoding apparatus encodes hierarchically encoded data in which image information relating to images of different quality for each layer is hierarchically encoded, and outputs an image in a target layer to be encoded.
  • An image encoding device to be restored A predicted image generation means for generating a predicted image of the target layer based on a prediction parameter, using an already encoded reference layer image and an already encoded target layer image as input, and
  • the prediction image generation unit includes a filter unit that performs adaptive filter processing on the reference layer image based on a filter parameter and generates a filtered reference layer image.
  • the filter parameter encoding means initializes a filter parameter encoded flag to 0 for each filter unit as a predetermined unit, and a reference layer image is used for generation of a prediction image in the prediction image generation means.
  • the filter parameter decoded flag is “0”
  • the filter parameter is encoded, and the filter parameter encoded flag is set to “1”.
  • the prediction parameter when the image decoding apparatus according to the present invention generates a prediction image using a reference layer image, the prediction parameter satisfies a predetermined condition when an adaptive filter (interlayer filter) is used.
  • the inter-layer filter process is applied only in the case where it is applied, and not applied in other cases, so that the range of the inter-layer filter is limited and the processing amount of the filter process is reduced.
  • only the first CU (or PU) in which the reference layer image is used for generating the predicted image or the first CU (or PU) in which the prediction parameter satisfies a predetermined condition and the inter-layer filter process is applied is supported.
  • FIG. 1 is a functional block diagram illustrating a schematic configuration of a hierarchical video decoding device.
  • FIG. It is a figure for demonstrating the layer structure of the hierarchy coding data which concerns on the said embodiment, (a) is a figure shown about the hierarchy moving image encoder side, (b) is about the hierarchy moving image decoder side.
  • FIG. It is a figure for demonstrating the structure of the hierarchy coding data based on the said embodiment, (a) is a figure which shows the sequence layer which prescribes
  • FIG. 7 is a diagram illustrating PU partition type patterns, where (a) to (h) are PU partition types 2N ⁇ N, 2N ⁇ nU, 2N ⁇ nD, 2N ⁇ N, 2N ⁇ nU, and 2N, respectively. The partition shape in the case of xnD is shown.
  • FIG. 1 It is a functional block diagram which shows the schematic structure of the prediction parameter decompression
  • FIG. 3 is a functional block diagram illustrating a schematic configuration of an interlayer filter unit included in the hierarchical video decoding device. It is a flowchart which shows the operation example of an interlayer filter control part. It is a flowchart which shows another operation example of an interlayer filter control part. It is a flowchart which shows the operation example of the detail of an interlayer filter control part. It is a flowchart which shows another operation example of the detail of an interlayer filter control part. It is a flowchart which shows another operation example of the detail of an interlayer filter control part. It is a figure which shows the structure of the coding data decoded by the variable length decoding part.
  • 10 is a flowchart illustrating an operation example of decoding encoded data including a filter parameter in the variable length decoding unit 12; 12 is a flowchart illustrating another example of operation for decoding encoded data including a filter parameter in the variable length decoding unit 12; 12 is a flowchart illustrating another example of operation for decoding encoded data including a filter parameter in the variable length decoding unit 12; It is a figure which shows another structure of the coding data decoded by the variable length decoding part. It is a figure which shows the structure of the filter parameter encoded with a filter unit in the variable length decoding part.
  • FIG. 10 is a flowchart illustrating an operation example of encoding encoded data including a filter parameter in the variable length encoding unit 25.
  • 12 is a flowchart illustrating another operation example of encoding encoded data including a filter parameter in the variable-length encoding 25.
  • 12 is a flowchart illustrating another operation example of encoding encoded data including a filter parameter in the variable-length encoding 25. It is a functional block diagram which shows schematic structure of the hierarchy moving image encoder which concerns on one Embodiment of this invention.
  • the hierarchical moving picture decoding apparatus 1 and the hierarchical moving picture encoding apparatus 2 according to an embodiment of the present invention will be described below with reference to FIGS.
  • a hierarchical video decoding device (image decoding device) 1 receives encoded data that has been subjected to scalable video coding (SVC) by a hierarchical video encoding device (image encoding device) 2.
  • Scalable video coding is a coding method that hierarchically encodes moving images from low quality to high quality.
  • Scalable video coding is, for example, H.264. H.264 / AVC Annex G SVC.
  • the quality of a moving image here widely means an element that affects the appearance of a subjective and objective moving image.
  • the quality of the moving image includes, for example, “resolution”, “frame rate”, “image quality”, and “pixel representation accuracy”.
  • the quality of the moving image is different, it means that, for example, “resolution” is different, but it is not limited thereto.
  • the quality of moving images is different from each other.
  • SVC is also classified into (1) spatial scalability, (2) temporal scalability, and (3) SNR (Signal-to-Noise-Ratio) scalability from the viewpoint of the type of information layered.
  • Spatial scalability is a technique for hierarchizing resolution and image size.
  • Time scalability is a technique for layering at a frame rate (the number of frames per unit time).
  • SNR scalability is a technique for hierarchizing in coding noise.
  • FIG. 2 is a diagram schematically illustrating a case where a moving image is hierarchically encoded / decoded by three layers of a lower layer L3, a middle layer L2, and an upper layer L1. That is, in the example shown in FIGS. 2A and 2B, of the three layers, the upper layer L1 is the highest layer and the lower layer L3 is the lowest layer.
  • a decoded image corresponding to a specific quality that can be decoded from hierarchically encoded data is referred to as a decoded image of a specific hierarchy (or a decoded image corresponding to a specific hierarchy) (for example, in the upper hierarchy L1).
  • Decoded image POUT # A a decoded image of a specific hierarchy (or a decoded image corresponding to a specific hierarchy) (for example, in the upper hierarchy L1).
  • FIG. 2A shows a hierarchical moving image encoding apparatus 2 # A to 2 # C that generates encoded data DATA # A to DATA # C by hierarchically encoding input images PIN # A to PIN # C, respectively. Is shown.
  • FIG. 2B shows a hierarchical moving picture decoding apparatus 1 # A ⁇ that generates decoded images POUT # A ⁇ POUT # C by decoding the encoded data DATA # A ⁇ DATA # C, which are encoded hierarchically. 1 # C is shown.
  • the input images PIN # A, PIN # B, and PIN # C that are input on the encoding device side have the same original image but different image quality (resolution, frame rate, image quality, and the like).
  • the image quality decreases in the order of the input images PIN # A, PIN # B, and PIN # C.
  • the hierarchical video encoding device 2 # C of the lower hierarchy L3 encodes the input image PIN # C of the lower hierarchy L3 to generate encoded data DATA # C of the lower hierarchy L3.
  • Basic information necessary for decoding the decoded image POUT # C of the lower layer L3 is included (indicated by “C” in FIG. 2). Since the lower layer L3 is the lowest layer, the encoded data DATA # C of the lower layer L3 is also referred to as basic encoded data.
  • the hierarchical video encoding apparatus 2 # B of the middle hierarchy L2 encodes the input image PIN # B of the middle hierarchy L2 with reference to the encoded data DATA # C of the lower hierarchy, and performs the middle hierarchy L2 Encoded data DATA # B is generated.
  • additional data necessary for decoding the decoded image POUT # B of the intermediate hierarchy is added to the encoded data DATA # B of the intermediate hierarchy L2.
  • Information (indicated by “B” in FIG. 2) is included.
  • the hierarchical video encoding apparatus 2 # A of the upper hierarchy L1 encodes the input image PIN # A of the upper hierarchy L1 with reference to the encoded data DATA # B of the intermediate hierarchy L2 to Encoded data DATA # A is generated.
  • the encoded data DATA # A of the upper layer L1 is used to decode the basic information “C” necessary for decoding the decoded image POUT # C of the lower layer L3 and the decoded image POUT # B of the middle layer L2.
  • additional information indicated by “A” in FIG. 2 necessary for decoding the decoded image POUT # A of the upper layer is included.
  • the encoded data DATA # A of the upper layer L1 includes information related to decoded images having a plurality of different qualities.
  • the decoding device side will be described with reference to FIG.
  • the decoding devices 1 # A, 1B, and 1 # C corresponding to the upper layer L1, the middle layer L2, and the lower layer L3 have encoded data DATA # A, DATA # B, and DATA # C is decoded and decoded images POUT # A, POUT # B, and POUT # C are output.
  • the hierarchical moving picture decoding apparatus 1 # B in the middle hierarchy L2 receives information necessary for decoding the decoded picture POUT # B from the hierarchical encoded data DATA # A in the upper hierarchy L1 (that is, the hierarchical encoded data).
  • the decoded image POUT # B may be decoded by extracting “B” and “C”) included in DATA # A.
  • the decoded images POUT # A, POUT # B, and POUT # C can be decoded based on information included in the hierarchically encoded data DATA # A of the upper hierarchy L1.
  • the hierarchical encoded data is not limited to the above three-layer hierarchical encoded data, and the hierarchical encoded data may be hierarchically encoded with two layers or may be hierarchically encoded with a number of layers larger than three. Good.
  • Hierarchically encoded data may be configured as described above. For example, in the example described above with reference to FIGS. 2A and 2B, it has been described that “C” and “B” are referred to for decoding the decoded image POUT # B, but the present invention is not limited thereto. It is also possible to configure the hierarchically encoded data so that the decoded image POUT # B can be decoded using only “B”.
  • Hierarchically encoded data can also be generated so that In that case, the lower layer hierarchical video encoding device generates hierarchical encoded data by quantizing the prediction residual using a larger quantization width than the upper layer hierarchical video encoding device. To do.
  • Upper layer A layer located above a certain layer is referred to as an upper layer.
  • the upper layers of the lower layer L3 are the middle layer L2 and the upper layer L1.
  • the decoded image of the upper layer means a decoded image with higher quality (for example, high resolution, high frame rate, high image quality, etc.).
  • Lower layer A layer located below a certain layer is referred to as a lower layer.
  • the lower layers of the upper layer L1 are the middle layer L2 and the lower layer L3.
  • the decoded image of the lower layer refers to a decoded image with lower quality.
  • Target layer A layer that is the target of decoding or encoding. *
  • Reference layer A specific lower layer referred to for decoding a decoded image corresponding to a target layer is referred to as a reference layer.
  • the reference layers of the upper hierarchy L1 are the middle hierarchy L2 and the lower hierarchy L3.
  • the hierarchically encoded data can be configured so that it is not necessary to refer to all of the lower layers in decoding of the specific layer.
  • the hierarchical encoded data can be configured such that the reference layer of the upper hierarchy L1 is either the middle hierarchy L2 or the lower hierarchy L3.
  • Base layer A layer located at the lowest layer is referred to as a base layer.
  • the base layer decoded image is the lowest quality decoded image that can be decoded from the encoded data, and is referred to as the base decoded image.
  • the basic decoded image is a decoded image corresponding to the lowest layer.
  • the partially encoded data of the hierarchically encoded data necessary for decoding the basic decoded image is referred to as basic encoded data.
  • the basic information “C” included in the hierarchically encoded data DATA # A of the upper hierarchy L1 is the basic encoded data.
  • Extension layer The upper layer of the base layer is called the extension layer.
  • the layer identifier is for identifying the hierarchy, and corresponds to the hierarchy one-to-one.
  • the hierarchically encoded data includes a hierarchical identifier used for selecting partial encoded data necessary for decoding a decoded image of a specific hierarchy.
  • a subset of hierarchically encoded data associated with a layer identifier corresponding to a specific layer is also referred to as a layer representation.
  • a layer representation of the layer and / or a layer representation corresponding to a lower layer of the layer is used. That is, in decoding the decoded image of the target layer, layer representation of the target layer and / or layer representation of one or more layers included in a lower layer of the target layer are used.
  • Inter-layer prediction is based on the syntax element value, the value derived from the syntax element value included in the layer expression of the layer (reference layer) different from the layer expression of the target layer, and the decoded image. It is to predict the syntax element value of the target layer, the encoding parameter used for decoding of the target layer, and the like. Inter-layer prediction in which information related to motion prediction is predicted from reference layer information (at the same time) may be referred to as motion information prediction. In addition, inter-layer prediction in which a decoded image of a lower layer (at the same time) is predicted from an up-sampled image may be referred to as texture prediction. Note that the hierarchy used for inter-layer prediction is, for example, a lower layer of the target layer. In addition, performing prediction within a target layer without using a reference layer may be referred to as intra-layer prediction.
  • the lower layer and the upper layer may be encoded by different encoding methods.
  • the encoded data of the hierarchy may be supplied to the hierarchical video decoding apparatus 1 ′ via different transmission paths, or may be supplied to the hierarchical video decoding apparatus 1 ′ via the same transmission path. It may be supplied.
  • the base layer when transmitting ultra-high-definition video (moving image, 4K video data) with a base layer and one extended layer in a scalable encoding, the base layer downscales 4K video data, and interlaced video data.
  • MPEG-2 or H.264 The enhancement layer may be encoded by H.264 / AVC and transmitted over a television broadcast network, and the enhancement layer may encode 4K video (progressive) with HEVC and transmit over the Internet.
  • FIG. 3 is a diagram illustrating a data structure of encoded data (hierarchically encoded data DATA # C in the example of FIG. 2) that can be employed in the base layer.
  • Hierarchically encoded data DATA # C illustratively includes a sequence and a plurality of pictures constituting the sequence.
  • FIG. 3 shows a hierarchical structure of data in the hierarchical encoded data DATA # C.
  • 3A to 3E show a sequence layer that defines a sequence SEQ, a picture layer that defines a picture PICT, a slice layer that defines a slice S, and a tree block that defines a tree block TBLK. It is a figure which shows the CU layer which prescribes
  • coding unit Coding
  • sequence layer a set of data referred to by the hierarchical video decoding device 1 for decoding a sequence SEQ to be processed (hereinafter also referred to as a target sequence) is defined.
  • the sequence SEQ includes a sequence parameter set SPS (Sequence Parameter Set) and a picture parameter set PPS (Picture Parameter Set). ), An adaptation parameter set (APS), pictures PICT1 to PICTNP (NP is the total number of pictures included in the sequence SEQ), and supplemental enhancement information (SEI).
  • SPS Sequence Parameter Set
  • PPS Picture Parameter Set
  • SEI Supplemental Enhancement Information
  • the sequence parameter set SPS defines a set of encoding parameters that the hierarchical video decoding device 1 refers to in order to decode the target sequence.
  • a set of encoding parameters referred to by the hierarchical video decoding device 1 for decoding each picture in the target sequence is defined.
  • a plurality of PPS may exist. In that case, one of a plurality of PPSs is selected from each picture in the target sequence.
  • the adaptive parameter set APS defines a set of encoding parameters that the hierarchical video decoding device 1 refers to in order to decode each slice in the target sequence. There may be a plurality of APSs. In that case, one of a plurality of APSs is selected from each slice in the target sequence.
  • Picture layer In the picture layer, a set of data that is referred to by the hierarchical video decoding device 1 in order to decode a picture PICT to be processed (hereinafter also referred to as a target picture) is defined. As shown in FIG. 3B, the picture PICT includes a picture header PH and slices S1 to SNS (NS is the total number of slices included in the picture PICT).
  • the picture header PH includes a coding parameter group referred to by the hierarchical video decoding device 1 in order to determine a decoding method of the target picture.
  • the encoding parameter group is not necessarily included directly in the picture header PH, and may be included indirectly, for example, by including a reference to the picture parameter set PPS.
  • slice layer In the slice layer, a set of data that is referred to by the hierarchical video decoding device 1 in order to decode a slice S (also referred to as a target slice) to be processed is defined. As shown in FIG. 3C, the slice S includes a slice header SH and a sequence of tree blocks TBLK1 to TBLKNC (NC is the total number of tree blocks included in the slice S).
  • the slice header SH includes a coding parameter group that the hierarchical video decoding device 1 refers to in order to determine a decoding method of the target slice.
  • Slice type designation information (slice_type) for designating a slice type is an example of an encoding parameter included in the slice header SH.
  • I slice that uses only intra prediction at the time of encoding (2) P slice that uses unidirectional prediction or intra prediction at the time of encoding, (3) B-slice using unidirectional prediction, bidirectional prediction, or intra prediction at the time of encoding may be used.
  • the slice header SH may include a reference to the picture parameter set PPS (pic_parameter_set_id) and a reference to the adaptive parameter set APS (aps_id) included in the sequence layer.
  • the slice header SH includes a filter parameter FP that is referred to by an adaptive filter included in the hierarchical video decoding device 1.
  • the filter parameter FP can include, for example, a filter coefficient group.
  • the filter coefficient group includes (1) filter specification information for specifying the filter shape and the number of taps, and (2) filter coefficients fcoeff [0] to fcoeff [NT-1] (NT is a filter coefficient included in the filter coefficient group. ), And (3) offset.
  • the filter parameters may include a filter on / off flag that is a flag for switching whether or not to apply a filter in units of filter units, which will be described later, and a filter index that switches a filter parameter to be used in units of filter units.
  • the filter parameters can be switched in units of filter units corresponding to a predetermined image area.
  • a filter unit that is a block for specifying a predetermined image region may be a CTB, or a group of a plurality of CTBs, a CU, or a PU, but the same filter parameter is used in the filter unit.
  • the filter unit is a CTB unit.
  • the filter parameters are composed of filter parameters that are encoded in units of filter units in the tree block layer and below and filter parameters that are commonly used by a plurality of filter units.
  • the filter parameters encoded in units of filter units are encoded in units of CTB, CU, and PU, and the filter parameters used in the plurality of filter units are the slice header SH, the sequence parameter set (SPS) as described above. ), Picture parameter set (PPS), adaptive parameter set (APS), and the like.
  • Tree block layer a set of data referred to by the hierarchical video decoding device 1 for decoding a processing target tree block TBLK (hereinafter also referred to as a target tree block) is defined.
  • the tree block may be referred to as a coding tree block (CTB) or a maximum coding unit (LCU).
  • the tree block TBLK includes a tree block header TBLKH and coding unit information CU1 to CUNL (NL is the total number of coding unit information included in the tree block TBLK).
  • NL is the total number of coding unit information included in the tree block TBLK.
  • the tree block TBLK is divided into partitions for specifying a block size for each process of intra prediction or inter prediction and conversion.
  • the above partition of the tree block TBLK is divided by recursive quadtree partitioning.
  • the tree structure obtained by this recursive quadtree partitioning is hereinafter referred to as a coding tree.
  • a partition corresponding to a leaf that is a node at the end of the coding tree is referred to as a coding node.
  • the encoding node is also referred to as an encoding unit (CU).
  • the coding node may be called a coding block (CB: Coding Block).
  • coding unit information (hereinafter referred to as CU information)
  • CU1 to CUNL is information corresponding to each coding node (coding unit) obtained by recursively dividing the tree block TBLK into quadtrees.
  • the root of the coding tree is associated with the tree block TBLK.
  • the tree block TBLK is associated with the highest node of the tree structure of the quadtree partition that recursively includes a plurality of encoding nodes.
  • each encoding node is half the size of the encoding node to which the encoding node directly belongs (that is, the partition of the node one layer higher than the encoding node).
  • the size of the tree block TBLK and the size that each coding node can take are the size specification information of the minimum coding node and the maximum coding node included in the sequence parameter set SPS of the hierarchical coding data DATA # C.
  • the minimum coding node hierarchy depth difference For example, when the size of the minimum coding node is 8 ⁇ 8 pixels and the difference in the layer depth between the maximum coding node and the minimum coding node is 3, the size of the tree block TBLK is 64 ⁇ 64 pixels.
  • the size of the encoding node can take any of four sizes, namely, 64 ⁇ 64 pixels, 32 ⁇ 32 pixels, 16 ⁇ 16 pixels, and 8 ⁇ 8 pixels.
  • the tree block header TBLKH includes an encoding parameter referred to by the hierarchical video decoding device 1 in order to determine a decoding method of the target tree block. Specifically, as shown in FIG. 3D, tree block division information SP_TBLK that specifies a division pattern of the target tree block into each CU, and a quantization parameter difference that specifies the size of the quantization step ⁇ qp (qp_delta) is included.
  • the tree block division information SP_TBLK is information representing a coding tree for dividing the tree block. Specifically, the shape and size of each CU included in the target tree block, and the position in the target tree block Is information to specify.
  • the tree block division information SP_TBLK may not explicitly include the shape or size of the CU.
  • the tree block division information SP_TBLK may be a set of flags indicating whether the entire target tree block or a partial region of the tree block is to be divided into four. In that case, the shape and size of each CU can be specified by using the shape and size of the tree block together.
  • the quantization parameter difference ⁇ qp is a difference qp ⁇ qp ′ between the quantization parameter qp in the target tree block and the quantization parameter qp ′ in the tree block encoded immediately before the target tree block.
  • CU layer In the CU layer, a set of data referred to by the hierarchical video decoding device 1 ′ for decoding a CU to be processed (hereinafter also referred to as a target CU) is defined.
  • the encoding node is a node at the root of a prediction tree (PT) and a transformation tree (TT).
  • PT prediction tree
  • TT transformation tree
  • the encoding node is divided into one or a plurality of prediction blocks, and the position and size of each prediction block are defined.
  • the prediction block is one or a plurality of non-overlapping areas constituting the encoding node.
  • the prediction tree includes one or a plurality of prediction blocks obtained by the above division.
  • Prediction processing is performed for each prediction block.
  • a prediction block that is a unit of prediction is also referred to as a prediction unit (PU).
  • PU partitioning There are roughly two types of partitioning in the prediction tree (hereinafter abbreviated as PU partitioning): intra prediction and inter prediction.
  • the division method is 2N ⁇ 2N (the same size as the encoding node), 2N ⁇ N, 2N ⁇ nU, 2N ⁇ nD, N ⁇ 2N, nL ⁇ 2N, nR ⁇ 2N, and N XN etc.
  • the types of PU division will be described later with reference to the drawings.
  • the encoding node is divided into one or a plurality of transform blocks, and the position and size of each transform block are defined.
  • the transform block is one or a plurality of non-overlapping areas constituting the encoding node.
  • the conversion tree includes one or a plurality of conversion blocks obtained by the above division.
  • the division in the transformation tree includes the one in which an area having the same size as the encoding node is assigned as the transformation block, and the one in the recursive quadtree division as in the above-described division of the tree block.
  • transform processing is performed for each conversion block.
  • the transform block which is a unit of transform is also referred to as a transform unit (TU).
  • the CU information CU specifically includes a skip flag SKIP, prediction tree information (hereinafter abbreviated as PT information) PTI, and conversion tree information (hereinafter abbreviated as TT information). Include TTI).
  • PT information prediction tree information
  • TT information conversion tree information
  • the skip flag SKIP is a flag indicating whether or not the skip mode is applied to the target PU.
  • the value of the skip flag SKIP is 1, that is, when the skip mode is applied to the target CU, A part of the PT information PTI and the TT information TTI in the CU information CU are omitted. Note that the skip flag SKIP is omitted for the I slice.
  • the PT information PTI is information related to a prediction tree (hereinafter abbreviated as PT) included in the CU.
  • PT prediction tree
  • the PT information PTI is a set of information regarding each of one or a plurality of PUs included in the PT, and is referred to when the predicted image is generated by the hierarchical video decoding device 1 ′.
  • the PT information PTI includes prediction type information PType and prediction information PInfo.
  • Prediction type information PType is information that specifies whether intra prediction or inter prediction is used as a prediction image generation method for the target PU.
  • the prediction information PInfo includes intra prediction information PP_Intra or inter prediction information PP_Inter depending on which prediction method the prediction type information PType specifies.
  • a PU to which intra prediction is applied is also referred to as an intra PU
  • a PU to which inter prediction is applied is also referred to as an inter PU.
  • Inter prediction information PP_Inter includes an encoding parameter that is referred to when the hierarchical video decoding device 1 generates an inter prediction image by inter prediction. More specifically, the inter prediction information PP_Inter includes inter PU division information that specifies a division pattern of the target CU into each inter PU, and inter prediction parameters for each inter PU.
  • the intra prediction information PP_Intra includes an encoding parameter that is referred to when the hierarchical video decoding device 1 generates an intra predicted image by intra prediction. More specifically, the intra prediction information PP_Intra includes intra PU division information that specifies a division pattern of the target CU into each intra PU, and intra prediction parameters for each intra PU.
  • the intra prediction parameter is a parameter for designating an intra prediction method (prediction mode) for each intra PU.
  • the PU partition information may include information specifying the shape, size, and position of the target PU. Details of the PU partition information will be described later.
  • the TT information TTI is information regarding a conversion tree (hereinafter abbreviated as TT) included in the CU.
  • TT conversion tree
  • the TT information TTI is a set of information regarding each of one or a plurality of TUs included in the TT, and is referred to when the hierarchical video decoding device 1 decodes residual data.
  • a TU may be referred to as a block.
  • the TT information TTI includes TT division information SP_TT that designates a division pattern for each transform block of the target CU, and quantized prediction residuals QD1 to QDNT (NT Includes the total number of blocks included in the target CU).
  • TT division information SP_TT is information for determining the shape and size of each TU included in the target CU and the position in the target CU.
  • the TT division information SP_TT can be realized from information (split_transform_unit_flag) indicating whether or not the target node is divided and information (trafoDepth) indicating the division depth.
  • each TU obtained by the division can have a size from 32 ⁇ 32 pixels to 4 ⁇ 4 pixels.
  • Each quantized prediction residual QD is encoded data generated by the hierarchical video encoding device 2 'performing the following processes 1 to 3 on a target block that is a processing target block.
  • Process 1 The prediction residual obtained by subtracting the prediction image from the encoding target image is subjected to frequency conversion (for example, DCT conversion (Discrete Cosine Transform) and DST conversion (Discrete Sine Transform));
  • Process 2 Quantize the transform coefficient obtained in Process 1;
  • the prediction information PInfo includes an inter prediction parameter or an intra prediction parameter.
  • the inter prediction parameters include, for example, a merge flag (merge_flag), a merge index (merge_idx), an estimated motion vector index (mvp_idx), a reference image index (ref_idx), an inter prediction flag (inter_pred_flag), and a motion vector residual (mvd). Is mentioned.
  • examples of the intra prediction parameters include an estimated prediction mode flag, an estimated prediction mode index, and a residual prediction mode index.
  • the PU partition type specified by the PU partition information includes the following eight patterns in total, assuming that the size of the target CU is 2N ⁇ 2N pixels. That is, 4 symmetric splittings of 2N ⁇ 2N pixels, 2N ⁇ N pixels, N ⁇ 2N pixels, and N ⁇ N pixels, and 2N ⁇ nU pixels, 2N ⁇ nD pixels, nL ⁇ 2N pixels, And four asymmetric splittings of nR ⁇ 2N pixels.
  • N 2m (m is an arbitrary integer of 1 or more).
  • an area obtained by dividing the target CU is also referred to as a partition.
  • 4 (a) to 4 (h) specifically show the positions of the PU partition boundaries in the CU for each partition type.
  • FIG. 4A shows a 2N ⁇ 2N PU partition type that does not perform CU partitioning.
  • FIGS. 4B, 4C, and 4D show the partition shapes when the PU partition types are 2N ⁇ N, 2N ⁇ nU, and 2N ⁇ nD, respectively.
  • 4 (e), (f), and (g) show the shapes of the partitions when the PU partition types are N ⁇ 2N, nL ⁇ 2N, and nR ⁇ 2N, respectively.
  • FIG. 4H shows the shape of the partition when the PU partition type is N ⁇ N.
  • the PU partition types shown in FIGS. 4A and 4H are also referred to as square partitions based on the shape of the partition.
  • the PU partition types shown in FIGS. 4B to 4G are also referred to as non-square partitioning.
  • the numbers assigned to the respective regions indicate the region identification numbers, and the regions are processed in the order of the identification numbers. That is, the identification number represents the scan order of the area.
  • Partition type for inter prediction In the inter PU, seven types other than N ⁇ N (FIG. 4 (h)) are defined among the above eight division types. The six asymmetric partitions are sometimes called AMP (Asymmetric Motion Partition).
  • a specific value of N is defined by the size of the CU to which the PU belongs, and specific values of nU, nD, nL, and nR are determined according to the value of N.
  • a 128 ⁇ 128 pixel inter-CU includes 128 ⁇ 128 pixels, 128 ⁇ 64 pixels, 64 ⁇ 128 pixels, 64 ⁇ 64 pixels, 128 ⁇ 32 pixels, 128 ⁇ 96 pixels, 32 ⁇ 128 pixels, and 96 ⁇ It is possible to divide into 128-pixel inter PUs.
  • Partition type for intra prediction In the intra PU, the following two types of division patterns are defined.
  • the division patterns (a) and (h) can be taken in the example shown in FIG.
  • an 128 ⁇ 128 pixel intra CU can be divided into 128 ⁇ 128 pixel and 64 ⁇ 64 pixel intra PUs.
  • Enhancement layer For the enhancement layer encoded data, for example, a data structure substantially similar to the data structure shown in FIG. 3 can be adopted. However, in the encoded data of the enhancement layer, additional information can be added or parameters can be omitted as follows.
  • Information indicating hierarchical coding may be encoded in the SPS.
  • spatial scalability, temporal scalability, and SNR scalability hierarchy identification information may be encoded.
  • Filter information and filter on / off information can be encoded by a PPS, a slice header, a macroblock header, or the like.
  • a skip flag (skip_flag), a base mode flag (base_mode_flag), and a prediction mode flag (pred_mode_flag) may be encoded.
  • the CU type of the target CU is an intra CU, an inter CU, a skip CU, or a base skip CU.
  • Intra CU and skip CU can be defined in the same manner as in the HEVC method described above. For example, in the skip CU, “1” is set in the skip flag. If it is not a skip CU, “0” is set in the skip flag. In the intra CU, “0” is set in the prediction mode flag.
  • the inter CU may be defined as a CU that applies non-skip and motion compensation (MC).
  • MC non-skip and motion compensation
  • the base skip CU is a CU type that estimates CU or PU information from a reference layer.
  • “1” is set in the skip flag and “1” is set in the base mode flag.
  • the PU type of the target PU is an intra PU, an inter PU, a merge PU, or a base merge PU.
  • Intra PU, inter PU, and merge PU can be defined in the same manner as in the HEVC method described above.
  • the base merge PU is a PU type for estimating PU information from a reference layer. Further, for example, in the PT information PTI, a merge flag and a base mode flag may be encoded, and using these flags, it may be determined whether or not the target PU is a PU that performs base merge. That is, in the base merge PU, “1” is set to the merge flag and “1” is set to the base mode flag.
  • the motion vector information included in the enhancement layer the motion vector information that can be derived from the motion vector information included in the lower layer can be omitted from the enhancement layer.
  • the code amount of the enhancement layer can be reduced, so that the coding efficiency is improved.
  • the encoded data of the enhancement layer may be generated by an encoding method different from the encoding method of the lower layer. That is, the encoding / decoding process of the enhancement layer does not depend on the type of the lower layer codec.
  • the lower layer is, for example, MPEG-2 or H.264. It may be encoded by the H.264 / AVC format.
  • the reference layer parameters are converted to the corresponding parameters of the target layer or similar parameters, so that corresponding compatibility between the layers is achieved.
  • the reference layer parameters can keep.
  • a macroblock in the H.264 / AVC format can be interpreted as a CTB in HEVC.
  • the parameters described above may be encoded independently, or a plurality of parameters may be encoded in combination.
  • an index is assigned to the combination of parameter values, and the assigned index is encoded.
  • the encoding of the parameter can be omitted.
  • FIG. 1 is a functional block diagram showing a schematic configuration of the hierarchical video decoding device 1.
  • the hierarchical video decoding device 1 decodes the hierarchical encoded data DATA supplied from the hierarchical video encoding device 2 by the HEVC method, and generates a decoded image POUT # T of the target layer.
  • the hierarchical video decoding device 1 includes a NAL demultiplexing unit 11, a variable length decoding unit 12, a prediction parameter restoration unit 14, a texture restoration unit 15, a base decoding unit 13, and a filter parameter restoration unit 16. Prepare.
  • the NAL demultiplexing unit 11 demultiplexes hierarchically encoded data DATA transmitted in units of NAL units in NAL (Network Abstraction Layer).
  • NAL is a layer provided to abstract communication between a VCL (Video Coding Layer) and a lower system that transmits and stores encoded data.
  • VCL Video Coding Layer
  • VCL is a layer that performs video encoding processing, and encoding is performed in the VCL.
  • the lower system here is H.264. H.264 / AVC and HEVC file formats and MPEG-2 systems are supported. In the example shown below, the lower system corresponds to the decoding process in the target layer and the reference layer.
  • NAL a bit stream generated by VCL is divided into units called NAL units and transmitted to a destination lower system.
  • the NAL unit includes encoded data encoded by the VCL and a header for appropriately delivering the encoded data to the destination lower system. Also, the encoded data in each layer is stored in the NAL unit, is NAL-multiplexed, and is transmitted to the hierarchical video decoding device 1 '.
  • the NAL demultiplexing unit 11 demultiplexes the hierarchical encoded data DATA, and extracts the target layer encoded data DATA # T and the reference layer encoded data DATA # R. Further, the NAL demultiplexing unit 11 supplies the target layer encoded data DATA # T to the variable length decoding unit 12, and also supplies the reference layer encoded data DATA # R to the base decoding unit 13.
  • variable length decoding unit 12 performs a decoding process of information for decoding various syntax values from the binary included in the target layer encoded data DATA # T.
  • variable length decoding unit 12 decodes prediction information, encoding information, transform coefficient information, and filter parameter information from the encoded data DATA # T as follows.
  • variable length decoding unit 12 decodes prediction information regarding each CU or PU from the encoded data DATA # T.
  • the prediction information includes, for example, designation of a CU type or a PU type.
  • variable length decoding unit 12 decodes the PU partition information from the encoded DATA # T. In addition, in each PU, the variable length decoding unit 12 further converts motion information such as a reference image index RI, an estimated motion vector index PMVI, and a motion vector residual MVD, and mode information as encoded data DATA as prediction information. Decrypt from #T.
  • variable length decoding unit 12 when the CU is an intra CU, the variable length decoding unit 12 further includes (1) size designation information for designating the size of the prediction unit and (2) prediction index designation for designating the prediction index as the prediction information.
  • the intra prediction information including information is decoded from the encoded data DATA # T.
  • variable length decoding unit 12 decodes the encoded information from the encoded data DATA # T.
  • the encoded information includes information for specifying the shape, size, and position of the CU. More specifically, the encoding information includes tree block division information that specifies a division pattern of the target tree block into each CU, that is, the shape, size, and target tree block of each CU included in the target tree block. Contains information that specifies the position within.
  • variable length decoding unit 12 supplies the decoded prediction information and encoded information to the prediction parameter restoration unit 14.
  • variable length decoding unit 12 decodes the quantization prediction residual QD for each block and the quantization parameter difference ⁇ qp for the tree block including the block from the encoded data DATA # T.
  • the variable length decoding unit 12 supplies the decoded quantization prediction residual QD and the quantization parameter difference ⁇ qp to the texture restoration unit 15 as transform coefficient information.
  • variable length decoding unit 12 decodes the filter parameter information for deriving the filter parameter from the encoded data #T and supplies it to the filter parameter restoration unit 16.
  • the base decoding unit 13 decodes base decoding information, which is information on a reference layer that is referred to when decoding a decoded image corresponding to the target layer, from the reference layer encoded data DATA # R.
  • the base decoding information includes a base prediction parameter, a base transform coefficient, a base decoded image, and a filter parameter.
  • the base decoding unit 13 supplies the decoded base decoding information to the prediction parameter restoration unit 14 and the texture restoration unit 15.
  • the prediction parameter restoration unit 14 restores the prediction parameter using the prediction information and the base decoding information.
  • the prediction parameter restoration unit 14 supplies the restored prediction parameter to the texture restoration unit 15.
  • the prediction parameter restoration unit 14 can refer to the motion information stored in the frame memory 155 provided in the texture restoration unit 15 when restoring the prediction parameter.
  • the filter parameter restoration unit 16 derives a filter parameter using the filter parameter and supplies it to the texture restoration unit 15.
  • the texture restoration unit 15 generates a decoded image POUT # T using the transform coefficient information, the base decoding information, the prediction parameter, and the filter parameter, and outputs the decoded image POUT # T to the outside.
  • the texture restoration unit 15 stores information on the restored decoded image in a frame memory 155 provided therein.
  • FIG. 5 is a functional block diagram illustrating the configuration of the prediction parameter restoration unit 14.
  • the prediction parameter restoration unit 14 includes a prediction type selection unit 141, a switch 142, an intra prediction mode restoration unit 143, a motion vector candidate derivation unit 144, a motion information restoration unit 145, a merge candidate derivation unit 146, and A merge information restoration unit 147 is provided.
  • the prediction type selection unit 141 sends a switching instruction to the switch 142 according to the CU type or the PU type, and controls the prediction parameter derivation process. Specifically, it is as follows.
  • the prediction type selection unit 141 controls the switch 142 so that the prediction parameter can be derived using the intra prediction mode restoration unit 143.
  • the prediction type selection unit 141 uses the motion information restoration unit 145 to control the switch 142 so that a prediction parameter can be derived.
  • the prediction type selection unit 141 controls the switch 142 so that the prediction parameter can be derived using the merge information restoration unit 147.
  • the switch 142 supplies the prediction information to any of the intra prediction mode restoration unit 143, the motion information restoration unit 145, and the merge information restoration unit 147 in accordance with an instruction from the prediction type selection unit 141.
  • a prediction parameter is derived at a supply destination of the prediction information.
  • the intra prediction mode restoration unit 143 derives a prediction mode from the prediction information. That is, the intra prediction mode restoration unit 143 restores the prediction parameter in the prediction mode.
  • FIG. 6 shows the definition of the prediction mode.
  • 36 types of prediction modes are defined, and each prediction mode is specified by a number (intra prediction mode index) from “0” to “35”.
  • the following names are assigned to each prediction mode. That is, “0” is “Intra_Planar (planar prediction mode, plane prediction mode)”, “1” is “Intra DC (intra DC prediction mode)”, and “2” to “34” are “ “Intra Angular (direction prediction)”, and “35” is “Intra From Luma”.
  • “35” is unique to the color difference prediction mode, and is a mode for performing color difference prediction based on luminance prediction.
  • the color difference prediction mode “35” is a prediction mode using the correlation between the luminance pixel value and the color difference pixel value.
  • the color difference prediction mode “35” is also referred to as an LM mode.
  • the number of prediction modes (intraPredModeNum) is “35” regardless of the size of the target block.
  • the motion vector candidate derivation unit 144 uses the base decoding information to derive an estimated motion vector candidate by intra-layer motion estimation processing or inter-layer motion estimation processing.
  • the motion vector candidate derivation unit 144 supplies the derived motion vector candidates to the motion information restoration unit 145.
  • the motion information restoration unit 145 restores motion information related to each inter PU that is not merged. That is, the motion information restoring unit 145 restores motion information as a prediction parameter.
  • the motion information restoration unit 145 restores motion information from the prediction information when the target PU is an inter CU and an inter PU. More specifically, the motion information restoration unit 145 acquires a motion vector residual (mvd), an estimated motion vector index (mvp_idx), an inter prediction flag (inter_pred_flag), and a reference image index (refIdx). Then, based on the value of the inter prediction flag, a reference image list use flag is determined for each of the reference image list L0 and the reference image list L1.
  • mvd motion vector residual
  • mvp_idx estimated motion vector index
  • inter_pred_flag inter prediction flag
  • refIdx reference image index
  • the motion information restoration unit 145 derives an estimated motion vector based on the value of the estimated motion vector index, A motion vector is derived based on the motion vector residual and the estimated motion vector.
  • the motion information restoration unit 145 outputs the motion vector (motion compensation parameter) together with the derived motion vector, the reference image list use flag, and the reference image index.
  • the merge candidate derivation unit 146 derives various merge candidates using the decoded motion information supplied from the frame memory 155 and / or the base decoding information supplied from the base decoding unit 13.
  • the merge candidate derivation unit 146 supplies the derived merge candidates to the merge information restoration unit 147.
  • the merge information restoration unit 147 restores motion information regarding each PU that is merged within a layer or between layers. That is, the motion information restoring unit 145 restores motion information as a prediction parameter.
  • the merge information restoration unit 147 causes the merge candidate deriving unit 146 to select a merge index (included in the prediction information) from the merge candidate list ( The motion information is restored by deriving the motion compensation parameter corresponding to merge_idx).
  • FIG. 8 is a functional block diagram illustrating the configuration of the texture restoration unit 15.
  • the texture restoration unit 15 includes an inverse orthogonal transform / inverse quantization unit 151, a texture prediction unit 152, an adder 153, a loop filter unit 154, and a frame memory 155.
  • the inverse orthogonal transform / inverse quantization unit 151 (1) inversely quantizes the quantized prediction residual QD included in the transform coefficient information supplied from the variable length decoding unit 12, and (2) obtained by inverse quantization.
  • the DCT coefficient is subjected to inverse orthogonal transformation (for example, DCT (Discrete Cosine Transform) transformation), and (3) the prediction residual D obtained by the inverse orthogonal transformation is supplied to the adder 153.
  • inverse orthogonal transformation for example, DCT (Discrete Cosine Transform) transformation
  • the inverse orthogonal transform / inverse quantization unit 151 derives a quantization step QP from the quantization parameter difference ⁇ qp included in the transform coefficient information.
  • the texture prediction unit 152 refers to the base decoded image included in the base decoding information or the decoded decoded image stored in the frame memory according to the prediction parameter, and generates a predicted image.
  • the texture prediction unit 152 includes an inter prediction unit 152A and an intra prediction unit 152B.
  • the inter prediction unit 152A generates a prediction image related to each inter prediction partition by inter prediction. Specifically, the inter prediction unit 152A generates a prediction image from the reference image using the motion information supplied as a prediction parameter from the motion information restoration unit 145 or the merge information restoration unit 147.
  • the intra prediction unit 152B generates a prediction image related to each intra prediction partition by intra prediction. Specifically, the intra prediction unit 152B generates a prediction image from a decoded image or base decoded image that has been decoded in the target partition, using the prediction mode supplied as a prediction parameter from the intra prediction mode restoration unit 143.
  • the texture prediction unit 152 supplies the prediction image generated by the inter prediction unit 152A and the intra prediction unit 152B to the adder 153.
  • the adder 153 generates a decoded image by adding the prediction image generated by the texture prediction unit 152 and the prediction residual D supplied from the inverse orthogonal transform / inverse quantization unit 151.
  • the loop filter unit 154 performs a deblocking process, a filter process using an adaptive offset filter, and an adaptive loop filter on the decoded image supplied from the adder 153.
  • the frame memory 155 stores the decoded image that has been filtered by the loop filter unit 154.
  • FIG. 9 is a functional block diagram illustrating the configuration of the base decoding unit 13.
  • the base decoding unit 13 includes a variable length decoding unit 131, a base prediction parameter restoration unit 132, a base transform coefficient restoration unit 133, and a base texture restoration unit 134.
  • variable length decoding unit 131 performs a decoding process of information for decoding various syntax values from the binary included in the reference layer encoded data DATA # R.
  • variable length decoding unit 131 decodes prediction information and transform coefficient information from the encoded data DATA # R. Since the syntax of the prediction information and transform coefficient decoded by the variable length decoding unit 131 is the same as that of the variable length decoding unit 12, the detailed description thereof is omitted here.
  • variable length decoding unit 131 supplies the decoded prediction information to the base prediction parameter restoring unit 132, and supplies the decoded transform coefficient information to the base transform coefficient restoring unit 133.
  • the base prediction parameter restoration unit 132 restores the base prediction parameter based on the prediction information supplied from the variable length decoding unit 131.
  • the method by which the base prediction parameter restoration unit 132 restores the base prediction parameter is the same as that of the prediction parameter restoration unit 14, and thus detailed description thereof is omitted here.
  • the base prediction parameter restoration unit 132 supplies the restored base prediction parameter to the base texture restoration unit 134 and outputs it to the outside.
  • the base transform coefficient restoration unit 133 restores transform coefficients based on the transform coefficient information supplied from the variable length decoding unit 131.
  • the method by which the base transform coefficient restoration unit 133 restores the transform coefficient is the same as that of the inverse orthogonal transform / inverse quantization unit 151, and thus detailed description thereof is omitted here.
  • the base conversion coefficient restoration unit 133 supplies the restored base conversion coefficient to the base texture restoration unit 134 and outputs the same to the outside.
  • the base texture restoration unit 134 uses the base prediction parameters supplied from the base prediction parameter restoration unit 132 and the base transform coefficients supplied from the base transform coefficient restoration unit 133 to generate a decoded image. Specifically, the base texture restoration unit 134 performs the same texture prediction as the texture prediction unit 152 based on the base prediction parameter, and generates a predicted image. Further, the base texture restoration unit 134 generates a prediction residual based on the base conversion coefficient, and generates a base decoded image by adding the generated prediction residual and a predicted image generated by texture prediction.
  • the base texture restoration unit 134 may perform filter processing similar to the loop filter unit 154 on the base decoded image. Further, the base texture restoration unit 134 may include a frame memory for storing the decoded base decoded image, and may refer to the decoded base decoded image stored in the frame memory in texture prediction. Good.
  • FIG. 10 is a diagram illustrating a configuration of the inter prediction unit 152A.
  • the inter prediction unit 152A includes a motion compensation filter unit 1521, an inter layer filter unit 1522, and a selection synthesis unit 1523.
  • Prediction information PInfo (motion compensation parameter) and filter parameters are input to the inter prediction unit 152A.
  • the motion compensation parameters are the upper left coordinates (xP, yP) of the PU, the width and height of the PU, which is the size of the PU, nPSW, nPSH, the reference image list use flag predFlagL0 related to the reference image related to the reference image list L0, the reference image A reference image index RefIdxL0 that is an index for specifying a reference image in the list, a motion vector MvL0 [], a reference image list use flag predFlagL1 related to the reference image related to the reference image list L1, and a reference image in the reference image list Is composed of a reference image index RefIdxL1, which is an index for specifying the motion vector, and a motion vector MvL1 [].
  • the motion compensation filter unit 1521 performs motion compensation on the target layer image based on the target layer image and the motion compensation parameter input to the inter prediction unit 152A, and generates a motion compensated image. Specifically, when the reference image list use flag predFlagLX indicates that the prediction list is used for each of the reference image lists L0 and L1, that is, when it is other than -1, it is specified by the reference image index RefIdxLX. When the reference image indicates the target layer image, the motion compensation filter unit 1521 generates a motion compensation image.
  • an image necessary for the motion compensation filter determined by the PU size is extracted from a position shifted by the motion vector MvLX [] starting from the upper left coordinates (xP, yP) of the PU, and the motion vector MvLX [
  • a filter process is performed to generate a motion compensated image.
  • a separation filter of 8 horizontal taps and 8 vertical taps is used.
  • the inter layer filter unit 1522 generates a layer prediction image from the reference layer image based on the reference layer image (reference layer image), the motion compensation parameter, and the filter parameter input to the inter prediction unit 152A. Specifically, when the reference image list use flag predFlagLX indicates that the prediction list is used for each of the reference image lists L0 and L1, that is, when the prediction image is other than 0, it is specified by the reference image index RefIdxLX. When the reference image indicates a reference layer image, the inter-layer filter unit 1522 generates a layer prediction image.
  • Whether or not the reference image is a reference layer image can be determined by, for example, the following method.
  • a determination method 1 when the layer ID of the reference image is different from the layer ID of the target layer, it is determined as a reference layer image.
  • layerID is the layer ID of the target image
  • ReflayerID (X, Y) is the layer ID of the reference image specified by the reference image index X and the reference image list Y.
  • the layer ID is an identifier for identifying an image layer. Usually, 0 is assigned to the base layer and non-zero is assigned to other than the base layer.
  • POC is the POC of the target image
  • RefPOC (X, Y) is the POC of the reference image specified by the reference image index X and the reference image list Y.
  • POC is a value indicating the order of image display times. When the display times are the same, they have the same POC.
  • the reference layer image is the reference layer image.
  • ViewID is the view ID of the target image
  • RefViewID (X, Y) is the view ID of the reference image specified by the reference image index X and the reference image list Y.
  • the view ID is information indicating the viewpoint of an image, and is assigned and roughened when the image is view scalable encoded. Normally, 0 is assigned to the base view, and non-zero is assigned to other than the base view.
  • a determination method 4 when the reference image is a long-time reference picture, it is determined that the reference layer image is a reference layer image.
  • the judgment formula is as follows: LongTermPic (RefIdxLX, ListX)
  • LongTermPic (X, Y) is a function that is true when the reference image specified by the reference image index X and the reference image list Y is a long-time reference picture.
  • the motion compensation image and the layer prediction image are collectively referred to as an LX prediction image.
  • a predicted image using the reference image of the L0 reference list is called an L0 predicted image
  • a predicted image using the reference image of the L1 reference list is called an L1 predicted image.
  • the selection synthesis unit 1523 generates a prediction image using one or more LX prediction images generated from the motion compensation filter unit 1521 and the interlayer filter unit 1522, and outputs the prediction image as the prediction image of the inter prediction unit 152A.
  • the prediction parameter motion compensation parameter
  • one LX prediction image is output, and in the case of bi-prediction, two LX prediction images are combined and output by weighted prediction or average.
  • the LX prediction image with the predFlagLX effective is directly output as the prediction image. To do.
  • a prediction image is generated from a simple average or a weighted average of the L0 prediction image and the L1 prediction image, and the prediction of the inter prediction unit 152A Output as an image.
  • FIG. 11A is a diagram illustrating a configuration of the interlayer filter unit 1522.
  • the interlayer filter unit 1522 includes a filter unit 15221 and an interlayer filter control unit 15222.
  • the interlayer filter unit 1522 starts from the upper left coordinates (xP, yP) of the PU from the reference image of the reference layer image specified by the reference image index RefIdxLX in the layer image extraction unit of the interlayer filter unit 1522 (not shown).
  • An image necessary for the layer prediction image determined from the PU size is extracted from the position shifted by the motion vector MvLX [], and is output to the filter unit 15221 and the interlayer filter control unit 15222.
  • the filter unit 15221 uses the filter parameters and the reference layer image input to the inter-layer filter unit 1522 to perform the reference layer image filtering process.
  • As the filter process an existing filter process controlled by a filter parameter can be applied.
  • the case of the adaptive offset filter and the case of the adaptive spatial filter will be described.
  • ⁇ Adaptive offset filter (sample adaptive filter, SAO)>
  • the filter parameter is a filter parameter of an adaptive offset filter based on pixel classification
  • an offset type SaoTypeIdx and an offset SaoOffsetVal are input as filter parameters
  • sao_band_position is input, and the following processing is performed.
  • the offset type SaoTypeIdx is an edge offset
  • an edge index edgeIdx indicating the classification of the filter target pixel is derived from the relationship between the pixels around the filter target pixel and the filter target pixel, and the offset indicated by the edge index edgeIdx is determined as the filter target pixel. Add to. Specifically, the following formula is calculated.
  • fltPicture [xC + i] [yC + j] recPicture [xC + i] [yC + j] + SaoOffsetVal [edgeIdx]
  • recPicture is the pre-filter image
  • fltPicture is the post-filter image
  • xC + i and xC + j are the coordinates of the pixel to be filtered
  • xC and yC are the upper left coordinates of the filter unit
  • i and j are the relative coordinates from the upper left of the filter unit.
  • SaoOffsetVal [Idx] is a variable that represents the magnitude of the offset to be added from the index Idx that indicates the classification result of the filter target pixel. Although this variable may differ for every filter unit, for example, it takes the following values.
  • SaoOffsetVal [] ⁇ 0, -1, 1, 2 ⁇
  • calculates the sum of 0 to 1 for k.
  • Sign3 (x) is a function that takes 1 when x is positive, 0 when 0, and -1 when negative.
  • hPos and vPos are tables indicating surrounding pixels used for classification and relative coordinates from the filter target image.
  • a band index bandIdx indicating the classification of the filter target pixel is derived from the pixel of the filter target pixel, and the offset indicated by the band index bandIdx is added to the filter target pixel. Specifically, the following formula is calculated.
  • ⁇ Adaptive spatial filter> When the filter parameter is a filter parameter of an adaptive spatial filter that is a spatial filter of pixels around the pixel to be filtered, N filter coefficients fcoeff [0] to fcoeff [N-1] are input as the filter parameters. Then, the filter processing is performed from the product sum of the filter target pixel and the surrounding pixels and the filter coefficient (weight coefficient). Specifically, the following filter processing is performed.
  • fltPicture [xC + i] [yC + j] (( ⁇ fcoeff [ii] * recPicture [xC + i + dx [ii]] [yC + j + dy [ii]]) + fcoeff [N-1] + 128 )
  • recPicture is the pre-filter image
  • fltPicture is the post-filter image
  • is the sum of 0 to N-2 for ii.
  • dx [] and dy [] are tables indicating relative coordinates from the pixel to be filtered of surrounding pixels that are reference pixels of the adaptive spatial filter.
  • fcoeff [0] to fcoeff [N-2] are weighting factors
  • the last fcoeff [N-1] is an offset component.
  • an offset component is used in this example, a configuration using only a weighting coefficient that uses an offset component for filter parameters and filter processing may be used.
  • a predetermined table for example, the following table layer_filter_index_table and table reference for switching the filter index at the position of the pixel value of the target layer can be used.
  • pic_width and pic_height are the width and height of the target layer image.
  • the inter-layer filter unit that performs the filtering process on the reference layer image and generates the predicted image does not have an independent scaling unit
  • the width of the reference layer image When the height is different from the width and height of the target layer image, scaling is performed by the adaptive spatial filter of the filter unit 15221.
  • a pixel at a position to be acquired is derived from a plurality of pixels determined as filter input pixels. For example, when the width of the reference layer image is half that of the target layer image, double enlargement processing is required.
  • the pixel at the decimal pixel position of the reference layer image corresponding to the filter target pixel on the target layer image can be obtained by the product sum of the pixels of the surrounding reference layer image and the filter coefficient.
  • the weighting coefficient is 128, but specifically, the filter coefficient fcoeff included in the filter parameter is used.
  • M sets of filter coefficients that are different for each decimal pixel position on the reference layer image are used as filter parameters.
  • aifPicture [xC + i] [yC + j] (( ⁇ fcoeff [jj] [ii] * refPicture [xC '+ i + dx [ii]] [yC' + j + dy [ii]]) + fcoeff [ N-1] + 128)
  • jj is an index (filter index) for designating a set of filter coefficients.
  • the coordinates xC ′ and yC ′ on the reference layer are values derived by calculation or table lookup from the pixel value positions xC and yC on the target layer.
  • & indicates a logical product
  • indicates a left shift.
  • the inter-layer filter control unit 15222 uses a motion compensation parameter input to the inter-layer filter unit 1522 or a reference layer image before filtering using the motion compensation parameter and the filter parameter, or is an output of the filter unit 15221. Select whether to use a filtered reference layer image.
  • FIG. 12 is a flowchart showing an operation example of the interlayer filter control unit 15222.
  • the interlayer filter control unit 15222 determines whether the motion compensation parameter satisfies a predetermined condition (S201). If it is determined that the predetermined condition is satisfied (YES in S201), a filtered reference layer image is selected (S203). If it is not determined that the predetermined condition is satisfied (NO in S201), a reference layer image is selected (S202).
  • the interlayer filter processing is applied only when the motion compensation parameter satisfies the predetermined condition (interlayer filter on), and otherwise, the interlayer filter processing is not applied (interlayer filter on). The operation of layer filter off) can be realized.
  • the inter-layer filter control unit 15222 in the texture restoration unit 15 determines that the inter-layer filter processing is applied (inter-layer filter on) when the filter_enable_flag is 1, and when the filter_enable_flag is 0 By determining that the layer filter process is not applied (interlayer filter off), the filter process is switched.
  • FIG. 13 is a flowchart showing another example of operation of the interlayer filter control unit 15222.
  • FIG. 13 performs processing (S200) for determining whether or not the filter on / off flag included in the filter parameter is on.
  • the filter parameter includes an on / off flag indicating whether or not to apply the interlayer filter.
  • the filter parameter includes a filter on / off flag
  • application / non-application of the interlayer filter is switched by the filter on / off flag and the motion compensation parameter.
  • the filter on / off flag is off, the interlayer filter is not applied.
  • the on / off flag it is determined whether or not the motion compensation parameter satisfies a predetermined condition.
  • an interlayer filter is applied. Specifically, when the filter on / off flag is off, the reference layer image is selected (S202). In other cases, it is determined whether or not the motion compensation parameter satisfies a predetermined condition (S201). Thereafter, the processing is as shown in FIG.
  • FIG. 5 shows a case where the predetermined condition is the size of the prediction unit.
  • the inter-layer flag is turned on only when the size of the prediction unit is equal to or larger than the predetermined size. More specifically, it is determined whether the prediction unit is a predetermined size or more (S201B) as in the following equation.
  • PSW + PWH ⁇ TH
  • the size of the prediction unit used for this determination is not limited to the sum of the width and height of the prediction unit, but may be a product of width and height. Further, the operation may be such that the interlayer flag is turned off when the size is equal to or larger than the predetermined size, and the interlayer flag is turned on when the size is smaller than the predetermined size.
  • the inter layer filter when the prediction unit is smaller than the predetermined size, the inter layer filter is not applied, and thus the processing load associated with the inter layer filter can be reduced.
  • the inter-layer filter process is heavier than a normal motion compensation filter because the operation of the adaptive offset filter and the filter coefficient (weight coefficient) in the adaptive spatial filter are variable depending on the filter parameter. There is an effect that this processing can be reduced.
  • the operation of the interlayer filter can be switched in accordance with the existing conditions without explicitly encoding the flag as a filter on / off flag. With this configuration, since the code amount of the filter on / off flag is reduced and only a region having a high filter effect can be filtered, the encoding efficiency is improved.
  • FIG. 14B shows a case where the predetermined condition is single prediction.
  • the inter-layer flag is turned on only when the motion compensation parameter of the prediction unit is single prediction. More specifically, whether or not the motion compensation parameter of the prediction unit is single prediction is determined by the following formula (S201B).
  • predFlagL0 + predFlagL1 ⁇ 2 the process of turning off the inter-layer flag when the motion compensation parameter is bi-prediction and turning on the inter-layer flag when uni-prediction is the same operation.
  • the processing load accompanying the inter-layer filter can be reduced.
  • two LX prediction images are generated by a motion compensation filter or an interlayer filter, and a process for synthesizing two LX images is required, which increases the processing load.
  • this configuration has an effect of reducing processing when the processing load is the highest.
  • the operation of the interlayer filter can be switched in accordance with the existing conditions without explicitly encoding the flag as a filter on / off flag.
  • a blurred prediction image is often generated by combining two LX prediction images (effect of a low-pass filter).
  • the interlayer filter also has the effect of a low-pass filter, resulting in a blurred predicted image. If two low-pass filters are applied, more blur will occur. In the case of bi-prediction, the image quality of the predicted image is higher when the interlayer filter is turned off. Therefore, with this configuration, since the code amount of the filter on / off flag is reduced and only a region having a high filter effect can be filtered, the coding efficiency is improved.
  • FIG. 15A shows a case where the predetermined condition is whether or not a predetermined reference image list is used.
  • the inter-layer flag is turned on only when the motion compensation parameter of the prediction unit uses a predetermined reference image list (LX in the figure is L0 or L1). More specifically, whether or not the motion compensation parameter is LX prediction is determined (S201C) using the following equation.
  • the determination as to whether the image is a specific reference image list is not limited to L0 but may be L1 for the following expression.
  • the operation of the interlayer filter can be switched in accordance with the existing conditions without explicitly encoding the flag as a filter on / off flag.
  • a plurality of the determinations described above may be combined. Specifically, as shown in the following expression, when the prediction unit is a predetermined size or more and the single prediction is performed, an operation of turning on the interlayer filter may be performed.
  • && is a function that takes a logical product, and changes true when both the left side and the right side are true.
  • FIG. 11B is a diagram illustrating another configuration example of the interlayer filter unit 1522.
  • An interlayer filter unit 1522b of another configuration example illustrated in FIG. 11B is different from the configuration of FIG. 11A in that a scaling unit 15223 is included. Since the operation of the other components is the same as that of the interlayer filter unit 1522, the description thereof is omitted.
  • the interlayer filter unit 1522b starts from the upper left coordinates (xP, yP) of the PU from the reference image of the reference layer image specified by the reference image index RefIdxLX in the layer image extraction unit of the interlayer filter unit 1522 (not shown). From the position shifted by the motion vector MvLX [], an area necessary for generating the layer prediction image determined from the PU size is extracted and output to the scaling unit 15223.
  • the scaling unit 15223 performs an enlargement process on the extracted image and outputs it to the filter unit 15221 and the interlayer filter control unit 15222 if the scaling parameter indicates that scaling is performed according to the scaling parameter.
  • the scaling parameter is information indicating whether or not the width and height of the reference layer image are different from the width and height of the target layer image, and the width and height of the reference layer image are If the width and height are different, the scaling process is performed.
  • a prediction image is generated for a case where the width and height of the reference layer image are different from the width and height of the target layer image without using an applied spatial filter by the interlayer filter unit. It becomes the structure which can do.
  • FIG. 16 shows a case where the predetermined condition is whether or not to perform scaling.
  • the interlayer flag is turned on only when the prediction unit does not perform scaling.
  • the inter-layer filter when the prediction unit performs scaling, the inter-layer filter is not applied. Therefore, when the load for generating the predicted image from the reference layer image is heavy, such as scaling, the processing associated with the inter-layer filter The load can be reduced.
  • the operation of the interlayer filter can be switched in accordance with the existing conditions without explicitly encoding the flag as a filter on / off flag.
  • the code amount of the filter on / off flag is reduced and only a region having a high filter effect can be filtered, the coding efficiency is improved.
  • FIG. 18 is a diagram illustrating an operation of decoding encoded data including a filter parameter in the variable length decoding unit 12.
  • the variable length decoding unit 12 performs the following processing in each CTB. First, in order to indicate that the filter parameter is not decoded in the filter unit, a filter encoding flag filter_coded_flag is set to 0 (S301). Subsequently, a CU loop is started (S302). The CU loop is performed by sequentially processing all the CUs included in the CTB. Subsequently, a PU loop is started (S303).
  • the PU loop is performed by sequentially processing all PUs included in the CU.
  • information corresponding to the PU is decoded from the prediction information PInfo excluding the filter parameter (S304). For example, a merge index, a merge flag, an inter prediction flag, a reference reference index, an estimated motion vector index, and a motion vector residual are decoded.
  • the reference layer image is used to generate a predicted image (S306). Specifically, it is determined whether or not the reference image indicated by the reference reference index is a reference layer. judge. In the case of the reference layer, the reference layer image is used as the predicted image.
  • the filter parameter of the filter unit is changed to decoding (transition to S306), and when the reference layer image is not used for generation of a prediction image (S306). NO), the decoding of the filter parameters of the filter unit is omitted (transition to S310).
  • the reference layer image is used to generate a predicted image, it is determined whether or not to apply an inter-layer filter (S306). The details of this determination are as follows: filter_enable_flag is set to 1 when the motion compensation parameter satisfies a predetermined condition as described above.
  • the filter parameter of the filter unit is decoded (S308).
  • decoding of the filter parameter of the filter unit is omitted.
  • S310 is the end of the loop in PU units
  • S311 is the end of the loop in CU units.
  • variable length decoding unit 12 in the prediction unit included in the filter unit, the filter parameter of the filter unit is decoded in the prediction unit to which the interlayer filter is first applied.
  • the decoding of the filter parameter of the filter unit is omitted in the prediction unit to which the interlayer filter is applied after the second included in the filter unit. That the information of the filter unit is decoded only in the first PU to which the inter-layer filter is applied, that is, when the inter-layer filter is turned off in all prediction units included in the filter unit, the filter Since the filter parameter of the unit is not encoded, an effect of reducing the code amount of the filter parameter can be obtained.
  • FIG. 19 is a diagram illustrating another example of the operation of decoding the encoded data including the filter parameter in the variable length decoding unit 12. Steps having the same numbers as those in FIG. 18 perform the same operations, and thus description thereof is omitted.
  • the operation from S304 to S305 is the same. If YES in S305, that is, if the reference layer image is used to generate a predicted image, it is further determined whether or not the filter parameter of the filter unit has been decoded (S307 '). If the filter parameter of the filter unit has been decoded (YES in S307 '), the filter parameter of the filter unit is decoded (S308). Conversely, when the filter parameter has already been decoded in the filter unit (NO in S307 '), the decoding of the filter parameter of the filter unit is omitted.
  • FIG. 20 is a diagram illustrating still another example of the operation of decoding the encoded data including the filter parameter in the variable length decoding unit 12.
  • This example has the same basic individual operations as the example of FIG. 19, but the processing procedure is different.
  • the processing of S305, S307 ', S308, and S309 is performed between the end of the PU unit loop (S310) and the end of the CU unit loop (S311).
  • the filter parameter of the filter unit is decoded not in units of PU but in units of CU.
  • the variable length decoding unit 12 decodes the filter parameter of the filter unit in the prediction unit included in the filter unit in the prediction unit in which the reference layer is first applied to the generation of the predicted image.
  • the filter parameter of the filter unit is not encoded, so that an effect of reducing the code amount of the filter parameter can be obtained.
  • it is not necessary to switch whether or not to apply the inter-layer filter according to the motion compensation parameter of the PU (parameter other than the reference reference index that identifies the reference layer).
  • FIG. 17 is a diagram illustrating an encoded data portion (corresponding to the prediction information PInfo in FIG. 3) of a prediction unit of the configuration of the encoded data decoded by the variable length decoding unit 12.
  • the filter parameter or a part of the filter parameter is encoded for each predetermined filter unit.
  • the encoded data in the prediction unit of the encoded data includes the motion compensation parameters (merge index merge_idx, merge flag merge_flag, inter prediction flag inter_pred_flag, reference reference index ref_idx_l0 / ref_idx_l1, estimated by SYN1A in FIG.
  • Motion vector index mvp_l0_flag / mvp_l1_flag, motion vector residual mvd_coding and a filter parameter filter_param corresponding to the filter unit indicated by SYN2A.
  • the encoding is arranged in the order of the motion compensation parameter and the filter parameter.
  • the filter unit It is possible to switch whether or not to decode the filter parameters.
  • the filter parameter of the filter unit is included in the PU unit when the inter-layer filter is first turned on (when the reference layer image is used for generation of the prediction image). Filter parameters are not included. Accordingly, in all PU units included in the filter unit, when the inter-layer filter is turned off (when the reference layer image is not used for generating the predicted image), the filter parameter of the filter unit is not encoded. An effect of reducing the code amount of the filter parameter can be obtained.
  • FIG. 21 shows another configuration of the encoded data decoded by the variable length decoding unit 12 (corresponding to the prediction information PInfo of FIG. 3).
  • This example corresponds to the structure of encoded data when the operation already described with reference to FIG. 21 is performed.
  • the filter parameters of the filter unit are also encoded in CU units (coding_unit) so that the filter parameters of the filter units are encoded not in PU units but in CU units.
  • the encoded data in the CU unit of the encoded data includes the prediction information (prediction mode pred_mode_flag, PU partition mode part_mode, intra prediction mode intra_pred_mode) indicated by SYN1B in FIG. 21, and the filter indicated by SYN2B.
  • filter_param corresponding to the unit.
  • the encoding is arranged in the order of the motion compensation parameter and the filter parameter.
  • the filter unit It is possible to switch whether or not to decode the filter parameters. There is the following determination in SYN1B. ! filter_coded_flag && filter_enable_flag
  • the filter parameter of the filter unit is included in the CU unit in which the reference layer image is first used to generate the predicted image, and the filter parameter is not included in other cases. Therefore, when the interlayer filter is turned off in all CU units included in the filter unit, the filter parameter of the filter unit is not encoded, so that an effect of reducing the code amount of the filter parameter can be obtained.
  • FIG. 22 is a diagram illustrating a configuration of encoded data of filter parameters in units of filter units.
  • FIG. 22A shows an example of an adaptive offset filter
  • FIGS. 22B and 22C show examples of an adaptive spatial filter.
  • the offset type SaoTypeIdx, the offset SaoOffsetVal, the edge offset class EdgeClass, and the band offset position sao_band_pos are encoded by the syntax elements sao_type, sao_offset, sao_eo_class, and sao_band_pos, respectively.
  • the value of each syntax element corresponds to the filter parameter of the adaptive offset filter as it is, but for example, the offset may be divided into an absolute value and a code and encoded.
  • a merge filter indicating that the filter parameters of the adjacent filter units that have already been decoded are used (copied) as they are may be included.
  • FIG. 22B is an example of the filter parameters of the filter unit of the adaptive spatial filter.
  • a filter coefficient and an on / off flag are used as filter parameters, but only the on / off flag is encoded for each filter unit. This is because since the code amount of the filter coefficient is large, it is not appropriate in terms of encoding efficiency to perform encoding in units of relatively small units.
  • the layer_flt_onoff_flag in FIG. 22B is a flag indicating whether or not to apply an interlayer filter in the filter unit. As already described in S307, even when layer_flt_onoff_flag applies an inter-layer flag, a reference layer is not used for prediction or an inter-layer filter is not performed based on other prediction information and motion compensation parameters. Is not necessary.
  • FIG. 22C is another example of the filter parameter of the filter unit of the adaptive spatial filter.
  • layer_flt_idx is an index for switching a set of filter coefficients used for the adaptive spatial filter in the inter-layer filter of the filter unit, and takes a value from 0 to a predetermined value M ⁇ 1.
  • the filter coefficient to be used can be switched for each filter unit.
  • the filter coefficients may be stored in advance as a table in the decoding device, or may be decoded by being included in the encoded data in a slice header, a picture parameter set, or the like. Alternatively, some filter coefficient sets may be held in the decoding device, and the other filter coefficient set may be decoded from the encoded data.
  • layer_flt_idx When layer_flt_idx is 0, it may function as a flag indicating whether or not to apply an interlayer filter. In this case, the value of layer_flt_idx from 1 to M ⁇ 1 is a flag for switching the set of filter coefficients.
  • FIG. 23 shows a structure of encoded data when the filter coefficient of the adaptive spatial filter is encoded by the slice header.
  • fcoeff [i] [j] is an individual filter coefficient (filter coefficient and offset).
  • j is an index indicating each filter coefficient set, and
  • i is an index indicating a filter coefficient in the filter coefficient set.
  • the encoded data includes an M filter set composed of a set of N filter coefficients.
  • FIG. 27 is a functional block diagram showing a schematic configuration of the hierarchical video encoding device 2.
  • the hierarchical video encoding device 2 encodes the input image PIN # T of the target layer with reference to the reference layer encoded data DATA # R to generate hierarchical encoded data DATA of the target layer. It is assumed that the reference layer encoded data DATA # R has been encoded in the hierarchical video encoding apparatus corresponding to the reference layer.
  • the hierarchical video encoding device 2 includes a prediction parameter determination unit 21, a prediction information generation unit 22, a base decoding unit 23, a texture information generation unit 24, a variable length encoding unit 25, and a NAL multiplexing unit 26. And a filter parameter information generation unit 27.
  • the prediction parameter determination unit 21 determines a prediction parameter used for prediction of a prediction image and other encoding settings based on the input image PIN # T.
  • the prediction parameter determination unit 21 performs encoding settings including prediction parameters as follows.
  • the prediction parameter determination unit 21 generates a CU image for the target CU by sequentially dividing the input image PIN # T into slice units, tree block units, and CU units.
  • the prediction parameter determination unit 21 generates encoded information (sometimes referred to as header information) based on the result of the division process.
  • the encoding information includes (1) tree block information that is information about the size and shape of the tree block belonging to the target slice and the position in the target slice, and (2) the size, shape, and target of the CU belonging to each tree block.
  • CU information which is information about the position in the tree block.
  • the prediction parameter determination unit 21 refers to the CU image, the tree block information, and the CU information, and predicts the prediction type of the target CU, the division information of the target CU into the PU, and the prediction parameter (the target CU is an intra CU). If so, the intra prediction mode, and in the case of an inter CU, a motion compensation parameter in each PU is derived.
  • the prediction parameter determination unit 21 includes (1) a prediction type of the target CU, (2) a possible division pattern for each PU of the target CU, and (3) a prediction mode that can be assigned to each PU (if it is an intra CU).
  • the cost is calculated for all combinations of the intra prediction mode and the motion compensation parameter in the case of inter CU), and the prediction type, division pattern, and prediction mode with the lowest cost are determined.
  • the prediction parameter determination unit 21 supplies the encoded information and the prediction parameter to the prediction information generation unit 22 and the texture information generation unit 24. Although not shown for simplicity of explanation, the above-described encoding setting determined by the prediction parameter determination unit 21 can be referred to by each unit of the hierarchical video encoding device 2.
  • the prediction information generation unit 22 generates prediction information including a syntax value related to the prediction parameter based on the prediction parameter supplied from the prediction parameter determination unit 21 and the reference layer encoded data DATA # R.
  • the prediction information generation unit 22 supplies the generated prediction information to the variable length encoding unit 25. Note that the prediction information generation unit 22 can refer to the motion information stored in the frame memory 155 included in the texture information generation unit 24 when restoring the prediction parameters.
  • the base decoding unit 23 is the same as the base decoding unit 13 of the hierarchical video decoding device 1, the description thereof is omitted here.
  • the texture information generation unit 24 generates transform coefficient information including transform coefficients obtained by orthogonal transform / quantization of the prediction residual obtained by subtracting the predicted image from the input image PIN # T.
  • the texture information generation unit 24 supplies the generated transform coefficient information to the variable length encoding unit 25. Note that the texture information generation unit 24 stores information on the restored decoded image in the frame memory 155 provided therein.
  • the texture information generation unit 24 sets a filter parameter and supplies it to the filter parameter information generation unit 27.
  • the filter parameter information generation unit 27 generates filter parameter information including a syntax value related to the filter parameter supplied from the texture information generation unit 24.
  • the filter parameter information generation unit 27 supplies the generated filter parameter information to the variable length encoding unit 25.
  • the variable length coding unit 25 variable lengths the prediction information supplied from the prediction information generation unit 22, the transform coefficient information supplied from the texture information generation unit 24, and the filter parameter information supplied from the filter parameter information generation unit 27.
  • the target layer encoded data DATA # T is generated by encoding.
  • the variable length encoding unit 25 supplies the generated target layer encoded data DATA # T to the NAL multiplexing unit 26.
  • the NAL multiplexing unit 26 stores the target layer encoded data DATA # T and the reference layer encoded data DATA # R supplied from the variable length encoding unit 25 in the NAL unit, and thereby performs hierarchical video that has been NAL multiplexed. Image encoded data DATA is generated and output to the outside.
  • FIG. 28 is a functional block diagram illustrating the configuration of the prediction information generation unit 22.
  • the prediction information generation unit 22 includes a prediction type selection unit 221, a switch 222, an intra prediction mode derivation unit 223, a motion vector candidate derivation unit 224, a motion information generation unit 225, a merge candidate derivation unit 226, and A merge information generation unit 227 is provided.
  • the prediction type selection unit 221 sends a switching instruction to the switch 222 according to the CU type or PU type, and controls the prediction parameter derivation process. Specifically, it is as follows.
  • the prediction type selection unit 221 controls the switch 222 so that the prediction information can be derived using the intra prediction mode deriving unit 223.
  • the prediction type selection unit 221 uses the motion information generation unit 225 to control the switch 222 so that a prediction parameter can be derived.
  • the prediction type selection unit 221 controls the switch 222 so that a prediction parameter can be derived using the merge information generation unit 227.
  • the switch 222 supplies the prediction parameter to any of the intra prediction mode deriving unit 223, the motion information generating unit 225, and the merge information generating unit 227 in accordance with an instruction from the prediction type selecting unit 221.
  • a prediction parameter is derived at a supply destination of the prediction information.
  • the intra prediction mode deriving unit 223 derives a syntax value related to the prediction mode. That is, the intra prediction mode deriving unit 223 generates the syntax information related to the prediction mode as the prediction information.
  • the motion vector candidate derivation unit 224 uses the base decoding information to derive an estimated motion vector candidate by intra-layer motion estimation processing or inter-layer motion estimation processing.
  • the motion vector candidate derivation unit 224 supplies the derived motion vector candidates to the motion information generation unit 225.
  • the motion information generation unit 225 generates a syntax value related to motion information in each inter prediction partition that is not merged. That is, the motion information generation unit 225 generates a syntax value related to motion information as prediction information. Specifically, the motion information generation unit 225 derives corresponding syntax element values inter_pred_flag, mvd, mvp_idx, and refIdx from the motion compensation parameter in each PU.
  • the motion information generation unit 225 derives the syntax value based on the motion vector candidates supplied from the motion vector candidate derivation unit 224.
  • the motion information generation unit 225 derives the syntax value based on the motion information included in the prediction parameter.
  • the merge candidate derivation unit 226 uses the motion information already decoded supplied from the frame memory 155 and / or the base decoding information supplied from the base decoding unit 23, and the like, a motion compensation parameter similar to the motion compensation parameter in each PU.
  • a merge candidate having The merge candidate derivation unit 226 supplies the derived merge candidates to the merge information generation unit 227.
  • the configuration of the merge candidate derivation unit 226 is the same as the configuration of the merge candidate derivation unit 146 included in the hierarchical video decoding device 1, and thus the description thereof is omitted.
  • the merge information generation unit 227 generates a syntax value related to motion information regarding each inter prediction partition to be merged. That is, the merge information generation unit 227 generates a syntax value related to motion information as prediction information. Specifically, the merge information generation unit 227 outputs a syntax element value merge_idx that specifies a merge candidate having a motion compensation parameter similar to the motion compensation parameter in each PU.
  • FIG. 29 is a functional block diagram illustrating the configuration of the texture information generation unit 24.
  • the texture information generation unit 24 includes a texture prediction unit 152, a subtractor 242, an orthogonal transformation / quantization unit 243, an inverse orthogonal transformation / inverse quantization unit 244, an adder 245, a loop filter unit 246, A frame memory 155 and a filter parameter deriving unit 248 are provided.
  • the subtractor 242 generates a prediction residual D by subtracting the prediction image supplied from the texture prediction unit 152 from the input image PIN # T.
  • the subtractor 242 supplies the generated prediction residual D to the orthogonal transform / quantization unit 243.
  • the orthogonal transform / quantization unit 243 generates a quantized prediction residual by performing orthogonal transform and quantization on the prediction residual D.
  • the orthogonal transform refers to an orthogonal transform from the pixel region to the frequency region. Examples of orthogonal transformation include DCT transformation (DiscretecreCosine Transform), DST transformation (Discrete Sine Transform), and the like.
  • DCT transformation DiscretecreCosine Transform
  • DST transformation Discrete Sine Transform
  • the specific quantization process is as described above, and the description thereof is omitted here.
  • the orthogonal transform / quantization unit 243 supplies the generated transform coefficient information including the quantized prediction residual to the inverse orthogonal transform / inverse quantization unit 244 and the variable length coding unit 25.
  • the filter parameter deriving unit 248 derives a filter parameter from the input layer PIN # T and the reference layer image input to the inter layer filter unit 1522 of the inter layer deriving unit supplied from the texture prediction unit 152, and the loop filter unit 246, And supplied to the filter parameter information generation unit 27. More specifically, a filter parameter that minimizes the difference between the filter image of the input image PIN # T and the reference layer image is derived. For example, the offset type SaoTypeIdx and the offset SaoOffsetVal of the adaptive offset filter are input, and sao_band_position is derived. In the case of an adaptive spatial filter, a set of filter coefficients is derived. As a derivation method, the same method as the conventional adaptive loop filter ALF can be used.
  • the adaptive spatial filter is performed by selecting an appropriate filter coefficient set from a plurality of predetermined filter coefficient sets
  • a filtered image obtained by performing filter processing with the plurality of filter coefficient sets is used.
  • a filter coefficient set with the smallest error from the input image PIN # T is selected.
  • the texture prediction unit 152, the inverse orthogonal transform / inverse quantization unit 244, the adder 245, the loop filter unit 246, and the frame memory 155 are respectively the texture prediction unit 152, inverse orthogonal transform / Since it is similar to the inverse quantization unit 151, the adder 153, the loop filter unit 154, and the frame memory 155, the description thereof is omitted here. However, the texture prediction unit 152 supplies the predicted image not only to the adder 245 but also to the subtractor 242.
  • the texture prediction unit 152 applies an inter-layer filter process when a reference layer image is used, or when a prediction parameter satisfies a predetermined condition (interlayer filter processing). In other cases, an operation that the processing of the inter-layer filter is not applied (inter-layer filter off) can be realized.
  • the interlayer filter when the prediction parameter does not satisfy a predetermined condition, the interlayer filter is not applied, so that the processing load associated with the interlayer filter can be reduced.
  • the inter-layer filter process is heavier than a normal motion compensation filter because the operation of the adaptive offset filter and the filter coefficient (weight coefficient) in the adaptive spatial filter are variable depending on the filter parameter. There is an effect that this processing can be reduced.
  • the operation of the interlayer filter can be switched in accordance with the existing conditions without explicitly encoding the flag as a filter on / off flag. With this configuration, since the code amount of the filter on / off flag is reduced and only a region having a high filter effect can be filtered, the encoding efficiency is improved.
  • FIGS. 24 to 26 are drawings corresponding to FIGS. 18 to 20 in the decoding apparatus.
  • S301 to S311 correspond to S401 to S411, respectively.
  • FIG. 24 is a diagram illustrating an operation for decoding encoded data including a filter parameter in the variable length encoding unit 25.
  • the variable length coding unit 25 performs the following processing in each CTB. First, in order to indicate that the filter parameter is not decoded in this filter unit, the filter encoding flag filter_coded_flag is set to 0. (S401). Subsequently, a CU loop is started (S402). The CU loop is performed by sequentially processing all the CUs included in the CTB. Subsequently, a PU loop is started (S403).
  • the PU loop is performed by sequentially processing all PUs included in the CU.
  • information corresponding to the PU is decoded from the prediction information PInfo excluding the filter parameter (S404). For example, a merge index, a merge flag, an inter prediction flag, a reference reference index, an estimated motion vector index, and a motion vector residual are encoded.
  • the reference layer image is used to generate a predicted image (S406). Specifically, it is determined whether or not the reference image indicated by the reference reference index is a reference layer. judge. In the case of the reference layer, the reference layer image is used as the predicted image.
  • filter_enable_flag is set to 1 when the motion compensation parameter satisfies a predetermined condition as described above.
  • the filter parameter of the filter unit is encoded (S408).
  • the encoding of the filter parameter of the filter unit is omitted.
  • S410 is the end of the loop in PU units, and S411 is the end of the loop in CU units.
  • variable length encoding unit 25 in the prediction unit included in the filter unit, the filter parameter of the filter unit is encoded in the prediction unit to which the interlayer filter is first applied.
  • the filter parameter of the filter unit is not encoded, so that the effect of reducing the code amount of the filter parameter can be obtained.
  • FIG. 25 is a diagram illustrating another example of the operation of encoding the encoded data including the filter parameter in the variable length encoding unit 25. Steps having the same numbers as those in FIG. 24 perform the same operations, and thus description thereof is omitted. The operations from S404 to S405 are the same. If YES in S405, that is, if the reference layer image is used to generate a predicted image, it is further determined whether or not the filter parameters of the filter unit have been encoded (S407 '). If the filter parameter of the filter unit has been encoded (YES in S407 '), the filter parameter of the filter unit is encoded (S408). Conversely, if the filter parameter has already been encoded in the filter unit (NO in S407 '), the encoding of the filter parameter of the filter unit is omitted.
  • FIG. 26 is a diagram showing still another example of the operation of encoding the encoded data including the filter parameter in the variable length encoding unit 25.
  • the processing of S405, S407 ', S408A, and S409 is performed between the end of the PU unit loop (S410) and the end of the CU unit loop (S411).
  • the filter parameters of the filter unit are encoded not in units of PUs but in units of CUs.
  • variable length encoding unit 25 in the prediction unit included in the filter unit, the filter parameter of the filter unit is first encoded in the prediction unit in which the reference layer is applied to the generation of the prediction image.
  • the filter parameter of the filter unit is not encoded, so that an effect of reducing the code amount of the filter parameter can be obtained.
  • the above-described hierarchical video encoding device 2 and hierarchical video decoding device 1 can be used by being mounted on various devices that perform transmission, reception, recording, and reproduction of moving images.
  • the moving image may be a natural moving image captured by a camera or the like, or may be an artificial moving image (including CG and GUI) generated by a computer or the like.
  • Hierarchical video encoding device 2 and hierarchical video decoding device 1 can be used for transmission and reception of video.
  • FIG. 30 is a block diagram illustrating a configuration of a transmission device PROD_A in which the hierarchical video encoding device 2 is mounted.
  • the transmission device PROD_A modulates a carrier wave with an encoding unit PROD_A1 that obtains encoded data by encoding a moving image, and the encoded data obtained by the encoding unit PROD_A1.
  • a modulation unit PROD_A2 that obtains a modulation signal and a transmission unit PROD_A3 that transmits the modulation signal obtained by the modulation unit PROD_A2 are provided.
  • the hierarchical moving image encoding apparatus 2 described above is used as the encoding unit PROD_A1.
  • the transmission device PROD_A is a camera PROD_A4 that captures a moving image, a recording medium PROD_A5 that records the moving image, an input terminal PROD_A6 that inputs the moving image from the outside, as a supply source of the moving image input to the encoding unit PROD_A1.
  • An image processing unit A7 that generates or processes an image may be further provided.
  • FIG. 30A illustrates a configuration in which the transmission apparatus PROD_A includes all of these, but a part of the configuration may be omitted.
  • the recording medium PROD_A5 may be a recording of a non-encoded moving image, or a recording of a moving image encoded by a recording encoding scheme different from the transmission encoding scheme. It may be a thing. In the latter case, a decoding unit (not shown) for decoding the encoded data read from the recording medium PROD_A5 according to the recording encoding method may be interposed between the recording medium PROD_A5 and the encoding unit PROD_A1.
  • FIG. 30 is a block diagram illustrating a configuration of the receiving device PROD_B in which the hierarchical video decoding device 1 is mounted.
  • the receiving device PROD_B includes a receiving unit PROD_B1 that receives a modulated signal, a demodulating unit PROD_B2 that obtains encoded data by demodulating the modulated signal received by the receiving unit PROD_B1, and a demodulator.
  • a decoding unit PROD_B3 that obtains a moving image by decoding the encoded data obtained by the unit PROD_B2.
  • the above-described hierarchical video decoding device 1 is used as the decoding unit PROD_B3.
  • the receiving device PROD_B has a display PROD_B4 for displaying a moving image, a recording medium PROD_B5 for recording the moving image, and an output terminal for outputting the moving image to the outside as a supply destination of the moving image output by the decoding unit PROD_B3.
  • PROD_B6 may be further provided.
  • FIG. 30B illustrates a configuration in which the reception device PROD_B includes all of these, but a part of the configuration may be omitted.
  • the recording medium PROD_B5 may be used for recording a non-encoded moving image, or may be encoded using a recording encoding method different from the transmission encoding method. May be. In the latter case, an encoding unit (not shown) for encoding the moving image acquired from the decoding unit PROD_B3 according to the recording encoding method may be interposed between the decoding unit PROD_B3 and the recording medium PROD_B5.
  • the transmission medium for transmitting the modulation signal may be wireless or wired.
  • the transmission mode for transmitting the modulated signal may be broadcasting (here, a transmission mode in which the transmission destination is not specified in advance) or communication (here, transmission in which the transmission destination is specified in advance). Refers to the embodiment). That is, the transmission of the modulation signal may be realized by any of wireless broadcasting, wired broadcasting, wireless communication, and wired communication.
  • a terrestrial digital broadcast broadcasting station (broadcasting equipment or the like) / receiving station (such as a television receiver) is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives a modulated signal by wireless broadcasting.
  • a broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) of cable television broadcasting is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives a modulated signal by cable broadcasting.
  • a server workstation etc.
  • Client television receiver, personal computer, smart phone etc.
  • VOD Video On Demand
  • video sharing service using the Internet is a transmitting device for transmitting and receiving modulated signals by communication.
  • PROD_A / reception device PROD_B usually, either a wireless or wired transmission medium is used in a LAN, and a wired transmission medium is used in a WAN.
  • the personal computer includes a desktop PC, a laptop PC, and a tablet PC.
  • the smartphone also includes a multi-function mobile phone terminal.
  • the video sharing service client has a function of encoding a moving image captured by the camera and uploading it to the server. That is, the client of the video sharing service functions as both the transmission device PROD_A and the reception device PROD_B.
  • Hierarchical video encoding device 2 and hierarchical video decoding device 1 can be used for video recording and reproduction.
  • FIG. 31 (a) is a block diagram showing a configuration of a recording apparatus PROD_C in which the above-described hierarchical video encoding apparatus 2 is mounted.
  • the recording device PROD_C has an encoding unit PROD_C1 that obtains encoded data by encoding a moving image, and the encoded data obtained by the encoding unit PROD_C1 on the recording medium PROD_M.
  • a writing unit PROD_C2 for writing.
  • the hierarchical moving image encoding device 2 described above is used as the encoding unit PROD_C1.
  • the recording medium PROD_M may be of a type built in the recording device PROD_C, such as (1) HDD (Hard Disk Drive) or SSD (Solid State Drive), or (2) SD memory. It may be of the type connected to the recording device PROD_C, such as a card or USB (Universal Serial Bus) flash memory, or (3) DVD (Digital Versatile Disc) or BD (Blu-ray Disc: registration) Or a drive device (not shown) built in the recording device PROD_C.
  • HDD Hard Disk Drive
  • SSD Solid State Drive
  • SD memory such as a card or USB (Universal Serial Bus) flash memory, or (3) DVD (Digital Versatile Disc) or BD (Blu-ray Disc: registration) Or a drive device (not shown) built in the recording device PROD_C.
  • the recording device PROD_C is a camera PROD_C3 that captures moving images as a supply source of moving images to be input to the encoding unit PROD_C1, an input terminal PROD_C4 for inputting moving images from the outside, and reception for receiving moving images.
  • the unit PROD_C5 and an image processing unit C6 that generates or processes an image may be further provided.
  • FIG. 31A illustrates a configuration in which the recording apparatus PROD_C includes all of these, but a part of the configuration may be omitted.
  • the receiving unit PROD_C5 may receive a non-encoded moving image, or may receive encoded data encoded by a transmission encoding scheme different from the recording encoding scheme. You may do. In the latter case, a transmission decoding unit (not shown) that decodes encoded data encoded by the transmission encoding method may be interposed between the reception unit PROD_C5 and the encoding unit PROD_C1.
  • Examples of such a recording device PROD_C include a DVD recorder, a BD recorder, and an HDD (Hard Disk Drive) recorder (in this case, the input terminal PROD_C4 or the receiving unit PROD_C5 is a main supply source of moving images).
  • a camcorder in this case, the camera PROD_C3 is a main source of moving images
  • a personal computer in this case, the receiving unit PROD_C5 or the image processing unit C6 is a main source of moving images
  • a smartphone in this case In this case, the camera PROD_C3 or the receiving unit PROD_C5 is a main supply source of moving images
  • the camera PROD_C3 or the receiving unit PROD_C5 is a main supply source of moving images
  • FIG. 31 is a block showing a configuration of a playback device PROD_D in which the above-described hierarchical video decoding device 1 is mounted.
  • the playback device PROD_D reads a moving image by decoding a read unit PROD_D1 that reads encoded data written to the recording medium PROD_M and a read unit PROD_D1 that reads the encoded data. And a decoding unit PROD_D2 to be obtained.
  • the hierarchical moving image decoding apparatus 1 described above is used as the decoding unit PROD_D2.
  • the recording medium PROD_M may be of the type built into the playback device PROD_D, such as (1) HDD or SSD, or (2) such as an SD memory card or USB flash memory, It may be of a type connected to the playback device PROD_D, or (3) may be loaded into a drive device (not shown) built in the playback device PROD_D, such as DVD or BD. Good.
  • the playback device PROD_D has a display PROD_D3 that displays a moving image, an output terminal PROD_D4 that outputs the moving image to the outside, and a transmission unit that transmits the moving image as a supply destination of the moving image output by the decoding unit PROD_D2.
  • PROD_D5 may be further provided.
  • FIG. 31B illustrates a configuration in which the playback apparatus PROD_D includes all of these, but some of them may be omitted.
  • the transmission unit PROD_D5 may transmit an unencoded moving image, or transmits encoded data encoded by a transmission encoding method different from the recording encoding method. You may do. In the latter case, it is preferable to interpose an encoding unit (not shown) that encodes a moving image with an encoding method for transmission between the decoding unit PROD_D2 and the transmission unit PROD_D5.
  • Examples of such a playback device PROD_D include a DVD player, a BD player, and an HDD player (in this case, an output terminal PROD_D4 to which a television receiver or the like is connected is a main supply destination of moving images).
  • a television receiver in this case, the display PROD_D3 is a main supply destination of moving images
  • a digital signage also referred to as an electronic signboard or an electronic bulletin board
  • the display PROD_D3 or the transmission unit PROD_D5 is the main supply of moving images.
  • Desktop PC (in this case, the output terminal PROD_D4 or the transmission unit PROD_D5 is the main video image supply destination), laptop or tablet PC (in this case, the display PROD_D3 or the transmission unit PROD_D5 is a moving image)
  • a smartphone which is a main image supply destination
  • a smartphone in this case, the display PROD_D3 or the transmission unit PROD_D5 is a main moving image supply destination
  • the like are also examples of such a playback device PROD_D.
  • Each block of the above-described hierarchical video decoding device 1 and hierarchical video encoding device 2 may be realized in hardware by a logic circuit formed on an integrated circuit (IC chip), or a CPU (Central Processing Unit) may be used for software implementation.
  • IC chip integrated circuit
  • CPU Central Processing Unit
  • each device includes a CPU that executes instructions of a program that realizes each function, a ROM (Read (Memory) that stores the program, a RAM (Random Memory) that expands the program, the program, and various types
  • a storage device such as a memory for storing data is provided.
  • An object of the present invention is to provide a recording medium in which a program code (execution format program, intermediate code program, source program) of a control program for each of the above devices, which is software that realizes the above-described functions, is recorded in a computer-readable manner This can also be achieved by supplying each of the above devices and reading and executing the program code recorded on the recording medium by the computer (or CPU or MPU).
  • Examples of the recording medium include tapes such as magnetic tapes and cassette tapes, magnetic disks such as floppy (registered trademark) disks / hard disks, CD-ROMs (Compact Disc-Read-Only Memory) / MO discs (Magneto-Optical discs).
  • tapes such as magnetic tapes and cassette tapes
  • magnetic disks such as floppy (registered trademark) disks / hard disks
  • CD-ROMs Compact Disc-Read-Only Memory
  • MO discs Magnetic-Optical discs
  • IC cards including memory cards
  • Cards such as optical cards
  • each of the above devices may be configured to be connectable to a communication network, and the program code may be supplied via the communication network.
  • the communication network is not particularly limited as long as it can transmit the program code.
  • the Internet intranet, extranet, LAN (Local Area Network), ISDN (Integrated Services Digital Network), VAN (Value-Added Network), CATV (Community Area Antenna / Cable Television) communication network, Virtual Private Network (Virtual Private Network) Network), telephone line network, mobile communication network, satellite communication network, and the like.
  • the transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type.
  • the present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.
  • the present invention relates to a hierarchical video decoding device that decodes encoded data in which image data is hierarchically encoded, and a hierarchical video encoding device that generates encoded data in which image data is hierarchically encoded. It can be suitably applied to. Further, the present invention can be suitably applied to the data structure of hierarchically encoded data that is generated by a hierarchical video encoding device and referenced by the hierarchical video decoding device.
  • Hierarchical video decoding device image decoding device
  • NAL demultiplexing unit Variable length decoding unit
  • Base decoding unit Prediction parameter restoration unit 15 Texture restoration unit (filter application unit, prediction image generation unit) 16 Filter parameter restoration unit (filter parameter deriving means) 2.
  • Hierarchical video encoding device (image encoding device) 21 Prediction parameter determination unit 22 Prediction information generation unit 23 Base decoding unit 24 Texture information generation unit (filter application means) 25 Variable length encoding unit 26 NAL demultiplexing unit 27 Filter parameter information generating unit (filter parameter deriving means) 152A Inter Prediction Unit 152B Intra Prediction Unit 1521 Motion Compensation Filter Unit 1522 Inter Layer Filter Unit 1523 Selection Synthesis Unit 15221 Filter Unit 15222 Inter Layer Filter Control Unit 15223 Scaling Unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'objectif de l'invention est de réduire la quantité de traitement de filtrage par une technique pour améliorer la qualité d'image subjective et objective d'une image prédite en utilisant un filtre adaptatif lorsque l'image prédite est générée au moyen d'une image de couche de référence. Un dispositif de décodage d'image décode de manière hiérarchique les données encodées dans lesquelles les informations d'image relatives aux images, dont la qualité diffère de couche en couche, sont encodées hiérarchiquement, et reconstruit une image dans une couche cible devant être décodée. Le dispositif de décodage d'image comprend : un moyen de génération d'image prédite pour, selon les paramètres de prédiction, générer une image prédite de la couche cible en utilisant, comme entrées, une image de couche de référence déjà décodée et une image de couche cible déjà décodée ; et un moyen de filtrage pour procéder au traitement par le filtre adaptatif sur l'image de couche de référence, ledit traitement par le filtre adaptatif étant basé sur les paramètres de filtrage, et générer une image de couche de référence filtrée. Le moyen de génération de l'image prédite génère une image prédite au moyen de l'image de couche de référence filtrée seulement quand les paramètres de prédiction respectent une condition prédéterminée.
PCT/JP2013/074483 2012-09-28 2013-09-11 Dispositif de décodage d'image et dispositif de codage d'image WO2014050554A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2012217593 2012-09-28
JP2012-217593 2012-09-28

Publications (2)

Publication Number Publication Date
WO2014050554A1 true WO2014050554A1 (fr) 2014-04-03
WO2014050554A8 WO2014050554A8 (fr) 2015-03-26

Family

ID=50387961

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/074483 WO2014050554A1 (fr) 2012-09-28 2013-09-11 Dispositif de décodage d'image et dispositif de codage d'image

Country Status (1)

Country Link
WO (1) WO2014050554A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015163047A1 (fr) * 2014-04-23 2015-10-29 ソニー株式会社 Dispositif et procédé de traitement d'image

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006295913A (ja) * 2005-04-11 2006-10-26 Sharp Corp 空間的スケーラブルコーディングのためのアダプティブアップサンプリング方法および装置
JP2010516199A (ja) * 2007-01-09 2010-05-13 クゥアルコム・インコーポレイテッド スケーラブルビデオコーディング用の適応アップサンプリング
WO2011086836A1 (fr) * 2010-01-12 2011-07-21 シャープ株式会社 Appareil codeur, appareil décodeur et structure de données
WO2013161690A1 (fr) * 2012-04-24 2013-10-31 シャープ株式会社 Dispositif de décodage d'image et dispositif de codage d'image

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006295913A (ja) * 2005-04-11 2006-10-26 Sharp Corp 空間的スケーラブルコーディングのためのアダプティブアップサンプリング方法および装置
JP2010516199A (ja) * 2007-01-09 2010-05-13 クゥアルコム・インコーポレイテッド スケーラブルビデオコーディング用の適応アップサンプリング
WO2011086836A1 (fr) * 2010-01-12 2011-07-21 シャープ株式会社 Appareil codeur, appareil décodeur et structure de données
WO2013161690A1 (fr) * 2012-04-24 2013-10-31 シャープ株式会社 Dispositif de décodage d'image et dispositif de codage d'image

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DANNY HONG ET AL.: "Scalability Support in HEVC", JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT- VC) OF ITU-T SG16 WP3 AND ISO/IEC JTC1/SC29/ WG11, JCTVC-F290R1, 6TH MEETING, July 2011 (2011-07-01), TORINO, IT, pages 1 - 15 *
HYOMIN CHOI ET AL.: "Scalable structures and inter-layer predictions for HEVC scalable extension", JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC) OF ITU-T SG16 WP3 AND ISO/IEC JTC1/SC29/WG11, JCTVC-F096_R4, 6TH MEETING, July 2011 (2011-07-01), TORINO, IT, pages 1 - 15 *
KEMAL UGUR ET AL.: "Bilinear chroma interpolation for small block sizes", JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC) OF ITU-T SG16 WP3 AND ISO/IEC JTC1/SC29/WG11, JCTVC-F242, 6TH MEETING, July 2011 (2011-07-01), TORINO *
TOMOYUKI YAMAMOTO ET AL.: "Description of scalable video coding technology proposal by SHARP (proposal 2)", JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC) OF ITU-T SG16 WP3 AND ISO/IEC JTC1/SC29/WG11, JCTVC-K0032R1, LLTH MEETING, October 2012 (2012-10-01), SHANGHAI, CN, pages 1 - 18 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015163047A1 (fr) * 2014-04-23 2015-10-29 ソニー株式会社 Dispositif et procédé de traitement d'image
JP2015216626A (ja) * 2014-04-23 2015-12-03 ソニー株式会社 画像処理装置及び画像処理方法
US10477207B2 (en) 2014-04-23 2019-11-12 Sony Corporation Image processing apparatus and image processing method

Also Published As

Publication number Publication date
WO2014050554A8 (fr) 2015-03-26

Similar Documents

Publication Publication Date Title
JP6284661B2 (ja) 画像符号化装置、および画像符号化方法
JP6456535B2 (ja) 画像符号化装置、画像符号化方法および記録媒体
US10136151B2 (en) Image decoding device and image decoding method
US10841600B2 (en) Image decoding device, an image encoding device and a decoding method
JP6352248B2 (ja) 画像復号装置、および画像符号化装置
WO2014007131A1 (fr) Dispositif de décodage d'image et dispositif de codage d'image
WO2014104242A1 (fr) Dispositif de codage d'image et dispositif de décodage d'image
WO2013161690A1 (fr) Dispositif de décodage d'image et dispositif de codage d'image
WO2012121352A1 (fr) Dispositif de décodage de vidéo, dispositif de codage de vidéo et structure de données
JP2014176039A (ja) 画像復号装置、および画像符号化装置
JP2014013975A (ja) 画像復号装置、符号化データのデータ構造、および画像符号化装置
WO2014050554A1 (fr) Dispositif de décodage d'image et dispositif de codage d'image
WO2013161689A1 (fr) Dispositif de décodage de vidéo animée et dispositif de codage de vidéo animée
JP2014082729A (ja) 画像復号装置、および画像符号化装置
JP2014013976A (ja) 画像復号装置、および画像符号化装置
JP2015076807A (ja) 画像復号装置、画像符号化装置、および符号化データのデータ構造

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13840640

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13840640

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP