JP2018182531A

JP2018182531A - Division shape determining apparatus, learning apparatus, division shape determining method, and division shape determining program

Info

Publication number: JP2018182531A
Application number: JP2017079585A
Authority: JP
Inventors: 翔太折橋; Shota Orihashi; 忍工藤; Shinobu Kudo; 正樹北原; Masaki Kitahara; 清水　淳; Atsushi Shimizu; 淳清水
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-04-13
Filing date: 2017-04-13
Publication date: 2018-11-15
Anticipated expiration: 2037-04-13
Also published as: JP6748022B2

Abstract

PROBLEM TO BE SOLVED: To provide a division shape determining apparatus, a learning apparatus, a division shape determining method, and divided shape determining program capable of determining a division shape of a CU for efficiently encoding an encoding target image even when a calculation amount for determining the division shape of the CU is reduced.SOLUTION: A division shape determining apparatus has a plurality of nodes that hold division probabilities that are probabilities related to division in a hierarchical structure and includes: a learning unit that updates a learning parameter of a learning model that is a set of nodes in accordance with a division probability of a node associated with a block that divides an encoding target image and outputs a division probability obtained as an output of a learning model in which the learning parameter is updated in association with the node; and a determination unit that determines whether to divide the block associated with the block on the basis of the division probability output in association with the node.SELECTED DRAWING: Figure 3

Description

本発明は、分割形状決定装置、学習装置、分割形状決定方法及び分割形状決定プログラムに関する。 The present invention relates to a divided shape determination device, a learning device, a divided shape determination method, and a divided shape determination program.

動画像符号化の標準規格として、Ｈ.２６４／ＡＶＣ（Advanced Video Coding）（以下「ＡＶＣ」という。）がある。ＡＶＣに次ぐ新たな標準規格として、Ｈ.２６５／ＨＥＶＣ（High Efficiency Video Coding）（以下「ＨＥＶＣ」という。）が２０１３年に標準化された。ＨＥＶＣは、同程度の画質では、ＡＶＣと比較して２倍の圧縮性能を誇る。しかしながら、ＨＥＶＣの演算量は、ＡＶＣの演算量と比較して膨大である。 H.264 / AVC (Advanced Video Coding) (hereinafter referred to as "AVC") is a standard of moving picture coding. As a new standard next to AVC, H.265 / HEVC (High Efficiency Video Coding) (hereinafter referred to as "HEVC") was standardized in 2013. HEVC boasts twice as much compression performance as AVC at comparable image quality. However, the amount of computation of HEVC is enormous compared to the amount of computation of AVC.

ＨＥＶＣでは、符号化対象画像は、６４画素×６４画素のサイズのブロックであるＣＴＵ（Coding Tree Unit）の単位で区切られている。画像符号化装置は、ＣＴＵごとに符号化処理を実行する。分割形状決定装置は、符号化単位であるＣＵ（Coding Unit）と呼ばれるブロックに、ＣＴＵを再帰的に４分割することができる。ＨＥＶＣでは、６４画素×６４画素、３２画素×３２画素、１６画素×１６画素、８画素×８画素という４種類のＣＵのサイズが定義されている。以下、ｎ画素×ｎ画素を「ｎ×ｎ」と表記する。 In HEVC, an image to be encoded is divided in units of CTU (Coding Tree Unit), which is a block having a size of 64 pixels × 64 pixels. The image coding apparatus performs coding processing for each CTU. The division shape determination device can recursively divide the CTU into four blocks called a coding unit, called a CU (Coding Unit). In HEVC, four types of CU sizes are defined: 64 pixels × 64 pixels, 32 pixels × 32 pixels, 16 pixels × 16 pixels, and 8 pixels × 8 pixels. Hereinafter, n pixels × n pixels are described as “n × n”.

図６は、ＣＵの分割形状の一例を示す図である。各ＣＵは、イントラ予測やインター予測等のパラメータを共有している。ＣＵにおける輝度値の分布が平坦である場合、ＣＵのサイズは大きく定められる。ＣＵにおける輝度値の分布が複雑である場合、ＣＵのサイズは小さく定められる。これらのようにＣＵのサイズが定められることによって、ＨＥＶＣの画像符号化装置は、符号化効率を高めることができる。 FIG. 6 is a diagram illustrating an example of divided shapes of CUs. Each CU shares parameters such as intra prediction and inter prediction. If the distribution of luminance values in the CU is flat, the size of the CU is determined large. If the distribution of luminance values in the CU is complex, the size of the CU is set smaller. By determining the size of the CU as described above, the image coding apparatus of HEVC can improve the coding efficiency.

図７は、ＣＵの分割形状を表すための四分木データ構造の例を示す図である。ＣＵの分割形状は、四分木データ構造を用いて表現される。四分木データ構造は、階層構造を有する。四分木データ構造の各ノードは、各ＣＵに対応付けられている。各ＣＵは、四分木データ構造の階層（分割デプス）ごとに分類される。四分木データ構造の各ノードには、ノードに対応付けられているＣＵ（ブロック）の分割に関するフラグが、ノードのラベルとして定められている。ＨＥＶＣでは、分割に関するフラグは、分割を表す１と非分割を表す０との２値で表現される。 FIG. 7 is a diagram illustrating an example of a quadtree data structure for representing a divided shape of a CU. The split shape of CU is expressed using a quadtree data structure. The quadtree data structure has a hierarchical structure. Each node of the quadtree data structure is associated with each CU. Each CU is classified by hierarchy (division depth) of the quadtree data structure. In each node of the quadtree data structure, a flag regarding division of a CU (block) associated with the node is defined as a label of the node. In HEVC, a flag relating to division is represented by a binary value of 1 representing division and 0 representing non-division.

ＨＥＶＣの分割形状決定装置は、ＨＥＶＣテストモデル（ＨＭ）等の参照ソフトウェアにおいて定められたレート歪み最適化に基づいて、ＣＵの分割形状を決定する。分割形状決定装置は、参照ソフトウェアにおいて定められたレート歪み最適化に基づいて、レート歪みコスト関数Ｊ（＝Ｄ＋λＲ）が最小となるＣＵの分割形状及び予測モードを算出する。レート歪みコスト関数Ｊにおいて、Ｄは、パラメータの選択に応じて発生する歪み量を表す。Ｒは、発生するビット量を表す。λは、ラグランジュ乗数と呼ばれる定数を表す。ＨＥＶＣの分割形状決定装置は、ＣＵの分割形状及び予測モードを、レート歪み最適化において全探索して決定する。このため、レート歪み最適化の演算量は膨大である。 The split shape determination device of HEVC determines the split shape of CU based on rate distortion optimization defined in reference software such as HEVC test model (HM). The divided shape determination device calculates the divided shape and prediction mode of the CU that minimize the rate distortion cost function J (= D + λR) based on the rate distortion optimization defined in the reference software. In the rate distortion cost function J, D represents the amount of distortion generated in response to the selection of the parameter. R represents the amount of generated bits. λ represents a constant called Lagrange multiplier. The split shape determination device of HEVC determines and determines the split shape and prediction mode of a CU by performing a full search in rate distortion optimization. For this reason, the amount of computation of rate distortion optimization is enormous.

そこで、分割形状決定装置がレート歪み最適化を実行せずにＣＵの分割形状を決定する方法として、ＣＵの分割形状を教師データとするニューラルネットワークの学習モデルを用いてＣＵの分割形状を決定する方法が提案されている。教師データを用いた学習（教師あり学習）では、学習モデルの入力であるＣＴＵと、学習モデルの出力であるＣＵの分割形状（分割パターン）を表す正解ラベルとが、大量に用意される。 Therefore, as a method of determining the divided shape of the CU without performing the rate distortion optimization, the divided shape determination device determines the divided shape of the CU using a learning model of a neural network using the divided shape of the CU as teaching data. A method has been proposed. In learning using supervised data (supervised learning), a large number of CTUs, which are inputs of a learning model, and correct answer labels representing divided shapes (division patterns) of CUs, which are outputs of the learning models, are prepared.

分割形状決定装置は、ＣＴＵごとの教師データを反復して用いることによって、ＣＵの分割形状を学習モデルに学習させる。分割形状決定装置は、ＣＴＵごとの教師データが学習モデルに反復して入力された結果として得られたＣＵの分割形状が正解ラベルに近づくよう、学習モデルの学習パラメータを更新する。 The divided shape determination device causes the learning model to learn the divided shape of the CU by repeatedly using the teaching data for each CTU. The divided shape determination device updates the learning parameter of the learning model so that the divided shape of the CU obtained as a result of repetitively inputting the training data for each CTU into the learning model approaches the correct answer label.

図８は、ＣＵの分割形状を表す正解ラベルの例を示す図である。分割形状決定装置がＣＵの分割形状をＣＴＵごとに学習する場合、ＣＴＵの単位の符号化対象画像の原画を入力としてＣＵの分割形状を出力とする正解ラベル（分類モデル）を分割形状決定装置に学習させる方法が、最も単純な方法である。しかしながら、ＣＵの全ての分割形状をＣＴＵの単位で網羅すると、正解ラベルの数が８万を超えて膨大になってしまう。したがって、膨大な数の教師データが用意されなければ、分割形状決定装置はＣＵの分割形状を学習することができない。 FIG. 8 is a diagram illustrating an example of correct answer labels indicating divided shapes of CUs. When the divided shape determination device learns the divided shape of the CU for each CTU, the correct shape label (classification model) which receives the original image of the coding target image of the CTU unit as an input and outputs the divided shape of the CU is used as the divided shape determination device The method of learning is the simplest method. However, if all divided shapes of a CU are covered in CTU units, the number of correct labels will exceed 80,000 and become enormous. Therefore, the division shape determination device can not learn the division shape of the CU unless a huge number of teacher data are prepared.

そこで、膨大な数の教師データが用意されなくても分割形状決定装置がＣＵの分割形状を学習することができる方法として、ＣＵの分割又は非分割をＣＵの階層ごとに決定する学習モデルを用いた方法が提案されている（非特許文献１参照）。非特許文献１では、膨大な数の教師データを用意する代わりに、ＣＵの分割又は非分割をＣＵの階層ごとに決定する複数の学習モデルを用意することで、分割形状決定装置は、ＣＵの分割形状を学習することができる。 Therefore, as a method by which the divided shape determination device can learn the divided shapes of CUs without using a large number of teacher data, a learning model is used that determines division or non-division of CUs for each hierarchy of CUs. The following method has been proposed (see Non-Patent Document 1). In Non-Patent Document 1, instead of preparing a large number of teacher data, the division shape determination device prepares a plurality of learning models that determine division or non-division of CUs for each hierarchy of CUs. Divided shapes can be learned.

非特許文献１では、分割形状決定装置は、四分木データ構造の階層ごとの学習モデルを順次適用することによって、ＣＵの分割形状を決定する。以下、分割又は非分割が決定される対象のブロックを「対象ブロック」という。以下、ノードに対応付けられているＣＵ（ブロック）の分割に関する確率を「分割確率」という。学習モデル（確率分布モデル）は、ノードに対応付けられた対象ブロックごとに分割確率を表すラベルを出力する。分割（正例）を表す分割確率の値は１である。非分割（負例）を表す分割確率の値は０である。分割確率は、０及び１の平均値である０．５を含む所定範囲内の値（曖昧な値）でもよい。分割確率が曖昧である場合、非特許文献１の分割形状決定装置は、ＨＥＶＣテストモデルにおいて定められたレート歪み最適化に基づいて、対象ブロックであるＣＵの分割形状を決定する。 In Non-Patent Document 1, the divided shape determination device determines a divided shape of a CU by sequentially applying a learning model for each hierarchy of a quadtree data structure. Hereinafter, a block to be divided or not divided is referred to as a “target block”. Hereinafter, the probability regarding division of a CU (block) associated with a node is referred to as “division probability”. The learning model (probability distribution model) outputs a label representing the division probability for each target block associated with the node. The division probability value representing division (positive example) is 1. The division probability value representing non-division (negative example) is zero. The division probability may be a value (ambiguous value) within a predetermined range including 0.5 which is an average value of 0 and 1. If the division probability is ambiguous, the division shape determination device of Non-Patent Document 1 determines the division shape of the CU, which is the target block, based on rate distortion optimization defined in the HEVC test model.

F. Duanmu, Z. Ma, Y. Wang: “Fast CU Partition Decision Using Machine Learning for Screen Content Compression,” IEEE International Conference of Image Processing, Sept. 2015.F. Duanmu, Z. Ma, Y. Wang: “Fast CU Partition Decision Using Machine Learning for Screen Content Compression,” IEEE International Conference of Image Processing, Sept. 2015.

図９は、非特許文献１のＣＵの分割形状を決定するために用意される複数の学習モデルの例を示す図である。図１０は、非特許文献１の分割形状決定装置の動作の例を示すフローチャートである。図９及び図１０に示されているように、非特許文献１の分割形状決定装置は、ＣＵの分割形状を決定する場合、四分木データ構造の階層ごとに用意された複数の学習モデル（分割判定モデル）を用いる。 FIG. 9 is a diagram showing an example of a plurality of learning models prepared to determine the divided shape of a CU of Non-Patent Document 1. As shown in FIG. FIG. 10 is a flowchart showing an example of the operation of the divided shape determination device of Non-Patent Document 1. As shown in FIG. 9 and FIG. 10, when the divided shape determination device of Non-Patent Document 1 determines the divided shapes of CUs, a plurality of learning models prepared for each hierarchy of the quadtree data structure ((1) Use a split judgment model).

分割形状決定装置が複数の学習モデルを用いた場合には、画像の特徴量を抽出する処理の演算量が増えるので、ＣＵの分割形状を決定するための演算量は膨大になる。また、分割形状決定装置が複数の学習モデルを用いた場合には、隣接するＣＵ同士の相関を考慮せずにＣＵの分割形状を独立に決定することになるので、分割形状決定装置は、符号化対象画像を効率的に符号化するためのＣＵの分割形状を決定することができない。 When the divided shape determination device uses a plurality of learning models, the amount of operation of processing for extracting the feature amount of the image increases, so the amount of operation for determining the divided shape of the CU becomes enormous. Further, when the divided shape determination device uses a plurality of learning models, the divided shape of the CU is determined independently without considering the correlation between adjacent CUs, so the divided shape determination device It is not possible to determine the division shape of the CU for efficiently encoding the image to be digitized.

これらのように、従来の分割形状決定装置は、ＣＵの分割形状を決定するための演算量を少なくした場合には、符号化対象画像を効率的に符号化するためのＣＵの分割形状を決定することができない、という問題があった。 As described above, the conventional divided shape determination device determines the divided shape of the CU for efficiently encoding the encoding target image when the amount of operation for determining the divided shape of the CU is reduced. There was a problem that I could not do it.

上記事情に鑑み、本発明は、ＣＵの分割形状を決定するための演算量を少なくした場合でも、符号化対象画像を効率的に符号化するためのＣＵの分割形状を決定することが可能である分割形状決定装置、学習装置、分割形状決定方法及び分割形状決定プログラムを提供することを目的としている。 In view of the above circumstances, according to the present invention, it is possible to determine the CU division shape for efficiently encoding the encoding target image even when the amount of operation for determining the CU division shape is reduced. An object of the present invention is to provide a divided shape determination device, a learning device, a divided shape determination method, and a divided shape determination program.

本発明の一態様は、分割に関する確率である分割確率を保持する複数のノードが階層構造を成しており、前記ノードの集合である学習モデルの学習パラメータを、符号化対象画像を区切るブロックに対応付けられた前記ノードの前記分割確率に応じて更新し、前記学習パラメータが更新された前記学習モデルの出力として得られた前記分割確率を、前記ノードに対応付けて出力する学習部と、前記ノードに対応付けて出力された前記分割確率に基づいて、前記ノードに対応付けられたブロックを分割するか否かを決定する決定部とを備える分割形状決定装置である。 In one aspect of the present invention, a plurality of nodes holding a division probability, which is a probability relating to division, form a hierarchical structure, and learning parameters of a learning model which is a set of nodes are divided into blocks for dividing an image to be coded. A learning unit that updates according to the division probability of the associated node, and outputs the division probability obtained as an output of the learning model in which the learning parameter is updated, in association with the node; It is a division | segmentation shape determination apparatus provided with the determination part which determines whether the block matched with the said node is divided | segmented based on the said division | segmentation probability matched with the node.

本発明の一態様は、上記の分割形状決定装置であって、前記学習部は、前記ノードが保持している前記分割確率に応じて、前記ノードの下位のノードである子ノードの前記分割確率を参照するか否かを定める。 One embodiment of the present invention is the split shape determination device described above, wherein the learning unit is configured to split the child node that is a lower node of the node according to the split probability held by the node. Determine whether to refer to.

本発明の一態様は、上記の分割形状決定装置であって、前記階層構造は、四分木データ構造であり、前記学習部は、前記ノードが保持している前記分割確率が０である場合、前記学習パラメータを更新する際に前記子ノードの前記分割確率を参照しないと定める。 One aspect of the present invention is the divided shape determination device described above, wherein the hierarchical structure is a quadtree data structure, and the learning unit is configured to determine that the division probability held by the node is 0. It is determined that when the learning parameter is updated, the division probability of the child node is not referred to.

本発明の一態様は、上記の分割形状決定装置であって、前記決定部は、前記ノードの下位のノードである子ノードが保持している前記分割確率に基づいて、前記ノードに対応付けられた分割するか否かを決定する。 One aspect of the present invention is the divided shape determination device described above, wherein the determination unit is associated with the node based on the division probability held by a child node that is a subordinate node of the node. Decide whether to divide.

本発明の一態様は、上記の分割形状決定装置であって、前記分割確率は、３値以上で表現される確率である。 One aspect of the present invention is the above-described split shape determination device, wherein the split probability is a probability represented by three or more values.

本発明の一態様は、確率を保持する複数のノードが階層構造を成しており、前記ノードの集合である学習モデルの学習パラメータを、前記ノードの確率が所定値である場合に前記ノードの子ノードの確率に基づくことなく更新する学習部を備える学習装置である。 According to an aspect of the present invention, when a plurality of nodes holding probabilities form a hierarchical structure, a learning parameter of a learning model which is a set of the nodes is a learning parameter of the nodes when the probability of the nodes is a predetermined value. It is a learning apparatus provided with the learning part updated without being based on the probability of a child node.

本発明の一態様は、符号化対象画像を区切るブロックの分割形状を決定する分割形状決定装置が実行する分割形状決定方法であって、分割に関する確率である分割確率を保持する複数のノードが階層構造を成しており、前記ノードの集合である学習モデルの学習パラメータを、前記ブロックに対応付けられた前記ノードの前記分割確率に応じて更新し、前記学習パラメータが更新された前記学習モデルの出力として得られた前記分割確率を、前記ノードに対応付けて出力するステップと、前記ノードに対応付けて出力された前記分割確率に基づいて、前記ノードに対応付けられたブロックを分割するか否かを決定するステップとを有する分割形状決定方法である。 One aspect of the present invention is a division shape determination method executed by a division shape determination device that determines a division shape of a block that divides an encoding target image, and a plurality of nodes holding division probabilities that are probabilities related to division are hierarchical The learning parameter of a learning model that is structured and is a set of nodes, is updated according to the division probability of the nodes associated with the block, and the learning parameters are updated. Whether or not to divide the block associated with the node based on the step of outputting the division probability obtained as output in association with the node, and the division probability output in association with the node And determining the division shape.

本発明の一態様は、コンピュータに、分割に関する確率である分割確率を保持する複数のノードが階層構造を成しており、前記ノードの集合である学習モデルの学習パラメータを、符号化対象画像を区切るブロックに対応付けられた前記ノードの前記分割確率に応じて更新し、前記学習パラメータが更新された前記学習モデルの出力として得られた前記分割確率を、前記ノードに対応付けて出力する手順と、前記ノードに対応付けて出力された前記分割確率に基づいて、前記ノードに対応付けられたブロックを分割するか否かを決定する手順とを実行させるための分割形状決定プログラムである。 According to one aspect of the present invention, in a computer, a plurality of nodes holding division probabilities, which are probabilities related to division, form a hierarchical structure, and learning parameters of a learning model, which is a set of nodes, are encoded target images. Updating according to the division probability of the node associated with the division block, and outputting the division probability obtained as an output of the learning model in which the learning parameter is updated in association with the node It is a division | segmentation shape determination program for performing the procedure which determines whether the block matched with the said node is divided | segmented based on the said division probability matched with the said node.

本発明により、ＣＵの分割形状を決定するための演算量を少なくした場合でも、符号化対象画像を効率的に符号化するためのＣＵの分割形状を決定することが可能である。 According to the present invention, it is possible to determine the divided shape of the CU for efficiently encoding the image to be encoded, even when the amount of calculation for determining the divided shape of the CU is reduced.

第１実施形態における、画像符号化装置１の構成の例を示す図である。It is a figure which shows the example of a structure of the image coding apparatus 1 in 1st Embodiment. 第１実施形態における、四分木データ構造及び出力ラベルの例を示す図である。It is a figure which shows the example of a quadtree data structure and an output label in 1st Embodiment. 第１実施形態における、分割形状決定装置の構成の例を示す図である。It is a figure which shows the example of a structure of a division | segmentation shape determination apparatus in 1st Embodiment. 第１実施形態における、分割形状決定装置の動作の例を示すフローチャートである。It is a flowchart which shows the example of operation | movement of a division | segmentation shape determination apparatus in 1st Embodiment. 第２実施形態における、分割形状決定装置の動作の例を示すフローチャートである。It is a flowchart which shows the example of operation | movement of a division | segmentation shape determination apparatus in 2nd Embodiment. ＣＵの分割形状の一例を示す図である。It is a figure which shows an example of the division | segmentation shape of CU. ＣＵの分割形状を表すための四分木データ構造の例を示す図である。It is a figure which shows the example of the quadtree data structure for showing the division | segmentation shape of CU. ＣＵの分割形状を表す正解ラベルの例を示す図である。It is a figure which shows the example of the correct answer label showing the division | segmentation shape of CU. ＣＵの分割形状を決定するために用意される複数の学習モデルの例を示す図である。It is a figure which shows the example of the several learning model prepared in order to determine the division | segmentation shape of CU. 分割形状決定装置の動作の例を示すフローチャートである。It is a flowchart which shows the example of operation | movement of a division | segmentation shape determination apparatus.

本発明の実施形態について、図面を参照して詳細に説明する。
（第１実施形態）
図１は、画像符号化装置１の構成の例を示す図である。画像符号化装置１は、例えば、パーソナルコンピュータ装置、スマートフォン端末、タブレット端末又はサーバ装置等の情報処理装置である。画像符号化装置１は、動画像を構成する複数の画像（フレーム）を符号化対象画像として符号化する。符号化対象画像は、６４画素×６４画素のサイズであるＣＴＵの単位のブロックに区切られている。 Embodiments of the present invention will be described in detail with reference to the drawings.
First Embodiment
FIG. 1 is a diagram showing an example of the configuration of the image coding device 1. The image encoding device 1 is, for example, an information processing device such as a personal computer device, a smartphone terminal, a tablet terminal, or a server device. The image encoding device 1 encodes a plurality of images (frames) constituting a moving image as an image to be encoded. The image to be encoded is divided into blocks of CTU units each having a size of 64 pixels × 64 pixels.

画像符号化装置１は、分割形状決定装置１０と、減算器１１と、直交変換・量子化部１２と、可変長符号化部１３と、逆量子化・逆直交変換部１４と、加算器１５と、ループフィルタ部１６と、復号ピクチャメモリ１７と、イントラ予測部１８と、インター予測部１９と、イントラ・インター切替スイッチ２０とを備える。画像符号化装置１は、例えば、磁気ハードディスク装置や半導体記憶装置等の不揮発性の記録媒体（非一時的な記録媒体）を記憶部として更に備えてもよい。 The image coding device 1 includes a division shape determination device 10, a subtractor 11, an orthogonal transformation / quantization unit 12, a variable length encoding unit 13, an inverse quantization / inverse orthogonal transformation unit 14, and an adder 15. , A loop filter unit 16, a decoded picture memory 17, an intra prediction unit 18, an inter prediction unit 19, and an intra / inter switching switch 20. The image coding apparatus 1 may further include, for example, a non-volatile recording medium (non-temporary recording medium) such as a magnetic hard disk device or a semiconductor storage device as a storage unit.

分割形状決定装置１０と減算器１１と直交変換・量子化部１２と可変長符号化部１３と逆量子化・逆直交変換部１４と加算器１５とループフィルタ部１６とイントラ予測部１８とインター予測部１９とイントラ・インター切替スイッチ２０との一部又は全部は、例えば、ＣＰＵ（Central Processing Unit）等のプロセッサが、記憶部に記憶されたプログラムを実行することにより実現されてもよいし、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）等のハードウェアを用いて実現されてもよい。 Divided shape determination device 10, subtractor 11, orthogonal transform / quantization unit 12, variable length coding unit 13, inverse quantization / inverse orthogonal transform unit 14, adder 15, loop filter unit 16, intra prediction unit 18, inter Part or all of the prediction unit 19 and the intra / inter switching switch 20 may be realized, for example, by a processor such as a central processing unit (CPU) executing a program stored in the storage unit. It may be realized using hardware such as LSI (Large Scale Integration) or ASIC (Application Specific Integrated Circuit).

分割形状決定装置１０は、単一の学習モデルを用いて学習する情報処理装置（学習装置）である。学習モデルは、分割確率を保持する複数のノードが階層構造を成しているモデルである。学習モデルは、四分木データ構造の各ノードのラベルを出力する学習モデルであれば、特定の学習モデルに限定されない。学習モデルは、ニューラルネットワークの学習モデルでもよいし、ニューラルネットワーク以外の学習モデルでもよい。ニューラルネットワーク以外の学習モデルは、例えば、遺伝的プログラミングの学習モデルでもよい。分割形状決定装置１０は、四分木データ構造で表される汎用データに関する決定方法を学習する。分割形状決定装置１０は、学習した結果を所定の機能部に出力する。 The divided shape determination device 10 is an information processing device (learning device) that performs learning using a single learning model. The learning model is a model in which a plurality of nodes holding division probabilities form a hierarchical structure. The learning model is not limited to a specific learning model as long as the learning model outputs a label of each node of the quadtree data structure. The learning model may be a learning model of a neural network or a learning model other than a neural network. The learning model other than the neural network may be, for example, a learning model of genetic programming. The split shape determination device 10 learns a determination method regarding general-purpose data represented by a quadtree data structure. The divided shape determination device 10 outputs the learned result to a predetermined functional unit.

以下では、分割形状決定装置１０は、一例として、符号化対象画像をＣＴＵごとに取得する。分割形状決定装置１０は、一例として、四分木データ構造で表されるＣＵの分割形状の決定方法を学習する。分割形状決定装置１０は、学習モデルを用いてＣＵの分割形状を学習する。分割形状決定装置１０は、ＣＵの単位のブロックに、ＣＴＵを再帰的に４分割することができる。分割形状決定装置１０は、ＣＵの分割形状を学習した結果に基づいて、ＣＵの分割形状（分割パターン）を決定する。分割形状決定装置１０は、ＨＥＶＣ等のＣＵの分割形状を、ＣＴＵごとに決定する。 Below, the division | segmentation shape determination apparatus 10 acquires an encoding object image for every CTU as an example. The split shape determination device 10 learns, as an example, a method of determining a split shape of a CU represented by a quadtree data structure. The divided shape determination device 10 learns the divided shape of the CU using the learning model. The division shape determination device 10 can recursively divide the CTU into four blocks into units of CU. The division shape determination device 10 determines the division shape (division pattern) of the CU based on the result of learning the division shape of the CU. The division shape determination device 10 determines the division shape of a CU such as HEVC for each CTU.

減算器１１は、ＣＵの分割形状が決定されたＣＴＵごとに、符号化対象画像を分割形状決定装置１０から取得する。減算器１１は、各ノードの分割確率を表す出力ラベルを、ＣＴＵごとに分割形状決定装置１０から取得する。減算器１１は、イントラ予測部１８又はインター予測部１９から、ＣＴＵの予測画像を取得する。減算器１１は、符号化対象画像のＣＴＵと予測画像との差分を、直交変換・量子化部１２に出力する。 The subtractor 11 acquires an image to be encoded from the divided shape determination device 10 for each CTU for which the divided shape of the CU has been determined. The subtractor 11 obtains an output label representing the division probability of each node from the division and shape determination device 10 for each CTU. The subtractor 11 obtains a CTU predicted image from the intra prediction unit 18 or the inter prediction unit 19. The subtractor 11 outputs the difference between the CTU of the image to be encoded and the prediction image to the orthogonal transformation / quantization unit 12.

直交変換・量子化部１２は、ＣＴＵと予測画像との差分に、直交変換処理及び量子化処理を施す。直交変換・量子化部１２は、直交変換処理及び量子化処理の結果である量子化係数を、可変長符号化部１３と逆量子化・逆直交変換部１４とに出力する。 The orthogonal transformation / quantization unit 12 performs orthogonal transformation processing and quantization processing on the difference between the CTU and the predicted image. The orthogonal transformation / quantization unit 12 outputs the quantization coefficient, which is the result of the orthogonal transformation process and the quantization process, to the variable length coding unit 13 and the inverse quantization / inverse orthogonal transformation unit 14.

可変長符号化部１３は、可変長符号化処理を実行する符号化部である。可変長符号化部１３は、量子化係数に可変長符号化処理を施した結果を含む符号化データを、画像復号装置等に出力する。可変長符号化部１３は、動きベクトル等の符号化パラメータを含む符号化データを、画像復号装置等に出力してもよい。符号化パラメータは、例えば、レート歪み最適化の結果に基づいて定められる。 The variable-length coding unit 13 is a coding unit that executes variable-length coding processing. The variable-length coding unit 13 outputs, to an image decoding apparatus or the like, coded data including the result of performing variable-length coding processing on the quantization coefficient. The variable-length coding unit 13 may output coded data including coding parameters such as a motion vector to an image decoding apparatus or the like. The coding parameters are determined, for example, based on the result of rate distortion optimization.

逆量子化・逆直交変換部１４は、量子化係数に逆量子化処理及び逆直交変換処理を施した結果である画像を、加算器１５に出力する。加算器１５は、量子化係数に逆量子化処理及び逆直交変換処理を施した結果である画像を、逆量子化・逆直交変換部１４から取得する。加算器１５は、イントラ予測部１８又はインター予測部１９から、イントラ・インター切替スイッチ２０を介して、ＣＴＵの予測画像を取得する。加算器１５は、量子化係数に逆量子化処理及び逆直交変換処理を施した結果である画像と予測画像とを加算した結果を、ループフィルタ部１６及びイントラ予測部１８に出力する。 The inverse quantization / inverse orthogonal transform unit 14 outputs, to the adder 15, an image which is a result of performing the inverse quantization process and the inverse orthogonal transform process on the quantization coefficient. The adder 15 obtains, from the inverse quantization / inverse orthogonal transformation unit 14, an image which is a result of performing the inverse quantization process and the inverse orthogonal transformation process on the quantization coefficient. The adder 15 obtains a predicted image of CTU from the intra prediction unit 18 or the inter prediction unit 19 via the intra / inter switching switch 20. The adder 15 outputs, to the loop filter unit 16 and the intra prediction unit 18, a result obtained by adding the image which is the result of performing the inverse quantization process and the inverse orthogonal transformation process to the quantization coefficient and the prediction image.

ループフィルタ部１６は、量子化係数に逆量子化処理及び逆直交変換処理を施した結果である画像と予測画像とを加算器１５が加算した結果に、ループフィルタを適用する。ループフィルタ部１６は、ループフィルタが適用された結果を、復号ピクチャメモリ１７に出力する。 The loop filter unit 16 applies a loop filter to the result of the adder 15 adding the image which is the result of performing the inverse quantization process and the inverse orthogonal transformation process on the quantization coefficient and the predicted image. The loop filter unit 16 outputs the result to which the loop filter is applied to the decoded picture memory 17.

復号ピクチャメモリ１７は、例えば、ＲＡＭ（Random Access Memory）などの揮発性の記録媒体である。復号ピクチャメモリ１７は、例えば、半導体記憶装置等の不揮発性の記録媒体（非一時的な記録媒体）でもよい。復号ピクチャメモリ１７は、加算器１５が加算した結果にループフィルタが適用された結果（再構成された信号）を記憶した結果、複数の画像（フレーム）を記憶する。復号ピクチャメモリ１７は、加算器１５が加算した結果にループフィルタが適用された結果を、インター予測部１９に出力する。 The decoded picture memory 17 is, for example, a volatile recording medium such as a random access memory (RAM). The decoded picture memory 17 may be, for example, a non-volatile recording medium (non-temporary recording medium) such as a semiconductor storage device. The decoded picture memory 17 stores a plurality of images (frames) as a result of storing the result (reconstructed signal) to which the loop filter is applied to the result added by the adder 15. The decoded picture memory 17 outputs the result obtained by applying the loop filter to the result added by the adder 15 to the inter prediction unit 19.

イントラ予測部１８は、加算器１５が加算した結果にループフィルタが適用された結果を、加算器１５から取得する。イントラ予測部１８は、加算器１５が加算した結果にループフィルタが適用された結果を、参照画像として使用する。イントラ予測部１８は、加算器１５から取得された参照画像に基づくイントラ予測によって、符号化対象画像のＣＴＵの予測画像を生成する。 The intra prediction unit 18 acquires, from the adder 15, a result in which the loop filter is applied to the result added by the adder 15. The intra prediction unit 18 uses the result obtained by applying the loop filter to the result added by the adder 15 as a reference image. The intra prediction unit 18 generates a predicted image of CTU of the image to be encoded by intra prediction based on the reference image acquired from the adder 15.

インター予測部１９は、ＣＵの分割形状が決定されたＣＴＵごとに、符号化対象画像を分割形状決定装置１０から取得する。インター予測部１９は、加算器１５が加算した結果にループフィルタが適用された結果を、復号ピクチャメモリ１７から取得する。インター予測部１９は、加算器１５が加算した結果にループフィルタが適用された結果を、参照画像として使用する。インター予測部１９は、復号ピクチャメモリ１７から取得された参照画像に基づくインター予測によって、符号化対象画像のＣＴＵの予測画像を生成する。 The inter prediction unit 19 acquires the encoding target image from the division shape determination device 10 for each CTU for which the division shape of the CU is determined. The inter prediction unit 19 acquires from the decoded picture memory 17 the result of the application of the loop filter to the result added by the adder 15. The inter prediction unit 19 uses a result obtained by applying the loop filter to the result added by the adder 15 as a reference image. The inter prediction unit 19 generates a predicted image of the CTU of the image to be encoded by inter prediction based on the reference image acquired from the decoded picture memory 17.

イントラ・インター切替スイッチ２０は、ＣＴＵの予測モードがイントラ予測である場合、イントラ予測部１８が生成した予測画像を、減算器１１及び加算器１５に出力する。イントラ・インター切替スイッチ２０は、ＣＴＵの予測モードがインター予測である場合、インター予測部１９が生成した予測画像を、減算器１１及び加算器１５に出力する。 The intra / inter switching switch 20 outputs the predicted image generated by the intra prediction unit 18 to the subtractor 11 and the adder 15 when the prediction mode of the CTU is intra prediction. The intra / inter switching switch 20 outputs the predicted image generated by the inter prediction unit 19 to the subtractor 11 and the adder 15 when the prediction mode of the CTU is the inter prediction.

次に、四分木データ構造及び出力ラベルの例を説明する。
図２は、四分木データ構造及び出力ラベルの例を示す図である。１個のＣＴＵにおけるＣＵの分割形状は、１個の四分木データ構造を用いて表される。四分木データ構造の各ノードには、ノードに対応付けられているＣＵの分割に関する確率（分割確率）が、ノードのラベルとして定められている。１個の四分木データ構造は、１個のＣＴＵの各ＣＵの分割確率を表す。 Next, examples of quadtree data structures and output labels are described.
FIG. 2 is a diagram showing an example of a quadtree data structure and an output label. The split shape of CU in one CTU is represented using one quadtree data structure. For each node of the quadtree data structure, a probability (division probability) regarding division of a CU associated with the node is defined as a label of the node. One quadtree data structure represents the division probability of each CU of one CTU.

学習モデルは、符号化対象画像のＣＴＵを入力とする。学習モデルは、入力されたＣＴＵのＣＵの分割形状と学習パラメータとに基づいて、四分木データ構造の各ノードの分割確率を表すラベルを出力する。学習モデルが出力するラベル（以下「出力ラベル」という。）の要素数は、１個のＣＴＵにおける四分木データ構造の最大ノード数と等しい。出力ラベルは、ＣＴＵの各ＣＵの分割確率ｙ［ｎ］（ｎは、０から２０までの整数。）から成る。図２の四分木データ構造では、出力ラベルは、学習モデルの出力ユニットの数が２１であることに対応して、ｙ［０］，ｙ［１］，…，ｙ［２０］から成る。 The learning model takes as input the CTU of the image to be encoded. The learning model outputs a label representing the division probability of each node of the quadtree data structure, based on the inputted CTU CU division shape and learning parameters. The number of elements of the label (hereinafter referred to as “output label”) output by the learning model is equal to the maximum number of nodes of the quadtree data structure in one CTU. The output label consists of the division probability y [n] (n is an integer from 0 to 20) of each CU of the CTU. In the quadtree data structure of FIG. 2, the output label consists of y [0], y [1],..., Y [20], corresponding to the number of output units of the learning model being 21.

出力ラベルでは、最も浅い階層における６４×６４のサイズのＣＵの分割確率は、ｙ［０］である。６４×６４のサイズのＣＵよりも１階層下の階層における、３２×３２のサイズの各ＣＵの分割確率は、ｙ［１］〜ｙ［４］である。３２×３２のサイズのＣＵよりも１階層下の階層における、１６×１６のサイズの各ＣＵの分割確率は、ｙ［５］〜ｙ［２０］である。 In the output label, the division probability of a 64 × 64 sized CU in the shallowest hierarchy is y [0]. The division probability of each 32 × 32 sized CU in a hierarchy one hierarchy lower than the 64 × 64 sized CU is y [1] to y [4]. The division probability of each of the 16 × 16 sized CUs in a hierarchy one hierarchy level below the 32 × 32 sized CUs is y [5] to y [20].

学習モデルの正解ラベルの要素数は、１個のＣＴＵにおける四分木データ構造の最大ノード数と等しい。正解ラベルは、ＣＴＵの各ＣＵの分割確率ｔ［ｎ］から成る。図２の四分木データ構造では、正解ラベルｔは、出力ラベルｙ（＝ｙ［０］，ｙ［１］，…，ｙ［２０］）に対応して、ｔ［０］，ｔ［１］，…，ｔ［２０］から成る。 The number of elements of the correct answer label of the learning model is equal to the maximum number of nodes of the quadtree data structure in one CTU. The correct answer label consists of the division probability t [n] of each CU of the CTU. In the quadtree data structure of FIG. 2, the correct answer label t corresponds to the output label y (= y [0], y [1],..., Y [20]), t [0], t [1]. ], ..., t [20].

正解ラベルでは、６４×６４のサイズのＣＵの分割確率は、ｔ［０］である。６４×６４のサイズのＣＵよりも１階層下の階層における、３２×３２のサイズの各ＣＵの分割確率は、ｔ［１］〜ｔ［４］である。３２×３２のサイズのＣＵよりも１階層下の階層における、１６×１６のサイズの各ＣＵの分割確率は、ｔ［５］〜ｔ［２０］である。図１に示された分割形状決定装置１０は、ＣＵの分割形状を表す出力ラベルが正解ラベルに近づくよう、学習の際に学習モデルの学習パラメータを更新する。 In the correct answer label, the division probability of a 64 × 64 sized CU is t [0]. The division probability of each 32 × 32 sized CU in a hierarchy one hierarchy lower than the 64 × 64 sized CU is t [1] to t [4]. The division probability of each of the 16 × 16 sized CUs in the hierarchy one hierarchy level below the 32 × 32 sized CUs is t [5] to t [20]. The divided shape determination device 10 illustrated in FIG. 1 updates the learning parameters of the learning model during learning so that the output label representing the divided shape of the CU approaches the correct answer label.

分割形状決定装置１０は、学習パラメータが更新された学習モデルが出力した出力ラベルに基づいて、分割確率の閾値を超える分割確率を保持しているノードの分割確率を１と決定する。すなわち、分割形状決定装置１０は、分割確率の閾値を超える分割確率を保持しているノードに対応付けられたＣＵを分割すると決定する。 The division shape determination device 10 determines that the division probability of the node holding the division probability exceeding the threshold of the division probability is 1 based on the output label output from the learning model in which the learning parameter is updated. That is, the division shape determination device 10 determines to divide the CU associated with the node holding the division probability exceeding the division probability threshold.

分割形状決定装置１０は、学習パラメータが更新された学習モデルが出力した出力ラベルに基づいて、分割確率の閾値を超えない分割確率を保持しているノードの分割確率を０と決定する。すなわち、分割形状決定装置１０は、分割確率の閾値を超えない分割確率を保持しているノードに対応付けられたＣＵを分割しないと決定する。 The division shape determination device 10 determines that the division probability of the node holding the division probability not exceeding the threshold of the division probability is 0, based on the output label outputted by the learning model in which the learning parameter is updated. That is, the division shape determination device 10 determines not to divide the CU associated with the node holding the division probability not exceeding the threshold of the division probability.

分割形状決定装置１０は、四分木データ構造の親ノードの分割確率が非分割を表す（０である）場合、親ノードの下位のノードである子ノードの分割確率を決定しない。すなわち、分割形状決定装置１０は、分割されないＣＵに対応付けられた親ノードの子ノードの分割確率を決定しない。 When the division probability of the parent node of the quadtree data structure represents non-division (is 0), the division shape determination device 10 does not determine the division probability of a child node which is a subordinate node of the parent node. That is, the division | segmentation shape determination apparatus 10 does not determine the division | segmentation probability of the child node of the parent node matched with CU which is not divided | segmented.

次に、分割形状決定装置１０の構成の例を説明する。
図３は、分割形状決定装置１０の構成の例を示す図である。分割形状決定装置１０は、特徴抽出部１００を、単一の学習モデルとして備える。分割形状決定装置１０は、決定部１１０を更に備える。 Next, an example of the configuration of the divided shape determination device 10 will be described.
FIG. 3 is a diagram showing an example of the configuration of the divided shape determination device 10. As shown in FIG. The divided shape determination device 10 includes the feature extraction unit 100 as a single learning model. The division shape determination device 10 further includes a determination unit 110.

特徴抽出部１００（学習部）は、符号化対象画像の原画又は特徴量を、ＣＴＵごとに取得する。特徴抽出部１００は、符号化対象画像の原画又は特徴量に基づいて、四分木データ構造の各ノードの分割確率を、単一の学習モデルの出力ラベルとして出力する。特徴抽出部１００は、反復された学習の結果として出力ラベルを正解ラベルに近づけるように、学習モデルの学習パラメータを更新する。特徴抽出部１００は、学習の結果として学習パラメータが更新された学習モデルに基づいて、四分木データ構造の各ノードの分割確率を算出する。決定部１１０は、四分木データ構造のノードごとに決定された分割確率を含む出力ラベルを、減算器１１に出力する。 The feature extraction unit 100 (learning unit) acquires, for each CTU, the original image or feature amount of the encoding target image. The feature extraction unit 100 outputs the division probability of each node of the quadtree data structure as an output label of a single learning model based on the original image or the feature amount of the encoding target image. The feature extraction unit 100 updates the learning parameters of the learning model so that the output label approaches the correct answer label as a result of repeated learning. The feature extraction unit 100 calculates a division probability of each node of the quadtree data structure based on a learning model in which learning parameters are updated as a result of learning. The determination unit 110 outputs an output label including the division probability determined for each node of the quadtree data structure to the subtractor 11.

図３では、学習モデルは、一例として、畳み込みニューラルネットワークの学習モデルである。特徴抽出部１００は、畳み込み層１０１と、プーリング層１０２と、畳み込み層１０３と、プーリング層１０４と、全結合層１０５とを備える。 In FIG. 3, the learning model is, as an example, a learning model of a convolutional neural network. The feature extraction unit 100 includes a convolution layer 101, a pooling layer 102, a convolution layer 103, a pooling layer 104, and an all coupling layer 105.

畳み込み層１０１（Convolution Layer）（更新部）は、学習の結果として、フィルタ係数等の学習パラメータを更新する。畳み込み層１０１は、二次元配列の各値に活性化関数を適用してもよい。プーリング層１０２（Pooling Layer）は、カーネル内の最大値、平均値等を用いて、ダウンサンプリングを実行する。すなわち、プーリング層１０２は、畳み込み層１０１の出力結果である二次元配列の各値のうちの有効な値を残す。 The convolution layer 101 (Convolution Layer) (updating unit) updates learning parameters such as filter coefficients as a result of learning. The convolution layer 101 may apply an activation function to each value of the two-dimensional array. The pooling layer 102 performs downsampling using the maximum value, the average value, and the like in the kernel. That is, the pooling layer 102 leaves valid values of each value of the two-dimensional array that is the output result of the convolutional layer 101.

畳み込み層１０３（更新部）は、学習の結果として、フィルタ係数等の学習パラメータを更新する。畳み込み層１０３は、プーリング層１０２の出力結果である二次元配列の各値に、活性化関数を適用してもよい。プーリング層１０４は、カーネル内の最大値、平均値等を用いて、ダウンサンプリングを実行する。すなわち、プーリング層１０４は、畳み込み層１０３の出力結果である二次元配列の各値のうちの有効な値を残す。全結合層１０５（fully connected layer）（分割確率出力部）は、プーリング層１０４の出力を結合することによって、ノードごとに分割確率を表す出力ラベルを出力する。 The convolutional layer 103 (update unit) updates learning parameters such as filter coefficients as a result of learning. The convolution layer 103 may apply an activation function to each value of the two-dimensional array that is the output result of the pooling layer 102. The pooling layer 104 performs downsampling using the maximum value, the average value, etc. in the kernel. That is, the pooling layer 104 leaves valid values of each value of the two-dimensional array that is the output result of the convolutional layer 103. The fully connected layer 105 (split probability output unit) outputs an output label representing the split probability for each node by combining the outputs of the pooling layer 104.

決定部１１０（分割確率決定部）は、全結合層１０５の出力ラベルに基づいて、ノードに対応付けられた対象ブロックの分割確率を決定する。すなわち、決定部１１０は、全結合層１０５の出力ラベルに基づいて、ノードに対応付けられた対象ブロックを分割するか否かを決定する。決定部１１０は、四分木データ構造のノードごとに決定された分割確率を含む出力ラベルを、図１に示された減算器１１にＣＴＵごとに出力する。 The determination unit 110 (division probability determination unit) determines the division probability of the target block associated with the node based on the output labels of all the combined layers 105. That is, based on the output labels of all the combined layers 105, the determination unit 110 determines whether to divide the target block associated with the node. The determination unit 110 outputs an output label including the division probability determined for each node of the quadtree data structure to the subtractor 11 illustrated in FIG. 1 for each CTU.

次に、特徴抽出部１００における学習モデルの学習方法を説明する。
特徴抽出部１００は、ＣＵの分割形状の正解ラベルを学習する場合、符号化対象画像の原画又は特徴量をＣＴＵごとに取得する。全結合層１０５は、出力ラベルｙを出力する。出力ラベルｙは、四分木データ構造の各ノードの分割確率を表す。出力ラベルｙは、式（１）のように表される。出力ラベルｙに対応する正解ラベルｔは、式（２）のように表される。 Next, the learning method of the learning model in the feature extraction unit 100 will be described.
When learning the correct answer label of the divided shape of the CU, the feature extraction unit 100 acquires, for each CTU, the original image or the feature amount of the encoding target image. The total coupling layer 105 outputs an output label y. The output label y represents the division probability of each node of the quadtree data structure. The output label y is expressed as equation (1). The correct answer label t corresponding to the output label y is expressed as equation (2).

ｙ＝[ｙ[０］，ｙ[１］，…，ｙ[２０］］^Ｔ …（１） y = [y [0], y [1], ..., y [20]] ^T (1)

ｔ＝[ｔ[０］，ｔ[１］，…，ｔ[２０］］^Ｔ …（２） t = [t [0], t [1], ..., t [20] ^T (2)

畳み込み層１０１及び畳み込み層１０３は、出力ラベルｙと正解ラベルｔとの誤差を表す誤差関数Ｅの値を算出する。誤差関数Ｅは、出力ラベルｙと正解ラベルｔとのクロスエントロピーや平均二乗誤差等を用いて定義される。畳み込み層１０１及び畳み込み層１０３は、誤差関数Ｅの値が小さくなるよう、畳み込み層１０１及び畳み込み層１０３の学習パラメータｗを誤差逆伝播法等によって更新する。 The convolution layer 101 and the convolution layer 103 calculate the value of the error function E representing the error between the output label y and the correct label t. The error function E is defined using the cross entropy of the output label y and the correct label t, the mean square error, and the like. The convolution layer 101 and the convolution layer 103 update the learning parameters w of the convolution layer 101 and the convolution layer 103 by an error back propagation method or the like so that the value of the error function E becomes small.

畳み込み層１０１及び畳み込み層１０３は、誤差逆伝播法を実行する場合、誤差関数Ｅの値が小さくなる方向に学習モデルの学習パラメータｗを更新する目的で、勾配降下法を用いてもよい。すなわち、畳み込み層１０１は、勾配∇Ｅを表す式（３）を用いて、式（４）のように誤差関数Ｅの値が小さくなる方向に、畳み込み層１０１の学習パラメータｗを更新する。畳み込み層１０３は、勾配∇Ｅを表す式（３）を用いて、式（４）のように誤差関数Ｅの値が小さくなる方向に、畳み込み層１０３の学習パラメータｗを更新する。式（３）において、Ｍは、学習パラメータｗの要素数を表す。式（４）において、εは、学習率を表す。 The convolution layer 101 and the convolution layer 103 may use the gradient descent method in order to update the learning parameter w of the learning model in the direction in which the value of the error function E decreases when the error back propagation method is performed. That is, the convolutional layer 101 updates the learning parameter w of the convolutional layer 101 in the direction in which the value of the error function E becomes smaller as in the equation (4) using the equation (3) representing the gradient ∇E. The convolutional layer 103 updates the learning parameter w of the convolutional layer 103 in the direction in which the value of the error function E becomes smaller as in the equation (4) using the equation (3) representing the gradient ∇E. In equation (3), M represents the number of elements of the learning parameter w. In equation (4), ε represents a learning rate.

勾配∇Ｅ
＝∂Ｅ／∂ｗ
＝[∂Ｅ／∂ｗ_１，∂Ｅ／∂ｗ_２，…，∂Ｅ／∂ｗ_Ｍ］^Ｔ …（３） Gradient ∇ E
= ∂ E / ∂ w
= [∂ E / ∂ w ₁ , ∂ E / ∂ w ₂ , ..., ∂ E / ∂ w _M ] ^T ... (3)

ｗ←ｗ−ε▽Ｅ …（４） w w w-ε E E (4)

第１実施形態では、正解ラベルｔの各要素は、ＨＥＶＣテストモデル（ＨＭ）等の参照ソフトウェアにおけるレート歪み最適化によって得られた分割確率を用いて表される。第１実施形態では、正解ラベルｔにおけるノードの分割確率は、２値（分割又は非分割）で表される。 In the first embodiment, each element of the correct answer label t is represented using a division probability obtained by rate distortion optimization in reference software such as HEVC test model (HM). In the first embodiment, the division probability of the node in the correct answer label t is represented by a binary value (division or non-division).

畳み込み層１０１及び畳み込み層１０３は、ＣＵの分割形状の正解ラベルｔを学習する場合、正解ラベルｔにおいて非分割を表す親ノードの子ノードの分割確率を参照しない。例えば、正解ラベルｔ[１］のノードの分割確率が非分割を表している（分割確率が所定値＝０である）場合、畳み込み層１０１及び畳み込み層１０３は、正解ラベルｔ[１］のノードの子ノードの正解ラベルｔ[５］〜ｔ[８］の分割確率を参照しない。 The convolutional layer 101 and the convolutional layer 103 do not refer to the division probability of the child node of the parent node indicating non-division in the correctness label t when learning the correctness label t of the divided shape of the CU. For example, when the division probability of the node of the correct answer label t [1] represents non-division (the division probability is a predetermined value = 0), the convolution layer 101 and the convolution layer 103 are nodes of the correct answer label t [1] It does not refer to the division probabilities of the correct labels t [5] to t [8] of the child nodes of.

畳み込み層１０１及び畳み込み層１０３は、正解ラベルｔにおいて参照されない分割確率を学習に用いない。すなわち、畳み込み層１０１及び畳み込み層１０３は、分割確率が非分割を表している親ノードの子ノードの分割確率が存在しないものとして学習された結果に基づいて、学習モデルの学習パラメータを更新する。 The convolution layer 101 and the convolution layer 103 do not use the division probability not referenced in the correct answer label t for learning. That is, the convolution layer 101 and the convolution layer 103 update the learning parameters of the learning model based on the result of learning that there is no division probability of the child node of the parent node whose division probability indicates non-division.

次に、分割形状決定装置１０の動作の例を説明する。
図４は、分割形状決定装置１０の動作の例を示すフローチャートである。特徴抽出部１００は、符号化対象画像をＣＴＵごとに取得する。特徴抽出部１００は、符号化対象画像のＣＴＵから、輝度値等の特徴量を抽出する。全結合層１０５は、学習パラメータが更新された学習モデルに基づいて、各ノードの分割確率を算出する（ステップＳ１０１）。決定部１１０は、四分木データ構造における階層が浅いノードに対応する対象ブロックを優先して処理を実行する。 Next, an example of the operation of the divided shape determination device 10 will be described.
FIG. 4 is a flowchart showing an example of the operation of the divided shape determination device 10. The feature extraction unit 100 acquires an encoding target image for each CTU. The feature extraction unit 100 extracts feature amounts such as luminance values from the CTU of the image to be encoded. The all coupling layer 105 calculates the division probability of each node based on the learning model in which the learning parameter is updated (step S101). The determination unit 110 executes processing by giving priority to a target block corresponding to a node having a shallow hierarchy in the quadtree data structure.

決定部１１０は、対象ブロックに対応するノードの四分木データ構造における階層が最深の階層であるか否かを判定する（ステップＳ１０２）。対象ブロックに対応するノードの四分木データ構造における階層が最深の階層でない場合（ステップＳ１０２：ＮＯ）、決定部１１０は、対象ブロックについて、分割確率の閾値を分割確率が超えているか否かを判定する（ステップＳ１０３）。分割確率の閾値を分割確率が超えている場合（ステップＳ１０３：ＹＥＳ）、決定部１１０は、対象ブロックを分割すると決定する。決定部１１０は、対象ブロックに対応するノードの分割確率を１と決定する（ステップＳ１０４）。決定部１１０は、１階層下の階層について、Ｚスキャン等の処理順で次のブロックを対象ブロックとする（ステップＳ１０５）。決定部１１０は、ステップＳ１０２に処理を戻す。 The determination unit 110 determines whether the hierarchy in the quadtree data structure of the node corresponding to the target block is the deepest hierarchy (step S102). When the hierarchy in the quadtree data structure of the node corresponding to the target block is not the deepest hierarchy (step S102: NO), the determination unit 110 determines whether or not the division probability exceeds the threshold of the division probability for the target block. It determines (step S103). If the division probability exceeds the threshold of the division probability (step S103: YES), the determination unit 110 determines to divide the target block. The determination unit 110 determines that the division probability of the node corresponding to the target block is 1 (step S104). The determination unit 110 sets the next block as a target block in the processing order such as Z scan for the next lower layer (step S105). The determination unit 110 returns the process to step S102.

対象ブロックに対応するノードの四分木データ構造における階層が最深の階層である場合（ステップＳ１０２：ＹＥＳ）、決定部１１０は、ステップＳ１０６に処理を進める。分割確率の閾値を分割確率が超えていない場合（ステップＳ１０３：ＮＯ）、決定部１１０は、対象ブロックを分割しないと決定する。決定部１１０は、対象ブロックに対応するノードの分割確率を０と決定する（ステップＳ１０６）。 If the hierarchy in the quadtree data structure of the node corresponding to the target block is the deepest hierarchy (step S102: YES), the determination unit 110 proceeds with the process to step S106. If the division probability does not exceed the threshold of the division probability (step S103: NO), the determination unit 110 determines that the target block is not divided. The determination unit 110 determines that the division probability of the node corresponding to the target block is 0 (step S106).

決定部１１０は、ＣＴＵにおける全てのブロック（ＣＵ）について決定部１１０が分割又は非分割を決定したか否かを判定する（ステップＳ１０７）。ＣＴＵにおけるいずれかのブロック（ＣＵ）について決定部１１０が分割又は非分割を決定していない場合（ステップＳ１０７：ＮＯ）、決定部１１０は、処理順で次のブロックを対象ブロックとする（ステップＳ１０８）。決定部１１０は、ステップＳ１０２に処理を戻す。ＣＴＵにおける全てのブロック（ＣＵ）について決定部１１０が分割又は非分割を決定している場合（ステップＳ１０７：ＹＥＳ）、決定部１１０は、処理を終了する。 The determination unit 110 determines whether or not the determination unit 110 has determined division or non-division for all blocks (CUs) in the CTU (step S107). If the determination unit 110 has not determined division or non-division for any block (CU) in the CTU (step S107: NO), the determination unit 110 sets the next block in the processing order as the target block (step S108). ). The determination unit 110 returns the process to step S102. When the determination unit 110 determines division or non-division for all blocks (CUs) in the CTU (step S107: YES), the determination unit 110 ends the process.

以上のように、第１実施形態の分割形状決定装置１０は、学習部としての特徴抽出部１００と、決定部１１０とを備える。分割確率を保持する複数のノードは、階層構造を成している。特徴抽出部１００は、ノードの集合である学習モデルの学習パラメータｗを、符号化対象画像を区切るブロックに対応付けられたノードの分割確率に応じて更新する。特徴抽出部１００は、学習パラメータが更新された学習モデルの出力として得られた分割確率を、ノードに対応付けて出力する。決定部１１０は、ノードに対応付けて出力された分割確率に基づいて、ノードに対応付けられたブロックを分割するか否かを決定する。 As described above, the divided shape determination apparatus 10 according to the first embodiment includes the feature extraction unit 100 as a learning unit and the determination unit 110. The plurality of nodes holding the division probability form a hierarchical structure. The feature extraction unit 100 updates the learning parameter w of the learning model, which is a set of nodes, according to the division probability of the node associated with the block that divides the encoding target image. The feature extraction unit 100 outputs the division probability obtained as an output of the learning model in which the learning parameter has been updated, in association with the node. The determination unit 110 determines whether to divide the block associated with the node based on the division probability output in association with the node.

これによって、第１実施形態の分割形状決定装置１０は、ＣＵの分割形状を決定するための演算量を少なくした場合でも、符号化対象画像を効率的に符号化するためのＣＵの分割形状を決定することが可能である。 As a result, the division shape determination apparatus 10 according to the first embodiment determines the division shape of the CU for efficiently encoding the encoding target image even when the amount of operation for determining the division shape of the CU is reduced. It is possible to make a decision.

第１実施形態の特徴抽出部１００は、ノードが保持している分割確率に応じて、ノードの下位のノードである子ノードの分割確率を参照するか否かを定める。第１実施形態の特徴抽出部１００は、ノードが保持している分割確率が０である場合、学習パラメータを更新する際に子ノードの分割確率を参照しないと定める。第１実施形態の特徴抽出部１００は、確率を保持する複数のノードが階層構造を成している学習モデルの学習パラメータを、ノードの確率が所定値である場合にノードの子ノードの確率に基づくことなく更新する。 The feature extraction unit 100 according to the first embodiment determines whether to refer to the division probability of a child node that is a subordinate node of the node, according to the division probability held by the node. When the division probability held by the node is 0, the feature extraction unit 100 of the first embodiment determines that the division probability of the child node is not referred to when updating the learning parameter. The feature extraction unit 100 according to the first embodiment uses the learning parameters of a learning model in which a plurality of nodes holding probabilities form a hierarchical structure to the probability of child nodes of the node when the probability of the nodes is a predetermined value. Update without being based.

一般的に、学習モデルは、ＣＴＵにおけるＣＵの全ての分割形状について学習モデルが正解ラベルを学習する場合、ＣＵの分割形状の正解ラベルの数が膨大であるため、ＣＵの分割形状を効率的に学習することができない。非特許文献１では、学習モデルは、ある程度まで効率的に学習することができる。しかしながら、非特許文献１の分割形状決定装置は、符号化処理の前にＣＵの分割形状を決定する処理において、複数の学習モデル（分割判定モデル）を直列に用いて原画から特徴量を抽出する処理を繰り返す。このため、非特許文献１では、原画から特徴量を抽出する処理の演算量が膨大になる。また、非特許文献１の学習モデルは、ＣＴＵにおける空間的位置の相関に基づいて、ＣＵの分割形状を学習することができない。 Generally, when the learning model learns correct labels for all divided shapes of CU in CTU, the number of correct labels of divided shapes of CU is enormous, so the divided shapes of CUs can be efficiently used. I can not learn. In Non-Patent Document 1, a learning model can be efficiently learned to a certain extent. However, in the process of determining the divided shape of the CU before the encoding process, the divided shape determination device of Non-Patent Document 1 extracts feature quantities from the original image by using a plurality of learning models (division determination models) in series. Repeat the process. For this reason, in Non-Patent Document 1, the amount of operation of processing for extracting feature quantities from an original image is enormous. In addition, the learning model of Non-Patent Document 1 can not learn the divided shape of CU based on the correlation of spatial position in CTU.

これに対して、第１実施形態の分割形状決定装置１０は、四分木データ構造のノードの分割確率を単一の学習モデルが学習するので、演算量が少なくても、符号化対象画像を効率的に符号化するためのＣＵの分割形状を決定することができる。第１実施形態の分割形状決定装置１０は、単一の学習モデルを用いてＣＵの分割形状を決定するので、わざわざ複数の学習モデルを直列に用いてＣＵの分割形状を決定しなくてもよい。第１実施形態の分割形状決定装置１０は、単一の学習モデルを用いてＣＵの分割形状を決定するので、学習モデルの出力ユニットの数（要素数）を現実的な数に抑えることが可能である。第１実施形態の分割形状決定装置１０は、輝度値等の特徴量を符号化対象画像から抽出するための演算量を削減することが可能である。第１実施形態の単一の学習モデルは、入力された画像の特徴量をまとめて抽出するので、ＣＴＵにおける空間的位置の相関に基づいて、ＣＵの分割形状を学習することができる。第１実施形態の分割形状決定装置１０は、単一の学習モデルを用いてＣＵの分割形状を決定するので、ＣＴＵにおける空間的位置の相関に基づいて、ＣＵの分割形状を決定することが可能である。第１実施形態の分割形状決定装置１０では、学習誤差に寄与しない正解ラベルの要素を学習モデルが学習の際に参照しないので、非分割を表す分割確率の親ノードの子ノードの分割確率は定義されなくてもよい。なお、分割確率が定義されなくても、子ノードは存在する。 On the other hand, since the single learning model learns the division probability of the nodes of the quadtree data structure in the division shape determination apparatus 10 of the first embodiment, the encoding target image can be obtained even if the amount of calculation is small. It is possible to determine the division shape of the CU for encoding efficiently. Since the division shape determination apparatus 10 according to the first embodiment determines the division shape of the CU using a single learning model, it is not necessary to determine the division shape of the CU using two or more learning models in series. . Since the division shape determination apparatus 10 according to the first embodiment determines the division shape of a CU using a single learning model, the number of output units (number of elements) of the learning model can be reduced to a realistic number. It is. The divided shape determination device 10 according to the first embodiment can reduce the amount of operation for extracting feature quantities such as luminance values from the image to be encoded. The single learning model of the first embodiment extracts the feature quantities of the input image together, so that the divided shape of the CU can be learned based on the correlation of the spatial position in the CTU. Since the divided shape determination apparatus 10 according to the first embodiment determines the divided shape of the CU using a single learning model, it is possible to determine the divided shape of the CU based on the correlation of the spatial position in the CTU. It is. In the divided shape determination device 10 according to the first embodiment, since the learning model does not refer to the element of the correct answer label that does not contribute to the learning error during learning, the division probability of the child node of the parent node of the division probability representing non-division is defined It does not have to be done. Note that child nodes exist even if the division probability is not defined.

（第２実施形態）
第２実施形態では、親ノードの分割確率が曖昧である場合に親ノードの子ノードの分割確率を分割形状決定装置１０が評価する点が、第１実施形態と相違する。第２実施形態では、第１実施形態との相違点についてのみ説明する。 Second Embodiment
The second embodiment is different from the first embodiment in that the division shape determination device 10 evaluates the division probability of the child node of the parent node when the division probability of the parent node is ambiguous. In the second embodiment, only differences from the first embodiment will be described.

決定部１１０は、対象ブロックに対応付けられたノードの分割確率が曖昧（０．５を含む所定範囲内の値）である場合に、対象ブロックに対応付けられたノードの階層の１階層下の子ノードの分割確率と、所定の分割確率の閾値とを比較する。決定部１１０は、親ノードの複数の子ノードについて、分割確率の平均値、最大値又は最小値等を用いて、子ノードの分割確率と分割確率の閾値とを比較してもよい。決定部１１０は、複数の子ノードの分割確率の平均値、最大値又は最小値等のうちから、比較に用いる分割確率を選択してもよい。 When the division probability of the node associated with the target block is ambiguous (value within a predetermined range including 0.5), the determination unit 110 is one hierarchy lower than the hierarchy of the node associated with the target block. The split probability of the child node is compared with a predetermined split probability threshold. The determination unit 110 may compare the division probability of the child node with the threshold of the division probability by using an average value, a maximum value, or a minimum value of the division probability for a plurality of child nodes of the parent node. The determination unit 110 may select a division probability to be used for comparison from among the average value, the maximum value, the minimum value, and the like of the division probabilities of a plurality of child nodes.

決定部１１０は、子ノードの分割確率が分割確率の閾値を超えている場合、子ノードの階層の１階層上の親ノードに対応付けられた対象ブロックを分割すると決定する。決定部１１０は、子ノードの分割確率が分割確率の閾値を超えていない場合、子ノードの階層の１階層上の親ノードに対応付けられた対象ブロックを分割しないと決定する。 When the division probability of the child node exceeds the threshold of the division probability, the determination unit 110 determines to divide the target block associated with the parent node one hierarchy higher than the hierarchy of the child node. If the division probability of the child node does not exceed the threshold of the division probability, the determination unit 110 determines not to divide the target block associated with the parent node one hierarchy higher than the hierarchy of the child node.

次に、分割形状決定装置１０の動作の例を説明する。
図５は、分割形状決定装置１０の動作の例を示すフローチャートである。ステップＳ２０１からステップＳ２０２までは、図４におけるステップＳ１０１からステップＳ１０２までと同様である。決定部１１０は、対象ブロックについて、分割確率が曖昧であるか否かを判定する。すなわち、決定部１１０は、対象ブロックについて、分割確率が０．５に近い値であるか否かを判定する（ステップＳ２０３）。分割確率が曖昧でない場合（ステップＳ２０３：ＮＯ）、決定部１１０は、ステップＳ０４に処理を進める。ステップＳ２０４からステップＳ２０６までは、図４におけるステップＳ１０３からステップＳ１０５までと同様である。 Next, an example of the operation of the divided shape determination device 10 will be described.
FIG. 5 is a flowchart showing an example of the operation of the divided shape determination apparatus 10. Steps S201 to S202 are the same as steps S101 to S102 in FIG. The determination unit 110 determines whether the division probability is ambiguous for the target block. That is, the determination unit 110 determines whether or not the division probability is a value close to 0.5 for the target block (step S203). If the division probability is not ambiguous (step S203: NO), the determination unit 110 proceeds to step S04. Steps S204 to S206 are the same as steps S103 to S105 in FIG.

分割確率が曖昧である場合（ステップＳ２０３：ＹＥＳ）、決定部１１０は、対象ブロックの階層の１階層下の子ノードの分割確率を取得する（ステップＳ２０７）。決定部１１０は、分割確率の閾値を子ノードの分割確率が超えているか否かを判定する（ステップＳ２０８）。分割確率の閾値を子ノードの分割確率が超えている場合（ステップＳ２０８：ＹＥＳ）、決定部１１０は、ステップＳ２０５に処理を進める。分割確率の閾値を子ノードの分割確率が超えていない場合（ステップＳ２０８：ＮＯ）、決定部１１０は、ステップＳ２０９に処理を進める。ステップＳ２０９からステップＳ２１１までは、図４におけるステップＳ１０６からステップＳ１０８までと同様である。 If the division probability is ambiguous (step S203: YES), the determination unit 110 acquires the division probability of a child node one level lower than the level of the target block (step S207). The determination unit 110 determines whether or not the division probability of the child node exceeds the threshold of the division probability (step S208). When the division probability of the child node exceeds the threshold of the division probability (step S208: YES), the determination unit 110 proceeds with the process to step S205. When the division probability of the child node does not exceed the threshold of the division probability (step S208: NO), the determination unit 110 proceeds with the process to step S209. Steps S209 to S211 are the same as steps S106 to S108 in FIG.

以上のように、第２実施形態の決定部１１０は、ノードの下位のノードである子ノードが保持している分割確率に基づいて、ノードに対応付けられた分割するか否かを決定する。これによって、第２実施形態の分割形状決定装置１０は、出力ラベルの分割確率が曖昧である場合に、ＣＵの分割形状を決定するための演算量を少なくした場合でも、符号化対象画像を効率的に符号化するためのＣＵの分割形状を決定することが可能である。 As described above, the determination unit 110 of the second embodiment determines whether or not to perform division associated with a node based on the division probability held by a child node that is a subordinate node of the node. As a result, when the division probability of the output label is ambiguous, the division shape determination apparatus 10 of the second embodiment can efficiently encode the coding target image even when the amount of calculation for determining the division shape of the CU is reduced. It is possible to determine the division shape of the CU to be encoded in an orderly manner.

第２実施形態の分割形状決定装置１０は、１個のＣＴＵを表す四分木データ構造の全ての階層の各ノードの分割確率を、学習モデルの出力ラベルｙとして並列に得ることができる。第２実施形態の分割形状決定装置１０は、四分木データ構造の全ての階層の各ノードの分割確率を並列に得ることができるので、対象ブロックに対応する親ノードの子ノードの分割確率を取得することができる。これによって、第２実施形態の分割形状決定装置１０は、出力ラベルの分割確率が曖昧である場合でも、レート歪み最適化を実行することなく、確率の高い判定処理を実行することができる。 The split shape determination device 10 according to the second embodiment can obtain the split probability of each node of all the layers of the quadtree data structure representing one CTU in parallel as the output label y of the learning model. Since the division shape determination apparatus 10 of the second embodiment can obtain in parallel the division probabilities of each node of all the layers of the quadtree data structure, the division probability of the parent node of the parent node corresponding to the target block It can be acquired. As a result, even when the division probability of the output label is ambiguous, the division shape determination apparatus 10 of the second embodiment can execute the determination processing with high probability without executing rate distortion optimization.

（第３実施形態）
第３実施形態では、正解ラベルにおいて表されるノードの分割確率が３値以上の多値である点が、第１実施形態と相違する。第３実施形態では、第１実施形態との相違点についてのみ説明する。 Third Embodiment
The third embodiment is different from the first embodiment in that the division probability of the node represented in the correct answer label is a multivalue of three or more. In the third embodiment, only differences from the first embodiment will be described.

決定部１１０は、正解ラベルｔに関して、ノードの分割確率が１である場合におけるレート歪みコスト関数Ｊの値と、ノードの分割確率が０である場合におけるレート歪みコスト関数Ｊの値との差を算出する。決定部１１０は、算出された差が所定のコスト閾値以上である場合、重み係数を所定の係数閾値以上にする。これによって、決定部１１０は、分割確率の閾値から遠い分割確率を正解ラベルｔの要素に含めることが可能である。決定部１１０は、算出された差が所定のコスト閾値未満である場合、重み係数を所定の係数閾値未満にする。これによって、決定部１１０は、分割確率の閾値に近い分割確率を正解ラベルｔの要素に含めることが可能である。 The determination unit 110 determines the difference between the value of the rate distortion cost function J when the division probability of the node is 1 and the value of the rate distortion cost function J when the division probability of the node is 0 with respect to the correct label t. calculate. If the calculated difference is equal to or greater than a predetermined cost threshold, the determination unit 110 sets the weighting factor equal to or greater than the predetermined coefficient threshold. Thereby, the determination unit 110 can include the division probability far from the threshold of the division probability in the element of the correct answer label t. If the calculated difference is less than the predetermined cost threshold, the determination unit 110 sets the weighting factor to less than the predetermined coefficient threshold. Thus, the determination unit 110 can include the division probability close to the division probability threshold in the element of the correct answer label t.

決定部１１０は、算出された差に応じた重み係数を用いて、正解ラベルｔにおける各ノードの分割確率を変更する。このようにして、決定部１１０は、正解ラベルｔにおいて表されるノードの分割確率を３値以上の多値にする。例えば、正解ラベルｔにおいて表されるノードの分割確率は、０から１までの間で連続する値でもよい。 The determination unit 110 changes the division probability of each node in the correct answer label t using the weighting factor according to the calculated difference. In this manner, the determination unit 110 sets the division probability of the node represented by the correct answer label t to a multivalue of three or more. For example, the division probability of the node represented by the correct answer label t may be a continuous value between 0 and 1.

畳み込み層１０１及び畳み込み層１０３は、出力ラベルｙと正解ラベルｔとの誤差を表す誤差関数Ｅの値を算出する。誤差関数Ｅは、出力ラベルｙと正解ラベルｔとの平均二乗誤差等を用いて定義される。畳み込み層１０１及び畳み込み層１０３は、誤差関数Ｅの値が小さくなるよう、四分木データ構造の各層の学習パラメータｗを誤差逆伝播法によって更新する。 The convolution layer 101 and the convolution layer 103 calculate the value of the error function E representing the error between the output label y and the correct label t. The error function E is defined using the mean square error of the output label y and the correct label t. The convolution layer 101 and the convolution layer 103 update the learning parameter w of each layer of the quadtree data structure by the error back propagation method so that the value of the error function E becomes smaller.

以上のように、第３実施形態の分割確率は、３値以上で表現される確率である。これによって、第３実施形態の分割形状決定装置１０は、ＣＵの分割形状を決定するための演算量を少なくした場合でも、符号化対象画像をより効率的に符号化するためのＣＵの分割形状を決定することが可能である。 As mentioned above, the division | segmentation probability of 3rd Embodiment is a probability represented by three or more values. Thus, the division shape determination apparatus 10 according to the third embodiment divides the CU division shape for encoding the encoding target image more efficiently, even when the amount of operation for determining the division shape of the CU is reduced. It is possible to determine

第３実施形態の分割形状決定装置１０は、正解ラベルを設計する場合、レート歪みコスト関数Ｊに応じた重み係数が乗算された多値である分割確率を、正解ラベルの要素に含める。これによって、第３実施形態の分割形状決定装置１０は、ＣＵの分割形状が符号化効率に与える影響を考慮してＣＵの分割形状を決定することが可能である。第３実施形態の分割形状決定装置１０は、特徴量に基づく学習モデルにおける機械学習によって得られる出力とレート歪み最適化における全検索によって得られる出力との差を埋めることが可能である。 When designing the correct answer label, the divided shape determination apparatus 10 according to the third embodiment includes, in the elements of the correct answer label, the division probability that is a multiple value multiplied by the weighting factor according to the rate distortion cost function J. By this, the division | segmentation shape determination apparatus 10 of 3rd Embodiment can determine the division | segmentation shape of CU in consideration of the influence which the division | segmentation shape of CU gives to encoding efficiency. The divided shape determination apparatus 10 according to the third embodiment can fill in the difference between the output obtained by machine learning in the learning model based on the feature amount and the output obtained by the full search in rate distortion optimization.

上述した実施形態における画像符号化装置、分割形状決定装置及び学習装置の少なくとも一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 At least a part of the image coding device, the divided shape determination device, and the learning device in the embodiments described above may be realized by a computer. In that case, a program for realizing this function may be recorded in a computer readable recording medium, and the program recorded in the recording medium may be read and executed by a computer system. Here, the “computer system” includes an OS and hardware such as peripheral devices. The term "computer-readable recording medium" refers to a storage medium such as a flexible disk, a magneto-optical disk, a ROM, a portable medium such as a ROM or a CD-ROM, or a hard disk built in a computer system. Furthermore, “computer-readable recording medium” dynamically holds a program for a short time, like a communication line in the case of transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include one that holds a program for a certain period of time, such as volatile memory in a computer system that becomes a server or a client in that case. Further, the program may be for realizing a part of the functions described above, or may be realized in combination with the program already recorded in the computer system. It may be realized using a programmable logic device such as an FPGA (Field Programmable Gate Array).

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes design and the like within the scope of the present invention.

本発明は、画像を区切るブロックの分割形状を決定する分割形状決定装置、四分木データ構造で表現される汎用データを学習する学習装置、画像符号化装置に適用可能である。 The present invention is applicable to a division shape determination device that determines the division shape of a block that divides an image, a learning device that learns general-purpose data represented by a quadtree data structure, and an image coding device.

１…画像符号化装置、１０…分割形状決定装置、１１…減算器、１２…直交変換・量子化部、１３…可変長符号化部、１４…逆量子化・逆直交変換部、１５…加算器、１６…ループフィルタ部、１７…復号ピクチャメモリ、１８…イントラ予測部、１９…インター予測部、２０…イントラ・インター切替スイッチ、１００…特徴抽出部、１０１…畳み込み層、１０２…プーリング層、１０３…畳み込み層、１０４…プーリング層、１０５…全結合層、１１０…決定部 DESCRIPTION OF SYMBOLS 1 ... Image coding apparatus, 10 ... Division | segmentation shape determination apparatus, 11 ... Subtractor, 12 ... Orthogonal transformation and quantization part, 13 ... Variable-length encoding part, 14 ... Dequantization and inverse orthogonal transformation part, 15 ... Addition , 16: loop filter unit, 17: decoded picture memory, 18: intra prediction unit, 19: inter prediction unit, 20: intra / inter switching switch, 100 ... feature extraction unit, 101 ... convolution layer, 102 ... pooling layer, 103 ... convolutional layer, 104 ... pooling layer, 105 ... all coupling layer, 110 ... determination unit

Claims

A plurality of nodes that hold a division probability, which is a probability related to division, form a hierarchical structure, and learning parameters of a learning model that is a set of nodes are associated with a block that divides a coding target image. A learning unit that updates according to the division probability, and outputs the division probability obtained as an output of the learning model with the learning parameter updated, in association with the node;
A determination unit that determines whether to divide a block associated with a node based on the division probability output in association with the node.

The divided shape according to claim 1, wherein the learning unit determines whether or not to refer to the division probability of a child node which is a subordinate node of the node according to the division probability held by the node. Decision device.

The hierarchical structure is a quadtree data structure,
The divided shape according to claim 2, wherein, when the division probability held by the node is 0, the learning unit determines not to refer to the division probability of the child node when updating the learning parameter. Decision device.

The determination unit determines whether or not to perform division associated with the node, based on the division probability held by a child node that is a subordinate node of the node. The split shape determination device according to any one of the above.

The divided shape determination device according to any one of claims 1 to 4, wherein the division probability is a probability represented by three or more values.

The plurality of nodes holding the probability form a hierarchical structure, and the learning parameter of the learning model which is a set of the nodes is based on the probabilities of the child nodes of the node when the probability of the node is a predetermined value. A learning device provided with a learning unit that updates without updating.

A division shape determination method executed by a division shape determination device that determines a division shape of a block that divides an image to be encoded.
A plurality of nodes holding a division probability, which is a probability related to division, form a hierarchical structure, and learning parameters of a learning model which is a set of the nodes are determined according to the division probability of the node associated with the block. Outputting the division probability obtained as an output of the learning model in which the learning parameter is updated, in association with the node; and
Determining whether or not to divide the block associated with the node based on the division probability output in association with the node.

On the computer
A plurality of nodes that hold a division probability, which is a probability related to division, form a hierarchical structure, and learning parameters of a learning model that is a set of nodes are associated with a block that divides a coding target image. A procedure of updating according to the division probability and outputting the division probability obtained as an output of the learning model with the learning parameter updated, in association with the node;
A division shape determination program for executing the steps of determining whether to divide a block associated with a node based on the division probability output in association with the node.