JP6242055B2

JP6242055B2 - Image encoding device

Info

Publication number: JP6242055B2
Application number: JP2013027873A
Authority: JP
Inventors: 一之宮澤
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2013-02-15
Filing date: 2013-02-15
Publication date: 2017-12-06
Anticipated expiration: 2033-02-15
Also published as: JP2014158164A

Description

この発明は、画像を圧縮符号化して伝送する画像符号化装置に関するものである。 The present invention relates to an image coding apparatus for compressing and transmitting an image.

カメラなどで撮影された映像（例えば、動画像）に写っている被写体を自動的に認識する物体認識技術は、例えば、画像による監視や、画像の自動編集・分類などを実施する上で重要である。
一般的に、画像に写っている被写体を認識するには、複数のウィンドウサイズで画像を走査する処理や、画像中の特徴点を抽出する処理などが必要である。
しかし、画像を走査する処理や特徴点を抽出する処理を行うには、膨大な演算量を必要とするため、多くの処理時間を要する。このため、例えば、実時間で物体認識を行うアプリケーションにおいては、処理が間に合わないなどの不具合を生じることがある。 Object recognition technology that automatically recognizes a subject captured in a video (for example, a moving image) taken by a camera or the like is important for, for example, image monitoring or automatic image editing / classification. is there.
In general, in order to recognize a subject in an image, processing for scanning the image with a plurality of window sizes, processing for extracting feature points in the image, and the like are required.
However, performing a process of scanning an image and a process of extracting a feature point requires a large amount of calculation, and therefore requires a lot of processing time. For this reason, for example, in an application that performs object recognition in real time, there may be a problem that processing is not in time.

ところで、カメラなどで撮影された画像には、一般的に符号化処理による情報量の圧縮が施される。
ＭＰＥＧやＩＴＵ−ＴＨ．２６ｘ等の国際標準動画像符号化方式では、画像を一定サイズのブロックに分割し、各々のブロック毎に符号化処理を適用する。この際、ブロックを被写体に合わせて再帰的に分割することで、圧縮性能を高めている。
図１１は最大のブロックサイズが６４画素×６４画素、再帰的な分割の階層数が３である場合のブロックの階層構造の一例を示す説明図である。 By the way, the amount of information is generally compressed by an encoding process on an image photographed by a camera or the like.
MPEG and ITU-TH In an international standard moving picture coding system such as 26x, an image is divided into blocks of a certain size, and coding processing is applied to each block. At this time, the compression performance is improved by recursively dividing the block according to the subject.
FIG. 11 is an explanatory diagram showing an example of a hierarchical structure of blocks in the case where the maximum block size is 64 pixels × 64 pixels and the number of recursive divisions is three.

ここで、符号化時に決定されるブロックのサイズに着目すると、被写体の周囲では、ブロックサイズが小さく、それ以外の背景領域では、ブロックサイズが大きくなる傾向にある。
図１２は被写体の周囲でブロックサイズが小さくなっている例を示す説明図である。
したがって、符号化時に決定されるブロックのサイズを参照することで、画像に写っている被写体が存在している領域を絞り込むことができる。
物体認識処理を絞り込んだ領域に施すようにすれば、物体認識に要する演算量を大幅に削減することができる。
また、物体認識処理を不要な領域に施すことがなくなるため、物体の誤認識を削減することも可能である。 Here, focusing on the block size determined at the time of encoding, the block size tends to be small around the subject and the block size tends to be large in other background areas.
FIG. 12 is an explanatory diagram showing an example in which the block size is reduced around the subject.
Therefore, by referring to the block size determined at the time of encoding, it is possible to narrow down the area where the subject shown in the image exists.
If the object recognition process is applied to the narrowed area, the amount of calculation required for the object recognition can be greatly reduced.
In addition, since object recognition processing is not performed on unnecessary areas, it is possible to reduce erroneous recognition of objects.

例えば、以下の特許文献１には、画像の符号化時に得られる情報を利用することで、物体認識の性能を向上させている画像符号化装置が開示されている。
画像の符号化時に得られる情報として、画像の符号化の過程で決定される符号化ブロックのサイズなどが考えられるが、符号化ブロックのサイズはビットレートに依存して大きく変動する。
図１３はビットレートに応じて符号化ブロックのサイズが変化する様子を示す説明図である。
符号化ブロックのサイズはビットレートに依存しており、図１３に示すように、高ビットレートでは、符号化ブロックのサイズが小さくなり、低ビットレートでは、符号化ブロックのサイズが大きくなる。 For example, Patent Document 1 below discloses an image coding apparatus that improves the performance of object recognition by using information obtained at the time of image coding.
As information obtained at the time of image encoding, the size of an encoded block determined in the process of image encoding can be considered. The size of the encoded block varies greatly depending on the bit rate.
FIG. 13 is an explanatory diagram showing how the size of the encoded block changes according to the bit rate.
The size of the coding block depends on the bit rate. As shown in FIG. 13, the size of the coding block decreases at a high bit rate, and the size of the coding block increases at a low bit rate.

低ビットレートでは、符号化ブロックのサイズが大きくなるため、被写体の周囲でのブロックサイズと、それ以外の背景領域でのブロックサイズとの差が小さくなっている。このため、符号化ブロックのサイズを示す情報から、被写体の形状に関する情報を得ることが困難な場合がある。 At a low bit rate, the size of the encoded block increases, so the difference between the block size around the subject and the block size in other background areas is small. For this reason, it may be difficult to obtain information on the shape of the subject from information indicating the size of the encoded block.

特開２００９−２７１７５８号公報（段落番号［０００６］から［０００７］）JP 2009-271758 A (paragraph numbers [0006] to [0007])

従来の画像符号化装置は以上のように構成されているので、高ビットレートの画像であれば、符号化時に決定されるブロックのサイズを参照することで、画像に写っている被写体が存在している領域を適正に絞り込むことができる。しかし、低ビットレートの画像の場合、画像に写っている被写体が存在している領域を適正に絞り込むことができず、物体認識に要する演算量の削減と誤認識の削減を達成することができないなどの課題があった。 Since the conventional image encoding apparatus is configured as described above, if the image has a high bit rate, there is a subject in the image by referring to the block size determined at the time of encoding. Can be narrowed down appropriately. However, in the case of an image with a low bit rate, it is not possible to properly narrow down the area where the subject in the image is present, and it is not possible to achieve a reduction in the amount of computation required for object recognition and a reduction in false recognition. There were issues such as.

この発明は上記のような課題を解決するためになされたもので、画像のビットレートが変動しても、画像に写っている被写体が存在している領域を適正に絞り込んで、物体認識に要する演算量の削減と誤認識の削減を達成することができる画像符号化装置を得ることを目的とする。 The present invention has been made to solve the above-described problems. Even if the bit rate of the image fluctuates, the present invention appropriately narrows down the area where the subject shown in the image exists and is required for object recognition. It is an object of the present invention to obtain an image coding apparatus that can achieve a reduction in the amount of calculation and a reduction in erroneous recognition.

この発明に係る画像符号化装置は、利用可能な複数の予測モードの中から、符号化対象画像に含まれている各々のブロックに対する符号化処理を実施する際に用いる予測モードをそれぞれ選択する予測モード選択部と、符号化対象画像に含まれている各々のブロックのうち、或るブロックについての予測モードとして、予測モード選択部によりインター予測モードに含まれるマージモードが選択されていれば、予測モード選択部によりイントラ予測モードが選択されるときよりも、当該ブロックのサイズが小さくなるように決定し、或るブロックについての予測モードとして、予測モード選択部によりマージモード以外のインター予測モードが選択されていれば、予測モード選択部によりマージモードが選択されるときよりも、当該ブロックのサイズが小さくなるように決定するブロックサイズ決定部と、符号化対象画像に含まれている各々のブロックのうち、予測モード選択部により選択された予測モードがマージモード以外のインター予測モードであるブロックと対応している要素、予測モード選択部により選択された予測モードがイントラ予測モードであるブロックに含まれている複数の領域のそれぞれと対応している要素及び予測モード選択部により選択された予測モードがマージモードであるブロックに含まれている複数の領域のそれぞれと対応している要素が２次元に配列されている２次元配列テーブルが用意されており、２次元配列テーブルの要素が、予測モードがマージモード以外のインター予測モードであるブロックに対応していれば、マージモード以外のインター予測モードであるブロックのサイズに対応する数値を当該要素に割り当て、２次元配列テーブルの要素が、予測モードがイントラ予測モードであるブロックに含まれている領域に対応していれば、イントラ予測モードであるブロックのサイズに対応する数値を当該要素に割り当て、２次元配列テーブルの要素が、予測モードがマージモードであるブロックに含まれている領域に対応していれば、マージモードであるブロックのサイズに対応する数値を当該要素に割り当て、２次元配列テーブルの各々の要素に割り当てた数値に対する閾値処理を実施することで、被写体が存在している領域を推定する被写体領域推定部とを設け、被写体検出部が、被写体領域推定部により推定された領域に対する物体認識処理を実施して、符号化対象画像に写っている被写体を検出するようにしたものである。 The image encoding apparatus according to the present invention selects a prediction mode to be used when performing an encoding process for each block included in an encoding target image from a plurality of available prediction modes. If the merge mode included in the inter prediction mode is selected by the prediction mode selection unit as a prediction mode for a certain block among the mode selection unit and each block included in the encoding target image, the prediction is performed. It is determined that the size of the block is smaller than when the intra prediction mode is selected by the mode selection unit, and an inter prediction mode other than the merge mode is selected by the prediction mode selection unit as a prediction mode for a certain block. If this is the case, the block is more effective than when the merge mode is selected by the prediction mode selection unit. A block size determination unit that determines the size to be reduced, and a block in which the prediction mode selected by the prediction mode selection unit is an inter prediction mode other than the merge mode among the blocks included in the encoding target image The element corresponding to each of the plurality of regions included in the block whose prediction mode selected by the prediction mode selection unit is the intra prediction mode and the prediction selected by the prediction mode selection unit A two-dimensional array table is prepared in which elements corresponding to each of a plurality of regions included in a block whose mode is merge mode are two-dimensionally arranged, and the elements of the two-dimensional array table are predicted. If the mode corresponds to a block that is in inter prediction mode other than merge mode, -Assign a numerical value corresponding to the size of the block in the prediction mode to the corresponding element, and if the element of the two-dimensional array table corresponds to the area included in the block whose prediction mode is the intra prediction mode, intra prediction A numerical value corresponding to the size of the block that is the mode is assigned to the element, and if the element of the two-dimensional array table corresponds to an area included in the block whose prediction mode is the merge mode, the block that is in the merge mode A subject area estimation unit that estimates a region where a subject exists by assigning a numerical value corresponding to the size of the object to the element and performing threshold processing on the numerical value assigned to each element of the two-dimensional array table The subject detection unit performs object recognition processing on the region estimated by the subject region estimation unit to generate an encoding target image. It is intended to detect the subject in the image.

この発明によれば、予測モード選択手段により選択された予測モードがイントラ予測モードであれば、所定のブロックサイズに決定し、その予測モードがインター予測モードであれば、所定のブロックサイズより小さなブロックサイズに決定する第２のブロックサイズ決定手段と、符号化対象画像に対応する２次元配列に対して、第２のブロックサイズ決定手段により決定されたブロックサイズに対応する数値を割り当て、その２次元配列の数値に対する閾値処理を実施することで、被写体が存在している領域を推定する被写体領域推定手段とを設け、被写体検出手段が、被写体領域推定手段により推定された領域に対する物体認識処理を実施して、符号化対象画像に写っている被写体を検出するように構成したので、符号化対象画像のビットレートが変動しても、符号化対象画像に写っている被写体が存在している領域を適正に絞り込んで、物体認識に要する演算量の削減と誤認識の削減を達成することができる効果がある。 According to this invention, if the prediction mode selected by the prediction mode selection means is the intra prediction mode, the predetermined block size is determined. If the prediction mode is the inter prediction mode, a block smaller than the predetermined block size is determined. A second block size determining means for determining the size, and a numerical value corresponding to the block size determined by the second block size determining means is assigned to the two-dimensional array corresponding to the encoding target image, and the two-dimensional Subject area estimation means for estimating the area where the subject exists is provided by performing threshold processing on the numerical values of the array, and the object detection means performs object recognition processing on the area estimated by the object area estimation means Thus, since the subject in the encoding target image is detected, the bit of the encoding target image is detected. Even if the rate fluctuates, it is possible to appropriately narrow down the area where the subject shown in the encoding target image exists, thereby reducing the amount of calculation required for object recognition and reducing erroneous recognition. .

この発明の実施の形態１による画像符号化装置を示す構成図である。It is a block diagram which shows the image coding apparatus by Embodiment 1 of this invention. この発明の実施の形態１による画像符号化装置の被写体領域推定部２２を示す構成図である。It is a block diagram which shows the to-be-photographed region estimation part 22 of the image coding apparatus by Embodiment 1 of this invention. 符号化対象画像に対応する２次元配列の一例を示す説明図である。It is explanatory drawing which shows an example of the two-dimensional arrangement | sequence corresponding to an encoding object image. 数値が割り当てられている２次元配列に対するノイズ除去処理を示す説明図である。It is explanatory drawing which shows the noise removal process with respect to the two-dimensional arrangement | sequence to which the numerical value is allocated. マスク生成部３３による２値マスクの生成処理を示す説明図である。It is explanatory drawing which shows the production | generation process of the binary mask by the mask production | generation part 33. FIG. マスク生成部３３による収縮処理と膨張処理の一例を示す説明図である。It is explanatory drawing which shows an example of the shrinkage | contraction process by the mask production | generation part 33, and an expansion process. 被写体検出部２３が物体認識処理を適用する領域を示す説明図である。It is explanatory drawing which shows the area | region where the subject detection part 23 applies an object recognition process. 符号化対象画像に写っている被写体の検出結果（物体認識結果）を示す説明図である。It is explanatory drawing which shows the detection result (object recognition result) of the to-be-photographed object reflected in the encoding object image. この発明の実施の形態２による画像符号化装置を示す構成図である。It is a block diagram which shows the image coding apparatus by Embodiment 2 of this invention. ２値マスクと多値マスクの生成例を示す説明図である。It is explanatory drawing which shows the example of a production | generation of a binary mask and a multi-value mask. 最大のブロックサイズが６４画素×６４画素、再帰的な分割の階層数が３である場合のブロックの階層構造の一例を示す説明図である。It is explanatory drawing which shows an example of the hierarchical structure of a block in case the largest block size is 64 pixels x 64 pixels, and the number of layers of recursive division is 3. 被写体の周囲でブロックサイズが小さくなっている例を示す説明図である。It is explanatory drawing which shows the example which block size is small around the to-be-photographed object. ビットレートに応じて符号化ブロックのサイズが変化する様子を示す説明図である。It is explanatory drawing which shows a mode that the size of an encoding block changes according to a bit rate.

実施の形態１．
図１はこの発明の実施の形態１による画像符号化装置を示す構成図である。
図１において、符号化制御部１は符号化効率検証部２、予測モード選択部３、ブロックサイズ決定部４及び予測差分符号化パラメータ決定部５から構成されており、インター予測処理（動き補償予測処理）又はイントラ予測処理が実施される際の処理単位となる符号化ブロックの最大サイズを決定するとともに、最大サイズの符号化ブロックが階層的に分割される際の上限の階層数を決定する処理を実施する。
また、符号化制御部１は利用可能な２以上の予測モード（１以上のイントラ予測モード及び１以上のインター予測モード）の中から、階層的に分割される各々の符号化ブロックに適する予測モードを選択する処理を実施する。ここで、この実施の形態１では、インター予測モードは通常マージモードを含むものとし、以下の説明では、マージモードを含むか、含まないかを区別する場合は明示するものとする。
さらに、符号化制御部１は変換ブロックサイズや量子化パラメータを決定し、その変換ブロックサイズや量子化パラメータを含む予測差分符号化パラメータを出力する処理を実施する。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing an image coding apparatus according to Embodiment 1 of the present invention.
In FIG. 1, an encoding control unit 1 includes an encoding efficiency verification unit 2, a prediction mode selection unit 3, a block size determination unit 4, and a prediction difference encoding parameter determination unit 5, and performs inter prediction processing (motion compensated prediction). Processing) or processing for determining the maximum size of a coding block that is a processing unit when intra prediction processing is performed, and determining the upper limit number of layers when the coding block of maximum size is hierarchically divided To implement.
The encoding control unit 1 also has a prediction mode suitable for each encoded block divided hierarchically from two or more available prediction modes (one or more intra prediction modes and one or more inter prediction modes). The process of selecting is performed. Here, in this Embodiment 1, it is assumed that the inter prediction mode includes the normal merge mode, and in the following description, it is clearly indicated whether the merge mode is included or not included.
Furthermore, the encoding control unit 1 determines a transform block size and a quantization parameter, and performs a process of outputting a prediction difference coding parameter including the transform block size and the quantization parameter.

符号化制御部１の符号化効率検証部２は符号化対象画像を各種のサイズのブロック（例えば、６４×６４画素のブロック、３２×３２画素のブロック、１６×１６画素のブロック、８×８画素のブロック）に分割して、各々のブロックに対する符号化処理を利用可能な各種の予測モードで実施した場合の符号化効率を検証する処理を実施する。なお、符号化効率検証部２は符号化効率検証手段を構成している。
符号化制御部１の予測モード選択部３は利用可能な２以上の予測モードの中で、符号化効率検証部２により検証された符号化効率が最も高い予測モードを選択するとともに、その予測モードと、その予測モードに対応する予測パラメータ（インター予測パラメータ、またはイントラ予測パラメータ）を出力する処理を実施する。なお、予測モード選択部３は予測モード選択手段を構成している。 The encoding efficiency verification unit 2 of the encoding control unit 1 converts the encoding target image into blocks of various sizes (for example, 64 × 64 pixel block, 32 × 32 pixel block, 16 × 16 pixel block, 8 × 8). A process of verifying the encoding efficiency when the encoding process for each block is performed in various prediction modes that can be used is performed. The encoding efficiency verification unit 2 constitutes an encoding efficiency verification unit.
The prediction mode selection unit 3 of the encoding control unit 1 selects a prediction mode with the highest encoding efficiency verified by the encoding efficiency verification unit 2 from two or more available prediction modes, and the prediction mode And the process which outputs the prediction parameter (inter prediction parameter or intra prediction parameter) corresponding to the prediction mode is implemented. In addition, the prediction mode selection part 3 comprises the prediction mode selection means.

符号化制御部１のブロックサイズ決定部４は各種のサイズのブロックに対して、予測モード選択部３により選択された予測モードで符号化処理が実施された場合の符号化効率（符号化効率検証部２により検証された符号化効率）を比較して、符号化効率が最も高いサイズを特定する処理を実施する。なお、ブロックサイズ決定部４は第１のブロックサイズ決定手段を構成している。 The block size determination unit 4 of the encoding control unit 1 performs encoding efficiency (encoding efficiency verification) when encoding processing is performed on blocks of various sizes in the prediction mode selected by the prediction mode selection unit 3. (Encoding efficiency verified by the unit 2) is compared, and a process for specifying a size having the highest encoding efficiency is performed. The block size determining unit 4 constitutes a first block size determining unit.

符号化制御部１の予測差分符号化パラメータ決定部５は変換ブロックサイズや量子化パラメータを決定し、その変換ブロックサイズや量子化パラメータを含む予測差分符号化パラメータを出力する処理を実施する。 The prediction difference encoding parameter determination unit 5 of the encoding control unit 1 determines a transform block size and a quantization parameter, and performs a process of outputting a prediction difference encoding parameter including the transform block size and the quantization parameter.

ブロック分割部６は符号化対象画像をブロックサイズ決定部４により決定されたブロックサイズの符号化ブロックに分割する処理を実施する。なお、ブロック分割部６はブロック分割手段を構成している。
予測画像生成部７はイントラ予測部及び動き補償予測部を備えており、イントラ予測部は予測モード選択部３により選択された予測モードがイントラ予測モードである場合、メモリ１４により格納されている符号化済みブロックの局所復号画像（参照画像）を参照しながら、予測モード選択部３から出力されたイントラ予測パラメータを用いて、ブロック分割部６により分割された符号化ブロックに対するイントラ予測処理を実施して、予測画像を生成する処理を実施する。
動き補償予測部は予測モード選択部３により選択された予測モードがインター予測モードである場合、ブロック分割部６により分割された符号化ブロックとメモリ１４により格納されている符号化済みブロックの局所復号画像（参照画像）を比較することで動き探索を実施して動きベクトルを算出し、その動きベクトルと予測モード選択部３から出力されたインター予測パラメータを用いて、その符号化ブロックに対するインター予測処理を実施して予測画像を生成する処理を実施する。
なお、予測画像生成部７は予測画像生成手段を構成している。 The block dividing unit 6 performs a process of dividing the encoding target image into encoded blocks having a block size determined by the block size determining unit 4. The block dividing unit 6 constitutes block dividing means.
The predicted image generation unit 7 includes an intra prediction unit and a motion compensation prediction unit. When the prediction mode selected by the prediction mode selection unit 3 is the intra prediction mode, the intra prediction unit stores the code stored in the memory 14. The intra prediction process for the encoded block divided by the block dividing unit 6 is performed using the intra prediction parameter output from the prediction mode selecting unit 3 while referring to the local decoded image (reference image) of the converted block. Then, a process for generating a predicted image is performed.
When the prediction mode selected by the prediction mode selection unit 3 is the inter prediction mode, the motion compensation prediction unit performs local decoding of the encoded block divided by the block division unit 6 and the encoded block stored by the memory 14. The motion prediction is performed by comparing the images (reference images) to calculate a motion vector, and the inter prediction processing for the coding block is performed using the motion vector and the inter prediction parameter output from the prediction mode selection unit 3 To generate a predicted image.
The predicted image generation unit 7 constitutes a predicted image generation unit.

減算部８はブロック分割部６により分割された符号化ブロックから、予測画像生成部７により生成された予測画像を減算することで、差分画像（＝符号化ブロック−予測画像）を生成する処理を実施する。
直交変換部９は予測差分符号化パラメータ決定部５から出力された予測差分符号化パラメータに含まれている変換ブロックサイズ単位で、減算部８により生成された差分画像の変換処理（例えば、ＤＣＴ（離散コサイン変換）や、予め特定の学習系列に対して基底設計がなされているＫＬ変換等の直交変換処理）を実施する。
量子化部１０は予測差分符号化パラメータ決定部５から出力された予測差分符号化パラメータに含まれている量子化パラメータを用いて、その差分画像の変換係数を量子化することで、量子化後の変換係数を差分画像の圧縮データとして出力する処理を実施する。
なお、減算部８、直交変換部９及び量子化部１０から画像圧縮手段が構成されている。 The subtracting unit 8 performs a process of generating a difference image (= encoded block−predicted image) by subtracting the predicted image generated by the predicted image generating unit 7 from the encoded block divided by the block dividing unit 6. carry out.
The orthogonal transform unit 9 transforms the difference image generated by the subtraction unit 8 (for example, DCT (for example, DCT ()) in units of transform block size included in the prediction difference encoding parameter output from the prediction difference encoding parameter determination unit 5. Discrete cosine transformation) or orthogonal transformation processing such as KL transformation in which a base design is made in advance for a specific learning sequence.
The quantization unit 10 quantizes the transform coefficient of the difference image using the quantization parameter included in the prediction difference encoding parameter output from the prediction difference encoding parameter determination unit 5, so that The conversion coefficient is output as compressed data of the difference image.
The subtracting unit 8, the orthogonal transform unit 9, and the quantizing unit 10 constitute an image compression unit.

逆量子化部１１は予測差分符号化パラメータ決定部５から出力された予測差分符号化パラメータに含まれている量子化パラメータを用いて、量子化部１０から出力された圧縮データを逆量子化する処理を実施する。
逆直交変換部１２は予測差分符号化パラメータ決定部５から出力された予測差分符号化パラメータに含まれている変換ブロックサイズ単位で、逆量子化部１１による逆量子化後の圧縮データの逆変換処理（例えば、逆ＤＣＴ（逆離散コサイン変換）や、逆ＫＬ変換等の逆変換処理）を実施することで、逆変換処理後の圧縮データを局所復号予測差分信号（伸張後の差分画像を示すデータ）として出力する処理を実施する。 The inverse quantization unit 11 performs inverse quantization on the compressed data output from the quantization unit 10 using the quantization parameter included in the prediction difference encoding parameter output from the prediction difference encoding parameter determination unit 5. Perform the process.
The inverse orthogonal transform unit 12 performs inverse transform of the compressed data after the inverse quantization by the inverse quantization unit 11 in units of transform block size included in the prediction difference coding parameter output from the prediction difference coding parameter determination unit 5. By performing processing (for example, inverse DCT (Inverse Discrete Cosine Transform) or inverse KL transform), the compressed data after the inverse transform process is subjected to local decoded prediction difference signal (decompressed difference image). Data) is output.

加算部１３は逆直交変換部１２から出力された局所復号予測差分信号と予測画像生成部７により生成された予測画像を示す予測信号を加算することで、局所復号画像を生成する処理を実施する。
メモリ１４は予測画像生成部７により次回の予測処理で用いられる画像として、加算部１３により生成された局所復号画像を格納するＲＡＭなどの記録媒体である。 The adding unit 13 performs a process of generating a local decoded image by adding the local decoded prediction difference signal output from the inverse orthogonal transform unit 12 and the prediction signal indicating the predicted image generated by the predicted image generating unit 7. .
The memory 14 is a recording medium such as a RAM that stores the locally decoded image generated by the adder 13 as an image used in the next prediction process by the predicted image generator 7.

可変長符号化部１５は量子化部１０から出力された圧縮データと、符号化制御部１から出力された予測モード、予測差分符号化パラメータ及び予測パラメータ（インター予測モードの場合、予測画像生成部７の動き補償予測部で探索された動きベクトルを含む）とを可変長符号化して、その圧縮データ、予測モード、予測差分符号化パラメータ、予測パラメータの符号化データが多重化されているビットストリームを生成する処理を実施する。なお、可変長符号化部１５は符号化手段を構成している。 The variable length encoding unit 15 includes the compressed data output from the quantization unit 10, the prediction mode, the prediction differential encoding parameter, and the prediction parameter output from the encoding control unit 1 (in the case of the inter prediction mode, a prediction image generation unit). 7 including the motion vector searched by the motion compensation prediction unit 7), and the compressed data, the prediction mode, the prediction differential encoding parameter, and the encoded data of the prediction parameter are multiplexed. The process to generate is performed. Note that the variable length coding unit 15 constitutes coding means.

ブロックサイズ決定部２１は符号化制御部１の予測モード選択部３により選択された予測モードがイントラ予測モードであれば、所定のブロックサイズ（例えば、６４×６４画素）に決定し、その予測モードがインター予測モードに含まれるマージモードであれば、上記のブロックサイズより小さなブロックサイズ（例えば、３２×３２画素）に決定し、その予測モードがマージモードを除くインター予測モードであれば、上記のブロックサイズより小さなブロックサイズ（例えば、１６×１６画素）に決定する処理を実施する。なお、ブロックサイズ決定部２１は第２のブロックサイズ決定手段を構成している。 If the prediction mode selected by the prediction mode selection unit 3 of the encoding control unit 1 is the intra prediction mode, the block size determination unit 21 determines a predetermined block size (for example, 64 × 64 pixels), and the prediction mode. Is a merge mode included in the inter prediction mode, a block size smaller than the block size (for example, 32 × 32 pixels) is determined, and if the prediction mode is an inter prediction mode excluding the merge mode, A process of determining a block size smaller than the block size (for example, 16 × 16 pixels) is performed. The block size determining unit 21 constitutes a second block size determining unit.

被写体領域推定部２２は符号化対象画像に対応する２次元配列に対して、ブロックサイズ決定部２１により決定されたブロックサイズに対応する数値（例えば、ブロックサイズが小さい程、大きな数値）を割り当て、その２次元配列の数値に対する閾値処理を実施することで、被写体が存在している領域を推定する処理を実施する。なお、被写体領域推定部２２は被写体領域推定手段を構成している。
被写体検出部２３は被写体領域推定部２２により推定された領域に対する物体認識処理を実施して、符号化対象画像に写っている被写体を検出する処理を実施する。なお、被写体検出部２３は被写体検出手段を構成している。 The subject region estimation unit 22 assigns a numerical value corresponding to the block size determined by the block size determination unit 21 (for example, a larger numerical value as the block size is smaller) to the two-dimensional array corresponding to the encoding target image. By performing threshold processing on the numerical values of the two-dimensional array, processing for estimating the area where the subject exists is performed. The subject area estimation unit 22 constitutes a subject area estimation unit.
The subject detection unit 23 performs an object recognition process on the region estimated by the subject region estimation unit 22, and performs a process of detecting a subject shown in the encoding target image. Note that the subject detection unit 23 constitutes subject detection means.

図１では、画像符号化装置の構成要素である符号化制御部１、ブロック分割部６、予測画像生成部７、減算部８、直交変換部９、量子化部１０、逆量子化部１１、逆直交変換部１２、加算部１３、メモリ１４、可変長符号化部１５、ブロックサイズ決定部２１、被写体領域推定部２２及び被写体検出部２３のそれぞれが専用のハードウェア（例えば、ＣＰＵを実装している半導体集積回路、あるいは、ワンチップマイコンなど）で構成されているものを想定しているが、画像符号化装置がコンピュータで構成されていてもよい。
画像符号化装置がコンピュータで構成される場合、メモリ１４をコンピュータの内部メモリ又は外部メモリ上に構成するとともに、符号化制御部１、ブロック分割部６、予測画像生成部７、減算部８、直交変換部９、量子化部１０、逆量子化部１１、逆直交変換部１２、加算部１３、可変長符号化部１５、ブロックサイズ決定部２１、被写体領域推定部２２及び被写体検出部２３の処理内容を記述しているプログラムを当該コンピュータのメモリに格納し、当該コンピュータのＣＰＵが当該メモリに格納されているプログラムを実行するようにすればよい。 In FIG. 1, a coding control unit 1, a block division unit 6, a predicted image generation unit 7, a subtraction unit 8, an orthogonal transformation unit 9, a quantization unit 10, an inverse quantization unit 11, which are components of the image coding apparatus, Each of the inverse orthogonal transform unit 12, the addition unit 13, the memory 14, the variable length coding unit 15, the block size determination unit 21, the subject region estimation unit 22, and the subject detection unit 23 has dedicated hardware (for example, a CPU). A semiconductor integrated circuit or a one-chip microcomputer) is assumed, but the image encoding device may be configured by a computer.
When the image encoding device is configured by a computer, the memory 14 is configured on an internal memory or an external memory of the computer, and the encoding control unit 1, the block division unit 6, the predicted image generation unit 7, the subtraction unit 8, and orthogonal Processing of Transformer 9, Quantizer 10, Inverse Quantizer 11, Inverse Orthogonal Transformer 12, Adder 13, Variable Length Coding Unit 15, Block Size Determination Unit 21, Subject Area Estimation Unit 22, and Subject Detection Unit 23 A program describing the contents may be stored in the memory of the computer, and the CPU of the computer may execute the program stored in the memory.

図２はこの発明の実施の形態１による画像符号化装置の被写体領域推定部２２を示す構成図である。
図２において、ブロックサイズ前処理部３１は符号化対象画像に対応する２次元配列に対して、ブロックサイズ決定部２１により決定されたブロックサイズに対応する数値（例えば、ブロックサイズが小さい程、大きな数値）を割り当てる処理を実施する。
閾値メモリ３２は所定の閾値を格納している記憶媒体である。
マスク生成部３３はブロックサイズ前処理部３１により割り当てられた２次元配列の数値と閾値メモリ３２により格納されている閾値を比較し、その数値が閾値より大きければ、その数値を“１”に置き換え、その数値が閾値より小さければ、その数値を“０”に置き換える２値化処理を実施することで、２値マスクを生成する処理を実施する。 FIG. 2 is a block diagram showing the subject region estimation unit 22 of the image coding apparatus according to Embodiment 1 of the present invention.
In FIG. 2, the block size preprocessing unit 31 has a numerical value corresponding to the block size determined by the block size determination unit 21 (for example, the smaller the block size, the larger the 2D array corresponding to the encoding target image). (Numerical value) is assigned.
The threshold memory 32 is a storage medium that stores a predetermined threshold.
The mask generation unit 33 compares the numerical value of the two-dimensional array allocated by the block size preprocessing unit 31 with the threshold value stored in the threshold memory 32, and if the numerical value is larger than the threshold value, replaces the numerical value with “1”. If the numerical value is smaller than the threshold value, a binarization process for replacing the numerical value with “0” is performed, thereby executing a process for generating a binary mask.

次に動作について説明する。
まず、符号化制御部１の符号化効率検証部２は、符号化対象画像を各種のサイズのブロック（例えば、６４×６４画素のブロック、３２×３２画素のブロック、１６×１６画素のブロック、８×８画素のブロック）に分割して、各々のブロックに対する符号化処理を利用可能な各種の予測モード（Ｍ種類のイントラ予測モード、Ｎ種類のインター予測モード：Ｍ，Ｎは１以上の整数）で実施した場合の符号化効率を検証する。
この場合、各サイズのブロック毎に、Ｍ＋Ｎ個の予測モードに対応する符号化効率が検証される。 Next, the operation will be described.
First, the encoding efficiency verification unit 2 of the encoding control unit 1 converts an encoding target image into blocks of various sizes (for example, a block of 64 × 64 pixels, a block of 32 × 32 pixels, a block of 16 × 16 pixels, Various prediction modes (M types of intra prediction modes, N types of inter prediction modes: M and N are integers equal to or greater than 1) that can be divided into 8 × 8 pixel blocks) and used for encoding processing for each block. ) To verify the coding efficiency.
In this case, the encoding efficiency corresponding to M + N prediction modes is verified for each size block.

符号化制御部１の予測モード選択部３は、符号化効率検証部２が符号化効率を検証すると、利用可能な２以上の予測モード（Ｍ種類のイントラ予測モード及びＮ種類のインター予測モード）の中で、符号化効率検証部２により検証された符号化効率が最も高い予測モードを選択し、その予測モードをブロックサイズ決定部４，２１、予測画像生成部７及び可変長符号化部１５に出力する。
また、予測モード選択部３は、その予測モードに対応する予測パラメータ（インター予測パラメータまたはイントラ予測パラメータ）を予測画像生成部７及び可変長符号化部１５に出力する。 When the coding efficiency verification unit 2 verifies the coding efficiency, the prediction mode selection unit 3 of the coding control unit 1 can use two or more available prediction modes (M types of intra prediction modes and N types of inter prediction modes). Among them, the prediction mode with the highest encoding efficiency verified by the encoding efficiency verification unit 2 is selected, and the prediction mode is selected as the block size determination unit 4, 21, the predicted image generation unit 7, and the variable length encoding unit 15. Output to.
Further, the prediction mode selection unit 3 outputs a prediction parameter (inter prediction parameter or intra prediction parameter) corresponding to the prediction mode to the prediction image generation unit 7 and the variable length encoding unit 15.

符号化制御部１のブロックサイズ決定部４は、予測モード選択部３が予測モードを選択すると、各種のサイズのブロック（例えば、６４×６４画素のブロック、３２×３２画素のブロック、１６×１６画素のブロック、８×８画素のブロック）に対して、その予測モードで符号化処理が実施された場合の符号化効率（符号化効率検証部２により検証された符号化効率）を比較して、符号化効率が最も高いサイズを特定する。
例えば、予測モード選択部３により選択された予測モードがインター予測モードであれば、符号化効率検証部２により検証された符号化効率のうち、インター予測モードで符号化処理が実施された場合の各種サイズのブロックの符号化効率を比較して、符号化効率が最も高いサイズを特定する。 When the prediction mode selection unit 3 selects the prediction mode, the block size determination unit 4 of the encoding control unit 1 selects various sizes of blocks (for example, 64 × 64 pixel block, 32 × 32 pixel block, 16 × 16 block). Compare the coding efficiency (encoding efficiency verified by the encoding efficiency verification unit 2) when the encoding process is performed in the prediction mode for the pixel block and the 8 × 8 pixel block). Identify the size with the highest coding efficiency.
For example, if the prediction mode selected by the prediction mode selection unit 3 is the inter prediction mode, the coding processing performed in the inter prediction mode among the coding efficiencies verified by the coding efficiency verification unit 2 By comparing the coding efficiency of blocks of various sizes, the size having the highest coding efficiency is identified.

符号化制御部１の予測差分符号化パラメータ決定部５は、変換ブロックサイズや量子化パラメータを決定し、その変換ブロックサイズや量子化パラメータを含む予測差分符号化パラメータを直交変換部９、量子化部１０、逆量子化部１１、逆直交変換部１２及び可変長符号化部１５に出力する。
変換ブロックサイズや量子化パラメータの決定方法については、公知の技術であるため詳細な説明を省略する。 The prediction difference encoding parameter determination unit 5 of the encoding control unit 1 determines the transform block size and the quantization parameter, and converts the prediction difference encoding parameter including the transform block size and the quantization parameter into the orthogonal transform unit 9 and the quantization. Output to the unit 10, the inverse quantization unit 11, the inverse orthogonal transform unit 12, and the variable length coding unit 15.
The method for determining the transform block size and the quantization parameter is a known technique and will not be described in detail.

ブロック分割部６は、符号化制御部１のブロックサイズ決定部４からブロックサイズを受けると、符号化対象画像を当該ブロックサイズの符号化ブロックに分割する。
例えば、ブロックサイズ決定部４から出力されたブロックサイズが３２×３２画素のサイズであれば、符号化対象画像を３２×３２画素のサイズの符号化ブロックに分割し、ブロックサイズ決定部４から出力されたブロックサイズが１６×１６画素のサイズであれば、符号化対象画像を１６×１６画素のサイズの符号化ブロックに分割する。 When receiving the block size from the block size determining unit 4 of the encoding control unit 1, the block dividing unit 6 divides the encoding target image into encoded blocks having the block size.
For example, if the block size output from the block size determination unit 4 is a size of 32 × 32 pixels, the encoding target image is divided into encoded blocks of a size of 32 × 32 pixels and output from the block size determination unit 4 If the block size is 16 × 16 pixels, the encoding target image is divided into encoded blocks having a size of 16 × 16 pixels.

予測画像生成部７は、イントラ予測部と動き補償予測部から構成されており、予測モード選択部３により選択された予測モードがイントラ予測モードであれば、イントラ予測部が予測画像（イントラ予測画像）を生成する。
即ち、イントラ予測部は、予測モード選択部３により選択された予測モードがイントラ予測モードである場合、メモリ１４により格納されている符号化済みブロックの局所復号画像（参照画像）を参照しながら、予測モード選択部３から出力されたイントラ予測パラメータを用いて、ブロック分割部６により分割された符号化ブロックに対するイントラ予測処理を実施して、予測画像を生成する。
なお、イントラ予測部のイントラ予測処理は、例えば、ＡＶＣ／Ｈ．２６４規格（ＩＳＯ／ＩＥＣ１４４９６−１０）に定められているアルゴリズムに従うが、このアルゴリズムに限定されるものではない。 The prediction image generation unit 7 includes an intra prediction unit and a motion compensation prediction unit. If the prediction mode selected by the prediction mode selection unit 3 is the intra prediction mode, the intra prediction unit determines that the prediction image (intra prediction image). ) Is generated.
That is, when the prediction mode selected by the prediction mode selection unit 3 is the intra prediction mode, the intra prediction unit refers to the local decoded image (reference image) of the encoded block stored in the memory 14. Using the intra prediction parameter output from the prediction mode selection unit 3, intra prediction processing is performed on the encoded block divided by the block dividing unit 6 to generate a prediction image.
The intra prediction process of the intra prediction unit is, for example, AVC / H. The algorithm is defined in the H.264 standard (ISO / IEC 14496-10), but is not limited to this algorithm.

予測モード選択部３により選択された予測モードがインター予測モードであれば、動き補償予測部予測画像（インター予測画像）を生成する。
即ち、動き補償予測部は、予測モード選択部３により選択された予測モードがインター予測モードである場合、ブロック分割部６により分割された符号化ブロックとメモリ１４により格納されている符号化済みブロックの局所復号画像（参照画像）を比較することで動き探索を実施して動きベクトルを算出し、その動きベクトルと予測モード選択部３から出力されたインター予測パラメータを用いて、その符号化ブロックに対するインター予測処理を実施して予測画像を生成する。
なお、予測画像生成部７は、予測画像を生成すると、その予測画像を減算部８及び加算部１３に出力するが、予測モード選択部３により選択された予測モードがインター予測モードである場合、算出した動きベクトルを可変長符号化部１５に出力する。 If the prediction mode selected by the prediction mode selection unit 3 is the inter prediction mode, a motion compensated prediction unit prediction image (inter prediction image) is generated.
That is, the motion compensation prediction unit, when the prediction mode selected by the prediction mode selection unit 3 is the inter prediction mode, the encoded block divided by the block dividing unit 6 and the encoded block stored by the memory 14 The motion vector is calculated by comparing the local decoded images (reference images) of the motion vector, the motion vector is calculated, and the motion vector and the inter prediction parameter output from the prediction mode selection unit 3 are used to calculate the motion vector. Inter prediction processing is performed to generate a predicted image.
Note that when the predicted image is generated, the predicted image generation unit 7 outputs the predicted image to the subtracting unit 8 and the adding unit 13. When the prediction mode selected by the prediction mode selecting unit 3 is the inter prediction mode, The calculated motion vector is output to the variable length coding unit 15.

減算部８は、予測画像生成部７から予測画像を受けると、ブロック分割部６により分割された符号化ブロックから当該予測画像を減算することで、差分画像（＝符号化ブロック−予測画像）を生成し、その差分画像を直交変換部９に出力する。
直交変換部９は、減算部８から差分画像を受けると、予測差分符号化パラメータ決定部５から出力された予測差分符号化パラメータに含まれている変換ブロックサイズ単位で、その差分画像の変換処理（例えば、ＤＣＴ（離散コサイン変換）や、予め特定の学習系列に対して基底設計がなされているＫＬ変換等の直交変換処理）を実施し、その差分画像の変換係数を量子化部１０に出力する。
量子化部１０は、直交変換部９から差分画像の変換係数を受けると、予測差分符号化パラメータ決定部５から出力された予測差分符号化パラメータに含まれている量子化パラメータを用いて、その差分画像の変換係数を量子化することで、量子化後の変換係数を差分画像の圧縮データとして逆量子化部１１及び可変長符号化部１５に出力する。 When the subtraction unit 8 receives the prediction image from the prediction image generation unit 7, the subtraction unit 8 subtracts the prediction image from the encoded block divided by the block division unit 6, thereby obtaining a difference image (= encoding block−prediction image). The difference image is generated and output to the orthogonal transform unit 9.
When the orthogonal transform unit 9 receives the difference image from the subtraction unit 8, the transform process of the difference image is performed in units of transform block size included in the prediction difference encoding parameter output from the prediction difference encoding parameter determination unit 5. (For example, DCT (Discrete Cosine Transform) or orthogonal transform processing such as KL transform in which a base design is made in advance for a specific learning sequence) is performed, and the transform coefficient of the difference image is output to the quantization unit 10 To do.
When the quantization unit 10 receives the transform coefficient of the difference image from the orthogonal transform unit 9, the quantization unit 10 uses the quantization parameter included in the prediction difference encoding parameter output from the prediction difference encoding parameter determination unit 5, and By quantizing the transform coefficient of the difference image, the quantized transform coefficient is output to the inverse quantization unit 11 and the variable length coding unit 15 as compressed data of the difference image.

逆量子化部１１は、量子化部１０から差分画像の圧縮データを受けると、予測差分符号化パラメータ決定部５から出力された予測差分符号化パラメータに含まれている量子化パラメータを用いて、その圧縮データを逆量子化する。
逆直交変換部１２は、逆量子化部１１から逆量子化後の圧縮データを受けると、予測差分符号化パラメータ決定部５から出力された予測差分符号化パラメータに含まれている変換ブロックサイズ単位で、逆量子化後の圧縮データの逆変換処理（例えば、逆ＤＣＴ（逆離散コサイン変換）や、逆ＫＬ変換等の逆変換処理）を実施することで、逆変換処理後の圧縮データを局所復号予測差分信号（伸張後の差分画像を示すデータ）として加算部１３に出力する。
加算部１３は、逆直交変換部１２から出力された局所復号予測差分信号と予測画像生成部７により生成された予測画像を示す予測信号を加算することで、局所復号画像を生成すし、次回の予測画像の生成処理に備えるために、その局所復号画像をメモリ１４に格納する。 When the inverse quantization unit 11 receives the compressed data of the difference image from the quantization unit 10, the inverse quantization unit 11 uses the quantization parameter included in the prediction difference encoding parameter output from the prediction difference encoding parameter determination unit 5, The compressed data is inversely quantized.
When the inverse orthogonal transform unit 12 receives the compressed data after inverse quantization from the inverse quantization unit 11, the transform block size unit included in the prediction difference encoding parameter output from the prediction difference encoding parameter determination unit 5 Thus, by performing inverse transform processing of the compressed data after inverse quantization (for example, inverse DCT (inverse discrete cosine transform) or inverse transform processing such as inverse KL transform), the compressed data after the inverse transform processing is locally It outputs to the addition part 13 as a decoding prediction difference signal (data which shows the difference image after expansion | extension).
The adding unit 13 generates a local decoded image by adding the local decoded prediction difference signal output from the inverse orthogonal transform unit 12 and the prediction signal indicating the predicted image generated by the predicted image generating unit 7, and In order to prepare for the predicted image generation process, the local decoded image is stored in the memory 14.

可変長符号化部１５は、量子化部１０から出力された圧縮データと、符号化制御部１から出力された予測モード、予測差分符号化パラメータ及び予測パラメータ（インター予測モードの場合、予測画像生成部７の動き補償予測部で探索された動きベクトルを含む）とを可変長符号化して、その圧縮データ、予測モード、予測差分符号化パラメータ、予測パラメータの符号化データが多重化されているビットストリームを生成し、そのビットストリームを図示せぬ画像復号装置等に出力する。 The variable length encoding unit 15 includes the compressed data output from the quantization unit 10, the prediction mode, the prediction differential encoding parameter, and the prediction parameter output from the encoding control unit 1 (prediction image generation in the case of the inter prediction mode). Bits including the motion vector searched by the motion compensation prediction unit of the unit 7), and the compressed data, the prediction mode, the prediction differential encoding parameter, and the encoding data of the prediction parameter are multiplexed. A stream is generated and the bit stream is output to an image decoding device (not shown).

ここまでの画像符号化装置の処理は、符号化対象画像を符号化する処理である。以下、符号化対象画像に映っている被写体の検出処理について説明する。
ブロックサイズ決定部２１は、符号化制御部１の予測モード選択部３が予測モードを選択すると、その予測モードに応じて、被写体の検出処理で用いるブロックサイズを決定する。
ここでは、符号化制御部１のブロックサイズ決定部４と異なり、符号化対象画像のビットレートを考慮せずに、ブロックサイズを決定するので、ビットレートの変動に影響されない。 The processing of the image encoding device so far is processing for encoding the encoding target image. Hereinafter, a process for detecting a subject shown in an encoding target image will be described.
When the prediction mode selection unit 3 of the encoding control unit 1 selects a prediction mode, the block size determination unit 21 determines a block size used in subject detection processing according to the prediction mode.
Here, unlike the block size determination unit 4 of the encoding control unit 1, the block size is determined without considering the bit rate of the image to be encoded, so that it is not affected by fluctuations in the bit rate.

以下、ブロックサイズ決定部２１によるブロックサイズの決定処理を具体的に説明する。
インター予測処理は、画面間の被写体の動きに着目して予測を行う方法であり、イントラ予測処理は、画面内の符号化対象画像の近傍画素を用いて予測を行う方法である。
インター予測処理とイントラ予測処理のうち、インター予測処理は被写体の動きや形状を反映し易いため、被写体の領域推定に適した予測モードであると言える。
また、インター予測処理は、イントラ予測処理よりも、被写体が存在している領域（特に、被写体の輪郭付近）について適用される可能性が高く、イントラ予測処理は、被写体が存在していない背景領域について適用される可能性が高い。
なお、インター予測モードに含まれるマージモードは、符号化ブロックで探索された動きベクトルを利用するのではなく、周囲のベクトルをそのまま利用するインター予測処理であるが、同じ動きベクトルを持ちやすい被写体の内部領域や背景領域では、マージモードが頻繁に選択される可能性が高い。 Hereinafter, the block size determination process by the block size determination unit 21 will be described in detail.
The inter prediction process is a method of performing prediction while paying attention to the movement of the subject between the screens, and the intra prediction process is a method of performing prediction using the neighboring pixels of the encoding target image in the screen.
Among the inter prediction processing and the intra prediction processing, the inter prediction processing is easy to reflect the motion and shape of the subject, and can be said to be a prediction mode suitable for subject region estimation.
In addition, the inter prediction process is more likely to be applied to a region where the subject exists (particularly, near the contour of the subject) than the intra prediction process, and the intra prediction process is a background region where no subject exists. Is likely to apply.
The merge mode included in the inter prediction mode is an inter prediction process that uses the surrounding vectors as they are instead of using the motion vectors searched in the coding block. The merge mode is likely to be frequently selected in the internal area and the background area.

そこで、ブロックサイズ決定部２１は、予測モード選択部３によりイントラ予測モードが選択された場合、背景領域の可能性が高いので、例えば、６４×６４画素などの大きなブロックサイズに決定する。
ブロックサイズ決定部２１は、予測モード選択部３によりマージモードを除くインター予測モードが選択された場合、被写体が存在している領域（特に、被写体の輪郭付近）の可能性が高いので、例えば、１６×１６画素などの小さなブロックサイズに決定する。
ブロックサイズ決定部２１は、予測モード選択部３によりインター予測モードに含まれるマージモードが選択された場合、被写体の内部領域や背景領域の可能性が高いので、例えば、３２×３２画素など、中間サイズのブロックサイズに決定する。 Therefore, when the intra prediction mode is selected by the prediction mode selection unit 3, the block size determination unit 21 determines a large block size such as 64 × 64 pixels, for example, because the possibility of a background region is high.
When the inter prediction mode excluding the merge mode is selected by the prediction mode selection unit 3, the block size determination unit 21 has a high possibility of a region where a subject exists (particularly, near the contour of the subject). A small block size such as 16 × 16 pixels is determined.
When the merge mode included in the inter prediction mode is selected by the prediction mode selection unit 3, the block size determination unit 21 has a high possibility of the internal region or background region of the subject. Determine the size block size.

被写体領域推定部２２は、ブロックサイズ決定部２１がブロックサイズを決定すると、符号化対象画像に対応する２次元配列に対して、そのブロックサイズに対応する数値（例えば、ブロックサイズが小さい程、大きな数値）を割り当て、その２次元配列の数値に対する閾値処理を実施することで、被写体が存在している領域を推定する。
以下、被写体領域推定部２２による被写体が存在している領域の推定処理を具体的に説明する。 When the block size determination unit 21 determines the block size, the subject region estimation unit 22 has a numerical value corresponding to the block size (for example, the smaller the block size is, the larger the block size is). (Numerical value) is assigned, and threshold processing is performed on the numerical value of the two-dimensional array to estimate the region where the subject exists.
Hereinafter, the process of estimating the area where the subject exists by the subject area estimation unit 22 will be described in detail.

被写体領域推定部２２のブロックサイズ前処理部３１は、符号化対象画像を入力すると、その符号化対象画像に対応する２次元配列を用意する。
符号化対象画像に対応する２次元配列としては、例えば、ブロック分割部６により分割される最小のサイズ単位（例えば、８×８画素のサイズ）、最小サイズより少し大きなサイズ単位（例えば、１６×１６画素のサイズ）や、画素単位などで、数値を格納可能な配列が考えられる。
図３は符号化対象画像に対応する２次元配列の一例を示す説明図であり、図３の例では、１６×１６画素のサイズでの２次元配列を示している。 When the block size preprocessing unit 31 of the subject region estimation unit 22 receives an encoding target image, the block size preprocessing unit 31 prepares a two-dimensional array corresponding to the encoding target image.
As a two-dimensional array corresponding to the encoding target image, for example, a minimum size unit (for example, a size of 8 × 8 pixels) divided by the block dividing unit 6 and a size unit slightly larger than the minimum size (for example, 16 × An array capable of storing numerical values in units of 16 pixels) or pixel units is conceivable.
FIG. 3 is an explanatory diagram illustrating an example of a two-dimensional array corresponding to an encoding target image. In the example of FIG. 3, a two-dimensional array with a size of 16 × 16 pixels is illustrated.

ブロックサイズ前処理部３１は、符号化対象画像に対応する２次元配列に対して、ブロックサイズ決定部２１により決定されたブロックサイズに対応する数値を割り当てる。
この数値は、ブロックサイズが小さい程、大きな値であり、例えば、ブロックサイズが６４×６４画素のサイズであれば“０”、ブロックサイズが３２×３２画素のサイズであれば“１”、ブロックサイズが１６×１６画素のサイズであれば“２”、ブロックサイズが８×８画素のサイズであれば“３”である。
この実施の形態１では、ブロックサイズ決定部２１が、被写体の境界付近でブロックサイズが小さくなるように制御しているが、例えば、ノイズや背景の影響によって、被写体とは異なる領域でブロックサイズが小さくなる場合も考えられる。
そこで、ブロックサイズ前処理部３２が、数値を割り当てた２次元配列に対して、ノイズ除去処理を行うようにしてもよい。 The block size preprocessing unit 31 assigns a numerical value corresponding to the block size determined by the block size determination unit 21 to the two-dimensional array corresponding to the encoding target image.
This value is larger as the block size is smaller. For example, if the block size is 64 × 64 pixels, the value is “0”, and if the block size is 32 × 32 pixels, the value is “1”. If the size is 16 × 16 pixels, it is “2”, and if the block size is 8 × 8 pixels, it is “3”.
In the first embodiment, the block size determination unit 21 controls the block size to be small near the boundary of the subject. However, for example, the block size is different in a region different from the subject due to the influence of noise and background. The case where it becomes small is also considered.
Therefore, the block size preprocessing unit 32 may perform noise removal processing on the two-dimensional array to which numerical values are assigned.

図４は数値が割り当てられている２次元配列に対するノイズ除去処理を示す説明図である。
ノイズ除去処理としては、数値が割り当てられている２次元配列に対して、平滑化フィルタを適用する方法が考えられる。
平滑化フィルタを適用することにより、被写体が存在している領域以外の領域で、ブロックサイズが小さくなっている小領域をノイズとして除去することが可能である。 FIG. 4 is an explanatory diagram showing noise removal processing for a two-dimensional array to which numerical values are assigned.
As a noise removal process, a method of applying a smoothing filter to a two-dimensional array to which numerical values are assigned can be considered.
By applying the smoothing filter, it is possible to remove a small area having a smaller block size as noise in an area other than the area where the subject exists.

マスク生成部３３は、ブロックサイズ前処理部３１が２次元配列に対してブロックサイズに対応する数値を割り当てると、その２次元配列の数値と閾値メモリ３２により格納されている閾値を比較する。ここでは、閾値メモリ３２により格納されている閾値を用いているが、外部から閾値が与えられるようにしてもよい。
なお、閾値としては、２．５などの値が用いられるが、これは一例に過ぎず、２．５の値に限定されるものではない。
マスク生成部３３は、２次元配列の数値が閾値より大きければ、その数値を“１”に置き換え、その数値が閾値より小さければ、その数値を“０”に置き換える２値化処理を実施することで、２値マスクを生成する。 When the block size preprocessing unit 31 assigns a numerical value corresponding to the block size to the two-dimensional array, the mask generation unit 33 compares the numerical value of the two-dimensional array with the threshold value stored in the threshold memory 32. Here, the threshold value stored in the threshold value memory 32 is used, but the threshold value may be given from the outside.
Note that a value such as 2.5 is used as the threshold value, but this is only an example, and the value is not limited to 2.5.
If the numerical value of the two-dimensional array is larger than the threshold value, the mask generation unit 33 replaces the numerical value with “1”, and if the numerical value is smaller than the threshold value, performs a binarization process that replaces the numerical value with “0”. Then, a binary mask is generated.

図５はマスク生成部３３による２値マスクの生成処理を示す説明図である。
この実施の形態１では、ブロックサイズ決定部２１が、被写体の境界付近でブロックサイズが小さくなるように制御しているため、図５（ａ）に示すように、被写体の境界付近の数値が閾値より大きくなって“１”に置き換えられ、その他の領域の数値が閾値より小さくなって“０”に置き換えられる。図５（ａ）の例では、白抜きの部分が“１”であり、黒色の部分が“０”である。
被写体の内部領域については“０”に置き換えられることが多いため、マスク生成部３３では、“１”に置き換えられているデータに対して輪郭追跡処理を適用し、画像中から被写体の境界を表す閉曲線を検出する。
そして、その閉曲線に囲まれている領域の内部の数値をすべて“１”に置き換えるようにする。
これにより、被写体が存在している領域の数値だけが“１”になり、他領域の数値は“０”となるような２値マスクが生成される（図５（ｂ）を参照）。 FIG. 5 is an explanatory diagram showing a binary mask generation process by the mask generation unit 33.
In the first embodiment, since the block size determining unit 21 controls the block size to be small near the subject boundary, as shown in FIG. 5A, the numerical value near the subject boundary is a threshold value. It becomes larger and is replaced with “1”, and the numerical values in the other areas become smaller than the threshold and are replaced with “0”. In the example of FIG. 5A, the white portion is “1” and the black portion is “0”.
Since the internal area of the subject is often replaced with “0”, the mask generation unit 33 applies contour tracking processing to the data replaced with “1” to represent the boundary of the subject from the image. Detect closed curves.
Then, all the numerical values in the area surrounded by the closed curve are replaced with “1”.
As a result, a binary mask is generated in which only the numerical value of the area where the subject exists is “1” and the numerical values of the other areas are “0” (see FIG. 5B).

なお、マスク生成部３３により生成された２値マスクにおいて、ブロックサイズ前処理部３１のノイズ除去処理で除去し切れなかったノイズが残る場合がある。また、本来、被写体が存在している領域であるはずの領域の一部が除去されている場合がある。
そこで、マスク生成部３３は、生成した２値マスクに対して、収縮処理を施すことで、ブロックサイズ前処理部３１のノイズ除去処理で除去し切れなかったノイズを除去してから、膨張処理を施すことで、除去されてしまった被写体が存在している領域の一部を復元するようにしてもよい。
図６はマスク生成部３３による収縮処理と膨張処理の一例を示す説明図である。 In the binary mask generated by the mask generation unit 33, noise that cannot be completely removed by the noise removal processing of the block size preprocessing unit 31 may remain. In some cases, a part of the area that should originally be the area where the subject exists is removed.
Therefore, the mask generation unit 33 performs contraction processing on the generated binary mask to remove noise that could not be removed by the noise removal processing of the block size preprocessing unit 31, and then performs expansion processing. By performing the processing, a part of the area where the removed subject exists may be restored.
FIG. 6 is an explanatory diagram illustrating an example of the contraction process and the expansion process performed by the mask generation unit 33.

また、マスク生成部３３では、上述の処理で生成したマスクの“０”と“１”の境界を初期境界として、例えば、動的輪郭モデルなどの他の境界検出手法を利用することで、更に被写体の境界を正確に検出するようにしてもよい。動的輪郭モデルは、被写体と背景の境界を高精度に推定することができるが、そのためには適切な初期境界（真の境界に近い境界）を必要とする。初期境界は、通常、ユーザが手動で指定する場合が多いが、上述の処理で生成したマスクの境界を利用することで、ユーザの手を介さずに適切な初期境界を与えることが可能である。 Further, the mask generation unit 33 uses the boundary of “0” and “1” of the mask generated by the above-described process as an initial boundary, for example, by using another boundary detection method such as a dynamic contour model. The boundary of the subject may be detected accurately. The active contour model can estimate the boundary between the subject and the background with high accuracy, but this requires an appropriate initial boundary (a boundary close to the true boundary). The initial boundary is usually manually specified by the user, but it is possible to give an appropriate initial boundary without using the user's hand by using the boundary of the mask generated by the above processing. .

被写体検出部２３は、被写体領域推定部２２が、被写体が存在している領域を推定すると、その領域を示す２値マスクに対する物体認識処理を実施して、符号化対象画像に写っている被写体を検出する。
ここで、図７は被写体検出部２３が物体認識処理を適用する領域を示す説明図である。
符号化対象画像に写っている被写体を検出するには、複数のウィンドウサイズで画像を走査する処理や、画像中の特徴点を抽出する処理などが必要であり、一般的に、膨大な演算量を必要とするため、多くの処理時間を要するが、被写体検出部２３では、物体認識処理を適用する領域を、被写体領域推定部２２により推定された被写体が存在している領域（２値マスクの数値が“１”の領域）に限定しているため、物体認識に要する演算量を大幅に削減することができる。また、被写体以外の背景などの不要な領域に対する認識処理を省略することができるため、物体の誤認識も削減することができる。 When the subject region estimation unit 22 estimates the region where the subject exists, the subject detection unit 23 performs object recognition processing on the binary mask indicating the region, and detects the subject in the encoding target image. To detect.
Here, FIG. 7 is an explanatory diagram showing a region to which the subject detection unit 23 applies object recognition processing.
In order to detect the subject in the image to be encoded, it is necessary to scan the image with multiple window sizes, extract the feature points in the image, and so on. However, in the subject detection unit 23, the region where the subject estimated by the subject region estimation unit 22 is present (the binary mask) Therefore, the amount of calculation required for object recognition can be greatly reduced. In addition, since recognition processing for unnecessary areas such as the background other than the subject can be omitted, erroneous recognition of objects can be reduced.

被写体検出部２３による物体認識処理自体は公知の技術であるため詳細な説明を省略するが、被写体検出部２３は、符号化対象画像に写っている被写体を検出すると、例えば、図８に示すように、符号化対象画像における被写体の位置を示す座標（被写体を包含する矩形の左上の位置を示す座標（Ｘ１，Ｙ１）や（Ｘ２，Ｙ２））、その矩形の幅Ｗ１，Ｗ２・高さＨ１，Ｈ２などの情報を物体認識結果として出力する。
また、検出対象の人物や車両などの写真と一緒に、その人物の名前や車両の車種名等をデータベースに登録しているような場合には、検出した被写体をデータベースに登録されている人物や車両等の写真と照合して、一致している人物や車両があれば、その人物の名前や車両の車種名等を物体認識結果として出力するようにしてもよい。 Since the object recognition process itself by the subject detection unit 23 is a known technique and will not be described in detail, when the subject detection unit 23 detects a subject in the encoding target image, for example, as shown in FIG. Further, coordinates indicating the position of the subject in the encoding target image (coordinates (X1, Y1) and (X2, Y2) indicating the upper left position of the rectangle including the subject), and the width W1, W2, and height H1 of the rectangle. , H2 and the like are output as object recognition results.
In addition, when the person's name, vehicle model name, etc. are registered in the database together with a photograph of the person or vehicle to be detected, the detected subject or person registered in the database If there is a person or vehicle that matches the photograph of the vehicle or the like, the name of the person, the vehicle type name of the vehicle, or the like may be output as the object recognition result.

なお、被写体検出部２３による物体認識結果を可変長符号化部１５に出力することにより、可変長符号化部１５により生成されるビットストリームに物体認識結果が含められるようにしてもいが、その物体認識結果をビットストリームと別個に出力するようにしてもよい。
物体認識結果の受信側では、例えば、画像に対する自動キーワード付けや、画像の意味内容による分類や検索などを実現することができるようになる。また、監視画像による監視業務の自動化や、人物の同定による入退室管理なども実現することができるようになる。 The object recognition result by the subject detection unit 23 may be output to the variable length coding unit 15 so that the object recognition result is included in the bitstream generated by the variable length coding unit 15. The recognition result may be output separately from the bit stream.
On the object recognition result receiving side, for example, automatic keyword assignment for an image, classification or search based on the semantic content of an image, and the like can be realized. In addition, it is possible to realize monitoring work automation using monitoring images and entrance / exit management based on person identification.

以上で明らかなように、この実施の形態１によれば、予測モード選択部３により選択された予測モードがイントラ予測モードであれば、所定のブロックサイズに決定し、その予測モードがインター予測モードであれば、所定のブロックサイズより小さなブロックサイズに決定するブロックサイズ決定部２１と、符号化対象画像に対応する２次元配列に対して、ブロックサイズ決定部２１により決定されたブロックサイズに対応する数値を割り当て、その２次元配列の数値に対する閾値処理を実施することで、被写体が存在している領域を推定する被写体領域推定部２２とを設け、被写体検出部２３が、被写体領域推定部２２により推定された領域に対する物体認識処理を実施して、符号化対象画像に写っている被写体を検出するように構成したので、符号化対象画像のビットレートが変動しても、符号化対象画像に写っている被写体が存在している領域を適正に絞り込んで、物体認識に要する演算量の削減と誤認識の削減を達成することができる効果を奏する。 As is apparent from the above, according to the first embodiment, if the prediction mode selected by the prediction mode selection unit 3 is the intra prediction mode, a predetermined block size is determined, and the prediction mode is the inter prediction mode. If so, the block size determination unit 21 that determines a block size smaller than a predetermined block size and the block size determined by the block size determination unit 21 for the two-dimensional array corresponding to the encoding target image. By assigning a numerical value and performing threshold processing on the numerical value of the two-dimensional array, a subject region estimation unit 22 that estimates a region where the subject exists is provided, and the subject detection unit 23 uses the subject region estimation unit 22 An object recognition process is performed on the estimated area to detect the subject in the image to be encoded. Therefore, even if the bit rate of the encoding target image fluctuates, the area where the subject shown in the encoding target image exists is appropriately narrowed down to reduce the amount of calculation required for object recognition and the reduction of erroneous recognition. There is an effect that can be achieved.

なお、ブロックサイズ決定部４の他に、ブロックサイズ決定部２１を設けることで、ブロックサイズ決定部が二重化されているが、ブロックサイズ決定部は、画像符号化装置の回路規模に占める割合が小さく、二重化しても回路規模の増大は少ない。また、全く異なる被写体領域推定手法を実装する場合と比べて、通常の画像符号化装置で用いられる回路を複製するだけで済むため、設計に要する労力も大幅に小さく抑えることができる。 Although the block size determination unit 21 is provided in addition to the block size determination unit 4, the block size determination unit is duplicated. However, the block size determination unit has a small proportion of the circuit scale of the image encoding device. Even if it is duplicated, the increase in circuit scale is small. Further, compared with the case where a completely different subject area estimation method is implemented, it is only necessary to duplicate a circuit used in a normal image coding apparatus, so that the labor required for design can be greatly reduced.

実施の形態２．
図９はこの発明の実施の形態２による画像符号化装置を示す構成図であり、図において、図１と同一符号は同一または相当部分を示すので説明を省略する。
予測モード選択部４０は符号化効率検証部２により検証された符号化効率のうち、インター予測モードで実施された場合の符号化効率に対して、１より大きな定数Ａを乗算してから、利用可能な複数の予測モードの中で、符号化効率が最も高い予測モードを選択する処理を実施する。なお、予測モード選択部４０は第２の予測モード選択手段を構成しており、この実施の形態１では、符号化制御部１の予測モード選択部３は第１の予測モード選択手段を構成する。 Embodiment 2. FIG.
FIG. 9 is a block diagram showing an image coding apparatus according to Embodiment 2 of the present invention. In the figure, the same reference numerals as those in FIG.
The prediction mode selection unit 40 multiplies the coding efficiency verified by the coding efficiency verification unit 2 by a constant A larger than 1 for the coding efficiency when implemented in the inter prediction mode, and then uses the prediction efficiency. A process of selecting a prediction mode having the highest encoding efficiency among a plurality of possible prediction modes is performed. In addition, the prediction mode selection part 40 comprises the 2nd prediction mode selection means, and in this Embodiment 1, the prediction mode selection part 3 of the encoding control part 1 comprises a 1st prediction mode selection means. .

図９では、画像符号化装置の構成要素である符号化制御部１、ブロック分割部６、予測画像生成部７、減算部８、直交変換部９、量子化部１０、逆量子化部１１、逆直交変換部１２、加算部１３、メモリ１４、可変長符号化部１５、予測モード選択部４０、ブロックサイズ決定部２１、被写体領域推定部２２及び被写体検出部２３のそれぞれが専用のハードウェア（例えば、ＣＰＵを実装している半導体集積回路、あるいは、ワンチップマイコンなど）で構成されているものを想定しているが、画像符号化装置がコンピュータで構成されていてもよい。
画像符号化装置がコンピュータで構成される場合、メモリ１４をコンピュータの内部メモリ又は外部メモリ上に構成するとともに、符号化制御部１、ブロック分割部６、予測画像生成部７、減算部８、直交変換部９、量子化部１０、逆量子化部１１、逆直交変換部１２、加算部１３、可変長符号化部１５、予測モード選択部４０、ブロックサイズ決定部２１、被写体領域推定部２２及び被写体検出部２３の処理内容を記述しているプログラムを当該コンピュータのメモリに格納し、当該コンピュータのＣＰＵが当該メモリに格納されているプログラムを実行するようにすればよい。 In FIG. 9, the encoding control unit 1, the block division unit 6, the predicted image generation unit 7, the subtraction unit 8, the orthogonal transformation unit 9, the quantization unit 10, the inverse quantization unit 11, which are components of the image coding apparatus, Each of the inverse orthogonal transform unit 12, the addition unit 13, the memory 14, the variable length coding unit 15, the prediction mode selection unit 40, the block size determination unit 21, the subject area estimation unit 22, and the subject detection unit 23 is dedicated hardware ( For example, a semiconductor integrated circuit on which a CPU is mounted or a one-chip microcomputer or the like is assumed, but the image encoding device may be configured with a computer.
When the image encoding device is configured by a computer, the memory 14 is configured on an internal memory or an external memory of the computer, and the encoding control unit 1, the block division unit 6, the predicted image generation unit 7, the subtraction unit 8, and orthogonal Transform unit 9, quantization unit 10, inverse quantization unit 11, inverse orthogonal transform unit 12, addition unit 13, variable length coding unit 15, prediction mode selection unit 40, block size determination unit 21, subject region estimation unit 22, and A program describing the processing contents of the subject detection unit 23 may be stored in the memory of the computer, and the CPU of the computer may execute the program stored in the memory.

次に動作について説明する。
ただし、予測モード選択部４０を追加している点以外は、上記実施の形態１と同様であるため、ここでは、予測モード選択部４０の処理内容だけを説明する。
インター予測処理とイントラ予測処理のうち、インター予測処理は、上述したように、被写体の動きや形状を反映し易いため、被写体の領域推定に適した予測モードであると言える。
したがって、被写体が存在している領域の推定処理では、イントラ予測モードが選択されるよりも、インター予測モードが選択される方が好ましい。 Next, the operation will be described.
However, since it is the same as that of the said Embodiment 1 except the point which has added the prediction mode selection part 40, only the processing content of the prediction mode selection part 40 is demonstrated here.
Among the inter prediction processing and the intra prediction processing, the inter prediction processing is a prediction mode suitable for subject region estimation because it easily reflects the motion and shape of the subject as described above.
Therefore, in the estimation process of the region where the subject exists, it is preferable to select the inter prediction mode rather than the intra prediction mode.

予測モード選択部４０は、符号化効率検証部２が、符号化対象画像を各種のサイズのブロック（例えば、６４×６４画素のブロック、３２×３２画素のブロック、１６×１６画素のブロック、８×８画素のブロック）に分割して、各々のブロックに対する符号化処理を利用可能な各種の予測モード（Ｍ種類のイントラ予測モード、Ｎ種類のインター予測モード）で実施した場合の符号化効率を検証すると、イントラ予測モードで実施した場合の符号化効率ＩＮＴＲＡＥ_m（ｍ＝１，２，・・・，Ｍ）と、インター予測モードで実施した場合の符号化効率ＩＮＴＥＲＥ_n（ｎ＝１，２，・・・，Ｎ）とに分類する。 In the prediction mode selection unit 40, the encoding efficiency verification unit 2 converts the encoding target image into blocks of various sizes (for example, a block of 64 × 64 pixels, a block of 32 × 32 pixels, a block of 16 × 16 pixels, 8 The encoding efficiency when it is performed in various prediction modes (M types of intra prediction modes and N types of inter prediction modes) that can be used for encoding processing for each block. When verified, the encoding efficiency INTRAE _m (m = 1, 2,..., M) when implemented in the intra prediction mode and the encoding efficiency INTER _n (n = 1, 2) when implemented in the inter prediction mode. ,..., N).

そして、予測モード選択部４０は、インター予測モードが選択され易くするために、インター予測モードで実施した場合の符号化効率ＩＮＴＥＲＥ_nに対して、１より大きな定数Ａ（例えば、Ａ＝１．２や、Ａ＝１．５）を乗算することで、インター予測モードで実施した場合の符号化効率ＩＮＴＥＲＥ_nを更新する。なお、定数Ａは、ブロックサイズに応じて変更してもよい（例えば、ブロックサイズが８×８の場合はＡ＝１．２、８×４の場合はＡ＝１．５とする）。
ＩＮＴＥＲＥ_n＝ＩＮＴＥＲＥ_n×Ａ Then, the prediction mode selection unit 40 makes a constant A larger than 1 (for example, A = 1.2) with respect to the encoding efficiency INTER _n when implemented in the inter prediction mode in order to facilitate selection of the inter prediction mode. In addition, by multiplying by A = 1.5), the coding efficiency INTER _n in the case of performing in the inter prediction mode is updated. The constant A may be changed according to the block size (for example, A = 1.2 when the block size is 8 × 8 and A = 1.5 when the block size is 8 × 4).
INTER _n = INTER _n * A

予測モード選択部４０は、インター予測モードで実施した場合の符号化効率ＩＮＴＥＲＥ_nを更新すると、更新後の符号化効率ＩＮＴＥＲＥ_n（ｎ＝１，２，・・・，Ｎ）及びイントラ予測モードで実施した場合の符号化効率ＩＮＴＲＡＥ_m（ｍ＝１，２，・・・，Ｍ）の中で、最も高い符号化効率を特定し、その符号化効率に対応する予測モードを選択する。
予測モード選択部４０は、最も高い符号化効率に対応する予測モードを選択すると、その予測モードをブロックサイズ決定部２１に出力する。 When the prediction mode selection unit 40 updates the encoding efficiency INTERE _n when implemented in the inter prediction mode, the prediction mode selection unit 40 uses the updated encoding efficiency INTERE _n (n = 1, 2,..., N) and the intra prediction mode. Among the encoding efficiencies INTRAE _m (m = 1, 2,..., M) when implemented, the highest encoding efficiency is specified, and a prediction mode corresponding to the encoding efficiency is selected.
When the prediction mode selection unit 40 selects a prediction mode corresponding to the highest coding efficiency, the prediction mode selection unit 40 outputs the prediction mode to the block size determination unit 21.

この実施の形態２では、予測モード選択部４０が、インター予測モードが選択され易くするために、インター予測モードで実施した場合の符号化効率ＩＮＴＥＲＥ_nに対して、１より大きな定数Ａを乗算して更新するものを示したが、逆に、イントラ予測モードで実施した場合の符号化効率ＩＮＴＲＡＥ_mに対して、１より小さな定数Ｂ（例えば、Ｂ＝０．８や、Ｂ＝０．５）を乗算することで、イントラ予測モードが選択され難くなるように更新してもよい。
ＩＮＴＲＡＥ_m＝ＩＮＴＲＡＥ_m×Ｂ In the second embodiment, the prediction mode selection unit 40 multiplies the constant A larger than 1 by the encoding efficiency INTER _n when the prediction mode selection unit 40 is performed in the inter prediction mode in order to facilitate the selection of the inter prediction mode. However, on the contrary, a constant B smaller than 1 (for example, B = 0.8 or B = 0.5) with respect to the coding efficiency INTRAE _m when implemented in the intra prediction mode. May be updated so that it becomes difficult to select the intra prediction mode.
INTRAE _m = INTRAE _m × B

また、予測モード選択部４０は、インター予測モードで実施した場合の符号化効率ＩＮＴＥＲＥ_nに対して、正の定数Ｃを乗算するとともに、イントラ予測モードで実施した場合の符号化効率ＩＮＴＲＡＥ_mに対して、定数Ｃより小さい正の定数Ｄを乗算して、符号化効率ＩＮＴＥＲＥ_n，ＩＮＴＲＡＥ_mの両方を更新するようにしてもよい。 In addition, the prediction mode selection unit 40 multiplies the encoding efficiency INTER _n when implemented in the inter prediction mode by a positive constant C, and for the encoding efficiency INTRAE _m when implemented in the intra prediction mode. Thus, both the coding efficiency INTERE _n and INTRAY _m may be updated by multiplying by a positive constant D smaller than the constant C.

即ち、予測モード選択部４０は、インター予測モードが選択され易く、または、イントラ予測モードが選択され難くするために、インター予測モードで実施した場合の符号化効率ＩＮＴＥＲＥ_n、または、イントラ予測モードで実施した場合の符号化効率ＩＮＴＲＡＥ_mの少なくとも一方に定数を乗算してから、利用可能な複数の予測モードの中で、符号化効率が最も高い予測モードを選択するようにしてもよい。 In other words, the prediction mode selection unit 40 uses the encoding efficiency INTERn _n or the intra prediction mode when the inter prediction mode is selected in order to make it easy to select the inter prediction mode or to make the intra prediction mode difficult to select. A coding mode having the highest coding efficiency may be selected from among a plurality of available prediction modes after multiplying a constant by at least one of the coding efficiency INTRAE _m when implemented.

以上で明らかなように、この実施の形態２によれば、予測モード選択部４０が、符号化効率検証部２により検証された符号化効率のうち、インター予測モードで実施された場合の符号化効率に対して、１より大きな定数Ａを乗算してから、利用可能な複数の予測モードの中で、符号化効率が最も高い予測モードを選択するように構成したので、被写体の領域推定に適しているインター予測モードが選択され易くなり、上記実施の形態１よりも、被写体が存在している領域の推定精度を高めることができる効果を奏する。
また、この実施の形態２によれば、予測モード選択部４０が、符号化効率検証部２により検証された符号化効率のうち、イントラ予測モードで実施された場合の符号化効率に対して、１より小さな定数Ｂを乗算してから、利用可能な複数の予測モードの中で、符号化効率が最も高い予測モードを選択するように構成したので、被写体の領域推定に適しているインター予測モードが選択され易くなり、上記実施の形態１よりも、被写体が存在している領域の推定精度を高めることができる効果を奏する。 As is apparent from the above, according to the second embodiment, the encoding when the prediction mode selection unit 40 is performed in the inter prediction mode among the encoding efficiencies verified by the encoding efficiency verification unit 2. It is suitable for estimation of the area of the subject because the prediction mode having the highest coding efficiency is selected from the plurality of available prediction modes after multiplying the efficiency by a constant A larger than 1. The inter prediction mode is easily selected, and there is an effect that the estimation accuracy of the region where the subject is present can be improved as compared with the first embodiment.
Further, according to the second embodiment, the prediction mode selection unit 40 is compared with the coding efficiency when the prediction efficiency selection unit 40 is implemented in the intra prediction mode among the coding efficiency verified by the coding efficiency verification unit 2. Since the prediction mode having the highest encoding efficiency is selected from the plurality of available prediction modes after being multiplied by a constant B smaller than 1, the inter prediction mode suitable for subject region estimation Is more easily selected than in the first embodiment, and there is an effect that it is possible to improve the estimation accuracy of the region where the subject exists.

実施の形態３．
上記実施の形態１，２では、被写体領域推定部２２のマスク生成部３３が、２次元配列の数値を２値化することで、２値マスクを生成するものを示したが、２次元配列の数値を多値化することで、多値マスクを生成するようにしてもよい。
具体的な処理内容は、以下の通りである。 Embodiment 3 FIG.
In the first and second embodiments, the mask generation unit 33 of the subject region estimation unit 22 generates the binary mask by binarizing the numerical value of the two-dimensional array. A multi-value mask may be generated by converting a numerical value into a multi-value.
The specific processing content is as follows.

マスク生成部３３は、ブロックサイズ前処理部３１が２次元配列に対してブロックサイズに対応する数値を割り当てると、上記実施の形態１，２と同様に、その２次元配列の数値と閾値メモリ３２により格納されている閾値を比較する。ここでは、閾値メモリ３２により格納されている閾値を用いているが、外部から閾値が与えられるようにしてもよい。
また、マスク生成部３３は、上記実施の形態１，２と同様に、２次元配列の数値が閾値より大きければ、その数値を“１”に置き換え、その数値が閾値より小さければ、その数値を“０”に置き換える２値化処理を実施することで、２値マスクを生成する。 When the block size preprocessing unit 31 assigns a numerical value corresponding to the block size to the two-dimensional array, the mask generation unit 33 assigns the numerical value of the two-dimensional array and the threshold memory 32 as in the first and second embodiments. Compare the stored thresholds. Here, the threshold value stored in the threshold value memory 32 is used, but the threshold value may be given from the outside.
Similarly to the first and second embodiments, the mask generation unit 33 replaces the numerical value with “1” if the numerical value of the two-dimensional array is larger than the threshold value, and sets the numerical value if the numerical value is smaller than the threshold value. A binary mask is generated by performing binarization processing to replace with “0”.

マスク生成部３３は、２値マスクを生成すると、“１”に置き換えられているデータに対して輪郭追跡処理を適用することで、画像中から被写体の境界を表す閉曲線を検出し、その閉曲線の内部領域の重心を計算する。閉曲線の内部領域の重心は、被写体の中心を表していると考えられる。
マスク生成部３３は、生成した２値マスクにおける“１”の数値のうち、被写体の中心に近い位置（閉曲線の内部領域の重心に近い位置）にある数値ほど、大きな数値になるように変換する。 When the mask generation unit 33 generates a binary mask, it applies a contour tracking process to data that has been replaced with “1”, thereby detecting a closed curve that represents the boundary of the subject from the image. Calculate the center of gravity of the inner region. The center of gravity of the inner area of the closed curve is considered to represent the center of the subject.
The mask generation unit 33 converts the numerical value “1” in the generated binary mask so that a numerical value closer to the center of the subject (a position closer to the center of gravity of the inner area of the closed curve) becomes a larger numerical value. .

図１０は２値マスクと多値マスクの生成例を示す説明図であり、図１の例では、被写体の中心に近い位置にある数値は、最大の“５”の値に変換されている。
即ち、被写体の境界から被写体の中心に向かって、“１”→“２”→“３”→“４”→“５”のように変換されている。
このように、被写体の中心に近い位置（閉曲線の内部領域の重心に近い位置）にある数値ほど、大きな数値になるように変換しているのは、被写体の境界付近では誤差の影響を受け易く、背景領域を含んでいる可能性が高いからである。 FIG. 10 is an explanatory diagram showing an example of generation of a binary mask and a multi-value mask. In the example of FIG. 1, a numerical value at a position close to the center of the subject is converted to the maximum value “5”.
That is, conversion is performed in the order of “1” → “2” → “3” → “4” → “5” from the boundary of the subject toward the center of the subject.
In this way, the numerical value that is closer to the center of the subject (closer to the center of gravity of the inner area of the closed curve) is converted to a larger value because it is easily affected by errors near the boundary of the subject. This is because the possibility of including the background area is high.

被写体検出部２３は、被写体領域推定部２２のマスク生成部３３が多値マスクを生成すると、多値マスクに対する物体認識処理を実施して、符号化対象画像に写っている被写体を検出する。
この実施の形態３では、被写体検出部２３が、物体認識処理を適用する領域を多値マスクの数値が“０”以外の領域に限定することで、物体認識に要する演算量の削減と物体の誤認識の削減を実現している。 When the mask generation unit 33 of the subject region estimation unit 22 generates a multi-value mask, the subject detection unit 23 performs an object recognition process on the multi-value mask and detects a subject in the encoding target image.
In the third embodiment, the subject detection unit 23 limits the region to which the object recognition process is applied to a region where the value of the multi-value mask is other than “0”, thereby reducing the amount of calculation required for the object recognition and the object. Reduces misrecognition.

被写体検出部２３は、物体認識処理を実施するに際して、画像中の特徴点を抽出する処理などを実施して、その特徴点に対する評価値を算出し、その評価値が所定値を超えていれば、通常、その特徴点の位置は被写体が存在している位置であると判別する。
しかし、この実施の形態３では、評価値と所定値を比較する前に、その評価値に対応する特徴点の位置の数値（多値マスクの数値）を当該評価点に乗算することで、当該評価点を更新する。
評価点＝評価点×（多値マスクの数値／ａ）
ただし、ａは、（多値マスクの数値／ａ）の値を、例えば、１〜２の範囲内に収めるための正規化定数である。
被写体検出部２３は、更新後の評価値が所定値を超えていれば、特徴点の位置は被写体が存在している位置であると判別する。 When performing the object recognition process, the subject detection unit 23 performs a process of extracting a feature point in the image, calculates an evaluation value for the feature point, and if the evaluation value exceeds a predetermined value Usually, the position of the feature point is determined as the position where the subject exists.
However, in the third embodiment, before the evaluation value is compared with the predetermined value, the evaluation point is multiplied by the numerical value of the position of the feature point corresponding to the evaluation value (the numerical value of the multi-value mask). Update evaluation points.
Evaluation point = Evaluation point × (Numerical value of multi-value mask / a)
However, a is a normalization constant for keeping the value of (numerical value of multi-value mask / a) within a range of 1 to 2, for example.
If the updated evaluation value exceeds the predetermined value, the subject detection unit 23 determines that the position of the feature point is a position where the subject exists.

以上で明らかなように、この実施の形態３によれば、被写体領域推定部２２のマスク生成部３３が、２次元配列の数値を多値化することで多値マスクを生成し、被写体検出部２３が、多値マスクに対する物体認識処理を実施して、符号化対象画像に写っている被写体を検出するように構成したので、ノイズや背景の影響を受け易い被写体の境界付近での認識性能の低下を防止することができる効果を奏する。 As is apparent from the above, according to the third embodiment, the mask generation unit 33 of the subject region estimation unit 22 generates a multi-value mask by converting the numerical values of the two-dimensional array into multi-values, and the subject detection unit 23 performs the object recognition processing on the multi-value mask and detects the subject in the encoding target image, so that the recognition performance near the boundary of the subject that is easily influenced by noise and background is improved. There exists an effect which can prevent a fall.

なお、本願発明はその発明の範囲内において、各実施の形態の自由な組み合わせ、あるいは各実施の形態の任意の構成要素の変形、もしくは各実施の形態において任意の構成要素の省略が可能である。 In the present invention, within the scope of the invention, any combination of the embodiments, or any modification of any component in each embodiment, or omission of any component in each embodiment is possible. .

上記実施の形態１では、インター予測モードは通常マージモードを含むものとして説明したが、イントラ予測モードとマージモードを除外したインター予測モードとを予測モードに採用して、この発明の画像符号化装置を構成することができる。
例えば、ブロックサイズ決定部２１は符号化制御部１の予測モード選択部３により選択された予測モードがイントラ予測モードであれば、所定のブロックサイズ（例えば、６４×６４画素）に決定し、その予測モードがインター予測モードであれば、上記のブロックサイズより小さなブロックサイズ（例えば、１６×１６画素）に決定する処理を実施してもよい。
この場合においても、ブロックサイズ決定部が二重化されるが、上記実施の形態１でも説明したように、ブロックサイズ決定部は、画像符号化装置の回路規模に占める割合が小さく、二重化しても回路規模の増大は少ない。また、全く異なる被写体領域推定手法を実装する場合と比べて、通常の画像符号化装置で用いられる回路を複製するだけで済むため、設計に要する労力も大幅に小さく抑えることができる。
同様に、上記実施の形態２，３においても、イントラ予測モードとマージモードを除外したインター予測モードとを予測モードに採用して、この発明の画像符号化装置を構成して実施することで、同じように効果を奏することができる。 In the first embodiment, the inter prediction mode has been described as including the normal merge mode. However, the intra prediction mode and the inter prediction mode excluding the merge mode are adopted as the prediction modes, and the image coding apparatus according to the present invention is used. Can be configured.
For example, if the prediction mode selected by the prediction mode selection unit 3 of the encoding control unit 1 is the intra prediction mode, the block size determination unit 21 determines a predetermined block size (for example, 64 × 64 pixels), and If the prediction mode is the inter prediction mode, a process of determining a block size smaller than the block size (for example, 16 × 16 pixels) may be performed.
Even in this case, the block size determination unit is duplexed. However, as described in the first embodiment, the block size determination unit has a small proportion of the circuit scale of the image coding apparatus, and even if the block size determination unit is duplexed, the circuit is determined. There is little increase in scale. Further, compared with the case where a completely different subject area estimation method is implemented, it is only necessary to duplicate a circuit used in a normal image coding apparatus, so that the labor required for design can be greatly reduced.
Similarly, in the second and third embodiments, the intra prediction mode and the inter prediction mode excluding the merge mode are adopted as the prediction mode, and the image coding apparatus of the present invention is configured and implemented. The same effect can be achieved.

１符号化制御部、２符号化効率検証部（符号化効率検証手段）、３予測モード選択部（予測モード選択手段、第１の予測モード選択手段）、４ブロックサイズ決定部（第１のブロックサイズ決定手段）、５予測差分符号化パラメータ決定部、６ブロック分割部（ブロック分割手段）、７予測画像生成部（予測画像生成手段）、８減算部（画像圧縮手段）、９直交変換部（画像圧縮手段）、１０量子化部（画像圧縮手段）、１１逆量子化部、１２逆直交変換部、１３加算部、１４メモリ、１５可変長符号化部（符号化手段）、２１ブロックサイズ決定部（第２のブロックサイズ決定手段）、２２被写体領域推定部（被写体領域推定手段）、２３被写体検出部（被写体検出手段）、３１ブロックサイズ前処理部、３２閾値メモリ、３３マスク生成部、４０予測モード選択部（第２の予測モード選択手段）。 DESCRIPTION OF SYMBOLS 1 Encoding control part, 2 Encoding efficiency verification part (Encoding efficiency verification means), 3 Prediction mode selection part (Prediction mode selection means, 1st prediction mode selection means), 4 Block size determination part (1st block) (Size determining means), 5 prediction difference encoding parameter determining section, 6 block dividing section (block dividing means), 7 predicted image generating section (predicted image generating means), 8 subtracting section (image compressing means), 9 orthogonal transform section ( Image compression means), 10 quantization section (image compression means), 11 inverse quantization section, 12 inverse orthogonal transform section, 13 addition section, 14 memory, 15 variable length coding section (coding means), 21 block size determination (Second block size determining means), 22 subject area estimating section (subject area estimating means), 23 subject detecting section (subject detecting means), 31 block size preprocessing section, 32 Threshold memory, 33 mask generation unit, 40 prediction mode selection unit (second prediction mode selection means).

Claims

A prediction mode selection unit that selects a prediction mode used when performing an encoding process on each block included in the encoding target image from a plurality of available prediction modes;
If the merge mode included in the inter prediction mode is selected by the prediction mode selection unit as the prediction mode for a certain block among the blocks included in the encoding target image, the prediction mode selection is performed. Is determined so that the size of the block is smaller than when the intra prediction mode is selected by the unit, and the prediction mode selection unit selects an inter prediction mode other than the merge mode as a prediction mode for a certain block. If so, a block size determination unit that determines the size of the block to be smaller than when the merge mode is selected by the prediction mode selection unit;
Of each block included in the encoding target image, an element corresponding to a block whose prediction mode selected by the prediction mode selection unit is an inter prediction mode other than the merge mode, the prediction mode selection Block corresponding to each of a plurality of regions included in the block whose prediction mode is selected as the intra prediction mode and a block whose prediction mode is selected as the merge mode A two-dimensional array table is prepared in which elements corresponding to each of a plurality of regions included in the two-dimensional array are prepared, and the prediction mode is the merge mode. Inter prediction modes other than the merge mode as long as they correspond to blocks that are inter prediction modes other than A numerical value corresponding to the size of a certain block is assigned to the element, and if the element of the two-dimensional array table corresponds to an area included in a block whose prediction mode is the intra prediction mode, the intra prediction A numerical value corresponding to the size of a block that is a mode is assigned to the element, and if the element of the two-dimensional array table corresponds to an area included in a block whose prediction mode is the merge mode, the merge Assign a number corresponding to the block size of a mode in the element, by performing the threshold processing for the numerical value assigned to each element of the two-dimensional array table, and estimates a region in which there is the Utsushitai A subject area estimation unit;
The subject area by carrying out object recognition processing for the estimated area by the estimation unit, the image coding apparatus that includes a subject detection unit for detecting an object that is reflected in the encoding target image.

The subject area estimation unit is determined by the block size determination unit for a block corresponding to an element of the two-dimensional array table or a block including an area corresponding to an element of the two-dimensional array table. A binary value indicating a region where the subject exists by assigning a larger numerical value to an element of the two-dimensional array table as the size is smaller and binarizing the numerical value allocated to each element of the two-dimensional array table The image encoding apparatus according to claim 1, wherein a mask is generated.

The subject area estimation unit is determined by the block size determination unit for a block corresponding to an element of the two-dimensional array table or a block including an area corresponding to an element of the two-dimensional array table. By assigning a larger numerical value to an element of the two-dimensional array table as the size is smaller, and multi-value the numerical value assigned to each element of the two-dimensional array table, a multi-value indicating an area where the subject exists The image encoding apparatus according to claim 1, wherein a mask is generated.

4. The object detection unit according to claim 3, wherein the subject detection unit detects an object in the encoding target image by performing object recognition processing on the multilevel mask generated by the subject region estimation unit. Image encoding device.