JP2024056441A

JP2024056441A - Image processing device, method and program for controlling the image processing device

Info

Publication number: JP2024056441A
Application number: JP2022163309A
Authority: JP
Inventors: 浩靖形川; 貴弘宇佐美; 侑弘小貝; 寧司大輪; 友貴植草; 浩之谷口
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2022-10-11
Filing date: 2022-10-11
Publication date: 2024-04-23

Abstract

【課題】被写体の動きが速い場合の被写体追跡の精度を向上させることができる画像処理装置、画像処理装置の制御方法およびプログラムを提供することを目的とする。【解決手段】逐次の入力画像で特定の被写体を追跡する画像処理装置１０１は、特定の被写体に対応する基準画像を登録する基準画像登録回路２０２、入力画像に設定される複数の部分領域３０４毎に基準画像との相関度を求める相関度算出回路２０３、部分領域３０４毎に入力画像内の所定の基準位置からの距離を算出する距離算出回路２０４、部分領域３０４毎に相関度と距離とを用いて評価値を算出する評価値算出回路２０５、評価値に基づいて複数の部分領域３０４のうち特定の被写体を含む領域を決定する追跡処理制御回路２０６を備える。評価値算出回路２０５は、評価値を算出する際に距離が評価値に寄与する度合いを、相関度算出回路２０３で相関度を求めるときのフレームレートに応じて変化させる。【選択図】図２[Problem] To provide an image processing device, a control method for an image processing device, and a program capable of improving the accuracy of tracking a subject when the subject moves quickly. [Solution] An image processing device 101 for tracking a specific subject in successive input images includes a reference image registration circuit 202 for registering a reference image corresponding to the specific subject, a correlation degree calculation circuit 203 for calculating the degree of correlation between each of a plurality of partial regions 304 set in the input image and the reference image, a distance calculation circuit 204 for calculating the distance from a predetermined reference position in the input image for each of the partial regions 304, an evaluation value calculation circuit 205 for calculating an evaluation value for each of the partial regions 304 using the correlation degree and the distance, and a tracking processing control circuit 206 for determining an area including the specific subject among the plurality of partial regions 304 based on the evaluation value. The evaluation value calculation circuit 205 changes the degree to which the distance contributes to the evaluation value when calculating the evaluation value according to the frame rate when the correlation degree calculation circuit 203 calculates the correlation degree. [Selected Figure] FIG.

Description

本発明は、画像処理装置、画像処理装置の制御方法およびプログラムに関する。 The present invention relates to an image processing device, a control method for an image processing device, and a program.

時系列的に逐次供給される１フレームの画像において特定の被写体を検出し、その検出した被写体を追跡する技術が非常に有用であり、例えば、動画像における人間の顔領域や人体領域の特定に利用されている。このような技術は、電話会議、マン・マシン・インターフェース、セキュリティ、任意の被写体を追跡するためのモニタ・システム、画像圧縮などの多くの分野で使用することができる。 Technology that detects a specific subject in a frame of image that is supplied sequentially in a time series and tracks the detected subject is very useful, and is used, for example, to identify human face areas and human body areas in moving images. Such technology can be used in many fields, such as telephone conferences, man-machine interfaces, security, monitor systems for tracking arbitrary subjects, and image compression.

また、デジタルスチルカメラやデジタルビデオカメラなどにおいて、撮像画像に含まれる任意の被写体を抽出および追跡して、被写体に対する焦点状態や露出状態を最適化する技術が知られている。例えば、特許文献１には、撮像画像に含まれる顔の位置を検出（抽出）および追跡し、その顔に対して、焦点を合わせると共に最適な露出で撮影する画像処理装置が開示されている。このとき、検出された顔を追跡することにより、時系列に対して安定的な制御が可能になる。また、特許文献２は、あるフレームで検出された顔を、後続するフレームで検出する追跡処理を行うことを開示している。特定の被写体を後続するフレームで追跡する方法としては、特許文献２に開示されるような、テンプレートマッチングの手法を利用する方法が知られている。なお、テンプレートマッチングとは、追跡対象となる特定の被写体を含む画像領域を切り出した部分画像を基準画像（テンプレート画像）として登録し、基準画像と最も相関度が高い領域を算出し、特定の被写体を追跡する技術である。 In addition, in digital still cameras and digital video cameras, a technique is known for extracting and tracking an arbitrary subject included in a captured image to optimize the focus and exposure of the subject. For example, Patent Document 1 discloses an image processing device that detects (extracts) and tracks the position of a face included in a captured image, focuses on the face, and captures the image with optimal exposure. At this time, tracking the detected face enables stable control over time series. Patent Document 2 discloses a tracking process that detects a face detected in a certain frame in a subsequent frame. As a method for tracking a specific subject in subsequent frames, a method that uses a template matching technique, as disclosed in Patent Document 2, is known. Template matching is a technique in which a partial image obtained by cutting out an image area including a specific subject to be tracked is registered as a reference image (template image), and the area with the highest correlation with the reference image is calculated to track the specific subject.

特開２００５－３１８５５４号公報JP 2005-318554 A 特開２００１－０６０２６９号公報JP 2001-060269 A

テンプレートマッチングを用いた被写体追跡方法では、被写体を追跡するためのフレーム画像と、基準画像（テンプレート画像）との相関度に基づいて被写体を追跡する。ここで、フレーム画像内に基準画像と類似する、しかし追跡すべき被写体とは異なる領域が存在する場合、その領域（以下、「類似領域」という）を被写体と誤検出することがある。この問題は、特に、フレーム画像中で被写体の見えが基準画像から変化している場合に発生しやすい。そこで、時系列的に連続な２つのフレーム画像間においては、被写体の位置が大きく変化しないという仮定を導入し、相関度の高い複数の領域のうち、画像間の移動距離が大きい領域は、被写体を含む領域でないとみなすことが考えられる。以下、被写体を含む領域を「被写体領域」という。これにより、被写体の動きが遅い場合には、類似領域を被写体領域と誤検出する可能性を低減することができるであろう。しかしながら、被写体の動きが速い場合、時系列的に連続な２つのフレーム画像間における被写体領域の位置変化は大きいため、被写体の動きが速い場合には上述の仮定の導入により、むしろ誤検出の可能性が高くなるおそれがある。 In a subject tracking method using template matching, a subject is tracked based on the degree of correlation between a frame image for tracking the subject and a reference image (template image). Here, if there is an area in a frame image that is similar to the reference image but different from the subject to be tracked, that area (hereinafter referred to as a "similar area") may be erroneously detected as the subject. This problem is particularly likely to occur when the appearance of the subject in the frame image has changed from the reference image. Therefore, it is possible to introduce an assumption that the position of the subject does not change significantly between two chronologically consecutive frame images, and to consider an area with a large moving distance between images among multiple areas with high correlation as not including the subject. Hereinafter, an area including the subject is referred to as a "subject area". This will reduce the possibility of erroneously detecting a similar area as the subject area when the subject moves slowly. However, when the subject moves quickly, the position of the subject area changes significantly between two chronologically consecutive frame images, so the introduction of the above assumption may actually increase the possibility of erroneous detection when the subject moves quickly.

本発明は、上記の課題に鑑みてなされたものである。本発明は、被写体の動きが速い場合の被写体追跡の精度を向上させることができる画像処理装置、画像処理装置の制御方法およびプログラムを提供することを目的とする。 The present invention has been made in consideration of the above problems. It is an object of the present invention to provide an image processing device, a control method for the image processing device, and a program that can improve the accuracy of subject tracking when the subject is moving quickly.

上記目的を達成するために、本発明の画像処理装置は、逐次供給される複数の入力画像に亘って特定の被写体を追跡する画像処理装置であって、前記特定の被写体に対応する基準画像を登録する登録手段と、前記入力画像に設定される複数の部分領域の各々について前記基準画像との相関度を求める相関算出手段と、前記複数の部分領域の各々について前記入力画像内の所定の基準位置からの距離を算出する距離算出手段と、前記複数の部分領域の各々について前記相関度と前記距離とを用いて評価値を算出する評価算出手段と、前記評価値に基づいて前記複数の部分領域の各々のうち前記特定の被写体を含む領域を決定する決定手段と、前記評価算出手段が前記評価値を算出する際に前記距離が前記評価値に寄与する度合いを、前記相関算出手段で前記相関度を求めるときのフレームレート、前記特定の被写体の種類、または前記特定の被写体の速度に応じて変化させる変化手段と、を備えることを特徴とする。 In order to achieve the above object, the image processing device of the present invention is an image processing device that tracks a specific subject across multiple input images that are sequentially supplied, and is characterized in that it comprises: a registration means for registering a reference image corresponding to the specific subject; a correlation calculation means for calculating a degree of correlation between each of multiple partial regions set in the input image and the reference image; a distance calculation means for calculating a distance from a predetermined reference position in the input image for each of the multiple partial regions; an evaluation calculation means for calculating an evaluation value for each of the multiple partial regions using the degree of correlation and the distance; a determination means for determining an area that includes the specific subject among the multiple partial regions based on the evaluation value; and a change means for changing the degree to which the distance contributes to the evaluation value when the evaluation calculation means calculates the evaluation value, depending on the frame rate, the type of the specific subject, or the speed of the specific subject when the correlation calculation means calculates the degree of correlation.

本発明によれば、被写体の動きが速い場合の被写体追跡の精度を向上させることができる。 The present invention can improve the accuracy of tracking a subject when the subject is moving quickly.

第１実施形態に係わる画像処理装置の概略構成を示すブロック図である。1 is a block diagram showing a schematic configuration of an image processing apparatus according to a first embodiment. 第１実施形態における被写体追跡回路の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a subject tracking circuit in the first embodiment. テンプレートマッチングを説明するための図である。FIG. 11 is a diagram for explaining template matching. 第１実施形態における距離とゲイン（ＧＡＩＮ）の関係を示す図である。FIG. 11 is a diagram showing the relationship between distance and gain (GAIN) in the first embodiment. 第１実施形態における被写体追跡処理を示すフローチャートである。5 is a flowchart showing a subject tracking process in the first embodiment. 第２実施形態における距離とゲイン（ＧＡＩＮ）の関係を示す図である。FIG. 11 is a diagram showing the relationship between distance and gain (GAIN) in the second embodiment. 第２実施形態における被写体追跡処理を示すフローチャートである。10 is a flowchart showing a subject tracking process in the second embodiment. 第３実施形態における被写体追跡回路の構成を示すブロック図である。FIG. 13 is a block diagram showing a configuration of a subject tracking circuit according to a third embodiment. 第３実施形態における被写体追跡処理を示すフローチャートである。13 is a flowchart showing a subject tracking process in the third embodiment.

以下、本発明の各実施形態について図面を参照しながら詳細に説明する。ただし、以下の各実施形態に記載されている構成はあくまで例示に過ぎず、本発明の範囲は各実施形態に記載されている構成によって限定されることはない。例えば、本発明を構成する各部は、同様の機能を発揮し得る任意の構成のものと置換することができる。また、任意の構成物が付加されていてもよい。また、各実施形態のうちの、任意の２以上の構成（特徴）を組み合わせることもできる。なお、各図面を通じて同一の構成要素には同一の符号を付し、その説明を簡略化又は省略することがある。 Each embodiment of the present invention will be described in detail below with reference to the drawings. However, the configurations described in the following embodiments are merely examples, and the scope of the present invention is not limited by the configurations described in the embodiments. For example, each part constituting the present invention can be replaced with any configuration that can perform a similar function. In addition, any component may be added. In addition, any two or more configurations (features) of each embodiment can be combined. Note that the same components are given the same reference numerals throughout the drawings, and their descriptions may be simplified or omitted.

＜第１実施形態＞
以下、図１～図５を参照して、第１実施形態について説明する。図１は、第１実施形態に係わる画像処理装置１０１の概略構成を示すブロック図である。画像処理装置１０１は、被写体の画像を撮像するデジタルスチルカメラやデジタルビデオカメラとして具現化される。また、画像処理装置１０１は、時系列的に逐次供給される画像に含まれる被写体を追跡する被写体追跡装置としても機能する。画像処理装置１０１は、レンズなどの光学系１０２、撮像素子１０３、アナログ信号処理回路１０４、Ａ／Ｄ変換器１０５、制御回路１０６、画像処理回路１０７、表示器１０８、記録媒体１０９、被写体指定部１１０、および被写体追跡回路１１１を有する。 First Embodiment
The first embodiment will be described below with reference to Figs. 1 to 5. Fig. 1 is a block diagram showing a schematic configuration of an image processing device 101 according to the first embodiment. The image processing device 101 is embodied as a digital still camera or a digital video camera that captures an image of a subject. The image processing device 101 also functions as a subject tracking device that tracks a subject included in images that are sequentially supplied in a time series. The image processing device 101 has an optical system 102 such as a lens, an image sensor 103, an analog signal processing circuit 104, an A/D converter 105, a control circuit 106, an image processing circuit 107, a display 108, a recording medium 109, a subject designation unit 110, and a subject tracking circuit 111.

被写体の像を表す光は、光学系１０２によって集光され、ＣＣＤイメージセンサやＣＭＯＳイメージセンサなどで構成された撮像素子１０３に入射する。撮像素子１０３は、入射する光の強度に応じた電気信号を画素単位で出力する。すなわち、撮像素子１０３は、光学系１０２によって形成された被写体の像を光電変換する。撮像素子１０３から出力される電気信号は、撮像素子１０３で撮像された被写体の像を示すアナログの映像信号である。撮像素子１０３から出力された映像信号に対しては、アナログ信号処理回路１０４で相関二重サンプリング（ＣＤＳ）などのアナログ信号処理が行われる。アナログ信号処理回路１０４から出力された映像信号は、Ａ／Ｄ変換器１０５でデジタルデータの形式に変換され、制御回路１０６および画像処理回路１０７に入力される。 Light representing the image of the subject is collected by the optical system 102 and enters the image sensor 103, which is composed of a CCD image sensor, a CMOS image sensor, or the like. The image sensor 103 outputs an electrical signal corresponding to the intensity of the incident light on a pixel-by-pixel basis. That is, the image sensor 103 photoelectrically converts the image of the subject formed by the optical system 102. The electrical signal output from the image sensor 103 is an analog video signal representing the image of the subject captured by the image sensor 103. The video signal output from the image sensor 103 is subjected to analog signal processing such as correlated double sampling (CDS) in the analog signal processing circuit 104. The video signal output from the analog signal processing circuit 104 is converted into a digital data format by the A/D converter 105 and input to the control circuit 106 and the image processing circuit 107.

制御回路１０６は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）やマイクロコントローラなどであり、画像処理装置１０１の動作を制御する。具体的には、制御回路１０６は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）に記憶されたプログラムをＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）の作業領域に展開して順次実行することで、画像処理装置１０１の各部を制御する。 The control circuit 106 is a CPU (Central Processing Unit) or a microcontroller, and controls the operation of the image processing device 101. Specifically, the control circuit 106 controls each part of the image processing device 101 by expanding a program stored in a ROM (Read Only Memory) into a working area of a RAM (Random Access Memory) and executing the program sequentially.

制御回路１０６は、撮像素子１０３で撮像する際の焦点の状況や露出の状況などの撮影条件を制御する。具体的には、制御回路１０６は、Ａ／Ｄ変換器１０５から出力された映像信号に基づいて、光学系１０２の焦点制御機構や露出制御機構（いずれも不図示）を制御する。例えば、焦点制御機構は、光学系１０２に含まれるレンズを光軸方向へ駆動させるアクチュエータなどであり、露出制御機構は、絞りやシャッタを駆動させるアクチュエータなどである。また、制御回路１０６は、撮像素子１０３の出力タイミングや出力画素など、撮像素子１０３の読み出し制御を行う。 The control circuit 106 controls the shooting conditions such as the focus and exposure conditions when capturing an image with the image sensor 103. Specifically, the control circuit 106 controls the focus control mechanism and exposure control mechanism (neither shown) of the optical system 102 based on the video signal output from the A/D converter 105. For example, the focus control mechanism is an actuator that drives a lens included in the optical system 102 in the optical axis direction, and the exposure control mechanism is an actuator that drives an aperture and a shutter. The control circuit 106 also controls the readout of the image sensor 103, such as the output timing and output pixels of the image sensor 103.

画像処理回路１０７は、Ａ／Ｄ変換器１０５から出力された映像信号に対して、ガンマ補正、ホワイトバランス処理などの画像処理を行う。また、画像処理回路１０７は、通常の画像処理に加え、後述する被写体追跡回路１１１から供給される画像中の被写体領域に関する情報を用いた画像処理を行う機能も有する。画像処理回路１０７から出力された映像信号は、表示器１０８に送られる。表示器１０８は、例えば、ＬＣＤや有機ＥＬディスプレイで構成され、映像信号を表示する。従って、撮像素子１０３で時系列的に逐次撮像した画像が、表示器１０８に逐次表示される。これにより、表示器１０８は、電子ビューファインダ（ＥＶＦ）として機能する。また、表示器１０８は、被写体追跡回路１１１によって追跡している被写体を含む被写体領域を矩形などで表示する。また、画像処理回路１０７から出力された映像信号は、記録媒体１０９（例えば、着脱可能なメモリーカードなど）に記録される。なお、映像信号の記録先は、画像処理装置１０１の内蔵メモリであっても、通信インターフェースによる通信可能に接続された外部装置（不図示）であってもよい。 The image processing circuit 107 performs image processing such as gamma correction and white balance processing on the video signal output from the A/D converter 105. In addition to normal image processing, the image processing circuit 107 also has a function of performing image processing using information on the subject area in the image supplied from the subject tracking circuit 111 described later. The video signal output from the image processing circuit 107 is sent to the display 108. The display 108 is composed of, for example, an LCD or an organic EL display, and displays the video signal. Therefore, images captured sequentially in time series by the image sensor 103 are displayed sequentially on the display 108. This causes the display 108 to function as an electronic viewfinder (EVF). The display 108 also displays the subject area including the subject being tracked by the subject tracking circuit 111 as a rectangle or the like. The video signal output from the image processing circuit 107 is also recorded on the recording medium 109 (for example, a removable memory card, etc.). The video signal may be recorded in the built-in memory of the image processing device 101 or in an external device (not shown) connected to enable communication via a communication interface.

被写体指定部１１０は、例えば表示器１０８に設けられたタッチパネルや、画像処理装置１０１の筐体に設けられたキーやボタンなどの入力インターフェースである。ユーザ（撮影者）は、例えば表示器１０８に表示されている映像信号中から所望の被写体の領域を被写体指定部１１０で指定することにより、追跡すべき被写体を指定することができる。タッチパネルや、キー、ボタンなどを用いて画像中から任意の領域を指定する方法は、特に制限はなく、周知の方法を採用できる。 The subject designation unit 110 is an input interface, such as a touch panel provided on the display 108, or keys and buttons provided on the housing of the image processing device 101. The user (photographer) can designate the subject to be tracked by designating the area of the desired subject from the video signal displayed on the display 108, for example, using the subject designation unit 110. There are no particular limitations on the method of designating an arbitrary area from within an image using a touch panel, keys, buttons, etc., and any well-known method can be used.

被写体追跡回路１１１は、画像処理回路１０７から時系列的に逐次供給される（すなわち、撮影された時刻の異なる）画像に含まれる被写体を追跡する。被写体追跡回路１１１は、顔検出など特定の被写体を検出する被写体検出回路を有しており、検出された被写体を追跡する。また、被写体追跡回路１１１は、被写体指定部１１０によって指定された被写体を被写体の画素パターンに基づき、逐次供給される画像から被写体領域を推定してもよい。被写体追跡回路１１１の詳細については後述する。制御回路１０６は、上述の焦点制御機構や露出制御機構の制御に、被写体追跡回路１１１から供給された被写体領域の情報を用いることができる。具体的には、制御回路１０６は、被写体領域のコントラスト値を用いた焦点制御や、被写体領域の輝度値を用いた露出制御を行う。これにより、画像処理装置１０１では、撮像画像における特定の被写体領域を考慮した撮像を行うことができる。 The subject tracking circuit 111 tracks subjects included in images sequentially supplied in a time series (i.e., images captured at different times) from the image processing circuit 107. The subject tracking circuit 111 has a subject detection circuit that detects a specific subject, such as face detection, and tracks the detected subject. The subject tracking circuit 111 may also estimate a subject area from images sequentially supplied based on the pixel pattern of the subject specified by the subject specification unit 110. Details of the subject tracking circuit 111 will be described later. The control circuit 106 can use the subject area information supplied from the subject tracking circuit 111 to control the focus control mechanism and exposure control mechanism described above. Specifically, the control circuit 106 performs focus control using the contrast value of the subject area and exposure control using the luminance value of the subject area. This allows the image processing device 101 to perform imaging that takes into account a specific subject area in the captured image.

ここで、被写体追跡回路１１１に関して詳細を説明する。被写体追跡回路１１１は、マッチング手段として機能する。つまり、被写体追跡回路１１１は、追跡対象とする被写体を示す部分画像をテンプレートとして、供給された画像の部分領域と照合し、照合する部分領域を変化させて、相関度が高い領域を推定するマッチング手段として機能する。以下、このようなマッチング手段を、「テンプレートマッチング」という。図２は、被写体追跡回路１１１の構成を示すブロック図である。被写体追跡回路１１１は、被写体検出回路２０１、基準画像登録回路２０２、相関度算出回路２０３、距離算出回路２０４、評価値算出回路２０５、および追跡処理制御回路２０６により構成される。被写体検出回路２０１から追跡処理制御回路２０６の各ブロックは、バスによって接続され、データのやり取りができる。 Here, the subject tracking circuit 111 will be described in detail. The subject tracking circuit 111 functions as a matching means. In other words, the subject tracking circuit 111 functions as a matching means that uses a partial image showing the subject to be tracked as a template, compares it with a partial area of the supplied image, changes the partial area to be compared, and estimates an area with a high degree of correlation. Hereinafter, such a matching means will be referred to as "template matching". FIG. 2 is a block diagram showing the configuration of the subject tracking circuit 111. The subject tracking circuit 111 is composed of a subject detection circuit 201, a reference image registration circuit 202, a correlation calculation circuit 203, a distance calculation circuit 204, an evaluation value calculation circuit 205, and a tracking processing control circuit 206. Each block from the subject detection circuit 201 to the tracking processing control circuit 206 is connected by a bus, and data can be exchanged.

被写体検出回路２０１は、供給される画像から追跡対象とする被写体を検出し特定する。追跡対象とする被写体としては、例えば、人物の顔などが代表的である。この場合、被写体検出回路２０１は、被写体領域として人物の顔領域を特定し、その人物の顔領域を追跡対象とする。被写体検出回路２０１における被写体の検出方法には、例えば、検出対象が人物の顔である場合、公知の顔検出方法を用いてもよい。顔検出の公知技術として、顔に関する知識（肌色情報、目・鼻・口などのパーツ）を利用する方法と、ニューラルネットに代表される学習アルゴリズムにより顔検出のための識別器を構成する方法などがある。また、顔検出では、認識率向上のために、これらを組み合わせて顔認識を行うのが一般的である。そのような顔検出の方法には、例えば、ウェーブレット変換と画像特徴量を利用して顔検出する方法がある。基準画像登録回路２０２（登録手段）は、追跡対象とする被写体を示す部分画像を基準画像（テンプレート）として登録する。相関度算出回路２０３（相関算出手段）は、基準画像登録回路２０２により登録されたテンプレートと、供給された画像の部分領域を照合し、照合する部分領域を変化させて、相関度が高い領域を推定する（テンプレートマッチング）。 The subject detection circuit 201 detects and identifies a subject to be tracked from the supplied image. A typical subject to be tracked is, for example, a person's face. In this case, the subject detection circuit 201 identifies the face area of the person as the subject area and tracks the face area of the person. For example, when the detection target is a person's face, a publicly known face detection method may be used as the subject detection method in the subject detection circuit 201. Publicly known face detection techniques include a method that uses knowledge about the face (skin color information, parts such as the eyes, nose, and mouth) and a method that uses a learning algorithm such as a neural network to configure a classifier for face detection. In addition, in face detection, it is common to combine these methods to perform face recognition in order to improve the recognition rate. Such face detection methods include, for example, a method of detecting faces using wavelet transformation and image features. The reference image registration circuit 202 (registration means) registers a partial image showing the subject to be tracked as a reference image (template). The correlation calculation circuit 203 (correlation calculation means) compares the template registered by the reference image registration circuit 202 with a partial area of the supplied image, and estimates areas with high correlation by changing the partial area to be compared (template matching).

図３を参照して、テンプレートマッチングの詳細について説明する。図３（ａ）は、テンプレートマッチングにおける基準画像の一例を示す図である。テンプレート３０１は、追跡対象となる被写体を示す部分画像（基準画像）であり、この部分画像の画素パターンが特徴量として扱われる。特徴量３０２は、テンプレート３０１における複数領域の各座標の特徴量を表現したものであり、第１実施形態では、画素データの輝度信号を特徴量とする。特徴量Ｔ（ｉ，ｊ）は、テンプレート３０１の領域内の座標を（ｉ，ｊ）、水平画素数をＷ、垂直画素数をＨとすると、式（１）で表現される。
Ｔ（ｉ，ｊ）＝｛Ｔ（０，０），Ｔ（１，０），・・・，Ｔ（Ｗ－１，Ｈ－１）｝・・・（１） Template matching will be described in detail with reference to Fig. 3. Fig. 3(a) is a diagram showing an example of a reference image in template matching. A template 301 is a partial image (reference image) showing a subject to be tracked, and a pixel pattern of this partial image is treated as a feature. A feature 302 expresses the feature of each coordinate of a plurality of regions in the template 301, and in the first embodiment, the luminance signal of the pixel data is used as the feature. The feature T(i,j) is expressed by Equation (1), where the coordinates in the region of the template 301 are (i,j), the number of horizontal pixels is W, and the number of vertical pixels is H.
T(i,j)={T(0,0), T(1,0), ..., T(W-1,H-1)} ... (1)

図３（ｂ）は、追跡対象を探索する画像の情報を示す図である。３０３は、テンプレートマッチング処理を行う範囲の探索画像である。なお、以下では、探索画像３０３を「入力画像」と表記する場合がある。探索画像３０３における座標は、（ｘ，ｙ）で表現する。部分領域３０４は、テンプレートマッチングの評価値を取得するための領域である。特徴量３０５は、部分領域３０４の特徴量を表現したものであり、テンプレート３０１と同様に画像データの輝度信号を特徴量とする。特徴量Ｓ（ｉ，ｊ）は、部分領域３０４内の座標を（ｉ，ｊ）、水平画素数をＷ、垂直画素数をＨとすると、式（２）で表現される。
Ｓ（ｉ，ｊ）＝｛Ｓ（０，０），Ｓ（１，０），・・・，Ｓ（Ｗ－１，Ｈ－１）｝・・・（２） FIG. 3B is a diagram showing information of an image for searching for a tracking target. 303 is a search image within a range in which template matching processing is performed. In the following, the search image 303 may be referred to as an "input image". Coordinates in the search image 303 are expressed as (x, y). A partial region 304 is a region for acquiring an evaluation value of template matching. A feature amount 305 expresses the feature amount of the partial region 304, and similarly to the template 301, the luminance signal of the image data is used as the feature amount. The feature amount S(i, j) is expressed by Equation (2), where the coordinates in the partial region 304 are (i, j), the number of horizontal pixels is W, and the number of vertical pixels is H.
S(i,j)={S(0,0), S(1,0), ..., S(W-1,H-1)} ... (2)

第１実施形態では、テンプレート３０１と部分領域３０４との相違度を評価する演算方法として、差分絶対値和、いわゆるＳＡＤ（ＳｕｍｏｆＡｂｓｏｌｕｔｅＤｉｆｆｅｒｅｎｃｅ）値が用いられる。ＳＡＤ値は、式（３）により算出される。

In the first embodiment, a sum of absolute differences (SAD) is used as a calculation method for evaluating the degree of difference between the template 301 and the partial region 304. The SAD value is calculated by the following formula (3).

相関度算出回路２０３は、部分領域３０４を探索画像３０３の左上から順に１画素ずつずらしながら、ＳＡＤ値Ｖ（ｘ，ｙ）を演算する。演算されたＶ（ｘ，ｙ）が最小値を示す座標（ｘ，ｙ）がテンプレート３０１と最も似た位置を示す。つまり、最小値を示す位置が、探索画像３０３において、目的とする追跡対象が存在する可能性が高い位置となる。なお、第１実施形態では、特徴量として輝度信号の１次元の情報を用いて説明したが、明度・色相・彩度の信号などの３次元の情報が特徴量として扱われてもよい。また、テンプレートマッチングの評価値の演算方法として、ＳＡＤ値に関して説明したが、正規相互相関いわゆるＮＣＣ（ＮｏｒｍａｌｉｚｅｄＣｏｒｒｅｌａｔｉｏｎＣｏｆｆｉｅｃｉｅｎｔ）などの異なる演算方法が用いられてもよい。 The correlation calculation circuit 203 calculates the SAD value V(x, y) while shifting the partial region 304 one pixel at a time from the top left of the search image 303. The coordinates (x, y) at which the calculated V(x, y) has the minimum value indicate the position most similar to the template 301. In other words, the position showing the minimum value is the position in the search image 303 where the target tracking target is likely to exist. In the first embodiment, one-dimensional information of the luminance signal is used as the feature amount, but three-dimensional information such as lightness, hue, and saturation signals may be treated as the feature amount. In addition, although the SAD value has been described as a method for calculating the evaluation value of template matching, a different calculation method such as normalized cross correlation (NCC) may be used.

テンプレートマッチングによって得られるＳＡＤ値において、最小となるＳＡＤ値が最も相関が高いため、第１実施形態では、相関度は、式（４）のようにＳＡＤ値の逆数により算出される。
相関度（ｘ，ｙ）＝１／Ｖ（ｘ，ｙ）・・・（４） Among the SAD values obtained by template matching, the smallest SAD value has the highest correlation, so in the first embodiment, the degree of correlation is calculated as the reciprocal of the SAD value as in equation (4).
Correlation degree (x, y) = 1 / V (x, y) (4)

図２に戻る。距離算出回路２０４（距離算出手段）は、入力画像内の所定の基準位置との距離を算出する。所定の基準位置は、被写体の追跡を実施した直前の入力画像において、被写体指定部１１０で指定されたあるいは被写体検出回路２０１が特定した被写体領域の位置とする。あるいは、所定の基準位置は、被写体追跡による過去の被写体領域の動き情報に基づいて予測した、現在の入力画像における被写体領域の位置でもよい。ここで、所定の基準位置、つまり被写体領域の位置は、任意に設定可能であり、例えば被写体領域の重心位置であってもよいし、被写体領域の１頂点の位置でもよい。なお、以下では、入力画像内の所定の基準位置との距離を、距離算出回路２０４が求める距離、または単に距離と表記する場合がある。 Returning to FIG. 2, the distance calculation circuit 204 (distance calculation means) calculates the distance to a predetermined reference position in the input image. The predetermined reference position is the position of the subject area specified by the subject designation unit 110 or identified by the subject detection circuit 201 in the input image immediately before subject tracking was performed. Alternatively, the predetermined reference position may be the position of the subject area in the current input image predicted based on past movement information of the subject area by subject tracking. Here, the predetermined reference position, i.e., the position of the subject area, can be set arbitrarily, and may be, for example, the position of the center of gravity of the subject area or the position of one vertex of the subject area. Note that, hereinafter, the distance to the predetermined reference position in the input image may be referred to as the distance calculated by the distance calculation circuit 204, or simply as the distance.

評価値算出回路２０５（評価算出手段）は、評価値を、相関度算出回路２０３が求める相関度と距離算出回路２０４が求める距離の関数値とし、探索画像３０３内の部分領域３０４ごとの評価値を算出する。上述の通り、相関度のみに基づいて評価値が求められると、被写体領域の他に被写体領域（基準画像）と特徴が類似した領域（類似領域）が存在する場合、類似領域を被写体領域と誤検出する可能性がある。特に、基準画像として用いられている被写体領域が登録された時点と、追跡を行う入力画像が撮影された時点とで被写体の見えが変わっている場合には、被写体領域よりも類似領域の方が基準画像との相関度が高くなる場合もある。つまり、類似領域が存在する可能性が高い場合には、相関度算出回路２０３が求める相関度の信頼性は低下する。 The evaluation value calculation circuit 205 (evaluation calculation means) calculates an evaluation value for each partial region 304 in the search image 303 by using the evaluation value as a function value of the correlation calculated by the correlation calculation circuit 203 and the distance calculated by the distance calculation circuit 204. As described above, if the evaluation value is calculated based only on the correlation, when a region (similar region) with similar characteristics to the subject region (reference image) exists in addition to the subject region, the similar region may be erroneously detected as the subject region. In particular, if the appearance of the subject changes between the time when the subject region used as the reference image is registered and the time when the input image to be tracked is captured, the similar region may have a higher correlation with the reference image than the subject region. In other words, when there is a high possibility that a similar region exists, the reliability of the correlation calculated by the correlation calculation circuit 203 decreases.

そのため、第１実施形態では、相関度算出回路２０３で相関度を求めるときのフレームレートを用いて、評価値に距離が寄与する程度が決められる。相関度を求めるときのフレームレートは、撮像素子１０３から出力される撮像のレートでもよいし、撮像素子１０３から逐次供給される入力画像を周期的に間引いたレートとしてもよい。これは、相関度を求めるときのフレームレートを、撮影の条件などによって切り換えるケースを想定したものである。例えば、動画の記録中や静止画の連写中は、撮像のレートと相関度を求めるときのフレームレートは同じであるが、記録はせずに表示器１０８のみに表示画像を表示している撮影条件の場合には、低消費電力の目的で撮像のレートを間引くケースが想定される。 Therefore, in the first embodiment, the degree to which distance contributes to the evaluation value is determined using the frame rate when the correlation calculation circuit 203 calculates the correlation. The frame rate when calculating the correlation may be the imaging rate output from the image sensor 103, or may be a rate at which the input images sequentially supplied from the image sensor 103 are periodically thinned out. This assumes a case in which the frame rate when calculating the correlation is switched depending on the shooting conditions. For example, during recording of a moving image or continuous shooting of still images, the imaging rate and the frame rate when calculating the correlation are the same, but in shooting conditions where no recording is performed and only the display image is displayed on the display 108, it is assumed that the imaging rate is thinned out in order to reduce power consumption.

相関度を求めるときのフレームレートは高くなればなるほど、被写体の移動量は相対的に小さくなるため、評価値に距離が寄与する程度が高められる。これにより、直近の被写体領域からの距離が小さい領域から被写体領域が決定されやすくなり、類似領域が存在する可能性が高い場合でも、誤検出する可能性を低減することができる。これに対して、相関度を求めるときのフレームレートが低い場合は、被写体の移動量は相対的に大きくなるため、評価値に距離が寄与する程度が低くされる。これにより、直近の被写体領域からの距離が大きくても、相関度の高い領域が被写体領域として判定されやすくなり、被写体の動きが速い場合でも、被写体領域を精度よく検出することができる。 The higher the frame rate when calculating the correlation, the smaller the amount of movement of the subject becomes relative to the object, and the greater the contribution of distance to the evaluation value. This makes it easier to determine the object area from an area that is close to the nearest object area, and reduces the possibility of erroneous detection even when there is a high possibility of a similar area existing. In contrast, when the frame rate when calculating the correlation is low, the amount of movement of the subject becomes relatively large, and the greater the contribution of distance to the evaluation value becomes. This makes it easier to determine an area with a high correlation as the object area even if it is far from the nearest object area, and allows the object area to be detected with high accuracy even when the object moves quickly.

このように、第１実施形態では、相関度と直近の被写体領域からの距離の両方を考慮して部分領域３０４の評価値が求められるとともに、相関度を求めるときのフレームレートに応じて、直近の被写体領域からの距離を考慮する度合いを動的に変化させる。そのため、フレームレートが高い場合は、類似領域が存在する場所における誤検出を抑制しながら、速く動く被写体に対しても追跡を可能とすることができる。また、フレームレートが低い場合においては、評価値に距離が寄与する程度が低くされるため、フレームレートが高い場合と同様にして、速く動く被写体に対して追跡を可能とすることができる。 In this way, in the first embodiment, the evaluation value of the partial region 304 is calculated taking into account both the degree of correlation and the distance from the nearest subject region, and the degree to which the distance from the nearest subject region is considered is dynamically changed depending on the frame rate when the degree of correlation is calculated. Therefore, when the frame rate is high, it is possible to track fast-moving subjects while suppressing false detections in locations where similar regions exist. Furthermore, when the frame rate is low, the degree to which distance contributes to the evaluation value is reduced, making it possible to track fast-moving subjects in the same way as when the frame rate is high.

評価値算出回路２０５は、相関度算出回路２０３が求める相関度と距離算出回路２０４が求める距離の関数値として、例えば以下の式（５）によって部分領域３０４ごとの評価値を算出する。
評価値＝相関度×ＧＡＩＮ・・・（５） The evaluation value calculation circuit 205 calculates an evaluation value for each partial region 304 as a function value of the correlation calculated by the correlation calculation circuit 203 and the distance calculated by the distance calculation circuit 204, for example, by the following equation (5).
Evaluation value = correlation degree × GAIN (5)

ここで、ＧＡＩＮは、下記の３条件で算出される。
距離≦αの場合、ＧＡＩＮ＝１．０
α＜距離≦βの場合、ＧＡＩＮ＝１．０－（直近の被写体領域からの距離－α）×γ
β＜距離の場合、ＧＡＩＮ＝０．１ Here, GAIN is calculated under the following three conditions:
If distance≦α, GAIN=1.0
If α<distance≦β, GAIN=1.0-(distance from nearest subject area-α)×γ
If β<distance, then GAIN=0.1

図４は、第１実施形態における距離とゲイン（ＧＡＩＮ）の関係を示す図である。図４では、横軸に距離をとり、縦軸をＧＡＩＮとしている。図４（ａ）は、初期状態の距離とゲイン（ＧＡＩＮ）の関係を示す図である。初期状態では、相関度を求めるときのフレームレートは、６０ｆｐｓを想定している。α、β、γは係数であり、α＝２０、β＝６０、γ＝０．０２２５としている。距離は、被写体領域の位置（ｘ，ｙ）と基準位置（ｂｘ，ｂｙ）の差分値である。距離の単位は、［ｐｉｘ］である。距離≦αの場合は、ＧＡＩＮは１．０としている。初期状態では、被写体はフレーム単位で距離が２０［ｐｉｘ］までは動く可能性があるということを考慮している。距離がα＜距離≦βの場合は、線形的にＧＡＩＮを小さくすることで遠くの被写体の評価値を下げるような処理をしている。距離がβ＜距離の場合は、予期しない高速に移動する被写体の可能性もあるため、ＧＡＩＮは、０にはせず、０．１という下限値をもうけている。 Figure 4 is a diagram showing the relationship between distance and gain (GAIN) in the first embodiment. In Figure 4, the horizontal axis is distance, and the vertical axis is GAIN. Figure 4 (a) is a diagram showing the relationship between distance and gain (GAIN) in the initial state. In the initial state, the frame rate when calculating the correlation is assumed to be 60 fps. α, β, and γ are coefficients, and α = 20, β = 60, and γ = 0.0225. The distance is the difference value between the position (x, y) of the subject area and the reference position (bx, by). The unit of distance is [pix]. If distance ≦ α, GAIN is set to 1.0. In the initial state, it is considered that the subject may move up to a distance of 20 [pix] per frame. If the distance is α < distance ≦ β, the GAIN is linearly reduced to reduce the evaluation value of the distant subject. If the distance is β < distance, there is a possibility that the subject is moving unexpectedly at high speed, so GAIN is not set to 0 but has a lower limit of 0.1.

図４（ｂ）は、相関度を求めるときのフレームレートが高い場合の距離とゲイン（ＧＡＩＮ）の関係を示す図である。図４（ｂ）では、相関度を求めるときのフレームレートは、１２０ｆｐｓを想定している。α＝１０、β＝３０、γ＝０．０４５としている。α＝１０とすることで、図４（ａ）の初期状態と比較して、ＧＡＩＮが１．０になる距離範囲を１／２にしている。また、β＝３０とすることで、図４（ａ）の初期状態と比較して、ＧＡＩＮが０．１になる距離範囲も１／２にしている。これは、フレームレートが高いため、被写体の移動量が相対的に小さくなると考えられるためである。 Figure 4(b) is a diagram showing the relationship between distance and gain (GAIN) when the frame rate when calculating the correlation is high. In Figure 4(b), the frame rate when calculating the correlation is assumed to be 120 fps. α = 10, β = 30, and γ = 0.045. By setting α = 10, the distance range where GAIN is 1.0 is halved compared to the initial state of Figure 4(a). Also, by setting β = 30, the distance range where GAIN is 0.1 is also halved compared to the initial state of Figure 4(a). This is because it is believed that the amount of movement of the subject will be relatively small due to the high frame rate.

図４（ｃ）は、相関度を求めるときのフレームレートが低い場合の距離とゲイン（ＧＡＩＮ）の関係を示す図である。図４（ｃ）では、相関度を求めるときのフレームレートは、３０ｆｐｓを想定している。α＝３０、β＝９０、γ＝０．０１５としている。α＝３０とすることで、図４（ａ）の初期状態と比較して、ＧＡＩＮが１．０になる距離範囲を１．５倍にしている。また、β＝９０とすることで、図４（ａ）の初期状態と比較して、ＧＡＩＮが０．１になる距離範囲も１．５倍にしている。これは、フレームレートが低いため、被写体の移動量が相対的に大きくなると考えられるためである。 Figure 4(c) is a diagram showing the relationship between distance and gain (GAIN) when the frame rate when calculating the correlation is low. In Figure 4(c), the frame rate when calculating the correlation is assumed to be 30 fps. α = 30, β = 90, and γ = 0.015. By setting α = 30, the distance range where GAIN is 1.0 is increased by 1.5 times compared to the initial state of Figure 4(a). Also, by setting β = 90, the distance range where GAIN is 0.1 is also increased by 1.5 times compared to the initial state of Figure 4(a). This is because it is considered that the amount of movement of the subject will be relatively large due to the low frame rate.

なお、相関度を求めるときのフレームレートと係数α、β、γの具体的な関係は、あらかじめ実験的に求めておくことが可能である。評価値算出回路２０５は、フレームレートと係数α、β、γとを対応付けたテーブルを有していてもよいし、フレームレートを代入すれば係数α、β、γが得られる関数式を有していてもよい。 The specific relationship between the frame rate and the coefficients α, β, and γ when calculating the correlation degree can be experimentally determined in advance. The evaluation value calculation circuit 205 may have a table that associates the frame rate with the coefficients α, β, and γ, or may have a function formula that obtains the coefficients α, β, and γ by substituting the frame rate.

図２に戻る。追跡処理制御回路２０６は、ＣＰＵ、ＲＯＭ、およびＲＡＭなどで構成され、被写体追跡処理の制御を行う。具体的には、追跡処理制御回路２０６は、ＲＯＭに記憶されたプログラムをＲＡＭの作業領域に展開して順次実行することで、被写体追跡処理の制御を行う。これにより、被写体検出回路２０１から評価値算出回路２０５では、追跡処理制御回路２０６を介して処理が実行される。追跡処理制御回路２０６（決定手段）は、評価値算出回路２０５が算出する評価値が最も高くなる部分領域３０４を被写体領域に決定する。 Returning to FIG. 2, the tracking process control circuit 206 is composed of a CPU, ROM, RAM, etc., and controls the subject tracking process. Specifically, the tracking process control circuit 206 controls the subject tracking process by expanding a program stored in the ROM into the working area of the RAM and executing it sequentially. As a result, processing is executed via the tracking process control circuit 206 from the subject detection circuit 201 to the evaluation value calculation circuit 205. The tracking process control circuit 206 (determination means) determines the partial area 304 with the highest evaluation value calculated by the evaluation value calculation circuit 205 as the subject area.

図５は、第１実施形態における被写体追跡処理を示すフローチャートである。図５の処理（画像処理装置の制御方法）は、追跡処理制御回路２０６のＣＰＵ（コンピュータ）がＲＯＭに記憶されたプログラムをＲＡＭに展開して実行し、被写体検出回路２０１から評価値算出回路２０５を制御することで実現される。ステップＳ５０１において、被写体検出回路２０１は、フレームｔ＝０における入力画像を読み込み、例えば顔検出処理といった被写体検出処理を行って、被写体領域を抽出し、被写体検出結果を得る。ステップＳ５０２において、追跡処理制御回路２０６は、ステップＳ５０１の被写体検出結果から初期の基準画像を生成し、基準画像登録回路２０２に登録する（登録工程）。 Figure 5 is a flowchart showing the subject tracking process in the first embodiment. The process in Figure 5 (control method of the image processing device) is realized by the CPU (computer) of the tracking process control circuit 206 expanding a program stored in ROM into RAM, executing it, and controlling the subject detection circuit 201 to the evaluation value calculation circuit 205. In step S501, the subject detection circuit 201 reads the input image in frame t = 0, performs subject detection processing such as face detection processing, extracts the subject area, and obtains the subject detection result. In step S502, the tracking process control circuit 206 generates an initial reference image from the subject detection result of step S501, and registers it in the reference image registration circuit 202 (registration process).

ステップＳ５０３において、評価値算出回路２０５（変化手段）は、相関度を求めるときのフレームレートに応じて評価値に距離が寄与する度合いを決定する（変化工程）。具体的には、評価値算出回路２０５は、式（５）のＧＡＩＮを算出するために、係数α、β、γを決定する。ステップＳ５０４において、相関度算出回路２０３は、次のフレームｔ＝１における入力画像を読み込む。さらに、相関度算出回路２０３は、入力画像の部分領域３０４と、フレームｔ＝０の入力画像において登録された基準画像とのテンプレートマッチング処理を行い、基準画像との相関度を算出する（相関算出工程）。ステップＳ５０５において、距離算出回路２０４は、相関度を求めた位置と基準位置との距離を算出する（距離算出工程）。基準位置は、直近に判定された被写体領域の位置（すなわち、基準画像を抽出した入力画像における基準画像の位置）とする。 In step S503, the evaluation value calculation circuit 205 (changing means) determines the degree to which the distance contributes to the evaluation value according to the frame rate when the correlation degree is calculated (changing step). Specifically, the evaluation value calculation circuit 205 determines the coefficients α, β, and γ to calculate GAIN in equation (5). In step S504, the correlation degree calculation circuit 203 reads the input image in the next frame t=1. Furthermore, the correlation degree calculation circuit 203 performs template matching processing between the partial area 304 of the input image and the reference image registered in the input image of frame t=0, and calculates the correlation degree with the reference image (correlation calculation step). In step S505, the distance calculation circuit 204 calculates the distance between the position where the correlation degree is calculated and the reference position (distance calculation step). The reference position is the position of the most recently determined subject area (i.e., the position of the reference image in the input image from which the reference image is extracted).

ステップＳ５０６において、評価値算出回路２０５は、ステップＳ５０４で算出された相関度、ステップＳ５０５で算出された距離、およびステップＳ５０３で決定された係数α、β、γを用い、式（５）に基づいて評価値を算出する（評価算出工程）。ステップＳ５０７において、追跡処理制御回路２０６は、入力画像のうち、評価値が最大となった部分領域３０４に対応する画像を被写体領域として判定し、抽出する（決定工程）。また、追跡処理制御回路２０６は、抽出した画像を基準画像登録回路２０２へ出力する。また、追跡処理制御回路２０６は、判定した被写体領域に関する情報を、制御回路１０６、画像処理回路１０７、距離算出回路２０４へ出力する。ステップＳ５０８において、基準画像登録回路２０２は、ステップＳ５０７で抽出された被写体領域を基準として基準画像を更新する。更新された基準画像は、後続する次のフレームのテンプレートマッチング処理（ステップＳ５０４）において利用される。 In step S506, the evaluation value calculation circuit 205 uses the correlation calculated in step S504, the distance calculated in step S505, and the coefficients α, β, and γ determined in step S503 to calculate an evaluation value based on formula (5) (evaluation calculation step). In step S507, the tracking processing control circuit 206 determines and extracts the image corresponding to the partial area 304 with the maximum evaluation value from the input image as the subject area (determination step). The tracking processing control circuit 206 also outputs the extracted image to the reference image registration circuit 202. The tracking processing control circuit 206 also outputs information about the determined subject area to the control circuit 106, the image processing circuit 107, and the distance calculation circuit 204. In step S508, the reference image registration circuit 202 updates the reference image based on the subject area extracted in step S507. The updated reference image is used in the template matching process (step S504) of the following next frame.

ステップＳ５０９において、追跡処理制御回路２０６は、被写体追跡処理を終了するかどうかを判定する。この判定は、画像処理装置１０１の電源がオフにされたかどうかに基づいて行われる。画像処理装置１０１の電源がオフにされていない場合、つまり追跡処理制御回路２０６が被写体追跡処理を終了しないと判定した場合、処理はステップＳ５０３に戻る。これにより、処理はステップＳ５０３から実行され、被写体追跡処理は継続される。これに対して、画像処理装置１０１の電源がオフにされた場合、つまり追跡処理制御回路２０６が被写体追跡処理を終了すると判定した場合、図５のフローチャートは終了する。 In step S509, the tracking process control circuit 206 determines whether or not to end the subject tracking process. This determination is made based on whether the power supply of the image processing device 101 has been turned off. If the power supply of the image processing device 101 has not been turned off, that is, if the tracking process control circuit 206 has determined not to end the subject tracking process, the process returns to step S503. As a result, the process is executed from step S503, and the subject tracking process continues. On the other hand, if the power supply of the image processing device 101 has been turned off, that is, if the tracking process control circuit 206 has determined to end the subject tracking process, the flowchart in FIG. 5 ends.

このように、画像処理装置１０１は、抽出された被写体領域に基づいて基準画像を順次更新していくことで、被写体の向きが変化するなど時系列的に被写体の見えが変化する場合においても、適切に被写体追跡を行うことができる。もっとも、画像処理装置１０１は、時系列的な被写体の見えの変化を考慮しない場合などは、基準画像を更新せず、初期に登録された基準画像を維持してもよい。 In this way, the image processing device 101 can appropriately track the subject even when the subject's appearance changes over time, such as when the subject's orientation changes, by sequentially updating the reference image based on the extracted subject area. However, in cases where the image processing device 101 does not take into account changes in the subject's appearance over time, it may maintain the initially registered reference image without updating the reference image.

なお、第１実施形態では、説明および理解を容易にするため、被写体追跡処理の各処理ステップが直列的に実行されるように説明したが、並行して処理可能な処理ステップは同時に実行されてもよい。例えば、係数α、β、γを決定する処理（ステップＳ５０３）と、相関度および距離を算出する処理（ステップＳ５０４～Ｓ５０５）は並列処理されてもよい。 In the first embodiment, for ease of explanation and understanding, the processing steps of the subject tracking process are described as being executed serially, but processing steps that can be executed in parallel may be executed simultaneously. For example, the process of determining the coefficients α, β, and γ (step S503) and the processes of calculating the correlation degree and distance (steps S504 to S505) may be executed in parallel.

以上説明したように、第１実施形態によれば、被写体追跡装置としても機能する画像処理装置１０１は、追跡すべき被写体を表す基準画像と、入力画像との相関度とを用いて被写体を追跡する。その際、画像処理装置１０１は、相関度に加え、直近に判定された被写体領域からの距離を加味した評価値を部分領域３０４ごとに算出する。さらに、画像処理装置１０１は、相関度を求めるときのフレームレートが高い場合、評価値に距離が寄与する割合を大きくし、距離が短い部分領域３０４が被写体領域と判定されやすくする。また、画像処理装置１０１は、相関度を求めるときのフレームレートが低い場合、評価値に距離が寄与する割合を小さくし、相関度の高い部分領域３０４が被写体領域と判定されやすくする。そのため、画像処理装置１０１では、被写体の動きが速い場合にも精度よく被写体を検出することができ、安定した被写体追跡が可能となる。このようにして、画像処理装置１０１は、被写体追跡の精度を向上させる。 As described above, according to the first embodiment, the image processing device 101, which also functions as a subject tracking device, tracks the subject using a reference image representing the subject to be tracked and the degree of correlation with the input image. At that time, the image processing device 101 calculates an evaluation value for each partial region 304 that takes into account the degree of correlation as well as the distance from the most recently determined subject region. Furthermore, when the frame rate when calculating the degree of correlation is high, the image processing device 101 increases the proportion of distance that contributes to the evaluation value, making it easier to determine a partial region 304 with a short distance as a subject region. Furthermore, when the frame rate when calculating the degree of correlation is low, the image processing device 101 reduces the proportion of distance that contributes to the evaluation value, making it easier to determine a partial region 304 with a high degree of correlation as a subject region. Therefore, the image processing device 101 can accurately detect the subject even when the subject moves quickly, and can stably track the subject. In this way, the image processing device 101 improves the accuracy of subject tracking.

＜第２実施形態＞
以下、図６および図７を参照して、第２実施形態について説明する。ここでは、第１実施形態との差異を中心に説明する。なお、第１実施形態で説明した、図１の画像処理装置１０１の概略構成、図２の被写体追跡回路１１１は、第２実施形態に係わる画像処理装置１０１においても同様である。 Second Embodiment
The second embodiment will be described below with reference to Fig. 6 and Fig. 7. Here, the differences from the first embodiment will be mainly described. Note that the schematic configuration of the image processing device 101 in Fig. 1 and the subject tracking circuit 111 in Fig. 2 described in the first embodiment are also the same in the image processing device 101 according to the second embodiment.

第２実施形態における被写体検出回路２０１は、既知の機械学習で獲得した辞書データを用いて被写体の検出を行う被写体検出回路である。被写体検出回路２０１が機械学習で獲得した辞書データを用いて被写体の検出を行うために画像中の被写体を学習、認識する際は、深層学習と呼ばれる手法が使用される。深層学習の代表的な手法としては、コンボリューショナル・ニューラル・ネットワーク（以下、ＣＮＮと記す）と呼ばれる手法がある。一般的なＣＮＮは、多段階の演算からなる。ＣＮＮは、各段階において、畳み込み演算を行って画像の局所の特徴を空間的に統合し、次の段階の中間層のニューロンへ入力する。さらに、ＣＮＮは、プーリングやサブサンプリングと呼ばれる、特徴量を空間方向へ圧縮する操作を行う。ＣＮＮは、このような多段階の特徴変換を通じて複雑な特徴表現を獲得することができる。そのため、ＣＮＮは、特徴量に基づいて画像中の被写体のカテゴリ認識や被写体検出を高精度に行うことができる。ＣＮＮに代表される機械学習では、画像信号と教師信号がセットとして学習される。学習の結果、被写体検出の処理パラメータである辞書データが生成される。辞書データは、例えば、人物や、車、飛行機などの乗り物、犬、鳥などの動物など種々存在する。ＣＮＮは、辞書データを他の辞書データに切り替えることで、目的とする被写体を検出する。 The subject detection circuit 201 in the second embodiment is a subject detection circuit that detects subjects using dictionary data acquired by known machine learning. When the subject detection circuit 201 learns and recognizes subjects in an image in order to detect subjects using dictionary data acquired by machine learning, a method called deep learning is used. A representative method of deep learning is a method called convolutional neural network (hereinafter referred to as CNN). A typical CNN consists of multi-stage calculations. In each stage, a CNN performs a convolution calculation to spatially integrate local features of an image and input them to the neurons in the intermediate layer of the next stage. Furthermore, a CNN performs an operation called pooling or subsampling to compress features in the spatial direction. A CNN can acquire complex feature expressions through such multi-stage feature transformation. Therefore, a CNN can perform highly accurate subject category recognition and subject detection of an image based on features. In machine learning represented by a CNN, an image signal and a teacher signal are learned as a set. As a result of learning, dictionary data, which is a processing parameter for subject detection, is generated. There are various types of dictionary data, such as people, vehicles such as cars and airplanes, and animals such as dogs and birds. CNN detects the target subject by switching the dictionary data to other dictionary data.

図６は、第２実施形態における距離とゲイン（ＧＡＩＮ）の関係を示す図である。図６（ａ）は、初期状態の距離とゲイン（ＧＡＩＮ）の関係を示す図である。初期状態では、検出した被写体が人の場合を想定している。図６（ａ）は、図４（ａ）と同じであるため、説明は省略する。図６（ｂ）は、検出した被写体が車や飛行機の場合の距離とゲイン（ＧＡＩＮ）の関係を示す図である。α＝３０、β＝９０、γ＝０．０１５としている。α＝３０とすることで、図６（ａ）の初期状態と比較して、ＧＡＩＮが１．０になる距離範囲を１．５倍にしている。また、β＝９０とすることで、図６（ａ）の初期状態と比較して、ＧＡＩＮが０．１になる距離範囲も１．５倍にしている。これは、車や飛行機という被写体が人と比較して相対的に高速に移動することが考えられるためである。 Figure 6 is a diagram showing the relationship between distance and gain (GAIN) in the second embodiment. Figure 6 (a) is a diagram showing the relationship between distance and gain (GAIN) in the initial state. In the initial state, it is assumed that the detected subject is a person. Since Figure 6 (a) is the same as Figure 4 (a), the description will be omitted. Figure 6 (b) is a diagram showing the relationship between distance and gain (GAIN) when the detected subject is a car or an airplane. α = 30, β = 90, γ = 0.015. By setting α = 30, the distance range where GAIN is 1.0 is 1.5 times larger than the initial state of Figure 6 (a). Also, by setting β = 90, the distance range where GAIN is 0.1 is 1.5 times larger than the initial state of Figure 6 (a). This is because it is considered that subjects such as cars and airplanes move relatively faster than people.

なお、検出した被写体と係数α、β、γの具体的な関係は、あらかじめ実験的に求めておくことが可能である。評価値算出回路２０５は、検出した被写体と係数α、β、γとを対応付けたテーブルを有していてもよいし、検出した被写体に紐づく値を代入すれば係数α、β、γが得られる関数式を有していてもよい。 The specific relationship between the detected object and the coefficients α, β, and γ can be experimentally determined in advance. The evaluation value calculation circuit 205 may have a table that associates the detected object with the coefficients α, β, and γ, or may have a function formula that obtains the coefficients α, β, and γ by substituting values associated with the detected object.

図７は、第２実施形態における被写体追跡処理を示すフローチャートである。図７の処理（画像処理装置の制御方法）は、追跡処理制御回路２０６のＣＰＵ（コンピュータ）がＲＯＭに記憶されたプログラムをＲＡＭに展開して実行し、被写体検出回路２０１から評価値算出回路２０５を制御することで実現される。ステップＳ７０１において、被写体検出回路２０１（検出手段）は、フレームｔ＝０における入力画像を読み込み、上述したように機械学習に基づく辞書データを用いて被写体およびその種類の検出を行い、被写体領域を抽出し、被写体検出結果を得る。ステップＳ７０２において、追跡処理制御回路２０６は、ステップＳ７０１の被写体検出結果から初期の基準画像を生成し、基準画像登録回路２０２に登録する（登録工程）。 Figure 7 is a flowchart showing the subject tracking process in the second embodiment. The process in Figure 7 (control method of the image processing device) is realized by the CPU (computer) of the tracking process control circuit 206 expanding a program stored in ROM to RAM, executing it, and controlling the subject detection circuit 201 to the evaluation value calculation circuit 205. In step S701, the subject detection circuit 201 (detection means) reads the input image in frame t = 0, detects the subject and its type using dictionary data based on machine learning as described above, extracts the subject area, and obtains the subject detection result. In step S702, the tracking process control circuit 206 generates an initial reference image from the subject detection result of step S701 and registers it in the reference image registration circuit 202 (registration process).

ステップＳ７０３において、評価値算出回路２０５（変化手段）は、ステップＳ７０１で検出した被写体の種類に応じて評価値に距離が寄与する度合いを決定する（変化工程）。具体的には、評価値算出回路２０５は、式（５）のＧＡＩＮを算出するために、係数α、β、γを決定する。ステップＳ７０４において、相関度算出回路２０３は、次のフレームｔ＝１における入力画像を読み込む。さらに、相関度算出回路２０３は、入力画像の部分領域３０４と、フレームｔ＝０の入力画像において登録された基準画像とのテンプレートマッチング処理を行い、基準画像との相関度を算出する（相関算出工程）。ステップＳ７０５において、距離算出回路２０４は、相関度を求めた位置と基準位置との距離を算出する（距離算出工程）。基準位置は、直近に判定された被写体領域の位置（すなわち、基準画像を抽出した入力画像における基準画像の位置）とする。 In step S703, the evaluation value calculation circuit 205 (changing means) determines the degree to which the distance contributes to the evaluation value according to the type of subject detected in step S701 (changing process). Specifically, the evaluation value calculation circuit 205 determines the coefficients α, β, and γ to calculate GAIN in equation (5). In step S704, the correlation calculation circuit 203 reads the input image in the next frame t=1. Furthermore, the correlation calculation circuit 203 performs template matching processing between the partial area 304 of the input image and the reference image registered in the input image of frame t=0, and calculates the correlation with the reference image (correlation calculation process). In step S705, the distance calculation circuit 204 calculates the distance between the position where the correlation was calculated and the reference position (distance calculation process). The reference position is the position of the subject area determined most recently (i.e., the position of the reference image in the input image from which the reference image was extracted).

ステップＳ７０６において、評価値算出回路２０５は、ステップＳ７０４で算出された相関度、ステップＳ７０５で算出された距離、およびステップＳ７０３で決定された係数α、β、γを用い、式（５）に基づいて評価値を算出する（評価算出工程）。ステップＳ７０７において、追跡処理制御回路２０６は、入力画像のうち、評価値が最大となった部分領域３０４に対応する画像を被写体領域として判定し、抽出する（決定工程）。また、追跡処理制御回路２０６は、抽出した画像を基準画像登録回路２０２へ出力する。また、追跡処理制御回路２０６は、判定した被写体領域に関する情報を、制御回路１０６、画像処理回路１０７、距離算出回路２０４へ出力する。ステップＳ７０８において、基準画像登録回路２０２は、ステップＳ７０７で抽出された被写体領域を基準として基準画像を更新する。更新された基準画像は、後続する次のフレームのテンプレートマッチング処理（ステップＳ７０４）において利用される。 In step S706, the evaluation value calculation circuit 205 uses the correlation calculated in step S704, the distance calculated in step S705, and the coefficients α, β, and γ determined in step S703 to calculate an evaluation value based on formula (5) (evaluation calculation step). In step S707, the tracking processing control circuit 206 determines and extracts the image corresponding to the partial area 304 with the maximum evaluation value from the input image as the subject area (determination step). The tracking processing control circuit 206 also outputs the extracted image to the reference image registration circuit 202. The tracking processing control circuit 206 also outputs information about the determined subject area to the control circuit 106, the image processing circuit 107, and the distance calculation circuit 204. In step S708, the reference image registration circuit 202 updates the reference image based on the subject area extracted in step S707. The updated reference image is used in the template matching process (step S704) of the following next frame.

ステップＳ７０９において、追跡処理制御回路２０６は、被写体追跡処理を終了するかどうかを判定する。この判定は、画像処理装置１０１の電源がオフにされたかどうかに基づいて行われる。画像処理装置１０１の電源がオフにされていない場合、つまり追跡処理制御回路２０６が被写体追跡処理を終了しないと判定した場合、処理はステップＳ７０３に戻る。これにより、処理はステップＳ７０３から実行され、被写体追跡処理は継続される。これに対して、画像処理装置１０１の電源がオフにされた場合、つまり追跡処理制御回路２０６が被写体追跡処理を終了すると判定した場合、図７のフローチャートは終了する。 In step S709, the tracking process control circuit 206 determines whether or not to end the subject tracking process. This determination is made based on whether the power supply of the image processing device 101 has been turned off. If the power supply of the image processing device 101 has not been turned off, that is, if the tracking process control circuit 206 has determined not to end the subject tracking process, the process returns to step S703. As a result, the process is executed from step S703, and the subject tracking process continues. On the other hand, if the power supply of the image processing device 101 has been turned off, that is, if the tracking process control circuit 206 has determined to end the subject tracking process, the flowchart in FIG. 7 ends.

なお、第２実施形態では、説明および理解を容易にするため、被写体追跡処理の各処理ステップが直列的に実行されるように説明したが、並行して処理可能な処理ステップは同時に実行してもよい。例えば、係数α、β、γを決定する処理（ステップＳ７０３）と、相関度および距離を算出する処理（ステップＳ７０４～Ｓ７０５）は並列処理されてもよい。 In the second embodiment, for ease of explanation and understanding, the processing steps of the subject tracking process are described as being executed serially, but processing steps that can be executed in parallel may be executed simultaneously. For example, the process of determining the coefficients α, β, and γ (step S703) and the processes of calculating the correlation degree and distance (steps S704 to S705) may be executed in parallel.

以上説明したように、第２実施形態によれば、被写体追跡装置としても機能する画像処理装置１０１は、追跡すべき被写体を表す基準画像と、入力画像との相関度とを用いて被写体を追跡する。その際、画像処理装置１０１は、相関度に加え、直近に判定された被写体領域からの距離を加味した評価値を部分領域３０４ごとに算出する。さらに、画像処理装置１０１は、機械学習に基づく辞書データを用いた被写体およびその種類の検出を行い、検出された被写体が例えば人の場合、評価値に距離が寄与する割合を大きくし、距離が短い部分領域３０４が被写体領域と判定されやすくする。また、画像処理装置１０１は、検出された被写体が例えば車や飛行機の場合、評価値に距離が寄与する割合を小さくし、相関度の高い部分領域３０４が被写体領域と判定されやすくする。そのため、画像処理装置１０１では、被写体の動きが速い場合にも精度よく被写体を検出することができ、安定した被写体追跡が可能となる。このようにして、画像処理装置１０１は、被写体追跡の精度を向上させる。 As described above, according to the second embodiment, the image processing device 101, which also functions as a subject tracking device, tracks the subject using a reference image representing the subject to be tracked and the degree of correlation with the input image. At that time, the image processing device 101 calculates an evaluation value for each partial region 304 that takes into account the degree of correlation as well as the distance from the most recently determined subject region. Furthermore, the image processing device 101 detects the subject and its type using dictionary data based on machine learning, and when the detected subject is, for example, a person, the image processing device 101 increases the proportion of distance that contributes to the evaluation value, making it easier to determine a partial region 304 with a short distance as the subject region. Furthermore, when the detected subject is, for example, a car or an airplane, the image processing device 101 reduces the proportion of distance that contributes to the evaluation value, making it easier to determine a partial region 304 with a high degree of correlation as the subject region. Therefore, the image processing device 101 can accurately detect the subject even when the subject moves quickly, and can stably track the subject. In this way, the image processing device 101 improves the accuracy of subject tracking.

＜第３の実施形態＞
以下、図８および図９を参照して、第３実施形態について説明する。ここでは、第１実施形態との差異を中心に説明する。なお、第１実施形態で説明した、図１の画像処理装置１０１の概略構成は、第３実施形態に係わる画像処理装置１０１においても同様である。第３実施形態において、画像処理装置１０１は、過去の被写体の速度を記憶しておき、その被写体の速度に応じて評価値に距離が寄与する度合いを決定する。 Third Embodiment
The third embodiment will be described below with reference to Fig. 8 and Fig. 9. The differences from the first embodiment will be mainly described. The schematic configuration of the image processing device 101 in Fig. 1 described in the first embodiment is also the same in the image processing device 101 according to the third embodiment. In the third embodiment, the image processing device 101 stores the past speed of the subject, and determines the degree to which the distance contributes to the evaluation value according to the speed of the subject.

図８は、被写体追跡回路１１１のブロック図である。被写体追跡回路１１１は、被写体検出回路８０１、基準画像登録回路８０２、相関度算出回路８０３、距離算出回路８０４、評価値算出回路８０５、追跡処理制御回路８０６、および速度記憶回路８０７により構成される。ここで、８０１から８０６は、図２の２０１から２０６と同じであるため、説明は省略する。速度記憶回路８０７（速度記憶手段）は、直前に決定された被写体の速度を記憶しておく回路である。なお、被写体の速度は、例えば、現在のフレームをｎとすると、ｎ－２の被写体領域（基準位置）からｎ－１の被写体領域（基準位置）の差である距離と、フレームレートから算出することができる。 Figure 8 is a block diagram of the object tracking circuit 111. The object tracking circuit 111 is composed of an object detection circuit 801, a reference image registration circuit 802, a correlation calculation circuit 803, a distance calculation circuit 804, an evaluation value calculation circuit 805, a tracking processing control circuit 806, and a speed memory circuit 807. Here, 801 to 806 are the same as 201 to 206 in Figure 2, so the explanation will be omitted. The speed memory circuit 807 (speed memory means) is a circuit that stores the speed of the object determined immediately before. Note that the speed of the object can be calculated from the distance, which is the difference between the object area (reference position) of n-2 and the object area (reference position) of n-1, and the frame rate, for example, if the current frame is n.

また、第２実施形態における距離とゲイン（ＧＡＩＮ）の関係は、第３実施形態における距離とゲイン（ＧＡＩＮ）の関係に相当する。つまり、図６（ａ）は、算出した被写体の速度が小さい場合の距離とゲイン（ＧＡＩＮ）の関係を示す図に相当する。図６（ｂ）は、算出した被写体の速度が大きい場合の距離とゲイン（ＧＡＩＮ）の関係を示す図に相当する。 The relationship between distance and gain (GAIN) in the second embodiment corresponds to the relationship between distance and gain (GAIN) in the third embodiment. That is, FIG. 6(a) corresponds to a diagram showing the relationship between distance and gain (GAIN) when the calculated speed of the subject is low. FIG. 6(b) corresponds to a diagram showing the relationship between distance and gain (GAIN) when the calculated speed of the subject is high.

なお、算出した被写体の速度と係数α、β、γの具体的な関係は、あらかじめ実験的に求めておくことが可能である。評価値算出回路２０５は、算出した被写体の速度と係数α、β、γとを対応付けたテーブルを有していてもよいし、算出した被写体の速度に紐づく値を代入すれば係数α、β、γが得られる関数式を有していてもよい。 The specific relationship between the calculated subject speed and the coefficients α, β, and γ can be experimentally determined in advance. The evaluation value calculation circuit 205 may have a table that associates the calculated subject speed with the coefficients α, β, and γ, or may have a function formula that obtains the coefficients α, β, and γ by substituting a value associated with the calculated subject speed.

図９は、第３実施形態における被写体追跡処理を示すフローチャートである。図９の処理（画像処理装置の制御方法）は、追跡処理制御回路８０６のＣＰＵ（コンピュータ）がＲＯＭに記憶されたプログラムをＲＡＭに展開して実行し、被写体検出回路８０１から評価値算出回路８０５、速度記憶回路８０７を制御することで実現される。ステップＳ９０１において、被写体検出回路８０１は、フレームｔ＝０における入力画像を読み込み、例えば顔検出処理といった被写体検出処理を行って、被写体領域を抽出し、被写体検出結果を得る。ステップＳ９０２において、追跡処理制御回路８０６は、ステップＳ９０１の被写体検出結果から初期の基準画像を生成し、基準画像登録回路８０２に登録する（登録工程）。 Figure 9 is a flowchart showing the subject tracking process in the third embodiment. The process in Figure 9 (control method of the image processing device) is realized by the CPU (computer) of the tracking process control circuit 806 expanding a program stored in ROM into RAM and executing it, and controlling the evaluation value calculation circuit 805 and the speed storage circuit 807 from the subject detection circuit 801. In step S901, the subject detection circuit 801 reads the input image in frame t = 0, performs subject detection processing such as face detection processing, extracts the subject area, and obtains the subject detection result. In step S902, the tracking process control circuit 806 generates an initial reference image from the subject detection result of step S901 and registers it in the reference image registration circuit 802 (registration process).

ステップＳ９０３において、評価値算出回路８０５（変化手段）は、速度記憶回路８０７に記憶された被写体の速度に応じて、評価値に距離が寄与する度合いを決定する（変化工程）。具体的には、評価値算出回路８０５は、式（５）のＧＡＩＮを算出するために、係数α、β、γを決定する。なお、被写体の速度は、上述したように、現在のフレームをｎとすると、ｎ－２の被写体領域（基準位置）からｎ－１の被写体領域（基準位置）の差である距離と、フレームレートとから算出される。但し、ｔ＝１のときの、被写体の速度は、所定値（例えば、０）とされる。ステップＳ９０４において、相関度算出回路８０３は、次のフレームｔ＝１における入力画像を読み込む。さらに、相関度算出回路８０３は、入力画像の部分領域３０４と、フレームｔ＝０の入力画像において登録された基準画像とのテンプレートマッチング処理を行い、基準画像との相関度を算出する（相関算出工程）。 In step S903, the evaluation value calculation circuit 805 (changing means) determines the degree to which the distance contributes to the evaluation value according to the speed of the subject stored in the speed memory circuit 807 (changing process). Specifically, the evaluation value calculation circuit 805 determines the coefficients α, β, and γ to calculate GAIN in equation (5). As described above, the speed of the subject is calculated from the distance, which is the difference between the n-2 subject area (reference position) and the n-1 subject area (reference position), and the frame rate, assuming that the current frame is n. However, the speed of the subject at t=1 is set to a predetermined value (for example, 0). In step S904, the correlation calculation circuit 803 reads the input image in the next frame t=1. Furthermore, the correlation calculation circuit 803 performs template matching processing between the partial area 304 of the input image and the reference image registered in the input image of frame t=0, and calculates the correlation with the reference image (correlation calculation process).

ステップＳ９０５において、距離算出回路８０４は、相関度を求めた位置と基準位置との距離を算出する（距離算出工程）。基準位置は、直近に判定された被写体領域の位置（すなわち、基準画像を抽出した入力画像における基準画像の位置）とする。ステップＳ９０６において、評価値算出回路８０５は、ステップＳ９０４で算出された相関度、ステップＳ９０５で算出された距離、およびステップＳ９０３で決定された係数α、β、γを用い、式（５）に基づいて評価値を算出する（評価算出工程）。ステップＳ９０７において、追跡処理制御回路８０６は、入力画像のうち、評価値が最大となった部分領域３０４に対応する画像を被写体領域として判定し、抽出する（決定工程）。また、追跡処理制御回路８０６は、抽出した画像を基準画像登録回路８０２へ出力する。また、追跡処理制御回路８０６は、判定した被写体領域に関する情報を、制御回路１０６、画像処理回路１０７、距離算出回路８０４へ出力する。 In step S905, the distance calculation circuit 804 calculates the distance between the position where the correlation degree is calculated and the reference position (distance calculation process). The reference position is the position of the most recently determined object area (i.e., the position of the reference image in the input image from which the reference image is extracted). In step S906, the evaluation value calculation circuit 805 uses the correlation degree calculated in step S904, the distance calculated in step S905, and the coefficients α, β, and γ determined in step S903 to calculate an evaluation value based on equation (5) (evaluation calculation process). In step S907, the tracking processing control circuit 806 determines and extracts the image corresponding to the partial area 304 with the maximum evaluation value from among the input images as the object area (determination process). The tracking processing control circuit 806 also outputs the extracted image to the reference image registration circuit 802. The tracking processing control circuit 806 also outputs information regarding the determined object area to the control circuit 106, the image processing circuit 107, and the distance calculation circuit 804.

ステップＳ９０８において、追跡処理制御回路８０６は、ステップＳ９０７の判定、抽出の対象とされた被写体領域の速度、つまり被写体の速度を算出し、速度記憶回路８０７に記憶する。記憶された被写体の速度は、後続する次のフレームのステップＳ９０３の処理において利用することができる。ステップＳ９０９において、基準画像登録回路８０２は、ステップＳ９０７で抽出された被写体領域を基準として基準画像を更新する。更新された基準画像は、後続する次のフレームのテンプレートマッチング処理（ステップＳ９０４）において利用される。 In step S908, the tracking processing control circuit 806 calculates the speed of the subject region determined and extracted in step S907, i.e., the speed of the subject, and stores it in the speed memory circuit 807. The stored subject speed can be used in the processing of step S903 of the following frame. In step S909, the reference image registration circuit 802 updates the reference image using the subject region extracted in step S907 as a reference. The updated reference image is used in the template matching processing (step S904) of the following frame.

ステップＳ９１０において、追跡処理制御回路８０６は、被写体追跡処理を終了するかどうかを判定する。この判定は、画像処理装置１０１の電源がオフにされたかどうかに基づいて行われる。画像処理装置１０１の電源がオフにされていない場合、つまり追跡処理制御回路８０６が被写体追跡処理を終了しないと判定した場合、処理はステップＳ９０３に戻る。これにより、処理はステップＳ９０３から実行され、被写体追跡処理は継続される。これに対して、画像処理装置１０１の電源がオフにされた場合、つまり追跡処理制御回路８０６が被写体追跡処理を終了すると判定した場合、図９のフローチャートは終了する。 In step S910, the tracking process control circuit 806 determines whether or not to end the subject tracking process. This determination is made based on whether the power supply of the image processing device 101 has been turned off. If the power supply of the image processing device 101 has not been turned off, that is, if the tracking process control circuit 806 has determined not to end the subject tracking process, the process returns to step S903. As a result, the process is executed from step S903, and the subject tracking process continues. On the other hand, if the power supply of the image processing device 101 has been turned off, that is, if the tracking process control circuit 806 has determined to end the subject tracking process, the flowchart in FIG. 9 ends.

なお、第３実施形態では、説明および理解を容易にするため、被写体追跡処理の各処理ステップが直列的に実行されるように説明したが、並行して処理可能な処理ステップは同時に実行してもよい。例えば、係数α、β、γを決定する処理（ステップＳ９０３）と、相関度および距離を算出する処理（ステップＳ９０４～Ｓ９０５）は並列処理されてもよい。 In the third embodiment, for ease of explanation and understanding, the processing steps of the subject tracking process are described as being executed serially, but processing steps that can be executed in parallel may be executed simultaneously. For example, the process of determining the coefficients α, β, and γ (step S903) and the processes of calculating the correlation degree and distance (steps S904 to S905) may be executed in parallel.

以上説明したように、第３実施形態によれば、被写体追跡装置としても機能する画像処理装置１０１は、追跡すべき被写体を表す基準画像と、入力画像との相関度とを用いて被写体を追跡する。その際、画像処理装置１０１は、相関度に加え、直近に判定された被写体領域からの距離を加味した評価値を部分領域３０４ごとに算出する。さらに、画像処理装置１０１は、過去の被写体の速度を記憶しておき、その速度が小さい場合、評価値に距離が寄与する割合を大きくし、距離が短い部分領域３０４が被写体領域と判定されやすくする。また、画像処理装置１０１は、過去の速度が大きい場合、評価値に距離が寄与する割合を小さくし、相関度の高い部分領域３０４が被写体領域と判定されやすくする。そのため、画像処理装置１０１では、被写体の動きが速い場合にも精度よく被写体を検出することができ、安定した被写体追跡が可能となる。このようにして、画像処理装置１０１は、被写体追跡の精度を向上させる。 As described above, according to the third embodiment, the image processing device 101, which also functions as a subject tracking device, tracks the subject using a reference image representing the subject to be tracked and the degree of correlation with the input image. At that time, the image processing device 101 calculates an evaluation value for each partial region 304 that takes into account the degree of correlation as well as the distance from the most recently determined subject region. Furthermore, the image processing device 101 stores the past speed of the subject, and when the speed is small, the image processing device 101 increases the proportion of the distance that contributes to the evaluation value, making it easier to determine a partial region 304 with a short distance as the subject region. Also, when the past speed is large, the image processing device 101 decreases the proportion of the distance that contributes to the evaluation value, making it easier to determine a partial region 304 with a high degree of correlation as the subject region. Therefore, the image processing device 101 can accurately detect the subject even when the subject moves quickly, and stable subject tracking is possible. In this way, the image processing device 101 improves the accuracy of subject tracking.

なお、入力画像で被写体が占める割合（以下、「被写体の面積」という）が大きい場合、例えば、被写体が拡大されて撮影された場合、フレーム間で被写体が大きく移動する可能性があるので、第３実施形態における被写体の速度と同様に考えることができる。つまり、第３実施形態における被写体の速度の大小は、被写体の面積の大小を含む概念として扱うことが可能である。 Note that if the proportion of the subject in the input image (hereinafter referred to as "subject area") is large, for example if the subject is photographed enlarged, the subject may move significantly between frames, and this can be considered similar to the subject speed in the third embodiment. In other words, the speed of the subject in the third embodiment can be treated as a concept that includes the area of the subject.

＜他の実施形態＞
以上、本発明の好ましい実施形態について説明したが、本発明は上述した各実施形態に限定されず、その要旨の範囲内で種々の変形および変更が可能である。本発明は、上述の各実施形態の１以上の機能を実現するプログラムを、ネットワークや記録媒体を介してシステムや装置に供給し、そのシステムまたは装置のコンピュータの１つ以上のプロセッサがプログラムを読み出して実行する処理でも実現可能である。また、本発明は、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other embodiments>
Although the preferred embodiments of the present invention have been described above, the present invention is not limited to the above-mentioned embodiments, and various modifications and changes are possible within the scope of the gist of the present invention. The present invention can also be realized by supplying a program that realizes one or more functions of the above-mentioned embodiments to a system or device via a network or a recording medium, and having one or more processors of a computer in the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., ASIC) that realizes one or more functions.

また、図中の機能ブロックは、ハードウェア、ソフトウェア又はそれらの組み合わせによって実現可能であるが、機能ブロックとそれを実現する構成とは１対１に対応する必要はない。複数の機能ブロックを１つのソフトウェア又はハードウェアモジュールで実現してもよい。 Furthermore, the functional blocks in the diagram can be realized by hardware, software, or a combination of both, but there is no need for a one-to-one correspondence between the functional blocks and the configuration that realizes them. Multiple functional blocks may be realized by a single software or hardware module.

上述の各実施形態では、被写体追跡の一例として画像処理装置１０１を説明した。しかし、上述の通り、本発明は画像処理装置以外にも多様な機器に適用可能である。例えば、本発明が画像データの再生表示装置に適用された場合、画像データの再生表示装置は、画像データ中の被写体領域の情報（画像中の被写体の位置、大きさなど）を用いて、画像データの再生条件や表示条件を設定するといった応用が可能である。具体的には、画像データの再生表示装置は、画像中の被写体の位置に枠などの被写体を示す情報の重畳表示や、被写体部分の輝度や色情報に応じて、被写体部分が適切に表示されるように、輝度や色合いなどの表示条件を制御することができる。 In each of the above-described embodiments, the image processing device 101 has been described as an example of subject tracking. However, as described above, the present invention can be applied to a variety of devices other than image processing devices. For example, when the present invention is applied to a device for reproducing and displaying image data, the device for reproducing and displaying image data can use information on the subject area in the image data (such as the position and size of the subject in the image) to set the reproduction conditions and display conditions of the image data. Specifically, the device for reproducing and displaying image data can superimpose information indicating the subject, such as a frame, on the position of the subject in the image, and control display conditions such as brightness and color so that the subject portion is appropriately displayed according to the brightness and color information of the subject portion.

各実施形態の開示は、以下の構成、方法およびプログラムを含む。
（構成１）逐次供給される複数の入力画像に亘って特定の被写体を追跡する画像処理装置であって、
前記特定の被写体に対応する基準画像を登録する登録手段と、
前記入力画像に設定される複数の部分領域の各々について前記基準画像との相関度を求める相関算出手段と、
前記複数の部分領域の各々について前記入力画像内の所定の基準位置からの距離を算出する距離算出手段と、
前記複数の部分領域の各々について前記相関度と前記距離とを用いて評価値を算出する評価算出手段と、
前記評価値に基づいて前記複数の部分領域の各々のうち前記特定の被写体を含む領域を決定する決定手段と、
前記評価算出手段が前記評価値を算出する際に前記距離が前記評価値に寄与する度合いを、前記相関算出手段で前記相関度を求めるときのフレームレート、前記特定の被写体の種類、または前記特定の被写体の速度に応じて変化させる変化手段と、を備えることを特徴とする画像処理装置。
（構成２）前記変化手段は、前記逐次供給される前記入力画像のレートを、前記相関算出手段で前記相関度を求めるときのフレームレートとして扱うことを特徴とする構成１に記載の画像処理装置。
（構成３）前記変化手段は、前記逐次供給される前記入力画像を周期的に間引いたレートを、前記相関算出手段で前記相関度を求めるときのフレームレートとして扱うことを特徴とする構成１に記載の画像処理装置。
（構成４）機械学習によって獲得された辞書データを参照することによって、前記特定の被写体および前記特定の被写体の種類を初期の前記入力画像から検出する検出手段を備えることを特徴とする構成１に記載の画像処理装置。
（構成５）前記入力画像における前記特定の被写体の速度を記憶する速度記憶手段を備え、
前記変化手段は、後続して入力された前記入力画像に設定される前記複数の部分領域の各々について前記評価算出手段が前記評価値を算出する際に前記距離が前記評価値に寄与する度合いを、前記速度記憶手段で記憶された前記特定の被写体の速度に応じて変化させることを特徴とする構成１に記載の画像処理装置。
（方法１）逐次供給される複数の入力画像に亘って特定の被写体を追跡する画像処理装置の制御方法であって、
前記特定の被写体に対応する基準画像を登録する登録工程と、
前記入力画像に設定される複数の部分領域の各々について前記基準画像との相関度を求める相関算出工程と、
前記複数の部分領域の各々について前記入力画像内の所定の基準位置からの距離を算出する距離算出工程と、
前記複数の部分領域の各々について前記相関度と前記距離とを用いて評価値を算出する評価算出工程と、
前記評価値に基づいて前記複数の部分領域の各々のうち前記特定の被写体を含む領域を決定する決定工程と、
前記評価算出工程が前記評価値を算出する際に前記距離が前記評価値に寄与する度合いを、前記相関算出工程で前記相関度を求めるときのフレームレート、前記特定の被写体の種類、または前記特定の被写体の速度に応じて変化させる変化工程と、を備えることを特徴とする画像処理装置の制御方法。
（プログラム１）構成１乃至５のいずれか一項に記載の画像処理装置の各手段をコンピュータに実行させるためのプログラム。 The disclosure of each embodiment includes the following configurations, methods, and programs.
(Configuration 1) An image processing device for tracking a specific subject across a plurality of input images that are sequentially supplied, comprising:
a registration means for registering a reference image corresponding to the specific subject;
a correlation calculation means for calculating a degree of correlation between the input image and the reference image for each of a plurality of partial regions set in the input image;
a distance calculation means for calculating a distance from a predetermined reference position in the input image to each of the plurality of partial regions;
an evaluation calculation means for calculating an evaluation value for each of the plurality of partial regions using the degree of correlation and the distance;
a determining means for determining an area including the specific subject among the plurality of partial areas based on the evaluation value;
and a change means for changing the degree to which the distance contributes to the evaluation value when the evaluation calculation means calculates the evaluation value, depending on a frame rate, a type of the specific subject, or a speed of the specific subject when the correlation calculation means calculates the correlation degree.
(Configuration 2) The image processing apparatus according to configuration 1, wherein the changing means treats the rate of the input images successively supplied as a frame rate when the correlation calculation means calculates the degree of correlation.
(Configuration 3) The image processing device according to configuration 1, wherein the changing means treats a rate at which the successively supplied input images are periodically thinned out as a frame rate when the correlation calculation means calculates the degree of correlation.
(Configuration 4) The image processing device according to Configuration 1, further comprising a detection means for detecting the specific subject and a type of the specific subject from the initial input image by referring to dictionary data acquired by machine learning.
(Configuration 5) A speed storage means for storing the speed of the specific object in the input image,
The image processing device described in configuration 1, characterized in that the change means changes the degree to which the distance contributes to the evaluation value when the evaluation calculation means calculates the evaluation value for each of the multiple partial areas set in the subsequently input input image, in accordance with the speed of the specific subject stored in the speed storage means.
(Method 1) A method for controlling an image processing device that tracks a specific subject across a plurality of input images that are sequentially supplied, comprising the steps of:
a registration step of registering a reference image corresponding to the specific subject;
a correlation calculation step of calculating a correlation degree between the reference image and each of a plurality of partial regions set in the input image;
a distance calculation step of calculating a distance from a predetermined reference position in the input image to each of the plurality of partial regions;
an evaluation calculation step of calculating an evaluation value for each of the plurality of partial regions using the correlation degree and the distance;
a determining step of determining an area including the specific subject among each of the plurality of partial areas based on the evaluation value;
a change step of changing the degree to which the distance contributes to the evaluation value when the evaluation calculation step calculates the evaluation value, depending on the frame rate, the type of the specific subject, or the speed of the specific subject when the correlation degree is calculated in the correlation calculation step.
(Program 1) A program for causing a computer to execute each unit of the image processing device according to any one of configurations 1 to 5.

１０１画像処理装置
２０２、８０２基準画像登録回路（登録手段）
２０３、８０３相関度算出回路（相関算出手段）
２０４、８０４距離算出回路（距離算出手段）
２０５、８０５評価値算出回路（評価算出手段）（変化手段）
２０６、８０６追跡処理制御回路（決定手段）
３０１テンプレート（基準画像）
３０３探索画像（入力画像）
３０４部分領域 101 Image processing device 202, 802 Reference image registration circuit (registration means)
203, 803 Correlation degree calculation circuit (correlation calculation means)
204, 804 Distance calculation circuit (distance calculation means)
205, 805 Evaluation value calculation circuit (evaluation calculation means) (changing means)
206, 806 Tracking processing control circuit (determination means)
301 Template (reference image)
303 Search image (input image)
304 Partial Area

Claims

1. An image processing device for tracking a particular subject across a plurality of input images provided sequentially, comprising:
a registration means for registering a reference image corresponding to the specific subject;
a correlation calculation means for calculating a degree of correlation between the input image and the reference image for each of a plurality of partial regions set in the input image;
a distance calculation means for calculating a distance from a predetermined reference position in the input image to each of the plurality of partial regions;
an evaluation calculation means for calculating an evaluation value for each of the plurality of partial regions using the degree of correlation and the distance;
a determining means for determining an area including the specific subject among the plurality of partial areas based on the evaluation value;
and a change means for changing the degree to which the distance contributes to the evaluation value when the evaluation calculation means calculates the evaluation value, depending on a frame rate, a type of the specific subject, or a speed of the specific subject when the correlation calculation means calculates the correlation degree.

The image processing device according to claim 1, characterized in that the change means treats the rate of the input images that are sequentially supplied as a frame rate when the correlation calculation means calculates the correlation degree.

The image processing device according to claim 1, characterized in that the change means treats the rate at which the sequentially supplied input images are periodically thinned as the frame rate at which the correlation calculation means calculates the degree of correlation.

The image processing device according to claim 1, further comprising a detection means for detecting the specific subject and the type of the specific subject from the initial input image by referring to dictionary data acquired by machine learning.

a velocity storage means for storing a velocity of the specific object in the input image;
The image processing device according to claim 1, characterized in that the change means changes the degree to which the distance contributes to the evaluation value when the evaluation calculation means calculates the evaluation value for each of the multiple partial regions set in the subsequently input input image, in accordance with the speed of the specific subject stored in the speed storage means.

1. A method for controlling an image processing device for tracking a specific subject across a plurality of input images that are sequentially supplied, comprising:
a registration step of registering a reference image corresponding to the specific subject;
a correlation calculation step of calculating a degree of correlation between the input image and the reference image for each of a plurality of partial regions set in the input image;
a distance calculation step of calculating a distance from a predetermined reference position in the input image to each of the plurality of partial regions;
an evaluation calculation step of calculating an evaluation value for each of the plurality of partial regions using the correlation degree and the distance;
a determining step of determining an area including the specific subject among each of the plurality of partial areas based on the evaluation value;
a change step of changing the degree to which the distance contributes to the evaluation value when the evaluation calculation step calculates the evaluation value, depending on the frame rate, the type of the specific subject, or the speed of the specific subject when the correlation degree is calculated in the correlation calculation step.

2. A program for causing a computer to execute each of the means of the image processing apparatus according to claim 1.