JP2024014516A

JP2024014516A - Information processing device, information processing method and program

Info

Publication number: JP2024014516A
Application number: JP2022117403A
Authority: JP
Inventors: 律子大竹
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2022-07-22
Filing date: 2022-07-22
Publication date: 2024-02-01

Abstract

【課題】本発明によれば、物体の検出精度を向上できる。【解決手段】上記課題を解決する本発明にかかる情報処理装置は、画像における重要位置を特定する特定手段と、前記重要位置を含む注目領域を前記画像から抽出する抽出手段と、前記画像の全領域を包含するように複数の部分画像を設定する設定手段と、前記注目領域に対応する前記画像の部分画像と、前記複数の部分画像と、から物体を検出する検出手段と、を有することを特徴とする。【選択図】図２According to the present invention, object detection accuracy can be improved. [Solution] An information processing apparatus according to the present invention that solves the above problems includes a specifying means for specifying an important position in an image, an extracting means for extracting a region of interest including the important position from the image, and an information processing apparatus for all parts of the image. The method further comprises: a setting means for setting a plurality of partial images to encompass a region; a detection means for detecting an object from a partial image of the image corresponding to the region of interest; and the plurality of partial images. Features. [Selection diagram] Figure 2

Description

本発明は、画像から物体を検出する技術に関する。 The present invention relates to a technique for detecting an object from an image.

特許文献１では、学習器への入力画像として扱えるように対象画像を複数の領域で分割して切り出し既定の画素数の画像となるよう変換する処理が行われる。 In Patent Document 1, a process is performed in which a target image is divided into a plurality of regions, cut out, and converted into an image with a predetermined number of pixels so that it can be treated as an input image to a learning device.

特開２０２１－１４４５８９号公報JP 2021-144589 Publication

特許文献１に開示された技術では、検出対象物が頻繁に映る位置と、分割された領域の関係によっては、画像から物体を検出しにくくなってしまう。 In the technique disclosed in Patent Document 1, it becomes difficult to detect the object from the image depending on the relationship between the position where the object to be detected is frequently displayed and the divided regions.

本発明では、物体の検出精度を向上することを目的とする。 The present invention aims to improve object detection accuracy.

上記課題を解決する本発明にかかる情報処理装置は、画像における重要位置を特定する特定手段と、前記重要位置を含む注目領域を前記画像から抽出する抽出手段と、前記画像の全領域を包含するように複数の部分画像を設定する設定手段と、前記注目領域に対応する前記画像の部分画像と、前記複数の部分画像と、から物体を検出する検出手段と、を有することを特徴とする。 An information processing apparatus according to the present invention that solves the above problems includes a specifying means for specifying an important position in an image, an extraction means for extracting a region of interest including the important position from the image, and an entire area of the image. The present invention is characterized by comprising: a setting means for setting a plurality of partial images, a partial image of the image corresponding to the region of interest, and a detection means for detecting an object from the plurality of partial images.

本発明によれば、物体の検出精度を向上できる。 According to the present invention, object detection accuracy can be improved.

情報処理装置の構成例を示す図である。1 is a diagram illustrating a configuration example of an information processing device. 情報処理装置の機能構成例を示す図である。1 is a diagram illustrating an example of a functional configuration of an information processing device. 注目領域の設定例を示す図である。FIG. 3 is a diagram illustrating an example of setting a region of interest. 情報処理装置によって実行される処理を説明するフローチャートである。3 is a flowchart illustrating processing executed by the information processing device. 注目領域を設定する操作画面の一例を示す図である。FIG. 3 is a diagram illustrating an example of an operation screen for setting a region of interest. 情報処理装置によって実行される処理を説明するフローチャートである。3 is a flowchart illustrating processing executed by the information processing device. 注目領域を設定する操作画面の一例を示す図である。FIG. 3 is a diagram illustrating an example of an operation screen for setting a region of interest. 注目領域の設定例を示す図である。FIG. 3 is a diagram illustrating an example of setting a region of interest.

以下、本発明を実施するための形態について図面を用いて説明する。 EMBODIMENT OF THE INVENTION Hereinafter, the form for implementing this invention is demonstrated using drawings.

［第１の実施形態］
近年、監視カメラ等の撮像装置により撮像された画像を用いて物体の検出や追尾、属性の推定等を行う画像解析や、そのような画像解析の結果を用いた物体数の推定が様々なシーンで行われている。ここで、物体の検出とは、例えば、画像における検出対象の物体の位置及び大きさ、物体の属性、物体の信頼度等を出力する情報処理である。物体の検出に用いる機械学習モデルでは、モデルに入力するデータのサイズが予め決められており、その入力サイズに合うように画像を縮小や分割するなどの前処理を行う。しかしながら、画像を機械学習モデルに入力する為の画像に分割した際に、シーンによって物体が検出しにくくなってしまう可能性があった。そこで、本実施形態では、画像を複数の部分画像に分割して物体検出処理を行う場合であっても物体の検出精度を向上させることが可能な情報処理装置（情報処理方法）について説明する。 [First embodiment]
In recent years, image analysis that uses images captured by imaging devices such as surveillance cameras to detect and track objects, estimate attributes, etc., and estimation of the number of objects using the results of such image analysis has become popular in various scenes. It is being carried out in Here, object detection is information processing that outputs, for example, the position and size of an object to be detected in an image, the attributes of the object, the reliability of the object, and the like. In machine learning models used for object detection, the size of data input to the model is determined in advance, and preprocessing such as reducing or dividing images is performed to match the input size. However, when dividing an image into images to be input into a machine learning model, there is a possibility that objects may become difficult to detect depending on the scene. Therefore, in this embodiment, an information processing apparatus (information processing method) that can improve object detection accuracy even when performing object detection processing by dividing an image into a plurality of partial images will be described.

図１は、本実施形態による情報処理装置１００の構成例を示すブロック図である。本実施形態における情報処理装置１００は、監視カメラ等の撮像装置によって撮像された画像から、検出対象の物体の検出を行う物体検出機能を有する。以下では、物体の一例として人物の顔を検出する場合について説明する。物体は、これに限定されるものではなく、車両や動物、各種物体を対象にすることができ、画像を解析して所定の物体を検出する任意のシステムに適用することができる。 FIG. 1 is a block diagram showing a configuration example of an information processing apparatus 100 according to this embodiment. The information processing device 100 in this embodiment has an object detection function that detects an object to be detected from an image captured by an imaging device such as a surveillance camera. In the following, a case will be described in which a human face is detected as an example of an object. The object is not limited to this, and can include vehicles, animals, and various objects, and can be applied to any system that analyzes images and detects a predetermined object.

本実施形態による情報処理装置１００は、ＣＰＵ１０１、メモリ１０２、通信インターフェース（Ｉ／Ｆ）部１０３、表示部１０４、操作部１０５、及び記憶部１０６を有し、これらはシステムバス１０７を介して通信可能に接続されている。なお、本実施形態による情報処理装置１００は、これ以外の構成をさらに有していても良い。 The information processing device 100 according to the present embodiment includes a CPU 101, a memory 102, a communication interface (I/F) section 103, a display section 104, an operation section 105, and a storage section 106, which communicate with each other via a system bus 107. Possibly connected. Note that the information processing apparatus 100 according to the present embodiment may further include configurations other than this.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１は、情報処理装置１００の全体の制御を司る。ＣＰＵ１０１は、例えばシステムバス１０７を介して接続される各機能部の動作を制御する。メモリ１０２は、ＣＰＵ１０１が処理に利用するデータ、プログラム等を記憶する。また、メモリ１０２は、ＣＰＵ１０１の主メモリ、ワークエリア等としての機能を有する。ＣＰＵ１０１がメモリ１０２に記憶されたプログラムに基づき処理を実行することにより、後述する図２に示す情報処理装置１００の機能構成及び後述する図４に示すフローチャートの処理が実現される。 A CPU (Central Processing Unit) 101 controls the entire information processing device 100 . The CPU 101 controls the operation of each functional unit connected via a system bus 107, for example. The memory 102 stores data, programs, etc. used by the CPU 101 for processing. Further, the memory 102 functions as a main memory, a work area, etc. for the CPU 101. When the CPU 101 executes processing based on the program stored in the memory 102, the functional configuration of the information processing apparatus 100 shown in FIG. 2, which will be described later, and the processing shown in the flowchart shown in FIG. 4, which will be described later, are realized.

通信Ｉ／Ｆ部１０３は、情報処理装置１００をネットワークに接続するインターフェースである。表示部１０４は、液晶ディスプレイ等の表示部材を有し、ＣＰＵ１０１による処理の結果等を表示する。操作部１０５は、マウス、タッチパネル、ボタン等の操作部材を有し、ユーザーの操作を情報処理装置１００に入力する。記憶部１０６は、例えば、ＣＰＵ１０１がプログラムに係る処理を行う際に必要な各種データ等を記憶する。また、記憶部１０６は、例えば、ＣＰＵ１０１がプログラムに係る処理を行うことにより得られた各種データ等を記憶する。なお、ＣＰＵ１０１が処理に利用するデータ、プログラム等を記憶部１０６に記憶するようにしても良い。 The communication I/F unit 103 is an interface that connects the information processing device 100 to a network. The display unit 104 includes a display member such as a liquid crystal display, and displays the results of processing by the CPU 101 and the like. The operation unit 105 includes operation members such as a mouse, a touch panel, and buttons, and inputs user operations to the information processing apparatus 100. The storage unit 106 stores, for example, various data necessary when the CPU 101 performs processing related to a program. Further, the storage unit 106 stores, for example, various data obtained by the CPU 101 performing processing related to a program. Note that data, programs, etc. used by the CPU 101 for processing may be stored in the storage unit 106.

図２は、情報処理装置１００の機能構成例を示すブロック図である。情報処理装置１００は、画像取得部２０１、物体検出部２０２、画像抽出部２０３、修正部２０４、出力部２０５、及び記憶部２０６を有する。 FIG. 2 is a block diagram showing an example of the functional configuration of the information processing device 100. The information processing device 100 includes an image acquisition section 201, an object detection section 202, an image extraction section 203, a modification section 204, an output section 205, and a storage section 206.

画像取得部２０１は、物体検出を行う対象となる画像を取得する。本実施形態では、物体検出を行う対象となる画像は、通信Ｉ／Ｆ部１０３を通じて外部（例えば、撮像装置）から取得する。これ以降は、この画像取得部２０１が取得した、物体検出を行う対象となる画像のデータを単に「入力画像」とも呼ぶ。以下の説明では、入力画像は、一例として水平方向（横方向）の幅が７２０ピクセルであり、垂直方向（縦方向）の高さが４８０ピクセルである、７２０×４８０ピクセルのＲＧＢ画像とする。なお、入力画像は、７２０×４８０ピクセルのＲＧＢ画像に限定されるものではなく、任意の画像を入力画像とすることができ、例えば水平方向の幅や垂直方向の高さが異なっていても良い。 The image acquisition unit 201 acquires an image to be subjected to object detection. In this embodiment, an image to be subjected to object detection is acquired from the outside (for example, an imaging device) through the communication I/F unit 103. Hereinafter, the data of the image that is the object of object detection, which is acquired by the image acquisition unit 201, will also be simply referred to as an "input image." In the following description, the input image is, for example, a 720×480 pixel RGB image with a width of 720 pixels in the horizontal direction (horizontal direction) and a height of 480 pixels in the vertical direction (vertical direction). Note that the input image is not limited to an RGB image of 720 x 480 pixels, and any image can be used as the input image, and for example, the width in the horizontal direction and the height in the vertical direction may be different. .

物体検出部２０２は、所定の解析処理を用いて、入力画像を分割した部分画像それぞれに対して物体検出処理を行う。本実施形態では、物体検出部２０２は、画像取得部２０１によって取得された入力画像を分割した複数の部分画像から、特定物体として人物の顔の位置を示す情報を検出する。また、物体検出部２０２は、画像に含まれる人物の顔を検出できるように学習が行われた機械学習モデルを用いて、検出結果を出力する。すなわち、所定の解析処理は、所定サイズの画像から特定の物体を検出し、その物体が存在する画像上の位置と大きさを示す情報を出力する処理である。例えば下記非特許文献１に記載の技術を適用することで実現できる。 The object detection unit 202 performs object detection processing on each of the partial images obtained by dividing the input image using predetermined analysis processing. In this embodiment, the object detection unit 202 detects information indicating the position of a person's face as a specific object from a plurality of partial images obtained by dividing the input image acquired by the image acquisition unit 201. Further, the object detection unit 202 outputs a detection result using a machine learning model that has been trained to be able to detect the face of a person included in an image. That is, the predetermined analysis process is a process of detecting a specific object from an image of a predetermined size and outputting information indicating the position and size of the object on the image. For example, this can be realized by applying the technology described in Non-Patent Document 1 below.

（非特許文献１）Ｊ．Ｒｅｄｍｏｎ，Ａ．Ｆａｒｈａｄｉ，“ＹＯＬＯ９０００：ＢｅｔｔｅｒＦａｓｔｅｒＳｔｒｏｎｇｅｒ”，ＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ（ＣＶＰＲ）２０１６．
なお、物体検出部２０２における所定の解析処理は、検出したい物体を検出することができる技術であれば、非特許文献１に開示されている技術に限らず、様々な技術を適用可能である。 (Non-patent Document 1) J. Redmon, A. Farhadi, “YOLO9000: Better Faster Stronger”, Computer Vision and Pattern Recognition (CVPR) 2016.
Note that the predetermined analysis process in the object detection unit 202 is not limited to the technique disclosed in Non-Patent Document 1, and various techniques can be applied as long as the technique can detect the object to be detected.

本実施形態では、物体検出部２０２は、一例として水平方向（横方向）の幅及び垂直方向（縦方向）の高さがともに２４０ピクセルである２４０×２４０ピクセルのＲＧＢ画像（部分画像）から検出を行うものとする。その他のサイズの画像が入力された場合には、一般的に知られているバイキュービック法等の任意の手法を用いてリサイズや変形処理を行ってもよい。 In this embodiment, the object detection unit 202 detects an RGB image (partial image) of 240 x 240 pixels, which has a width of 240 pixels in the horizontal direction (horizontal direction) and a height in the vertical direction (vertical direction), as an example. shall be carried out. If an image of another size is input, resizing or deformation processing may be performed using any method such as the generally known bicubic method.

画像抽出部２０３は、画像取得部２０１によって取得した入力画像から、物体検出部２０２へ入力するための部分画像を抽出する。画像抽出部２０３は、大きく分けて２種類の方法で、部分画像を抽出する。１つめの方法は、画像における重要位置を含む注目領域を画像から抽出する方法である。注目領域は、入力画像において特に物体を検出したい領域がある場合に、その領域の物体の検出精度を向上させるために設定する、入力画像の座標系で示される閉領域である。２つめの方法は、入力画像の全領域に対して、均等に、あるいは所定のルールに基づいて複数の部分画像を設定する方法である。複数の部分画像は、上述した所定の解析処理を行う為に、或いは入力画像を所定の大きさの画像にリサイズするために適した画像である。ここで抽出する注目領域ないし部分画像の形状は矩形であれば良いが、以降の説明を簡略化するために正方形であることとする。 The image extraction unit 203 extracts a partial image to be input to the object detection unit 202 from the input image acquired by the image acquisition unit 201. The image extraction unit 203 extracts partial images using roughly two types of methods. The first method is to extract a region of interest that includes important positions in the image from the image. The region of interest is a closed region indicated by the coordinate system of the input image, which is set in order to improve object detection accuracy in an input image when there is a region in which an object is particularly desired to be detected. The second method is to set a plurality of partial images equally or based on a predetermined rule for the entire area of the input image. The plurality of partial images are images suitable for performing the above-described predetermined analysis processing or for resizing an input image to an image of a predetermined size. Although the shape of the region of interest or partial image extracted here may be rectangular, it is assumed to be square in order to simplify the following explanation.

所定のルールは、部分画像を抽出するための位置およびサイズ情報に基づいて設定される。部分画像を抽出するための位置およびサイズ情報が特に与えられていない場合は、例えば図３（ａ）に示すように７２０×４８０ピクセルの入力画像を均等に６分割した２４０×２４０ピクセルの正方形領域を６つ設定する。点線３０１はこのとき抽出される６つの部分画像の境界を示している。一方、部分画像のサイズ情報として１辺３００ピクセルと指定された場合には、図３（ｂ）に示すように３００×３００ピクセルの正方形領域を一様に配置して画像の全領域を包含するために６つの部分画像を抽出する。このとき各抽出領域が重なりあっていても良い。点線３０２はこのとき抽出される各部分画像の境界を示している。 The predetermined rule is set based on position and size information for extracting a partial image. If position and size information for extracting a partial image is not particularly given, for example, as shown in Fig. 3(a), a 240 x 240 pixel square area obtained by equally dividing a 720 x 480 pixel input image into six Set six. Dotted lines 301 indicate the boundaries of the six partial images extracted at this time. On the other hand, if 300 pixels on each side is specified as the size information of the partial image, square areas of 300 x 300 pixels are uniformly arranged to cover the entire area of the image, as shown in Figure 3(b). Six partial images are extracted for this purpose. At this time, the extraction regions may overlap each other. Dotted lines 302 indicate the boundaries of each partial image extracted at this time.

また、例えば注目領域の中心座標が（５００，２００）、サイズは１辺２４０ピクセルという情報が与えられた場合には、図３（ｃ）に示すような１つの部分画像を抽出する。点線３０３で示す正方形は、画像の左上を原点（０，０）としたとき中心座標が（５００，２００）の位置にある２４０×２４０ピクセルの注目領域に対応する部分画像である。 For example, if information is given that the center coordinates of the region of interest are (500, 200) and the size is 240 pixels on one side, one partial image as shown in FIG. 3(c) is extracted. A square indicated by a dotted line 303 is a partial image corresponding to a region of interest of 240×240 pixels whose center coordinates are at a position (500,200) when the upper left of the image is the origin (0,0).

また、画像抽出部２０３は後述する画像上の重要位置の情報および注目領域のサイズに関する情報を取得して、注目領域の中心座標およびその中心座標に対応する注目領域サイズを算出する処理も行う。画像抽出部２０３は、画像全体を包含するように自動で部分画像を配置するための各注目領域の位置およびサイズを決定しそれぞれの部分画像を抽出する処理も行う。 The image extraction unit 203 also acquires information on important positions on the image and information on the size of the region of interest, which will be described later, and performs processing to calculate the center coordinates of the region of interest and the size of the region of interest corresponding to the center coordinates. The image extraction unit 203 also performs processing to determine the position and size of each region of interest for automatically arranging partial images so as to encompass the entire image, and to extract each partial image.

修正部２０４は、部分画像ごとに物体検出部２０２によって得られた検出結果を修正する。例えば一つの物体に対して複数の検出結果が出力された場合や、一つの物体が複数の注目領域にまたがって検出された結果を統合する結果修正処理を行う。 The modification unit 204 modifies the detection results obtained by the object detection unit 202 for each partial image. For example, when a plurality of detection results are output for one object, or when one object is detected across a plurality of regions of interest, result correction processing is performed to integrate the detection results.

出力部２０５は、表示手段に注目領域を出力する。表示手段に注目領域や重要位置を示す情報を表示されることによって、ユーザーが意図する設定を行いやすくなり、検出精度を向上させることができる。更に出力部２０５は、修正部２０４からの出力すなわち解析処理の結果を出力する。出力情報として検出した物体の画像上の位置を示す座標情報、検出した物体の外接矩形で表される検出枠を入力画像に重畳した画像、検出した物体の分類情報などがある。 The output unit 205 outputs the region of interest to a display means. By displaying information indicating regions of interest and important positions on the display means, it becomes easier for the user to make desired settings, and detection accuracy can be improved. Further, the output unit 205 outputs the output from the correction unit 204, that is, the result of the analysis process. The output information includes coordinate information indicating the position of the detected object on the image, an image in which a detection frame represented by a circumscribed rectangle of the detected object is superimposed on the input image, and classification information of the detected object.

記憶部２０６は、情報処理装置１００の各機能部２０１～２０５での処理に用いるデータや処理結果として得られるデータ等を記憶する。 The storage unit 206 stores data used in processing in each of the functional units 201 to 205 of the information processing device 100, data obtained as a processing result, and the like.

次に、図４及び図５を参照して、情報処理装置１００が行う処理について説明する。図４は、情報処理装置が実行する処理を説明するためのフローチャートである。図４のフローチャートに示した処理は、コンピュータである図１のＣＰＵ１０１により記憶部１０６に格納されているコンピュータプログラムに従って実行される。以下の説明では、各工程（ステップ）について先頭にＳを付けて表記することで、工程（ステップ）の表記を省略する。図５は、本実施形態に好適な操作画面の一例である。 Next, processing performed by the information processing apparatus 100 will be described with reference to FIGS. 4 and 5. FIG. 4 is a flowchart for explaining processing executed by the information processing device. The processing shown in the flowchart of FIG. 4 is executed by the CPU 101 of FIG. 1, which is a computer, according to a computer program stored in the storage unit 106. In the following description, each process (step) is indicated by adding S to the beginning thereof, thereby omitting the notation of the process (step). FIG. 5 is an example of an operation screen suitable for this embodiment.

Ｓ４０１において、画像取得部２０１は、入力画像（物体検出を行う対象となる画像）を取得する。本実施形態では入力画像は前述したように７２０×４８０ピクセルの画像であるとする。 In S401, the image acquisition unit 201 acquires an input image (an image to be subjected to object detection). In this embodiment, the input image is assumed to be a 720×480 pixel image as described above.

Ｓ４０２において、画像抽出部２０３は、入力画像における重要位置を特定する。具体的には、画像抽出部２０３は、７２０×４８０ピクセルの画像上で検出対象の物体が頻繁に映る領域を示す重要位置に関する情報を取得する。また、重要位置を定義する方法の一例として、入力画像の中央などに既定の位置を設定しておくことができる。また別の方法として、既定時間に取得した画像に対して所定の解析処理を実行することにより検出した物体の位置に基づいて重要位置を特定する。例えば、既定のタイミングで取得した入力画像に対してＣＰＵ１０１が一般に知られている動体検知処理や人物検知処理を実行し、そのときに検知された人物の外接矩形の重心位置を重要位置と定義する。更に別の方法として、画像抽出部２０３は、ユーザーによって指定された入力画像上の位置を重要位置として特定する。例えば、表示部１０４に表示される図５に示すような操作画面および操作部１０５によってユーザーが指定した重要位置に関する情報を画像抽出部２０３が取得する。 In S402, the image extraction unit 203 identifies important positions in the input image. Specifically, the image extraction unit 203 acquires information regarding important positions indicating areas where the object to be detected is frequently seen on the 720×480 pixel image. Further, as an example of a method of defining an important position, a predetermined position such as the center of the input image can be set. As another method, important positions are identified based on the position of the object detected by executing a predetermined analysis process on images acquired at a predetermined time. For example, the CPU 101 executes generally known moving body detection processing or person detection processing on an input image acquired at a predetermined timing, and defines the center of gravity of the circumscribed rectangle of the person detected at that time as an important position. . As yet another method, the image extraction unit 203 identifies a position on the input image specified by the user as an important position. For example, the image extraction unit 203 acquires information regarding important positions specified by the user using the operation screen shown in FIG. 5 displayed on the display unit 104 and the operation unit 105.

図５（ａ）の注目領域設定画面５００には、画像表示部５１０、ＯＫボタン５０１、キャンセルボタン５０２、リセットボタン５０３、自動配置ボタン５０４、注目領域追加ボタン５０５が含まれる。画像表示部５１０には前述の入力画像が表示され、図５（ａ）ではエレベーターの出入口付近が映し出されている。図５（ｂ）乃至（ｇ）では図５（ａ）と共通の画面要素については符号の付記は省略する。ユーザーが注目領域追加ボタン５０５を押下すると２４０×２４０ピクセルの正方形の注目領域枠５２０が表示される（図５（ｂ））。図５（ｂ）では注目領域枠５２０は画像表示部５１０の左上に初期表示されているが、追加直後の注目領域枠の表示位置は特に限定しない。また図５（ｂ）で追加された注目領域のサイズはこの画像に対して物体検出処理を実行するために適した２４０×２４０ピクセルであるが、条件に応じて追加される注目領域サイズを変更することとしても良い。またユーザーがサイズを自由に変更できることとしても良い。この注目領域枠５２０はユーザーによる操作で画面上での移動が可能である。ユーザーは注目領域枠５２０を物体検出が行われるべき位置や検出対象の物体が頻繁に映る位置すなわち重要な位置に移動する。図５（ｃ）は注目領域枠５２０がエレベーター出入口に重なるように移動された様子を示している。画像抽出部２０３は注目領域枠５２０の位置情報、ここでは正方形である注目領域５２０の左上頂点および右下頂点の座標または、正方形の中心座標および一辺の長さの情報を取得する。 The attention area setting screen 500 in FIG. 5A includes an image display section 510, an OK button 501, a cancel button 502, a reset button 503, an automatic placement button 504, and an add attention area button 505. The above-mentioned input image is displayed on the image display section 510, and in FIG. 5(a), the vicinity of the entrance and exit of the elevator is displayed. In FIGS. 5(b) to 5(g), reference numerals are omitted for screen elements common to those in FIG. 5(a). When the user presses the add attention area button 505, a square attention area frame 520 of 240×240 pixels is displayed (FIG. 5(b)). In FIG. 5B, the attention area frame 520 is initially displayed at the upper left of the image display section 510, but the display position of the attention area frame immediately after addition is not particularly limited. Additionally, the size of the added attention area in Figure 5(b) is 240 x 240 pixels, which is suitable for performing object detection processing on this image, but the size of the added attention area may be changed depending on the conditions. It's also good to do. Alternatively, the user may be able to freely change the size. This attention area frame 520 can be moved on the screen by a user's operation. The user moves the attention area frame 520 to a position where object detection is to be performed or a position where the object to be detected is frequently seen, that is, an important position. FIG. 5C shows the attention area frame 520 moved so as to overlap the elevator entrance. The image extraction unit 203 acquires positional information of the attention area frame 520, the coordinates of the upper left apex and the lower right apex of the attention area 520, which is a square here, or information on the center coordinates and the length of one side of the square.

図５（ｄ）乃至（ｆ）は画面上で重要な位置を指定するための操作画面の別の例である。図５（ｃ）では注目領域枠を直接画面上で配置するのに対し、図５（ｄ）ではポインタ５３０で重要位置を点で指定する操作画面である。マウスポインタで指定またはタッチパネル上でタップした座標を画像抽出部２０３が取得する。また、図５（ｅ）は、検出対象の物体に似せた所定の図形、ここでは人型の図形５４０を重要位置に配置する。この人型図形の位置情報を画像抽出部２０３が取得する。さらに、図５（ｆ）は、所定の図形を配置またはフリーハンドで描いて重要な位置付近を指定する操作画面である。この場合も描かれた閉領域５５０の位置情報（閉領域の重心位置等）を画像抽出部２０３が取得する。 FIGS. 5(d) to 5(f) are other examples of operation screens for specifying important positions on the screen. In FIG. 5(c), the attention area frame is placed directly on the screen, whereas in FIG. 5(d), the operation screen is used to specify important positions with points using the pointer 530. The image extraction unit 203 acquires the coordinates specified with the mouse pointer or tapped on the touch panel. Further, in FIG. 5E, a predetermined figure resembling the object to be detected, here a humanoid figure 540, is placed at an important position. The image extraction unit 203 acquires the position information of this humanoid figure. Furthermore, FIG. 5(f) is an operation screen for specifying the vicinity of important positions by placing a predetermined figure or drawing it freehand. In this case as well, the image extraction unit 203 acquires the positional information of the drawn closed region 550 (such as the position of the center of gravity of the closed region).

Ｓ４０３において、画像抽出部２０３は、Ｓ４０２で取得した重要位置に基づいて、注目領域の中心座標を決定する。図５（ｃ）のように正方形の注目領域が既定されている場合には、その正方形の中心点の座標を取得する。図５（ｃ）の例では注目領域枠５２０が２４０×２４０ピクセルのサイズで中心座標が（５１０，２１０）の位置に表示されている。すなわちＳ４０３で決定する注目領域の中心座標は（５１０，２１０）である。図５（ｄ）の場合は、ポインタ５３０によって指定された点の座標がそのままＳ４０３で決定する注目領域の中心座標である。図５（ｅ）及び（ｆ）の場合は、人型図形５４０あるいは不定形状の図形５５０の代表点を、重心または図形の外接矩形の中心点と定義して、その座標を対応する注目領域の中心座標に決定する。 In S403, the image extraction unit 203 determines the center coordinates of the region of interest based on the important position acquired in S402. If a square region of interest is defined as shown in FIG. 5(c), the coordinates of the center point of the square are acquired. In the example of FIG. 5(c), the attention area frame 520 has a size of 240×240 pixels and is displayed at a position with center coordinates (510, 210). That is, the center coordinates of the region of interest determined in S403 are (510, 210). In the case of FIG. 5D, the coordinates of the point specified by the pointer 530 are the center coordinates of the region of interest determined in S403. In the case of FIGS. 5(e) and (f), the representative point of the humanoid figure 540 or the irregularly shaped figure 550 is defined as the center of gravity or the center point of the circumscribed rectangle of the figure, and its coordinates are defined as the center point of the circumscribed rectangle of the figure. Set to center coordinates.

Ｓ４０４において、画像抽出部２０３は、Ｓ４０２で取得した重要位置に基づいて、注目領域を示す正方形サイズを決定する。図５（ｃ）の例では、注目領域枠５２０が２４０×２４０ピクセルサイズに既定されているため、Ｓ４０３で画像抽出部２０３は注目領域のサイズを２４０×２４０ピクセルと決定する。 In S404, the image extraction unit 203 determines the size of a square indicating the region of interest based on the important position acquired in S402. In the example of FIG. 5C, the size of the region of interest frame 520 is set to 240×240 pixels, so in step S403, the image extraction unit 203 determines the size of the region of interest to be 240×240 pixels.

また、画像抽出部２０３は、入力画像における重要位置に応じて異なるサイズの注目領域を抽出してもよい。図５（ｄ）乃至（ｆ）の操作方式ではサイズは未定のため、Ｓ４０３で取得した注目領域の中心座標（または重要位置）に応じてサイズを決定する。このとき、入力画像が奥行きのある場面を撮影したものである場合は、その奥行方向に対応する画像上の位置で検出対象物体のサイズが変わるために、注目領域サイズもそれに応じて変化させることができる。具体的には、重要位置が手前であるほど矩形のサイズを大きくし、重要位置が画面奥にいくほど矩形のサイズを小さくする。矩形のサイズと検出したい物体のサイズの比が一定の方が解析手段での検出精度が向上するためである。また、画像上の位置によるサイズ差が大きくない場合には、場所によらず一定の注目領域サイズを適用させても良い。本実施形態では、Ｓ４０４で決定される注目領域サイズは２４０×２４０ピクセルとする。 Furthermore, the image extraction unit 203 may extract regions of interest of different sizes depending on important positions in the input image. In the operation methods shown in FIGS. 5(d) to 5(f), the size is undetermined, so the size is determined according to the center coordinates (or important position) of the region of interest acquired in S403. At this time, if the input image is a shot of a scene with depth, the size of the object to be detected changes depending on the position on the image corresponding to the depth direction, so the size of the region of interest should also change accordingly. I can do it. Specifically, the closer the important position is to the front, the larger the rectangle size is, and the further the important position is to the back of the screen, the smaller the rectangle size is. This is because the detection accuracy of the analysis means is improved if the ratio between the size of the rectangle and the size of the object to be detected is constant. Further, if the size difference depending on the position on the image is not large, a constant attention area size may be applied regardless of the position. In this embodiment, the size of the region of interest determined in S404 is 240×240 pixels.

Ｓ４０５において、画像抽出部２０３が、Ｓ４０３で決定した座標を中心とするＳ４０４で決定したサイズの注目領域に対応する部分画像を入力画像から抽出する。注目領域に対応する部分画像には物体が包括的に撮像される可能性が高いため、物体検出処理の精度を向上させることができる。 In S405, the image extraction unit 203 extracts from the input image a partial image corresponding to the region of interest of the size determined in S404 and centered on the coordinates determined in S403. Since there is a high possibility that an object will be comprehensively captured in the partial image corresponding to the region of interest, the accuracy of object detection processing can be improved.

Ｓ４０６において、出力部２０５が、出力部（表示装置）に注目領域に対応する部分画像を出力する。つまり、出力部２０５は、Ｓ４０５で決定した注目領域が視認できる形式で画像表示部５１０上に重畳表示する。具体的には、注目領域を識別可能な特定の様態（色、形、透過率、線等）で表示手段に表示させる。ここで重畳表示された状態は図５（ｃ）に示す注目領域５２０と同一または類似の形態である。入力画像のどこに注目領域が設定されたのかを示すことによってユーザーにとって利便性が向上する。 In S406, the output unit 205 outputs the partial image corresponding to the region of interest to the output unit (display device). That is, the output unit 205 displays the region of interest determined in S405 in a manner that is visible and superimposed on the image display unit 510. Specifically, the area of interest is displayed on the display means in a specific manner (color, shape, transmittance, line, etc.) that allows it to be identified. The superimposed state here is the same as or similar to the attention area 520 shown in FIG. 5(c). By indicating where in the input image the region of interest is set, convenience for the user is improved.

Ｓ４０７において、操作部１０５は、ユーザーによって所定の処理を指示されたか否かを判断する。つまり、操作部１０５が、自動配置ボタン５０４がユーザーによって押下されたか否かを判断する。自動配置ボタン５０４は、入力画像に対して複数の部分画像を設定するための処理を行うトリガーとなる。自動配置ボタン５０４が押下された場合はＳ４０８の処理に進む。自動配置ボタン５０４が押下されなければＳ４０８及びＳ４０９の処理は実行されない。 In S407, the operation unit 105 determines whether the user has instructed a predetermined process. That is, the operation unit 105 determines whether the automatic placement button 504 has been pressed by the user. The automatic placement button 504 serves as a trigger for performing processing for setting a plurality of partial images for the input image. If the automatic placement button 504 is pressed, the process advances to S408. Unless the automatic placement button 504 is pressed, the processes of S408 and S409 are not executed.

Ｓ４０８において、画像抽出部２０３が、前記画像の全領域を包含するように複数の部分画像を設定する。具体的には、図３（ａ）（ｂ）のように入力画像を分割する。これらの部分画像を所定の解析処理に入力することによって入力画像における特定物体の位置を検出する。 In S408, the image extraction unit 203 sets a plurality of partial images so as to encompass the entire area of the image. Specifically, the input image is divided as shown in FIGS. 3(a) and 3(b). By inputting these partial images to a predetermined analysis process, the position of the specific object in the input image is detected.

Ｓ４０９において、出力部２０５は、入力画像に対して設定された複数の部分領域を出力する。具体的には、出力部２０５は、Ｓ４０８で設定した１つ以上の部分画像が視認できる形式で画像表示部５１０上に重畳表示する。このとき、入力画像に対して設定された複数の部分画像と、Ｓ４０６で描画された注目領域とを異なる様態で表示させるようにしても良い。図５（ｇ）はＳ４０９で２４０×２４０ピクセルの部分画像が６つ、画像表示部５１０の画像上に点線５６０で領域の境界を示す形式で表示された様子を示している。ＯＫボタン５０１がユーザーによって押下されると、ここまでの処理で決定された部分画像をそれぞれ画像抽出部２０３が抽出して物体検出部２０２に入力する。 In S409, the output unit 205 outputs the plurality of partial areas set for the input image. Specifically, the output unit 205 superimposes and displays the one or more partial images set in S408 on the image display unit 510 in a visually recognizable format. At this time, the plurality of partial images set for the input image and the region of interest drawn in S406 may be displayed in different manners. FIG. 5G shows how six 240×240 pixel partial images are displayed in S409 on the image display section 510 in a format in which dotted lines 560 indicate the boundaries of the regions. When the OK button 501 is pressed by the user, the image extraction unit 203 extracts each of the partial images determined through the processing up to this point and inputs them to the object detection unit 202.

Ｓ４１０において、物体検出部２０２は、画像取得部２０１によって取得された入力画像を分割した複数の部分画像から、特定物体として人物の顔の位置を示す情報を検出する。 In S410, the object detection unit 202 detects information indicating the position of a person's face as a specific object from a plurality of partial images obtained by dividing the input image acquired by the image acquisition unit 201.

以上説明したように第１の実施形態によれば、入力画像に対する物体検出処理において、対象とする部分画像を抽出する際、重要位置が少なくとも１つの部分画像に包含されるため、重要位置における物体の検出精度が向上する。 As explained above, according to the first embodiment, when extracting a target partial image in object detection processing for an input image, since an important position is included in at least one partial image, an object at an important position Detection accuracy is improved.

［第２の実施形態］
第１の実施形態では画像上の重要な位置に対応する注目領域を決定する処理および画像の全領域を包含するように１つ以上の部分画像を設定する処理を説明した。第２の実施形態では画像上の重要位置から決定された部分画像の１つを基準にして、その他の部分画像を設定する処理を説明する。以下の説明において、第１の実施形態と共通の構成については同一の符号を用い、説明を省略する。ハードウェア構成は第１の実施形態と同様に図１のような構成を用いる。 [Second embodiment]
In the first embodiment, the process of determining a region of interest corresponding to an important position on an image and the process of setting one or more partial images to encompass the entire area of the image have been described. In the second embodiment, a process of setting other partial images based on one of the partial images determined from an important position on an image will be described. In the following description, the same reference numerals will be used for the same components as in the first embodiment, and the description will be omitted. As for the hardware configuration, the configuration shown in FIG. 1 is used as in the first embodiment.

図６は、本実施形態で情報処理装置１００が行う処理を説明するフローチャートである。図６のフローチャートに示した処理は、コンピュータである図１のＣＰＵ１０１により記憶装置１０６に格納されているコンピュータプログラムに従って実行される。図４に示したフローチャートとの共通部分については図４と同一の符号を用いて説明を省略する。図７は、本実施形態による部分画像の設定方法を説明するための操作画面例である。 FIG. 6 is a flowchart illustrating processing performed by the information processing apparatus 100 in this embodiment. The processing shown in the flowchart of FIG. 6 is executed by the CPU 101 of FIG. 1, which is a computer, according to a computer program stored in the storage device 106. Components that are common to the flowchart shown in FIG. 4 are designated by the same reference numerals as those in FIG. 4, and a description thereof will be omitted. FIG. 7 is an example of an operation screen for explaining the partial image setting method according to this embodiment.

Ｓ６０１において、ＣＰＵ１０１はユーザーによる重要位置の追加操作が行われたか否かすなわち図７（ａ）における重要位置追加ボタン７０１が押下されたか否かを判断する。ここで否と判断された場合には、重要位置の指定が無いためＳ４０８に進み、画像抽出部２０３が画像の全領域を包含するように１つ以上の部分画像を設定する。 In S601, the CPU 101 determines whether the user has performed an operation to add an important position, that is, whether the important position addition button 701 in FIG. 7A has been pressed. If it is determined that no important position has been specified, the process advances to step S408, and the image extraction unit 203 sets one or more partial images so as to cover the entire area of the image.

画像抽出部２０３は、以下のいずれかの関数に基づいて、部分画像の配置を設定する。（１）前記入力画像における位置と物体の大きさに関して予め指定された関数。（２）ユーザーによって前記入力画像上に指定された１以上の前記物体の座標および大きさに基づいて算出された関数。（３）所定時間内に取得した画像に対して既定の解析処理を実行することにより検出した前記物体の座標および大きさに基づいて算出された関数。以下にそれぞれの関数について説明する。 The image extraction unit 203 sets the arrangement of partial images based on one of the following functions. (1) A function specified in advance regarding the position and size of the object in the input image. (2) A function calculated based on the coordinates and size of one or more objects specified on the input image by the user. (3) A function calculated based on the coordinates and size of the object detected by executing a predetermined analysis process on images acquired within a predetermined period of time. Each function will be explained below.

操作画面例を図７（ｂ）に示す。図７（ｂ）は、入力画像が手前（画像下部）と奥（画像上部）で被写体の映るサイズが異なる奥行きのある画像に適した部分画像の配置方法の一例を示す。部分画像の境界は点線７１０で表示される。奥行方向に変化する部分画像のサイズについては、（１）予め被写体のサイズが画像内の座標に応じてどのように変化するかを示す情報を基にした関数で指定される。例えば、（２）不図示の操作画面においてユーザーが手前と奥の解析対象物体のサイズを２か所以上指定することで得られるサイズ情報から関数が生成される。または、（３）既定のタイミングにおける入力画像に対して簡易的な解析処理を行い、そこで検出された物体の情報から関数を生成することも可能である。この関数は、画像上の正方形の解析処理領域の中心座標を（ｘ、ｙ）、一辺の長さをｗピクセルとすると、
ｗ＝ａ×ｘ＋ｂ×ｙ＋ｃ・・・（式１）
と定義される。図７の例では横方向（ｘ軸方向）での被写体の映るサイズの変化はほぼ無いため、式１におけるａは０である。縦方向（ｙ軸方向）では被写体の映るサイズの変化があるため、解析処理領域の中心のｙ座標に応じて、一辺の長さｗピクセルを、
ｗ＝０．４８×ｙ＋６８．６・・・（式２）
とする関数が定義されたものとしている。 An example of the operation screen is shown in FIG. 7(b). FIG. 7B shows an example of a method for arranging partial images suitable for an input image having a depth in which the size of the subject differs in the foreground (at the bottom of the image) and in the back (at the top of the image). The boundaries of the partial images are indicated by dotted lines 710. The size of the partial image that changes in the depth direction is (1) specified in advance using a function based on information indicating how the size of the subject changes depending on the coordinates within the image. For example, (2) a function is generated from size information obtained by the user specifying two or more sizes of the object to be analyzed, one in the foreground and the other in the background, on an operation screen (not shown). Alternatively, (3) it is also possible to perform simple analysis processing on the input image at a predetermined timing and generate a function from information about the detected object. This function is expressed as follows, assuming that the center coordinates of the square analysis processing area on the image are (x, y) and the length of one side is w pixels.
w=a×x+b×y+c...(Formula 1)
is defined as In the example of FIG. 7, there is almost no change in the size of the subject in the horizontal direction (x-axis direction), so a in Equation 1 is 0. Since the size of the subject changes in the vertical direction (y-axis direction), the length of one side is w pixels depending on the y-coordinate of the center of the analysis processing area.
w=0.48×y+68.6...(Formula 2)
It is assumed that a function is defined.

Ｓ６０１で重要位置追加操作が行われたと判断された場合には、出力部２０５が、第１の実施形態で説明した通りに、Ｓ４０２乃至Ｓ４０６の処理を実行し、指定された重要位置に対応する注目領域を表示装置に出力する。 If it is determined in S601 that an operation to add an important position has been performed, the output unit 205 executes the processes in S402 to S406 as described in the first embodiment, and adds the key position corresponding to the specified important position. Output the region of interest to a display device.

Ｓ６０２において、ＣＰＵ１０１はユーザーによる重要位置の追加や変更の操作があるか否かを判断する。ここで重要位置の追加や変更の操作があった場合にはＳ４０２乃至Ｓ４０６の処理が繰り返される。図７（ｃ）は、重要位置が２か所指定されそれぞれに対応する注目領域７１１、７１２が描画された場合の操作画面例を示している。Ｓ６０２で否と判断されるとＳ４０７の自動配置の指定有無判断を経てＳ６０３へ進む。 In S602, the CPU 101 determines whether the user performs an operation to add or change important positions. If there is an operation to add or change an important position here, the processes of S402 to S406 are repeated. FIG. 7C shows an example of an operation screen when two important positions are designated and corresponding regions of interest 711 and 712 are drawn. If the determination in S602 is negative, the process proceeds to S603 after determining whether or not automatic placement is specified in S407.

Ｓ６０３において、ＣＰＵ１０１はユーザーによる基準注目領域の指定があるか否かを判断する。ユーザーによる基準注目領域の指定は、例えば図７（ｃ）の注目領域２つのうちの１つが選択状態であれば指定ありと判断する。基準注目領域の指定が無い場合にはＳ４０８に進み、画像抽出部２０３が画像の全領域を包含するように１つ以上の注目領域を配置し、それぞれの注目領域に対応する画像の抽出を行う。図７（ｄ）はこのＳ４０８の処理によって配置された注目領域の境界を示す点線７１０が図７（ｃ）に追加された様子を示している。一方、図７（ｅ）は注目領域７１４が選択状態となっている操作画面例である。選択状態の注目領域７１４の境界が二重線で表示され、他の注目領域枠７１１の実線とは異なる表示形式であるため容易に識別できる。このように注目領域のうちの１つが選択状態であるときに自動配置ボタン５０４がユーザーによって押下された場合にはＳ６０３でＣＰＵ１０１が基準注目領域の指定がされていると判断し、Ｓ６０４へ進む。 In S603, the CPU 101 determines whether a reference region of interest has been specified by the user. The reference region of interest is designated by the user, for example, if one of the two regions of interest shown in FIG. 7C is in a selected state. If there is no reference region of interest specified, the process advances to step S408, where the image extraction unit 203 arranges one or more regions of interest so as to encompass the entire region of the image, and extracts an image corresponding to each region of interest. . FIG. 7(d) shows that a dotted line 710 indicating the boundary of the region of interest arranged by the process of S408 has been added to FIG. 7(c). On the other hand, FIG. 7E shows an example of an operation screen in which the attention area 714 is in a selected state. The boundary of the selected region of interest 714 is displayed as a double line, which is a different display format from the solid lines of the other region of interest frames 711, so that it can be easily identified. If the user presses the automatic placement button 504 while one of the attention areas is selected in this way, the CPU 101 determines in S603 that the reference attention area has been specified, and the process advances to S604.

Ｓ６０４において、画像抽出部２０３が、注目領域を基準として、画像の全領域を包含するように１つ以上の部分画像を設定する。ここでは、選択状態になっている注目領域７１４を基準にしてその他の部分画像の位置とサイズを、画像抽出部２０３が前述の関数を利用して決定する。図７（ｆ）はＳ６０４で設定された注目領域がＳ４０９で描画され、操作画面に点線７２０で表示された例を示している。ここで、注目領域７１４は中心座標が（４８０，２３０）、正方形一辺の長さが１７９ピクセルである。注目領域７１４と同じｙ座標の領域には、同サイズの注目領域が重なりなく配置される（７２１）。 In S604, the image extraction unit 203 sets one or more partial images to encompass the entire area of the image, using the region of interest as a reference. Here, the image extraction unit 203 uses the above-described function to determine the positions and sizes of other partial images based on the selected region of interest 714. FIG. 7F shows an example in which the attention area set in S604 is drawn in S409 and displayed as a dotted line 720 on the operation screen. Here, the region of interest 714 has center coordinates (480, 230) and a square side length of 179 pixels. In a region having the same y coordinate as the region of interest 714, regions of interest of the same size are arranged without overlapping (721).

次に注目領域７１４と同じｙ座標に配置された他の注目領域７２１の一段上の注目領域７２２の位置およびサイズを算出する。まず、注目領域７１４の上辺のｙ座標が、注目領域７１４の位置とサイズから１４１と算出される。このｙ座標が一段上の注目領域７２２の下辺となる。一段上の注目領域の中心点のｙ座標をｙ１、サイズを表す正方形の一辺の長さをｗ１としたとき、前述の式２からｗ１＝０．４８×ｙ１＋６８．６である。また、下辺のｙ座標が１４１と算出されているため、ｙ１＋ｗ１／２＝１４１である。この２式から、ｙ１＝８６，ｗ１＝１１０と算出される。このｙ座標とサイズ、基準となる注目領域７１４と同じｘ座標に注目領域７１５が配置される。次に左右に同サイズの注目領域が重なりなく設定される（７２２）。 Next, the position and size of a region of interest 722 one step above another region of interest 721 arranged at the same y coordinate as the region of interest 714 are calculated. First, the y-coordinate of the upper side of the attention area 714 is calculated as 141 from the position and size of the attention area 714. This y-coordinate becomes the lower side of the region of interest 722 one level higher. When the y-coordinate of the center point of the region of interest one step above is y1, and the length of one side of the square representing the size is w1, from the above equation 2, w1=0.48×y1+68.6. Furthermore, since the y coordinate of the lower side is calculated as 141, y1+w1/2=141. From these two equations, y1=86 and w1=110 are calculated. A region of interest 715 is arranged at the same x coordinate as the y coordinate and size of the region of interest 714 serving as a reference. Next, attention areas of the same size are set on the left and right without overlapping (722).

その他の段も同様に決定され、画像の最上段に配置されている注目領域７２３共通のｙ座標は－３，サイズは６７、最下段に配置されている注目領域７２４共通のｙ座標は２９２、サイズは４６５である。 The other rows are determined in the same way, the common y coordinate of the attention area 723 placed at the top of the image is -3, the size is 67, the common y coordinate of the attention area 724 placed at the bottom step is 292, The size is 465.

なお、この操作画面例では、基準となった注目領域７１４の境界は二重線で表示され、他の注目領域と見分けられるよう異なる形式とされている。また、ユーザーが重要位置追加操作によって配置した注目領域（実線７１１、二重線７１４）と自動配置によって配置された注目領域（点線７２０）とで異なる表示形式としているため、容易に見分けることができる。注目領域の表示形式はこれに限るものではなく、全領域同一の形式でも良い。また、注目領域を変更または削除、追加するなどの編集時に識別が必要な領域のみ境界の表示形式を変更する、注目領域内を含めて着色するなどの方式とすることも可能である。 In this example of the operation screen, the boundary of the reference area 714 is displayed as a double line, which is in a different format so that it can be distinguished from other areas of interest. In addition, the attention area placed by the user by the important position addition operation (solid line 711, double line 714) and the attention area placed by automatic placement (dotted line 720) are displayed in different display formats, so they can be easily distinguished. . The display format of the attention area is not limited to this, and the entire area may be displayed in the same format. Furthermore, it is also possible to change the display format of boundaries only for areas that need to be identified during editing, such as changing, deleting, or adding an attention area, or to color the area including the attention area.

以上説明したように第２の実施形態によれば、入力画像に対する物体検出処理において、ユーザーが指定した画像上の重要な位置を基準とした複数の部分領域の抽出ができるため、重要位置での検出精度向上とともにその他の部分領域の抽出効率も向上する。 As explained above, according to the second embodiment, in object detection processing for an input image, it is possible to extract a plurality of partial regions based on important positions on the image specified by the user. As the detection accuracy improves, the extraction efficiency of other partial regions also improves.

［第３の実施形態］
本実施形態では、第１、第２の実施形態における処理ステップのうち、入力画像の全領域を包含するように１以上の部分画像を設定する処理（Ｓ４０８）に関する説明をする。 [Third embodiment]
In this embodiment, among the processing steps in the first and second embodiments, a process (S408) of setting one or more partial images to encompass the entire area of the input image will be described.

部分画像単位で入力画像から抽出した画像を物体検出部２０２に入力するため、正方形の部分画像を１つ以上入力画像に設定する。第２の実施形態で示したように画像上の重要位置に対応した注目領域のうち基準とする注目領域が指定されていれば、そこを起点にしてそれ以外の部分画像を配置していけば良い。ところが基準の注目領域が指定されない場合の部分画像の配置方法は幾通りも想定される。そこで本実施形態では、画像上の重要位置に関する情報が無い場合、どのように基準の注目領域を決定するかについて図８を用いて説明する。 In order to input images extracted from the input image in partial image units to the object detection unit 202, one or more square partial images are set as the input image. As shown in the second embodiment, if a reference area of interest is specified among the areas of interest corresponding to important positions on the image, other partial images can be arranged using that area as a starting point. good. However, when a reference region of interest is not designated, there are many possible ways to arrange partial images. Therefore, in this embodiment, how to determine a reference region of interest when there is no information regarding important positions on an image will be described using FIG. 8.

図８（ａ）乃至（ｄ）の二重線で表示された注目領域８１１，８２１，８３１，８４１はそれぞれ基準として最初に配置された部分画像を示している。この部分画像の位置を基準にしてその他の部分画像の位置とサイズが算出される。その他の部分画像の境界は点線８１０，８２０，８３０，８４０で表示されている。図８（ａ）は画像上の位置によらず同サイズの部分画像、基準の注目領域位置を画像中央とした例である。図８（ｂ）は画像上の位置によらず同サイズの部分画像、基準の注目領域位置を画像の下中央とした例である。図８（ｃ）は、画像上の位置に応じて異なるサイズの部分画像が設定される例である。具体的には、入力画像のｙ座標に応じて部分画像サイズが変化し、基準の注目領域位置を画像中央とした例である。図８（ｄ）も、画像上の位置に応じて異なるサイズの部分画像が設定される例である。具体的には、入力画像のｙ座標に応じて部分画像サイズが変化し、基準の注目領域位置を画像の下中央とした例である。これらの４種類の配置方法を、被写体の映り方の特徴に合わせて使い分けることができると良い。そこで被写体の映り方の特徴を簡易的に推測するために、カメラの設定情報および奥行きに関する情報として第２の実施形態で使用した式１の係数を利用する。 Regions of interest 811, 821, 831, and 841 indicated by double lines in FIGS. 8(a) to 8(d) each indicate a partial image initially placed as a reference. The positions and sizes of other partial images are calculated based on the position of this partial image. Boundaries of other partial images are indicated by dotted lines 810, 820, 830, and 840. FIG. 8A shows an example in which partial images have the same size regardless of the position on the image, and the reference region of interest position is set at the center of the image. FIG. 8B is an example in which the partial images have the same size regardless of the position on the image, and the reference region of interest position is set at the bottom center of the image. FIG. 8C is an example in which partial images of different sizes are set depending on the position on the image. Specifically, this is an example in which the partial image size changes depending on the y-coordinate of the input image, and the reference region of interest position is set at the center of the image. FIG. 8D is also an example in which partial images of different sizes are set depending on the position on the image. Specifically, this is an example in which the partial image size changes depending on the y-coordinate of the input image, and the reference region of interest position is set at the bottom center of the image. It would be good if these four types of arrangement methods could be used depending on the characteristics of how the subject appears. Therefore, in order to easily estimate the characteristics of how a subject appears, the coefficients of Equation 1 used in the second embodiment are used as camera setting information and information regarding depth.

図８（ｅ）は人物８５０を高い位置からほぼ真下を見下ろして撮影した場合の入力画像例である。このような画像に対しては、検出対象となる物体が画像の中央に映ることが多いと推測されるため、基準の注目領域を画像の中央に配置する。また、奥行きが小さいため画像上の位置によるサイズの変化は無いものとして、図８（ａ）の配置パターンを選択する。 FIG. 8E is an example of an input image when a person 850 is photographed from a high position looking down almost directly below. For such images, it is assumed that the object to be detected is often seen at the center of the image, so the reference region of interest is placed at the center of the image. Furthermore, since the depth is small, the arrangement pattern shown in FIG. 8A is selected assuming that there is no change in size depending on the position on the image.

図８（ｆ）は奥行きが小さい場所で人物をほぼ真横から撮影した場合の入力画像例である。やや手前にいる人物８６０とそれより後方にいる人物８６１の画像上のサイズに大きな差が見られない。このような画像に対しては、基準の注目領域を入力画像の中央に置くよりも画像領域外にはみ出す注目領域数を減らすことのできる下中央に配置する。つまり図８（ｂ）の配置パターンを選択する。 FIG. 8(f) is an example of an input image when a person is photographed from almost directly sideways in a place with a small depth. There is no significant difference in the size of the person 860 who is slightly in the foreground and the person 861 who is further behind him in the image. For such an image, the reference region of interest is placed at the bottom center, which can reduce the number of regions of interest that protrude outside the image area, rather than placing it at the center of the input image. In other words, the arrangement pattern shown in FIG. 8(b) is selected.

図８（ｇ）は高い位置から斜め下を見下ろして撮影した場合の入力画像例である。奥行きがあり、中央付近に映る人物８７０に対して奥（画像上部）に映る人物８７１はサイズが小さい。また手前（画像下部）にも人物が映るものの一定以上近づくと人物８７２のように人物の一部分のみ大きく画像下部に映り、注目する対象から外れると考えられる。そのため、このような画像に対しては、基準の注目領域は入力画像の中央に配置する。奥行きがあるため画像上の位置による注目領域のサイズの変化は必要であるため、図８（ｃ）の配置パターンを選択する。 FIG. 8(g) is an example of an input image taken from a high position looking down diagonally. There is depth, and a person 871 appearing in the back (at the top of the image) is smaller in size than a person 870 appearing near the center. In addition, although a person is shown in the foreground (at the bottom of the image), if the person approaches more than a certain level, only a portion of the person, such as person 872, will appear larger at the bottom of the image, and the person will be removed from the object of interest. Therefore, for such an image, the reference region of interest is placed at the center of the input image. Since there is depth, it is necessary to change the size of the region of interest depending on the position on the image, so the arrangement pattern shown in FIG. 8(c) is selected.

図８（ｈ）は図８（ｇ）よりも低い位置から撮影した場合の入力画像例である。奥行きがあり、手前の人物８６０が正面から捉えられて画像下部に映っている。このような画像に対しては基準の注目領域を入力画像の下中央に配置する図８（ｄ）の配置パターンを選択する。 FIG. 8(h) is an example of an input image taken from a lower position than FIG. 8(g). There is depth, and the person 860 in the foreground is captured from the front and appears at the bottom of the image. For such an image, the arrangement pattern shown in FIG. 8(d) in which the reference region of interest is arranged at the bottom center of the input image is selected.

図８（ｅ）乃至（ｈ）に示した被写体の映り方を簡易的に推定するための１つ目の要素としてカメラの設定情報を使用する。例えばここではカメラの設定状態を示すものとして水平面と光軸の間の角度である俯角を用いる。カメラの俯角情報は、設置時にユーザーが操作部１０５を通じて情報処理装置１００に入力することで得られるほか、カメラのセンサ機能等により生成された情報が通信Ｉ／Ｆ１０３を通じて情報処理装置１００に入力される場合もある。更にカメラを取り付けた位置の床または地面からの高さ情報を加味した情報としても良い。高さ情報は設置時にユーザーが操作部１０５を通じて情報処理装置１００に入力することで得られる。 Camera setting information is used as the first element for simply estimating how the subject shown in FIGS. 8(e) to 8(h) appears. For example, here, the angle of depression, which is the angle between the horizontal plane and the optical axis, is used to indicate the setting state of the camera. The depression angle information of the camera can be obtained by the user inputting it into the information processing apparatus 100 through the operation unit 105 at the time of installation, and also by inputting information generated by the camera's sensor function etc. to the information processing apparatus 100 through the communication I/F 103. In some cases. Furthermore, the information may include height information from the floor or the ground at the position where the camera is attached. The height information can be obtained by the user inputting it into the information processing apparatus 100 through the operation unit 105 at the time of installation.

被写体の映り方を簡易的に推定するための２つ目の要素として、奥行きに関する情報を使用する。ここでは第２の実施形態で記載した式１（ｗ＝ａ×ｘ＋ｂ×ｙ＋ｃ、中心座標（ｘ、ｙ）における解析処理領域の一辺の長さｗピクセル）の係数ｂを用いる。ｂが大きいほどｙ軸方向（奥行方向）の画像上の被写体サイズの変化が大きいことを示している。ｂの値とカメラ俯角の２つの条件に応じて、基準の解析処理領域の位置と解析処理領域サイズの変化の有無の組み合わせである配列パターンを決定する例を表１に示す。表１において、ｂの値は例えば「小」は０≦ｂ＜０．５、「大」は０．５≦ｂ≦１である。また、この値が得られない場合は「不明」とする。また、カメラ俯角は例えば「小」を２０度未満、「大」を２０度以上とし、俯角情報が得られない場合は「不明」扱いとする。 Information regarding depth is used as the second element for easily estimating how the subject appears. Here, the coefficient b of Equation 1 (w=axx+b×y+c, length of one side of the analysis processing area at the center coordinates (x, y), w pixels) described in the second embodiment is used. The larger b is, the larger the change in the subject size on the image in the y-axis direction (depth direction) is. Table 1 shows an example of determining an array pattern that is a combination of the position of the reference analysis processing area and the presence or absence of a change in the size of the analysis processing area, depending on two conditions: the value of b and the camera depression angle. In Table 1, the value of b is, for example, 0≦b<0.5 for “small” and 0.5≦b≦1 for “large”. If this value cannot be obtained, it is marked as "unknown". Further, for the camera depression angle, for example, "small" is less than 20 degrees, "large" is 20 degrees or more, and if depression angle information cannot be obtained, it is treated as "unknown".

このように条件によって解析処理領域の配置パターンを既定することで、より入力画像に適した解析処理領域の設定が可能となる。 By predetermining the arrangement pattern of the analysis processing area based on the conditions in this way, it becomes possible to set the analysis processing area more suitable for the input image.

［その他の実施形態］
本発明は、前述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読み出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 [Other embodiments]
The present invention provides a system or device with a program that implements one or more functions of the embodiments described above via a network or a storage medium, and one or more processors in a computer of the system or device reads and executes the program. This can also be achieved by processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

上述の各実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明は、その技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 The embodiments described above are merely examples of implementation of the present invention, and the technical scope of the present invention should not be construed as limited by these embodiments. That is, the present invention can be implemented in various forms without departing from its technical idea or main features.

１００情報処理装置
２０１画像取得部
２０２物体検出部
２０３画像抽出部
２０４修正部
２０５出力部
２０６記憶部 100 Information Processing Device 201 Image Acquisition Unit 202 Object Detection Unit 203 Image Extraction Unit 204 Correction Unit 205 Output Unit 206 Storage Unit

Claims

identification means for identifying important positions in the image;
Extracting means for extracting a region of interest including the important position from the image;
Setting means for setting a plurality of partial images to encompass the entire area of the image;
An information processing device comprising: a partial image of the image corresponding to the region of interest; and a detection means for detecting an object from the plurality of partial images.

The information processing apparatus according to claim 1, wherein the setting means sets the plurality of partial images based on the region of interest.

The information according to claim 1, wherein the identifying means identifies the important position based on the position of the object detected by executing a predetermined analysis process on an image acquired at a predetermined time. Processing equipment.

further comprising operation means for accepting a position in the image specified by a user;
The information processing apparatus according to claim 1, wherein the specifying unit specifies a position on the image designated by the user as the important position.

5. The information processing apparatus according to claim 4, wherein the operating means receives the position by arranging a predetermined figure on the image displayed on the display means.

2. The information processing apparatus according to claim 1, wherein the extraction means extracts the attention area of a different size depending on the important position in the image.

The information processing apparatus according to claim 1, further comprising output means for outputting the region of interest to a display means.

8. The information processing apparatus according to claim 7, wherein the output means displays the plurality of set partial images and the region of interest in different manners.

8. The information processing apparatus according to claim 7, wherein the output means displays the region of interest and a region including an object detected from the plurality of partial images in different manners.

The information processing apparatus according to claim 1, wherein the detection means detects the object by performing a predetermined analysis process on each of the partial images of a predetermined size.

The identifying means identifies a predetermined position on the image as the important position,
3. The information processing apparatus according to claim 2, wherein the setting means sets the partial images so that one of the plurality of partial images is a partial image including the important position.

The information processing apparatus according to claim 1, wherein the setting means sets the partial images to have different sizes depending on their positions in the image.

The setting means is based on (1) a function specified in advance regarding the position and size of the object in the image, and (2) the coordinates and size of one or more of the objects specified on the image by the user. (3) a function calculated based on the coordinates and size of the object detected by executing a predetermined analysis process on images acquired within a predetermined time; 13. The information processing apparatus according to claim 12, wherein the arrangement of the partial images is set based on one of the following.

an identification step of identifying important positions in the image;
an extraction step of extracting a region of interest including the important position from the image;
a setting step of setting a plurality of partial images to encompass the entire area of the image;
An information processing method comprising: a partial image of the image corresponding to the region of interest; and a detection step of detecting an object from the plurality of partial images.

computer,
identification means for identifying important positions in the image;
Extracting means for extracting a region of interest including the important position from the image;
Setting means for setting a plurality of partial images to encompass the entire area of the image;
A program for functioning as an information processing apparatus, comprising a partial image of the image corresponding to the region of interest, and a detection means for detecting an object from the plurality of partial images.