JP2018120283A

JP2018120283A - Information processing device, information processing method and program

Info

Publication number: JP2018120283A
Application number: JP2017009453A
Authority: JP
Inventors: 矢野　光太郎; Kotaro Yano; 光太郎矢野; 内山　寛之; Hiroyuki Uchiyama; 寛之内山; 一郎梅田; Ichiro Umeda
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-01-23
Filing date: 2017-01-23
Publication date: 2018-08-02

Abstract

PROBLEM TO BE SOLVED: To improve detection accuracy of an object.SOLUTION: An information processing device comprises estimation means to estimate, on the basis of a first image of an object taken by a first imaging device as well as correspondence between a coordination system of the first image and a coordination system of a second image of the object taken by a second imaging device, an imaging direction of the object in the second image, determination means to determine, on the basis of the imaging direction estimated by the estimation means, a recognition model of a learning subject used for detection of the object in the second image, and learning means to learn, on the basis of an image of the object contained in the second image, the recognition model determined by the determination means.SELECTED DRAWING: Figure 4

Description

本発明は、情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

近年、セキュリティのために店舗等の施設内への監視カメラ等の撮像装置の設置が急速に普及している。そのような撮像装置を使って、画像を取得するだけでなく、画像中の人物等のオブジェクトを検出することによって混雑度を計測したり、オブジェクトの動線を解析することによって店舗のマーケティング調査に使用したりすることが提案されている。また、街頭に設置されている撮像装置により撮影された画像から特定のオブジェクトを検索したいという要望もある。
このような画像解析を行う場合、オブジェクトの検出精度が十分でないと役に立つ情報が得られない。画像から人物等のオブジェクトを検出する方法としては非特許文献１にあるような方法が提案されている。しかしながら、撮影するシーンによってはオブジェクトがカメラから遠い位置に立っているため十分な解像度が得られなかったり、オブジェクトが重なり合って一部が隠れたりするため、十分な検出精度が得られないという問題がある。 In recent years, installation of an imaging device such as a monitoring camera in a facility such as a store is rapidly spreading for security. Using such an imaging device, not only to acquire images, but also to measure the degree of congestion by detecting objects such as people in the image, or to analyze the flow lines of objects, for marketing research of stores It has been proposed to use it. There is also a desire to search for a specific object from an image taken by an imaging device installed on the street.
When such image analysis is performed, useful information cannot be obtained unless the object detection accuracy is sufficient. As a method for detecting an object such as a person from an image, a method as described in Non-Patent Document 1 has been proposed. However, depending on the scene to be photographed, the object is standing far from the camera, so that sufficient resolution cannot be obtained, or the objects overlap and part of them are hidden, so that sufficient detection accuracy cannot be obtained. is there.

この問題に対し、非特許文献２には、視野が重複する複数の撮像装置で互いにオブジェクトを検出し、それらの検出結果を統合することでより高精度の検出結果を得ることが開示されている。
一方、人物の見えは撮影方向や姿勢によって大きく変化する。そこで、非特許文献３には、人体をパーツに分けてパーツ毎に撮影方向別の複数の認識モデルを使用することで検出精度を上げる技術が開示されている。 To deal with this problem, Non-Patent Document 2 discloses that a plurality of imaging devices having overlapping fields of view detect objects with each other and integrate the detection results to obtain a more accurate detection result. .
On the other hand, the appearance of a person varies greatly depending on the shooting direction and posture. Therefore, Non-Patent Document 3 discloses a technique for improving detection accuracy by dividing a human body into parts and using a plurality of recognition models for each part according to shooting directions.

ＤａｌａｌａｎｄＴｒｉｇｇｓ．ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓｆｏｒＨｕｍａｎＤｅｔｅｃｔｉｏｎ．ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ（２００５）Dalal and Triggs. Histograms of Oriented Gradients for Human Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2005) Ｌｅｉｓｔｎｅｒら．ＶＩＳＵＡＬＯＮ−ＬＩＮＥＬＥＡＲＮＩＮＧＩＮＤＩＳＴＲＩＢＵＴＥＤＣＡＭＥＲＡＮＥＴＷＯＲＫＳ．ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＤｉｓｔｒｉｂｕｔｅｄＳｍａｒｔＣａｍｅｒａｓ（２００８）Leistner et al. VISUAL ON-LINE LEARNING IN DISTRIBUTED CAMERA NETWORKS. Proceedings of the International Conference on Distributed Smart Cameras (2008) ＹａｎｇａｎｄＲａｍａｎａｎ．Ａｒｔｉｃｕｌａｔｅｄｐｏｓｅｅｓｔｉｍａｔｉｏｎｗｉｔｈｆｌｅｘｉｂｌｅｍｉｘｔｕｒｅｓ−ｏｆ−ｐａｒｔｓ．ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ（２０１１）Yang and Ramanan. Articulated dose estimation with flexible mixes-of-parts. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2011) 徐剛、辻三郎著．「３次元ビジョン」．共立出版（１９９８）By Xugang and Saburo Saburo. "3D vision". Kyoritsu Publishing (1998) Ｂａｇａｒｉｎａｏら．ＡｄａｐｔｉｎｇＳＶＭＩｍａｇｅＣｌａｓｓｉｆｉｅｒｓｔｏＣｈａｎｇｅｓｉｎＩｍａｇｉｎｇＣｏｎｄｉｔｉｏｎｓＵｓｉｎｇＩｎｃｒｅｍｅｎｔａｌＳＶＭ：ＡｎＡｐｐｌｉｃａｔｉｏｎｔｏＣａｒＤｅｔｅｃｔｉｏｎ．Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ９ｔｈＡｓｉａｎＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎ（２００９）Bagarinao et al. Adapting SVM Image Classes to Changes in Imaging Conditions Usage Incremental SVM: An Application to Car Detection. Proceedings of the 9th Asian Conference on Computer Vision (2009) ＯｚａａｎｄＲｕｓｓｅｌｌ．ＯｎｌｉｎｅＢａｇｇｉｎｇａｎｄＢｏｏｓｔｉｎｇ．ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅａｎｄＳｔａｔｉｓｔｉｃｓ（２００１）Oza and Russell. Online Bagging and Boosting. Proceedings of the Artificial Intelligence and Statistics (2001) Ｓｚｅｌｉｓｋｉ著、玉木他訳．「コンピュータビジョンアルゴリズムと応用」．共立出版（２０１３）Szeliski, translated by Tamaki et al. “Computer Vision Algorithms and Applications”. Kyoritsu Publishing (2013)

非特許文献２には、更に、以下の技術が開示されている。即ち、あるカメラに撮影された画像についてのオブジェクトの検出結果を用いて、別のカメラに撮影された画像におけるオブジェクトの位置を推定する。そして、別のカメラにより撮影された画像における推定した位置の画像を学習サンプルとして、別のカメラにより撮影された画像についてのオブジェクトの認識モデルを学習する技術である。これにより、設置シーンにオブジェクトの認識モデルを適応させて精度向上を図ることができるとしている。
一方、店舗等の施設内や街頭を歩いている人物等のオブジェクトを検出する場合は姿勢の変化はそれほど大きくはないが、撮影する方向によって見えが大きく変わる。したがって、撮影方向によって複数の認識モデルを持つことが望ましい。また、その際、設置シーンに認識モデルを適応させることが望ましい。
しかし、オブジェクトの認識に複数の認識モデルを利用する場合、非特許文献２に記載の技術では、あるカメラに撮影された画像についてのオブジェクトの検出結果から求められた学習サンプルがどの認識モデルに適した学習サンプルであるかを特定できなかった。そのため、認識モデルを適した学習サンプルで学習できずに、オブジェクトの検出精度が向上できなかった。 Non-Patent Document 2 further discloses the following technique. That is, the position of the object in the image photographed by another camera is estimated using the detection result of the object for the image photographed by a certain camera. And it is the technique which learns the recognition model of the object about the image image | photographed with another camera by using the image of the estimated position in the image image | photographed with another camera as a learning sample. As a result, the accuracy can be improved by adapting the object recognition model to the installation scene.
On the other hand, when detecting an object such as a person walking in a facility such as a store or on the street, the change in posture is not so great, but the appearance changes greatly depending on the direction of shooting. Therefore, it is desirable to have a plurality of recognition models depending on the shooting direction. At that time, it is desirable to adapt the recognition model to the installation scene.
However, when a plurality of recognition models are used for object recognition, the technique described in Non-Patent Document 2 is suitable for which recognition model a learning sample obtained from an object detection result for an image captured by a camera is suitable. Could not be identified as a learning sample. Therefore, the recognition model cannot be learned with a suitable learning sample, and the object detection accuracy cannot be improved.

本発明の目的は、複数の撮像装置で撮影された画像から、オブジェクトを精度良く検出できるようにすることを目的とする。 An object of the present invention is to make it possible to accurately detect an object from images taken by a plurality of imaging devices.

本発明の情報処理装置は、第１の撮像装置によりオブジェクトが撮影された第１の画像と、前記第１の画像の座標系と第２の撮像装置により前記オブジェクトが撮影された第２の画像の座標系との対応関係と、に基づいて、前記第２の画像における前記オブジェクトの撮影方向を推定する推定手段と、前記推定手段により推定された前記撮影方向に基づいて、前記第２の画像における前記オブジェクトの検出に用いられる、学習対象の認識モデルを決定する決定手段と、前記第２の画像に含まれる前記オブジェクトの画像に基づいて、前記決定手段により決定された前記認識モデルを学習する学習手段と、を有する。 The information processing apparatus according to the present invention includes a first image in which an object is photographed by a first imaging device, and a second image in which the object is photographed by a coordinate system of the first image and a second imaging device. And a second image based on the photographing direction estimated by the estimating means and an estimating means for estimating a photographing direction of the object in the second image based on the correspondence relationship with the coordinate system of the second image. Learning the recognition model determined by the determining means based on a determination means for determining a recognition model to be used for detection of the object in the image and an image of the object included in the second image Learning means.

本発明によれば、オブジェクトの検出精度を向上させることができる。 According to the present invention, the object detection accuracy can be improved.

情報処理装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of information processing apparatus. 情報処理装置の機能構成等の一例を示す図である。It is a figure showing an example of functional composition etc. of an information processor. 撮影シーンの一例を示す図である。It is a figure which shows an example of an imaging | photography scene. 情報処理装置の処理の一例を示すフローチャートである。It is a flowchart which shows an example of a process of information processing apparatus. 検出部の一例の詳細を示す図である。It is a figure which shows the detail of an example of a detection part. 検出部の処理の一例を示すフローチャートである。It is a flowchart which shows an example of a process of a detection part. 認識モデルに対応する画像の一例を示す図である。It is a figure which shows an example of the image corresponding to a recognition model. キャリブレーションパターンの一例を示す図である。It is a figure which shows an example of a calibration pattern. 画像中の領域の対応関係の一例を説明する図である。It is a figure explaining an example of the correspondence of the area | region in an image. 情報処理装置の機能構成等の一例を示す図である。It is a figure showing an example of functional composition etc. of an information processor. 情報処理装置の処理の一例を示すフローチャートである。It is a flowchart which shows an example of a process of information processing apparatus. 視野制御部の一例の詳細を示す図である。It is a figure which shows the detail of an example of a visual field control part. 視野制御部の処理の一例を示すフローチャートである。It is a flowchart which shows an example of a process of a visual field control part. 重複領域の評価方法の一例を説明する図である。It is a figure explaining an example of the evaluation method of an overlap area. 撮影された画像の一例を示す図である。It is a figure which shows an example of the image | photographed image.

以下、本発明の実施形態について図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

＜実施形態１＞
図１は、本実施形態に係る情報処理装置１０のハードウェア構成の一例を示す図である。情報処理装置１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１、記憶装置１２、入力装置１３、及び出力装置１４を含む。なお、各構成要素は、互いに通信可能に、バス等を介して接続されている。
ＣＰＵ１１は、情報処理装置１０の動作をコントロールし、記憶装置１２に格納されたプログラムの実行等を行う中央演算装置である。記憶装置１２は、磁気記憶装置、半導体メモリ等のストレージデバイスであり、ＣＰＵ１１の動作にもとづき読み込まれるプログラム、長時間記憶しなくてはならないデータ等を記憶する。本実施形態では、ＣＰＵ１１が、記憶装置１２に格納されたプログラムの手順に従って処理を行うことによって、図２で後述する情報処理装置１０の機能及び図４、６で後述するフローチャートに係る処理等が実現される。記憶装置１２は、また、本実施形態に係る情報処理装置１０が処理対象とする画像および検出結果を記憶する。 <Embodiment 1>
FIG. 1 is a diagram illustrating an example of a hardware configuration of the information processing apparatus 10 according to the present embodiment. The information processing apparatus 10 includes a CPU (Central Processing Unit) 11, a storage device 12, an input device 13, and an output device 14. In addition, each component is connected via a bus | bath etc. so that communication is mutually possible.
The CPU 11 is a central processing unit that controls operations of the information processing apparatus 10 and executes programs stored in the storage device 12. The storage device 12 is a storage device such as a magnetic storage device or a semiconductor memory, and stores a program that is read based on the operation of the CPU 11, data that must be stored for a long time, and the like. In the present embodiment, the CPU 11 performs processing according to the procedure of the program stored in the storage device 12, so that the functions of the information processing apparatus 10 described later in FIG. 2 and the processing related to the flowcharts described later in FIGS. Realized. The storage device 12 also stores an image to be processed by the information processing apparatus 10 according to the present embodiment and a detection result.

入力装置１３は、マウス、キーボード、タッチパネルデバイス、ボタン等の入力装置であり、ユーザからの各種の入力を受け付ける。入力装置１３は、また、図２で後述するカメラ１１１、１１２等の撮像装置との間の情報のやり取りを行うためのインターフェースを含む。出力装置１４は、液晶パネル、外部モニタ等の表示装置、音声出力装置等であり、各種の情報を出力する。
なお、情報処理装置１０のハードウェア構成は、図１の構成に限られるものではない。例えば、情報処理装置１０は、各種の装置間で通信を行うためのＩ／Ｏ装置を含むこととしてもよい。例えば、Ｉ／Ｏ装置は、メモリーカード、ＵＳＢケーブル等の入出力装置、有線、無線等による送受信装置等である。 The input device 13 is an input device such as a mouse, a keyboard, a touch panel device, or a button, and accepts various inputs from the user. The input device 13 also includes an interface for exchanging information with imaging devices such as cameras 111 and 112 described later in FIG. The output device 14 is a display device such as a liquid crystal panel and an external monitor, an audio output device, and the like, and outputs various types of information.
Note that the hardware configuration of the information processing apparatus 10 is not limited to the configuration of FIG. For example, the information processing device 10 may include an I / O device for performing communication between various devices. For example, the I / O device is a memory card, an input / output device such as a USB cable, a wired / wireless transmission / reception device, or the like.

図２は、情報処理装置１０の機能構成等の一例を示す図である。情報処理装置１０の処理及び機能は、ＣＰＵ１１により実現される。
情報処理装置１０は、撮影情報取得部１２１、１２２、画像取得部１３１、１３２、検出部１４１、１４２、位置関係算出部１５０、位置推定部１６０、抽出部１７０、学習部１８０、選択部１９０、統合部２００を含む。
撮影情報取得部１２１、１２２は、それぞれ、カメラ１１１、１１２から撮影情報を取得する。画像取得部１３１、１３２は、それぞれ、カメラ１１１、１１２により撮影された画像を取得する。検出部１４１、１４２は、それぞれ、カメラ１１１、１１２により撮影された画像から人物を検出する。位置関係算出部１５０は、カメラ１１１により撮影された画像の位置座標と、カメラ１１２により撮影された画像の位置座標と、の対応関係を算出する。位置推定部１６０は、カメラ１１２により撮影された画像内における検出部１４１により検出された人物の位置に対応する位置座標を推定する。抽出部１７０は、カメラ１１２により撮影された画像から部分画像を抽出する。学習部１８０は、人物の検出に用いられる認識モデルを学習する。選択部１９０は、学習対象である認識モデルを決定する。統合部２００は、検出部１４１、１４２による検出結果を統合する。 FIG. 2 is a diagram illustrating an example of a functional configuration of the information processing apparatus 10. The processing and functions of the information processing apparatus 10 are realized by the CPU 11.
The information processing apparatus 10 includes shooting information acquisition units 121 and 122, image acquisition units 131 and 132, detection units 141 and 142, a positional relationship calculation unit 150, a position estimation unit 160, an extraction unit 170, a learning unit 180, a selection unit 190, An integration unit 200 is included.
The shooting information acquisition units 121 and 122 acquire shooting information from the cameras 111 and 112, respectively. The image acquisition units 131 and 132 acquire images taken by the cameras 111 and 112, respectively. The detection units 141 and 142 detect a person from images taken by the cameras 111 and 112, respectively. The positional relationship calculation unit 150 calculates a correspondence relationship between the positional coordinates of the image captured by the camera 111 and the positional coordinates of the image captured by the camera 112. The position estimation unit 160 estimates position coordinates corresponding to the position of the person detected by the detection unit 141 in the image captured by the camera 112. The extraction unit 170 extracts a partial image from the image captured by the camera 112. The learning unit 180 learns a recognition model used for detecting a person. The selection unit 190 determines a recognition model that is a learning target. The integration unit 200 integrates the detection results from the detection units 141 and 142.

カメラ１１１、１１２は、監視対象のシーンを撮影する撮像装置である。
撮影情報取得部１２１、１２２は、それぞれ、カメラ１１１、カメラ１１２の撮影情報を取得する。撮影情報とは、撮像装置により撮影された画像の位置座標と、撮影されたシーンの三次元空間座標と、を対応付ける情報であり、撮像装置の撮影倍率、撮像装置が撮影する方向から決定される。
画像取得部１３１、１３２は、それぞれ、カメラ１１１、カメラ１１２により撮影された画像を取得する。画像取得部１３１、１３２は、取得した各画像の情報を、記憶装置１２に記憶する。
検出部１４１、１４２は、それぞれ、画像取得部１３１、画像取得部１３２により取得された画像から検出対象のオブジェクトを検出する。本実施形態では、検出対象のオブジェクトは、人物であるとするが、荷物、車両、ドローン等の物体でもよいし、犬、猫、家畜等の動物でもよい。
位置関係算出部１５０は、撮影情報取得部１２１、及び撮影情報取得部１２２により取得された撮影情報に基づいて、以下の処理を行う。即ち、位置関係算出部１５０は、カメラ１１１により撮影された画像における座標系を、カメラ１１２により撮影された画像における座標系に変換するための座標変換パラメータを算出する。算出される座標変換パラメータは、カメラ１１１により撮影された画像における座標系と、カメラ１１２により撮影された画像における座標系と、を対応付ける対応情報の一例である。 The cameras 111 and 112 are imaging devices that capture a scene to be monitored.
The shooting information acquisition units 121 and 122 acquire shooting information of the camera 111 and the camera 112, respectively. The shooting information is information that associates the position coordinates of the image shot by the imaging device with the three-dimensional space coordinates of the shot scene, and is determined from the shooting magnification of the imaging device and the shooting direction of the imaging device. .
The image acquisition units 131 and 132 acquire images captured by the camera 111 and the camera 112, respectively. The image acquisition units 131 and 132 store the acquired information on each image in the storage device 12.
The detection units 141 and 142 detect objects to be detected from the images acquired by the image acquisition unit 131 and the image acquisition unit 132, respectively. In the present embodiment, the detection target object is a person, but may be an object such as a luggage, a vehicle, or a drone, or may be an animal such as a dog, a cat, or a domestic animal.
The positional relationship calculation unit 150 performs the following processing based on the shooting information acquired by the shooting information acquisition unit 121 and the shooting information acquisition unit 122. That is, the positional relationship calculation unit 150 calculates a coordinate conversion parameter for converting a coordinate system in an image captured by the camera 111 into a coordinate system in an image captured by the camera 112. The calculated coordinate conversion parameter is an example of correspondence information that associates the coordinate system in the image captured by the camera 111 with the coordinate system in the image captured by the camera 112.

位置推定部１６０は、位置関係算出部１５０により算出された座標変換パラメータを用いて、カメラ１１２により撮影された画像内における検出部１４１により検出された人物の位置に対応する座標を推定する。
抽出部１７０は、画像取得部１３２により取得された、カメラ１１２により撮影された画像から部分画像を抽出する。
学習部１８０は、抽出部１７０により抽出された部分画像を用いて、検出部１４２による人物の検出処理に用いられる認識モデルを学習する。
選択部１９０は、検出部１４２による人物の検出処理に用いられる複数の認識モデルから学習部１８０により学習される認識モデルを選択する。
統合部２００は、検出部１４１及び検出部１４２による検出処理の結果を統合する。 The position estimation unit 160 uses the coordinate conversion parameters calculated by the positional relationship calculation unit 150 to estimate coordinates corresponding to the position of the person detected by the detection unit 141 in the image captured by the camera 112.
The extraction unit 170 extracts a partial image from the image captured by the camera 112 acquired by the image acquisition unit 132.
The learning unit 180 uses the partial image extracted by the extraction unit 170 to learn a recognition model used for the person detection process by the detection unit 142.
The selection unit 190 selects a recognition model learned by the learning unit 180 from a plurality of recognition models used for the person detection process by the detection unit 142.
The integration unit 200 integrates the detection processing results by the detection unit 141 and the detection unit 142.

以下では、情報処理装置１０の処理について説明する。なお、カメラ１１１及びカメラ１１２は、図３に示すように撮影シーンに対して同じ人物を撮影可能なように視野が重複するように設置されている。本実施形態では、カメラ１１１及びカメラ１１２の視野は固定であるとする。また、カメラ１１１及びカメラ１１２の撮影情報も、固定であるとする。カメラ１１１及びカメラ１１２の撮影情報は、予め設定されている。また、カメラ１１１により撮影された画像における位置座標からカメラ１１２により撮影された画像における位置座標への変換に用いられる座標変換パラメータは、位置関係算出部１５０により予め算出されているとする。
図４は、情報処理装置１０の処理の一例を示すフローチャートである。
Ｓ４０１において、画像取得部１３１、画像取得部１３２は、それぞれ、カメラ１１１、カメラ１１２により撮影された画像を取得する。Ｓ４０１で取得される画像は、例えば、ＲＧＢ各８ビットで表現されるビットマップデータである。また、カメラ１１１、カメラ１１２は、撮影のタイミングが一致するように、同期されている。そのため、画像取得部１３１、画像取得部１３２は、それぞれ、カメラ１１１、カメラ１１２により同じタイミングで撮影された画像を取得する。しかし、画像取得部１３１、画像取得部１３２は、それぞれ、カメラ１１１、カメラ１１２により設定された期間（例えば、ある時刻を中心に０．１秒間の期間等）内に撮影された画像を取得することとしてもよい。画像取得部１３１、画像取得部１３２は、取得した各画像を、記憶装置１２に記憶する。以下では、画像取得部１３１により取得されたカメラ１１１により撮影された画像を第１の画像とする。また、以下では、画像取得部１３２により取得されたカメラ１１２により撮影された画像を第２の画像とする。 Below, the process of the information processing apparatus 10 is demonstrated. The camera 111 and the camera 112 are installed so that the fields of view overlap so that the same person can be photographed with respect to the photographing scene as shown in FIG. In this embodiment, it is assumed that the fields of view of the camera 111 and the camera 112 are fixed. In addition, shooting information of the camera 111 and the camera 112 is also fixed. The shooting information of the camera 111 and the camera 112 is set in advance. In addition, it is assumed that the coordinate conversion parameter used for conversion from the position coordinates in the image captured by the camera 111 to the position coordinates in the image captured by the camera 112 is calculated in advance by the positional relationship calculation unit 150.
FIG. 4 is a flowchart illustrating an example of processing of the information processing apparatus 10.
In S401, the image acquisition unit 131 and the image acquisition unit 132 acquire images captured by the camera 111 and the camera 112, respectively. The image acquired in S401 is, for example, bitmap data represented by 8 bits for each of RGB. The cameras 111 and 112 are synchronized so that the shooting timings coincide. Therefore, the image acquisition unit 131 and the image acquisition unit 132 acquire images captured at the same timing by the camera 111 and the camera 112, respectively. However, the image acquisition unit 131 and the image acquisition unit 132 acquire images captured within a period set by the camera 111 and the camera 112 (for example, a period of 0.1 seconds centered on a certain time, respectively). It is good as well. The image acquisition unit 131 and the image acquisition unit 132 store the acquired images in the storage device 12. Hereinafter, an image taken by the camera 111 acquired by the image acquisition unit 131 is referred to as a first image. Hereinafter, an image taken by the camera 112 acquired by the image acquisition unit 132 is referred to as a second image.

Ｓ４０２において、検出部１４１は、第１の画像から検出対象のオブジェクトである人物を検出する。検出部１４１の詳細を図５に示す。
検出部１４１は、部分画像取得部１４１１、特徴抽出部１４１２、パターン識別部１４１３、パラメータ取得部１４１４、検出結果出力部１４１５を含む。部分画像取得部１４１１は、第１の画像から人物を検出するための部分画像を取得する。特徴抽出部１４１２は、部分画像取得部１４１１により取得された部分画像から特徴量を抽出する。パターン識別部１４１３は、特徴抽出部１４１２により抽出された特徴量と、パラメータ取得部１４１４により取得された識別パラメータと、に基づいて、部分画像取得部１４１１により取得された部分画像が人物の画像か否かを識別する。パラメータ取得部１４１４は、人物を認識するための認識モデルから識別パラメータを取得する。検出結果出力部１４１５は、パターン識別部１４１３により人物と識別された場合に部分画像の矩形を表す４つの頂点の位置座標を出力する。 In S402, the detection unit 141 detects a person who is an object to be detected from the first image. Details of the detection unit 141 are shown in FIG.
The detection unit 141 includes a partial image acquisition unit 1411, a feature extraction unit 1412, a pattern identification unit 1413, a parameter acquisition unit 1414, and a detection result output unit 1415. The partial image acquisition unit 1411 acquires a partial image for detecting a person from the first image. The feature extraction unit 1412 extracts a feature amount from the partial image acquired by the partial image acquisition unit 1411. The pattern identification unit 1413 determines whether the partial image acquired by the partial image acquisition unit 1411 is an image of a person based on the feature amount extracted by the feature extraction unit 1412 and the identification parameter acquired by the parameter acquisition unit 1414. Identify whether or not. The parameter acquisition unit 1414 acquires an identification parameter from a recognition model for recognizing a person. The detection result output unit 1415 outputs the position coordinates of the four vertices representing the rectangle of the partial image when the pattern identification unit 1413 identifies the person.

図６は、検出部１４１の処理の一例を示すフローチャートである。以下では、図６を参照してＳ４０２の人物検出処理の詳細を説明する。
Ｓ６０１において、パラメータ取得部１４１４は、人物を認識するための認識モデルから１つを設定し、設定した認識モデルに応じた識別パラメータや探索範囲等のパラメータを取得する。識別パラメータとは、その認識モデルに対応するオブジェクトの識別処理に用いられるパラメータであり、例えば、その認識モデルから抽出される特徴量等である。本実施形態では、人物の認識に用いられる認識モデルには、正面モデルＭ１、側面モデルＭ２、平面モデルＭ３の３つがある。各モデルの情報は、予め、記憶装置１２に記憶されており、パラメータ取得部１４１４により管理されている。図７に各モデルが認識対象とする人物画像の一例を示す。正面モデルＭ１は、図７（ａ）に示すように人物を正面方向から撮影した画像を認識対象とする認識モデルである。本実施形態では、情報処理装置１０は、人物の輪郭を捉える特徴量を用いて識別処理を行うこととする。そのため、正面モデルＭ１は、人物を前方から撮影した画像に加えて輪郭が類似する人物を後方から撮影した画像も認識可能なように事前学習によって得られたものである。また、側面モデルＭ２、平面モデルＭ３は、それぞれ図７（ｂ）、（ｃ）に示した人物画像を認識対象とする認識モデルである。 FIG. 6 is a flowchart illustrating an example of processing of the detection unit 141. Hereinafter, the details of the person detection process in S402 will be described with reference to FIG.
In step S601, the parameter acquisition unit 1414 sets one of recognition models for recognizing a person, and acquires parameters such as an identification parameter and a search range according to the set recognition model. The identification parameter is a parameter used for identification processing of an object corresponding to the recognition model, and is, for example, a feature amount extracted from the recognition model. In the present embodiment, there are three recognition models used for human recognition: a front model M1, a side model M2, and a planar model M3. Information on each model is stored in advance in the storage device 12 and managed by the parameter acquisition unit 1414. FIG. 7 shows an example of a person image to be recognized by each model. The front model M1 is a recognition model whose recognition target is an image obtained by photographing a person from the front direction as shown in FIG. In the present embodiment, the information processing apparatus 10 performs identification processing using a feature amount that captures the outline of a person. Therefore, the front model M1 is obtained by prior learning so that, in addition to an image obtained by photographing a person from the front, an image obtained by photographing a person having a similar outline from the rear can also be recognized. Further, the side model M2 and the plane model M3 are recognition models for which the human images shown in FIGS. 7B and 7C are recognized.

このように人物は、その撮影される方向によって見えが異なるが、本実施形態では、情報処理装置１０は、撮影される方向の異なる複数の認識モデルを用いることで、人物の見えの変動に対応している。図７（ｃ）に示すように、平面モデルＭ３は、厳密に人物の上方から撮影された画像のみを認識対象とするという訳ではなく、人物の上方の周辺の方向から撮影された画像も認識対象とするように学習されている。
平面モデルＭ３は、例えば、設定された範囲（例えば、水平方向から７０度〜９０度の範囲等）の俯角で撮影された人物を認識対象として学習されている。また、正面モデルＭ１についても、厳密に人物の前方から撮影された画像のみを認識対象とするという訳ではなく、人物の前方の周辺の方向から撮影された画像も認識対象とするように学習されている。正面モデルＭ１は、例えば、人物の前方方向を基準方位として、設定された範囲（例えば、−１０度〜１０度、及び１７０度〜１９０度の範囲等）の方位角で撮影された人物を認識対象として学習されている。また、側面モデルＭ２についても、厳密に人物の側面から撮影された画像のみを認識対象とするという訳ではなく、人物の側面方向の周辺の方向から撮影された画像も認識対象とするように学習されている。側面モデルＭ２は、例えば、人物の側面方向を基準方位として、設定された範囲（例えば、−１０度〜１０度、及び１７０度〜１９０度の範囲等）の方位角で撮影された人物を認識対象として学習されている。 In this way, the appearance of a person varies depending on the direction in which the person is photographed. In this embodiment, the information processing apparatus 10 uses a plurality of recognition models with different directions in which the person is photographed to cope with fluctuations in the appearance of the person. doing. As shown in FIG. 7 (c), the plane model M3 does not strictly recognize only an image taken from above a person, but also recognizes an image taken from a peripheral direction above the person. Learned to target.
For example, the planar model M3 is learned by using a person photographed at a depression angle within a set range (for example, a range of 70 to 90 degrees from the horizontal direction) as a recognition target. Further, the front model M1 is not limited to an image that is strictly photographed from the front of the person, but is learned so that an image photographed from the peripheral direction in front of the person is also a recognition target. ing. For example, the front model M1 recognizes a person photographed at an azimuth angle within a set range (for example, a range of −10 degrees to 10 degrees, a range of 170 degrees to 190 degrees, etc.) with the forward direction of the person as a reference direction. Learned as a target. In addition, the side model M2 is not limited to an image captured strictly from the side surface of the person, but is learned so that an image captured from the peripheral direction of the side surface of the person is also recognized. Has been. For example, the side model M2 recognizes a person photographed at an azimuth angle within a set range (for example, a range of −10 degrees to 10 degrees and 170 degrees to 190 degrees, etc.) with the side direction of the person as a reference direction. Learned as a target.

また、図７に示すように人物の画像は、撮影の際の状況により、見えだけでなく、人物を囲む領域の形状も異なる。本実施形態では、情報処理装置１０は、画像中の矩形領域についてパターン照合を行い、その矩形領域が人物か否かの識別を行い、照合対象の矩形領域の縦横のサイズを照合に用いられる認識モデルによって変更する。更に、特定のシーンを撮影するカメラにより撮影された画像では、画像中の位置に応じて人物の見える方向に偏りがある。そこで本実施形態では、情報処理装置１０は、認識モデルに応じて、人物を探索する画像中の範囲である探索範囲を変更するようにしている。以上のような処理を行うため、人物の各認識モデル（認識モデルＭ１〜Ｍ３）は、パターン識別処理に用いられる識別パラメータに加えて、照合対象とする矩形領域の縦横サイズ、画像中の探索範囲の情報等のパラメータを含む。これらのパラメータがパラメータ取得部１４１４によって取得される。本実施形態では、情報処理装置１０は、３つの認識モデルを順次切り替えて、以下のＳ６０２〜Ｓ６０５の処理を繰り返す。対応する撮影方向をより細かく分割して多くの認識モデルで人物のモデルを構成した方が識別精度は有利であるが、処理量は増加する。このような構成においても本実施形態の処理は、適用できるが、本実施形態では３つの認識モデルで人物モデルを構成する。 In addition, as shown in FIG. 7, a person image has not only the appearance but also the shape of the region surrounding the person depending on the situation at the time of shooting. In the present embodiment, the information processing apparatus 10 performs pattern matching on a rectangular area in an image, identifies whether or not the rectangular area is a person, and recognizes the vertical and horizontal sizes of the rectangular area to be used for matching. Change by model. Furthermore, in an image taken by a camera that takes a specific scene, there is a bias in the direction in which a person can be seen according to the position in the image. Therefore, in the present embodiment, the information processing apparatus 10 changes the search range, which is a range in an image for searching for a person, according to the recognition model. In order to perform the processing as described above, each human recognition model (recognition models M1 to M3) includes, in addition to the identification parameters used in the pattern identification processing, the vertical and horizontal sizes of the rectangular area to be collated, the search range in the image Parameters such as information. These parameters are acquired by the parameter acquisition unit 1414. In the present embodiment, the information processing apparatus 10 sequentially switches the three recognition models and repeats the following processes of S602 to S605. It is more advantageous to divide the corresponding shooting direction more finely to construct a human model with many recognition models, but the processing amount increases. Even in such a configuration, the processing of the present embodiment can be applied, but in the present embodiment, a person model is configured by three recognition models.

Ｓ６０２において、部分画像取得部１４１１は、第１の画像から人物を検出するための部分領域画像を取得する。画像からオブジェクトを検出するには、例えば、画像から探索ウィンドウと呼ばれる所定の部分領域の画像を取得し、取得した画像と検出対象のオブジェクトを表すモデルとの照合を行う。そして、探索ウィンドウを画像中で順次移動させ、照合を繰り返すことによって画像中からオブジェクトを検出する。本実施形態でも、情報処理装置１０は、このような方法でオブジェクトを検出する。即ち、情報処理装置１０は、Ｓ６０２で部分領域画像を取得し、以下に示すＳ６０３〜Ｓ６０５の処理を行うことを繰り返す。
Ｓ６０３において、特徴抽出部１４１２は、Ｓ６０２で取得された部分領域の画像から人物を検出するための特徴量を抽出する。本実施形態では、特徴抽出部１４１２は、非特許文献１にあるような勾配方向ヒストグラム特徴（ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）を抽出する。勾配方向ヒストグラム特徴は、画像の輝度勾配の和を方向別に求めることによってエッジ形状を表現する特徴量の一つであり、人物画像においては主にその輪郭を捉えるのに有用である。情報処理装置１０は、人物の識別に用いる特徴量として、勾配方向ヒストグラム特徴に限られない。情報処理装置１０は、勾配方向ヒストグラム特徴の他にも、例えば、Ｈａａｒ−ｌｉｋｅ特徴、ＬＢＰＨ特徴（ＬｏｃａｌＢｉｎａｒｙＰａｔｔｅｒｎＨｉｓｔｏｇｒａｍ）等を用いてもよいし、それらを組み合せた特徴量を用いてもよい。 In step S602, the partial image acquisition unit 1411 acquires a partial region image for detecting a person from the first image. In order to detect an object from an image, for example, an image of a predetermined partial area called a search window is acquired from the image, and the acquired image is compared with a model representing an object to be detected. Then, the search window is sequentially moved in the image, and the object is detected from the image by repeating the collation. Also in this embodiment, the information processing apparatus 10 detects an object by such a method. In other words, the information processing apparatus 10 repeatedly acquires the partial region image in S602 and performs the processes in S603 to S605 described below.
In step S603, the feature extraction unit 1412 extracts a feature amount for detecting a person from the partial region image acquired in step S602. In the present embodiment, the feature extraction unit 1412 extracts gradient direction histogram features (Histograms of Oriented Gradients) as described in Non-Patent Document 1. The gradient direction histogram feature is one of the feature amounts that express the edge shape by obtaining the sum of the luminance gradients of the image for each direction, and is useful for capturing the contour mainly in a human image. The information processing apparatus 10 is not limited to the gradient direction histogram feature as a feature amount used for identifying a person. In addition to the gradient direction histogram feature, the information processing apparatus 10 may use, for example, a Haar-like feature, an LBPH feature (Local Binary Pattern Histogram), or a feature amount that is a combination thereof.

Ｓ６０４において、パターン識別部１４１３は、Ｓ６０３で特徴抽出部１４１２により抽出された特徴量と、Ｓ６０１でパラメータ取得部１４１４により取得された識別パラメータと、に基づいて、以下の処理を行う。即ち、パターン識別部１４１３は、Ｓ６０２で部分画像取得部１４１１により取得された部分領域画像が人物か否かを識別する。非特許文献１ではサポートベクターマシンによって人物識別器を学習し、識別に用いる方法が示されており、本実施形態においても、情報処理装置１０は、その方法を適用する。但し、非特許文献１では、様々な方向から見た人物の画像を一つの認識モデルで識別する方法が示されている。しかし、本実施形態では、情報処理装置１０は、撮影される方向毎に複数の認識モデルを用いて、照合対象の部分画像を、それぞれの認識モデルとの照合を行って、対応する方向から撮影された人物を識別するようにした。Ｓ６０４では、パターン識別部１４１３は、一つの認識モデルとの照合をＳ６０１でパラメータ取得部１４１４により取得された識別パラメータを用いて行う。パターン識別部１４１３は、例えば、以下の式１を用いて、線形サポートベクターマシンによるパターン識別処理を行う。
ｙ＝ｓｉｇｎ＜ｘ、ｗ＞（式１）
式１で、ｘは、Ｓ６０３で特徴抽出部１４１２により抽出された勾配方向ヒストグラム特徴である。また、ｗは、Ｓ６０１でパラメータ取得部１４１４により取得された識別パラメータであり、ｘと次元数の同じベクトルである。また、＜＞は、ベクトルの内積演算を示す演算子である。また、ｓｉｇｎは、符号演算を表す演算子であり、正の値の場合＋１、負の値の場合−１を返す。また、ｙは、識別結果であり、＋１であれば人物、−１であれば人物以外を表す。パターン識別部１４１３は、計算結果であるｙの値に基づいて、画像が人物であるか否かを識別する。パターン識別部１４１３は、Ｓ６０４で式１を用いる識別方法の他にも、カーネル演算を用いたサポートベクターマシンやアダブースト識別器、ランダム分類木（ＲａｎｄｏｍｉｚｅｄＴｒｅｅ）等を用いた識別方法を行ってもよい。 In step S604, the pattern identification unit 1413 performs the following process based on the feature amount extracted by the feature extraction unit 1412 in step S603 and the identification parameter acquired by the parameter acquisition unit 1414 in step S601. That is, the pattern identifying unit 1413 identifies whether or not the partial area image acquired by the partial image acquiring unit 1411 in S602 is a person. Non-Patent Document 1 discloses a method of learning a person classifier using a support vector machine and using it for identification. The information processing apparatus 10 also applies this method in this embodiment. However, Non-Patent Document 1 discloses a method for identifying a human image viewed from various directions with one recognition model. However, in the present embodiment, the information processing apparatus 10 uses a plurality of recognition models for each direction to be photographed, collates partial images to be collated with the respective recognition models, and photographs from the corresponding directions. Identified people were identified. In step S604, the pattern identification unit 1413 performs matching with one recognition model using the identification parameter acquired by the parameter acquisition unit 1414 in step S601. The pattern identification unit 1413 performs pattern identification processing by a linear support vector machine using, for example, the following Equation 1.
y = sign <x, w> (Formula 1)
In Equation 1, x is the gradient direction histogram feature extracted by the feature extraction unit 1412 in S603. Further, w is an identification parameter acquired by the parameter acquisition unit 1414 in S601, and is a vector having the same number of dimensions as x. <> Is an operator indicating an inner product operation of vectors. Further, sign is an operator representing a sign operation, and returns +1 for a positive value and -1 for a negative value. Moreover, y is an identification result, and if it is +1, it represents a person, and if it is -1, it represents a person other than a person. The pattern identifying unit 1413 identifies whether or not the image is a person based on the y value that is the calculation result. In addition to the identification method using Equation 1 in S604, the pattern identification unit 1413 may perform an identification method using a support vector machine using kernel calculation, an Adaboost classifier, a random classification tree, or the like. .

Ｓ６０５において、検出結果出力部１４１５は、Ｓ６０２で取得された部分画像がＳ６０４でパターン識別部１４１３により人物と識別された場合、その部分画像の矩形を表す４つの頂点の第１の画像内における位置座標を出力する。
Ｓ６０６において、部分画像取得部１４１１は、第１の画像におけるＳ６０１で取得された探索範囲の情報が示す範囲から取得できる部分画像の全てについて、Ｓ６０３〜Ｓ６０５の処理を行ったか否かを判定する。部分画像取得部１４１１は、行ったと判定した場合、Ｓ６０７の処理に進む。また、部分画像取得部１４１１は、行っていないと判定した場合、Ｓ６０２の処理に進み、まだ取得していない部分画像を取得する。
Ｓ６０７において、パラメータ取得部１４１４は、全ての認識モデルについて、Ｓ６０１〜Ｓ６０６の処理を行ったか否かを判定する。パラメータ取得部１４１４は、行ったと判定した場合、図６の処理を終了し、Ｓ４０３の処理に進む。パラメータ取得部１４１４は、行っていないと判定した場合、Ｓ６０１の処理に進み、まだ用いていない認識モデルから識別パラメータ等を取得する。 In S605, when the partial image acquired in S602 is identified as a person by the pattern identifying unit 1413 in S604, the detection result output unit 1415 positions the four vertices representing the rectangle of the partial image in the first image. Output coordinates.
In step S 606, the partial image acquisition unit 1411 determines whether the processes in steps S 603 to S 605 have been performed for all partial images that can be acquired from the range indicated by the search range information acquired in step S 601 in the first image. If the partial image acquisition unit 1411 determines that it has been performed, the process proceeds to S 607. If it is determined that the partial image acquisition unit 1411 has not performed the process, the process proceeds to S602, and a partial image that has not yet been acquired is acquired.
In step S607, the parameter acquisition unit 1414 determines whether the processing in steps S601 to S606 has been performed for all recognition models. If it is determined that the parameter acquisition unit 1414 has performed, the process of FIG. 6 ends, and the process proceeds to S403. If it is determined that the parameter acquisition unit 1414 has not performed the parameter acquisition unit 1414, the process proceeds to S601, and an identification parameter or the like is acquired from a recognition model that has not been used yet.

Ｓ４０３において、位置推定部１６０は、位置関係算出部１５０により算出された座標変換パラメータを用いて、Ｓ４０２で検出部１４１により検出された人物に対応する第２の画像における位置座標を推定する。まず、位置推定部１６０は、検出部１４１により出力された人物である部分画像の矩形を表す４つの頂点の第１の画像における位置座標に対して、座標変換パラメータを適用して第２の画像中のそれぞれのエピポーラ線を推定する。そして、位置推定部１６０は、推定したエピポーラ線に沿って相関法による対応付けを行ってそれぞれ対応する第２の画像の位置座標を算出する。
位置推定部１６０は、例えば、非特許文献４に開示されている座標変換パラメータを用いたエピポーラ線の推定及び相関法による対応付けの方法を行う。但し、カメラ１１１とカメラ１１２間で人物の見えの違いによっては、相関法による対応付けが困難になる。このような場合は、画像取得部１３１、１３２は、第１の画像、第２の画像の取得と同期して距離画像取得を行う。そして、位置推定部１６０は、座標変換パラメータと、取得された距離画像と、に基づいて、第２の画像の人物領域の位置座標を推定する。距離画像を取得するための手段としては、例えば、ＴＯＦ（ＴｉｍｅｏｆＦｌｉｇｈｔ）方式、パターン投光方式の距離センサー等がある。 In S403, the position estimation unit 160 estimates the position coordinates in the second image corresponding to the person detected by the detection unit 141 in S402 using the coordinate conversion parameters calculated by the positional relationship calculation unit 150. First, the position estimation unit 160 applies the coordinate transformation parameter to the position coordinates in the first image of the four vertices representing the rectangle of the partial image that is the person output by the detection unit 141, and applies the second image. Estimate each epipolar line inside. Then, the position estimation unit 160 performs correlation by the correlation method along the estimated epipolar line and calculates the position coordinates of the corresponding second images.
The position estimation unit 160 performs, for example, an epipolar line estimation using a coordinate transformation parameter disclosed in Non-Patent Document 4 and a correlation method using a correlation method. However, depending on the difference in the appearance of the person between the camera 111 and the camera 112, the association by the correlation method becomes difficult. In such a case, the image acquisition units 131 and 132 perform distance image acquisition in synchronization with the acquisition of the first image and the second image. And the position estimation part 160 estimates the position coordinate of the person area | region of a 2nd image based on a coordinate transformation parameter and the acquired distance image. As means for acquiring the distance image, for example, there are a distance sensor of a TOF (Time of Flight) method, a pattern light projection method, and the like.

ここで、位置関係算出部１５０が座標変換パラメータを事前に算出する方法の一例について説明する。
まず、画像取得部１３１は、カメラ１１１により撮影された形状が既知のキャリブレーションパターンの画像を取得する。キャリブレーションパターンは、例えば図８に示すような点が格子状に並んで描かれているボードであり、各点の位置関係は、既知である。図８のキャリブレーションパターンを撮影シーン中に配置し、カメラ１１１で撮影した画像を画像取得部１３１が取得する。撮影情報取得部１２１は、画像取得部１３１により取得された画像からキャリブレーションパターンの各点の画像中における位置座標を抽出する。撮影情報取得部１２１は、例えば、取得された画像を出力装置１４に表示し、入力装置１３を介したユーザによる各点のポインティングを受け付けることで位置座標を取得する。位置座標は、各点の画像中の重心位置である。また、撮影情報取得部１２１は、各点の位置座標を図８に示す点Ｏからの相対座標として表す。そして、撮影情報取得部１２１は、カメラ１１１により撮影された画像の位置座標と撮影するシーンの三次元空間座標とを対応付ける撮影情報を取得する。 Here, an example of a method in which the positional relationship calculation unit 150 calculates the coordinate conversion parameters in advance will be described.
First, the image acquisition unit 131 acquires an image of a calibration pattern whose shape captured by the camera 111 is known. The calibration pattern is, for example, a board on which dots as shown in FIG. 8 are drawn in a grid pattern, and the positional relationship between the points is known. The calibration pattern of FIG. 8 is arranged in the shooting scene, and the image acquisition unit 131 acquires an image captured by the camera 111. The imaging information acquisition unit 121 extracts position coordinates in the image of each point of the calibration pattern from the image acquired by the image acquisition unit 131. For example, the imaging information acquisition unit 121 displays the acquired image on the output device 14 and acquires the position coordinates by receiving the pointing of each point by the user via the input device 13. The position coordinate is the position of the center of gravity in the image of each point. Further, the imaging information acquisition unit 121 represents the position coordinates of each point as relative coordinates from the point O shown in FIG. The shooting information acquisition unit 121 acquires shooting information that associates the position coordinates of the image shot by the camera 111 with the three-dimensional space coordinates of the scene to be shot.

キャリブレーションパターンをそのまま配置し、撮影情報取得部１２２は、撮影情報取得部１２１と同様に、カメラ１１２により撮影されたキャリブレーションパターンの画像に基づいて、以下の処理を行う。即ち、撮影情報取得部１２２は、カメラ１１２により撮影された画像の位置座標と撮影するシーンの三次元空間座標とを対応付ける撮影情報を取得する。
次に、位置関係算出部１５０は、撮影情報取得部１２１及び１２２によりそれぞれ取得された撮影情報から第１の画像における座標系と第２の画像における座標系とを対応付ける座標変換パラメータを算出する。位置関係算出部１５０は、例えば、非特許文献４に開示されている両眼視のカメラ校正の方法を用いて、座標変換パラメータを算出する。なお、本実施形態では、位置関係算出部１５０は、三次元空間座標と画像中の位置座標が線形関係にあると仮定して座標系の変換を行うようにしたが、カメラの光学的幾何歪みが大きい場合には歪曲を考慮して撮影情報及び位置関係を算出してもよい。 The calibration pattern is arranged as it is, and the imaging information acquisition unit 122 performs the following processing based on the image of the calibration pattern captured by the camera 112, as in the imaging information acquisition unit 121. That is, the shooting information acquisition unit 122 acquires shooting information that associates the position coordinates of the image shot by the camera 112 with the three-dimensional space coordinates of the scene to be shot.
Next, the positional relationship calculation unit 150 calculates a coordinate conversion parameter that associates the coordinate system in the first image with the coordinate system in the second image from the shooting information acquired by the shooting information acquisition units 121 and 122, respectively. The positional relationship calculation unit 150 calculates coordinate conversion parameters using, for example, the binocular camera calibration method disclosed in Non-Patent Document 4. In the present embodiment, the positional relationship calculation unit 150 performs coordinate system conversion on the assumption that the three-dimensional spatial coordinates and the positional coordinates in the image have a linear relationship. When is large, the photographing information and the positional relationship may be calculated in consideration of distortion.

Ｓ４０４において、選択部１９０は、Ｓ４０３で位置推定部１６０により推定された位置座標に基づいて、学習部１８０により学習される認識モデルを選択する。ここで選択される認識モデルは、検出部１４２が管理する複数の認識モデルのうちの一つである。なお、本実施形態では検出部１４２の詳細は、図５に示される検出部１４１の詳細と同様である。検出部１４２は、認識モデルとして、正面モデル、側面モデル、平面モデルの３つの認識モデルを管理する。
以下、本実施形態における認識モデルの選択方法について図９を用いて説明する。図９の中の点Ａ、Ｂ、Ｃ、Ｄは、検出部１４１により検出された第１の画像中の人物の領域の頂点である。また、点Ａ'、Ｂ'、Ｃ'、Ｄ'は、それぞれ位置推定部１６０により推定された点Ａ、Ｂ、Ｃ、Ｄに対応する第２の画像中の頂点である。本実施形態では、選択部１９０は、点Ａ'、Ｂ'、Ｃ'、Ｄ'で特定される領域のアスペクト比に基づいて、人物がどの方向から撮影されたかを推定し、推定した方向に基づいて、認識モデルの選択を行う。画像における人物等のオブジェクトが撮影された方向を、オブジェクトの撮影方向とする。
画像における人物が占める領域は、人物が正面から撮影された場合、上方から撮影された場合に比べて、縦長な領域となる。また、画像における人物が占める領域は、横から撮影された場合、正面から撮影された場合に比べて、横幅に比べて縦幅がより長い領域となる。このように、画像における人物が撮影された方向は、その画像においてその人物が占める領域のアスペクト比と相関が認められる。 In S404, the selection unit 190 selects a recognition model learned by the learning unit 180 based on the position coordinates estimated by the position estimation unit 160 in S403. The recognition model selected here is one of a plurality of recognition models managed by the detection unit 142. In the present embodiment, details of the detection unit 142 are the same as the details of the detection unit 141 shown in FIG. The detection unit 142 manages three recognition models, that is, a front model, a side model, and a planar model, as recognition models.
Hereinafter, a recognition model selection method according to the present embodiment will be described with reference to FIG. Points A, B, C, and D in FIG. 9 are vertices of a person area in the first image detected by the detection unit 141. Points A ′, B ′, C ′, and D ′ are vertices in the second image corresponding to the points A, B, C, and D estimated by the position estimation unit 160, respectively. In the present embodiment, the selection unit 190 estimates from which direction the person was photographed based on the aspect ratio of the area specified by the points A ′, B ′, C ′, and D ′, and in the estimated direction. Based on this, a recognition model is selected. The direction in which an object such as a person in the image is photographed is defined as the object photographing direction.
The area occupied by the person in the image is a vertically long area when the person is photographed from the front as compared to when photographed from above. In addition, the area occupied by the person in the image is an area having a longer vertical width than the horizontal width when shooting from the side, compared to when shooting from the front. Thus, the direction in which the person in the image is photographed is correlated with the aspect ratio of the area occupied by the person in the image.

点Ａ'、Ｂ'、Ｃ'、Ｄ'の第２の画像における位置座標を、それぞれ（ｕＡ，ｖＡ）、（ｕＢ，ｖＢ）、（ｕＣ，ｖＣ）、（ｕＤ，ｖＤ）とする。本実施形態では、選択部１９０は、以下の式２を用いて四角形Ａ'Ｂ'Ｃ'Ｄ'の擬似的なアスペクト比Ｒを算出する。
Ｒ＝｜（（ｖ＿Ａ＋ｖ＿Ｂ）−（ｖ＿Ｃ＋ｖ＿Ｄ））／（（ｕ＿Ｂ＋ｕ＿Ｄ）−（ｕ＿Ａ＋ｕ＿Ｃ））｜（式２）
式２中の｜｜は、絶対値を表す演算子である。そして、選択部１９０は、算出したＲに対応する認識モデルを選択する。本実施形態では、記憶装置１２は、予め、各認識モデルに対応するＲの範囲を示す情報を記憶している。選択部１９０は、記憶装置１２に記憶されている各認識モデルに対応するＲの範囲を示す情報に基づいて、算出したＲの値がどの認識モデルに対応するかを決定する。各認識モデルに対応するＲの範囲を示す情報は、例えば、Ｒの値が設定された第１の閾値未満であれば、平面モデルに対応することを示す。また、各認識モデルに対応するＲの範囲を示す情報は、Ｒの値が設定された第１の閾値以上であり第１の閾値よりも大きい設定された第２の閾値未満であれば、正面モデルに対応することを示す。また、各認識モデルに対応するＲの範囲を示す情報は、例えば、Ｒの値が設定された第２の閾値以上であれば、側面モデルに対応することを示す。
図９の例では、検出部１４１は、正面モデルに基づいて、人物を検出した。この人物領域が図９の四角形Ａ'Ｂ'Ｃ'Ｄ'のように対応する場合、カメラ１１２が人物を上方から撮影したと推測される。なぜなら、四角形Ａ'Ｂ'Ｃ'Ｄ'から求まるアスペクト比Ｒが、平面モデルのアスペクト比に適合するためである。 The position coordinates of the points A ′, B ′, C ′, and D ′ in the second image are (uA, vA), (uB, vB), (uC, vC), and (uD, vD), respectively. In the present embodiment, the selection unit 190 calculates a pseudo aspect ratio R of the quadrangle A′B′C′D ′ using Equation 2 below.
R = | ((v_A + v_B) − (v_C + v_D)) / ((u_B + u_D) − (u_A + u_C)) | (Formula 2)
|| in Expression 2 is an operator representing an absolute value. Then, the selection unit 190 selects a recognition model corresponding to the calculated R. In the present embodiment, the storage device 12 stores in advance information indicating the range of R corresponding to each recognition model. The selection unit 190 determines which recognition model the calculated R value corresponds to based on information indicating the range of R corresponding to each recognition model stored in the storage device 12. The information indicating the range of R corresponding to each recognition model indicates that, for example, if the value of R is less than the set first threshold, it corresponds to a planar model. Further, if the information indicating the R range corresponding to each recognition model is less than the second threshold that is greater than the first threshold and greater than the first threshold, Indicates that the model is supported. Also, the information indicating the range of R corresponding to each recognition model indicates that it corresponds to the side model if the value of R is equal to or greater than the set second threshold.
In the example of FIG. 9, the detection unit 141 detects a person based on the front model. When the person area corresponds to a rectangle A′B′C′D ′ in FIG. 9, it is estimated that the camera 112 has photographed the person from above. This is because the aspect ratio R obtained from the quadrangle A′B′C′D ′ matches the aspect ratio of the planar model.

選択部１９０は、算出したＲの値に対応する認識モデルを選択し、決定した認識モデルを、学習部１８０により学習される認識モデルとして決定する。
本実施形態では、選択部１９０は、第２の画像における変換後の人物領域のアスペクト比に基づいて、認識モデルを選択するようにした。しかし、例えば、選択部１９０は、検出部１４１により検出された人物領域と、検出部１４１による検出に用いられた認識モデルと、位置関係算出部１５０により算出された座標変換パラメータと、に基づいて、以下の処理を行うこととしてもよい。即ち、選択部１９０は、検出部１４１による検出に用いられた認識モデルに基づいて、人物が第１の画像においてどの方向から撮影されたかを推定する。そして、選択部１９０は、検出部１４１により人物が検出された位置と位置関係算出部１５０により算出された座標変換パラメータとに基づいて、第１の画像における人物が第２の画像において、向きがどのように変化するかを特定する。そして、選択部１９０は、特定した第１の画像において人物が撮影された方向と、第１の画像における人物の第２の画像における向きの変化と、に基づいて、直接人物が第２の画像において、どのような方向から撮影されたかを推定する。そして、選択部１９０は、推定した方向に対応する認識モデルを選択し、選択した認識モデルを、学習部１８０により学習される認識モデルとして決定してもよい。 The selection unit 190 selects a recognition model corresponding to the calculated value of R, and determines the determined recognition model as a recognition model learned by the learning unit 180.
In the present embodiment, the selection unit 190 selects the recognition model based on the aspect ratio of the person area after conversion in the second image. However, for example, the selection unit 190 is based on the person area detected by the detection unit 141, the recognition model used for detection by the detection unit 141, and the coordinate conversion parameter calculated by the positional relationship calculation unit 150. The following processing may be performed. That is, the selection unit 190 estimates from which direction the person was photographed in the first image, based on the recognition model used for detection by the detection unit 141. Based on the position where the person is detected by the detection unit 141 and the coordinate conversion parameter calculated by the positional relationship calculation unit 150, the selection unit 190 determines whether the person in the first image has the orientation in the second image. Identify how it will change. Then, the selection unit 190 directly selects the second image based on the direction in which the person was captured in the specified first image and the change in the orientation of the person in the second image in the first image. The direction from which the image was taken is estimated. Then, the selection unit 190 may select a recognition model corresponding to the estimated direction, and may determine the selected recognition model as a recognition model learned by the learning unit 180.

Ｓ４０５において、抽出部１７０は、Ｓ４０１で画像取得部１３２により取得された第２の画像から、Ｓ４０３で位置推定部１６０により推定された頂点座標に基づいて部分画像を抽出する。この処理で抽出される画像は、Ｓ４０２で第１の画像中から人物領域として検出された部分画像に対応する第２の画像中の部分画像であり、人物が写っている可能性のある領域の画像である。なお、抽出される部分画像の縦横のアスペクト比は、Ｓ４０４で選択された認識モデルに対応する値となる。 In S405, the extraction unit 170 extracts a partial image from the second image acquired by the image acquisition unit 132 in S401 based on the vertex coordinates estimated by the position estimation unit 160 in S403. The image extracted by this process is a partial image in the second image corresponding to the partial image detected as a person area from the first image in S402, and is an area in which a person may be captured. It is an image. It should be noted that the aspect ratio of the extracted partial image is a value corresponding to the recognition model selected in S404.

Ｓ４０６において、学習部１８０は、Ｓ４０５で抽出部１７０により抽出された部分画像を用いて、Ｓ４０４で選択部１９０により学習部１８０により学習される認識モデルとして決定された認識モデルを学習する。本実施形態では、学習部１８０は、抽出部１７０により抽出された部分画像から特徴抽出部１４１２と同様の処理を行い、勾配方向ヒストグラム特徴を抽出して、抽出した勾配方向ヒストグラム特徴を追加学習サンプルとする。そして、学習部１８０は、選択部１９０により選択された認識モデルに対して追加学習を行う。学習部１８０は、例えば非特許文献５に開示されているサポートベクターマシンの追加型学習法により、追加学習を行う。Ｓ４０６で適用される学習処理は、検出部１４２により行われる識別処理に従う。例えば、検出部１４２による識別処理がアダブースト識別器を用いる識別処理の場合、学習部１８０は、非特許文献６に開示されているオンラインブースティングを用いて学習する。このように本実施形態においては、学習部１８０は、カメラ１１２により撮影された画像から抽出された部分画像に対応する認識モデルを選択して追加学習を行うようにした。したがって、学習部１８０は、人物が撮影される方向に応じて見えの類似したサンプルで対応する認識モデルの追加学習を行うので、複数の認識モデルで構成される認識モデルの長所を損なうことなく、識別処理の精度を向上できる。 In S406, the learning unit 180 learns the recognition model determined as the recognition model learned by the learning unit 180 by the selection unit 190 in S404 using the partial image extracted by the extraction unit 170 in S405. In the present embodiment, the learning unit 180 performs the same processing as the feature extraction unit 1412 from the partial image extracted by the extraction unit 170, extracts the gradient direction histogram feature, and adds the extracted gradient direction histogram feature to the additional learning sample. And Then, the learning unit 180 performs additional learning on the recognition model selected by the selection unit 190. The learning unit 180 performs additional learning using an additional learning method of a support vector machine disclosed in Non-Patent Document 5, for example. The learning process applied in S406 follows the identification process performed by the detection unit 142. For example, when the identification process by the detection unit 142 is an identification process using an Adaboost classifier, the learning unit 180 learns using online boosting disclosed in Non-Patent Document 6. As described above, in the present embodiment, the learning unit 180 selects the recognition model corresponding to the partial image extracted from the image captured by the camera 112 and performs additional learning. Therefore, the learning unit 180 performs additional learning of the corresponding recognition model with a sample that looks similar according to the direction in which the person is photographed, so that the advantage of the recognition model composed of a plurality of recognition models is not impaired. The accuracy of the identification process can be improved.

Ｓ４０７において、検出部１４２は、第２の画像から人物を検出する。検出部１４２の詳細は、図５に示す検出部１４１の詳細と同様である。また、検出部１４２の処理は、図６に示す検出部１４１の処理と同様である。但し、Ｓ４０７の処理では、検出部１４２は、Ｓ４０６で追加学習された認識モデルを用いる。したがって、本実施形態では、検出部１４２は、検出部１４１により検出された人物領域に対応する第２の画像中の人物画像の影響を加味して、カメラ１１２により撮影されたシーンに適応するように人物検出処理を行うので精度向上が期待できる。 In S407, the detection unit 142 detects a person from the second image. Details of the detection unit 142 are the same as the details of the detection unit 141 shown in FIG. The processing of the detection unit 142 is the same as the processing of the detection unit 141 shown in FIG. However, in the process of S407, the detection unit 142 uses the recognition model additionally learned in S406. Therefore, in the present embodiment, the detection unit 142 is adapted to the scene photographed by the camera 112 in consideration of the influence of the human image in the second image corresponding to the human region detected by the detection unit 141. In addition, since human detection processing is performed, an improvement in accuracy can be expected.

Ｓ４０８において、統合部２００は、検出部１４１及び検出部１４２による検出処理の結果を統合する。Ｓ４０８で統合される検出処理の結果は、検出部１４１により第１の画像から検出され、位置推定部１６０により推定された第２の画像中の人物の領域と、検出部１４２により第２の画像から検出された人物領域と、である。統合部２００は、人物領域が検出されなかった場合や１つしか検出されなかった場合には、処理を行わず、そのまま結果を出力する。統合部２００は、人物領域として検出した結果が複数ある場合には、非特許文献１に開示されている重複した領域を１つにまとめる処理（Ｎｏｎ−ＭａｘｉｍｕｍＳｕｐｐｒｅｓｓｉｏｎ）を適用することで、検出した人物領域を統合する。
統合部２００は、例えば、情報処理装置１０の出力装置１４に統合処理の結果を示す情報を表示することで出力する。また、統合部２００は、例えば、記憶装置１２に統合処理の結果を示す情報を記憶することで出力する。また、統合部２００は、例えば、設定された送信先に統合処理の結果を示す情報を送信することで出力する。 In S 408, the integration unit 200 integrates the detection processing results by the detection unit 141 and the detection unit 142. The result of the detection process integrated in S408 is detected from the first image by the detection unit 141, and the person region in the second image estimated by the position estimation unit 160, and the second image by the detection unit 142. And the person area detected from When the person area is not detected or only one is detected, the integration unit 200 outputs the result as it is without performing any processing. When there are a plurality of detection results as a person region, the integration unit 200 detects the result by applying a process (Non-Maximum Suppression) that combines the overlapping regions disclosed in Non-Patent Document 1 into one. Integrate people areas.
For example, the integration unit 200 outputs the information by displaying information indicating the result of the integration process on the output device 14 of the information processing apparatus 10. For example, the integration unit 200 outputs information by storing information indicating the result of the integration process in the storage device 12. For example, the integration unit 200 outputs the information by transmitting information indicating the result of the integration process to the set transmission destination.

Ｓ４０９において、統合部２００は、カメラ１１１及びカメラ１１２による撮影が終了したか否かを判定する。統合部２００は、カメラ１１１及びカメラ１１２による撮影が終了したと判定した場合、図４の処理を終了する。統合部２００は、カメラ１１１及びカメラ１１２による撮影が終了していないと判定した場合、Ｓ４０１の処理に進む。
以上の処理により、情報処理装置１０は、Ｓ４０１〜Ｓ４０８の処理を、カメラ１１１及びカメラ１１２により画像が撮影される度に繰り返し行う。 In step S409, the integration unit 200 determines whether shooting with the camera 111 and the camera 112 has ended. If the integration unit 200 determines that shooting by the camera 111 and the camera 112 has ended, the integration unit 200 ends the processing of FIG. If the integration unit 200 determines that shooting by the camera 111 and the camera 112 has not ended, the integration unit 200 proceeds to the process of S401.
With the above processing, the information processing apparatus 10 repeats the processing of S401 to S408 every time an image is captured by the camera 111 and the camera 112.

本実施形態では、情報処理装置１０は、カメラ１１１により撮影された画像についての人物の検出結果を用いてカメラ１１２により撮影された画像について用いられる認識モデルを学習し、検出精度を向上させ、それらの検出結果を統合することとした。情報処理装置１０は、カメラ１１２により撮影された画像についての人物検出の結果を用いて、カメラ１１１により撮影された画像について用いられる認識モデルを学習することとしてもよい。また、情報処理装置１０は、カメラ１１１、カメラ１１２により撮影された画像についての人物の検出結果を相互に利用して、それぞれのカメラについて用いられる認識モデルの学習を行うこととしてもよい。
また、本実施形態では、認識モデルは、異なる方向から撮影された人物の画像毎の認識モデルを含むこととした。認識モデルが、例えば、人物の高解像度画像と低解像度画像別の認識モデルを含む構成の場合や、人物が他の物体によって隠れる隠れ領域別に認識モデルを含む構成の場合でも、情報処理装置１０は、本実施形態の処理を適用できる。情報処理装置１０は、本実施形態の処理を行い、認識モデルを学習する際に追加学習の学習サンプルに対して適切な認識モデルを選択するようにすればよい。 In the present embodiment, the information processing apparatus 10 learns a recognition model used for an image photographed by the camera 112 using a person detection result for an image photographed by the camera 111, improves detection accuracy, and It was decided to integrate the detection results. The information processing apparatus 10 may learn a recognition model used for an image photographed by the camera 111 using a result of person detection for the image photographed by the camera 112. In addition, the information processing apparatus 10 may learn the recognition model used for each camera by using the human detection results of the images captured by the cameras 111 and 112 mutually.
In the present embodiment, the recognition model includes a recognition model for each person image taken from different directions. Even when the recognition model is configured to include a recognition model for each of a high-resolution image and a low-resolution image of a person or a configuration including a recognition model for each hidden area where a person is hidden by another object, the information processing apparatus 10 The processing of this embodiment can be applied. The information processing apparatus 10 may perform the processing of this embodiment and select an appropriate recognition model for the learning sample for additional learning when learning the recognition model.

以上、本実施形態では、情報処理装置１０は、カメラ１１１により撮影された画像についての人物の検出結果と、カメラ１１１の視野とカメラ１１２の視野との対応関係と、に基づいて、学習されるカメラ１１２により撮影された画像用の認識モデルを決定した。そして、情報処理装置１０は、カメラ１１１により撮影された画像についての人物の検出結果に対応するカメラ１１２により撮影された画像の部分画像を用いて、決定した認識モデルを学習することとした。このように、情報処理装置１０は、より適切にカメラ１１２により撮影された画像用の認識モデルを学習することができる。これにより、情報処理装置１０は、カメラ１１２により撮影された画像からの人物の検出処理の精度を向上させることができる。 As described above, in the present embodiment, the information processing apparatus 10 learns based on the detection result of the person for the image captured by the camera 111 and the correspondence between the visual field of the camera 111 and the visual field of the camera 112. A recognition model for an image taken by the camera 112 was determined. Then, the information processing apparatus 10 learns the determined recognition model using the partial image of the image captured by the camera 112 corresponding to the detection result of the person regarding the image captured by the camera 111. In this way, the information processing apparatus 10 can learn a recognition model for an image captured by the camera 112 more appropriately. Thereby, the information processing apparatus 10 can improve the accuracy of the person detection process from the image photographed by the camera 112.

＜実施形態２＞
実施形態１では、カメラ１１１の視野とカメラ１１２の視野とは、同じオブジェクトを撮影可能なように予め重複しているものとした。しかし、カメラ１１１が設置されている状況に、新たにカメラ１１２が設置されるような場合、カメラ１１２の視野が、カメラ１１１の視野と同じオブジェクトを撮影可能なように重複するとは限らない。
そこで、本実施形態の情報処理装置１０は、カメラ１１１とカメラ１１２との視野を適切に重複させるように、カメラ１１１の視野を制御する処理を行う。
本実施形態の情報処理装置１０のハードウェア構成は、実施形態１と同様である。本実施形態では、ＣＰＵ１１が、記憶装置１２に格納されたプログラムの手順に従って処理を行うことによって、図１０で後述する情報処理装置１０の機能及び図１１、１２、１４で後述するフローチャートに係る処理等が実現される。 <Embodiment 2>
In the first embodiment, the visual field of the camera 111 and the visual field of the camera 112 are overlapped in advance so that the same object can be photographed. However, when a new camera 112 is installed in a situation where the camera 111 is installed, the field of view of the camera 112 does not necessarily overlap so that the same object as the field of view of the camera 111 can be photographed.
Therefore, the information processing apparatus 10 according to the present embodiment performs processing for controlling the visual field of the camera 111 so that the visual fields of the camera 111 and the camera 112 are appropriately overlapped.
The hardware configuration of the information processing apparatus 10 of this embodiment is the same as that of the first embodiment. In the present embodiment, the CPU 11 performs processing according to the procedure of the program stored in the storage device 12, whereby the functions of the information processing device 10 described later in FIG. 10 and the processing related to flowcharts described later in FIGS. Etc. are realized.

図１０は、本実施形態の情報処理装置１０の機能構成の一例を示す図である。図１０に示される本実施形態の情報処理装置１０の機能構成は、図２に示される実施形態１の機能構成と比べて、ＰＴＺ制御部３００を含む点で異なる。
ＰＴＺ制御部３００は、カメラ１１１の駆動系を制御する制御部である。図１０の機能構成要素のうち、図２と共通するものは、図２と同様である。 FIG. 10 is a diagram illustrating an example of a functional configuration of the information processing apparatus 10 according to the present embodiment. The functional configuration of the information processing apparatus 10 according to the present embodiment illustrated in FIG. 10 is different from the functional configuration according to the first embodiment illustrated in FIG. 2 in that the PTZ control unit 300 is included.
The PTZ control unit 300 is a control unit that controls the drive system of the camera 111. Among the functional components in FIG. 10, those common to FIG. 2 are the same as those in FIG. 2.

以下、本実施形態における情報処理装置１０の処理について説明する。図１１は、本実施形態の情報処理装置１０の処理の一例を示すフローチャートである。以下では、カメラ１１１が設置されている状況に、新規にカメラ１１２が設置される場合において、カメラ１１２の検出部１４２を撮影するシーンに最適化させ、カメラ１１１の検出結果と統合させる処理について説明する。
Ｓ１１０１において、ＰＴＺ制御部３００は、カメラ１１１の視野と、カメラ１１２の視野と、が重複するように、カメラ１１１のパン、チルト、ズーム駆動系を制御することで、カメラ１１１の視野を制御する。
Ｓ１１０２において、位置関係算出部１５０は、カメラ１１１及びカメラ１１２の位置関係を表す座標変換パラメータを算出する。Ｓ１１０１〜Ｓ１１０２の詳細については、図１３等で後述する。以降、Ｓ４０１〜Ｓ４０９の処理において、情報処理装置１０は、Ｓ１１０２で視野を制御されたカメラ１１１により撮影された画像に基づいて処理を行うことになる。Ｓ４０１〜Ｓ４０９の処理は、実施形態１と同様である。 Hereinafter, processing of the information processing apparatus 10 in the present embodiment will be described. FIG. 11 is a flowchart illustrating an example of processing of the information processing apparatus 10 according to the present embodiment. In the following, a description will be given of a process of optimizing the detection unit 142 of the camera 112 for a scene to be photographed and integrating the detection result of the camera 111 when the camera 112 is newly installed in a situation where the camera 111 is installed. To do.
In step S1101, the PTZ control unit 300 controls the field of view of the camera 111 by controlling the pan, tilt, and zoom drive systems of the camera 111 so that the field of view of the camera 111 and the field of view of the camera 112 overlap. .
In step S 1102, the positional relationship calculation unit 150 calculates a coordinate conversion parameter that represents the positional relationship between the camera 111 and the camera 112. Details of S1101 to S1102 will be described later with reference to FIG. Thereafter, in the processing of S401 to S409, the information processing apparatus 10 performs processing based on the image captured by the camera 111 whose field of view is controlled in S1102. The processing of S401 to S409 is the same as that of the first embodiment.

ＰＴＺ制御部３００の詳細を図１２に示す。
ＰＴＺ制御部３００は、対応点抽出部３１０、重複領域評価部３２０、制御信号生成部３３０を含む。対応点抽出部３１０は、カメラ１１１により撮影された画像と、カメラ１１２により撮影された画像と、の間で対応する点を抽出する。重複領域評価部３２０は、対応点抽出部３１０により抽出された点に基づいて、カメラ１１１により撮影された画像と、カメラ１１２により撮影された画像と、の間で重複する領域の大きさを評価する。制御信号生成部３３０は、重複領域評価部３２０による評価に基づいて、カメラ１１１の視野を制御するための制御信号を生成する。
ＰＴＺ制御部３００が行うＰＴＺ制御処理の詳細を図１３に示す。以下、図１３を参照して、Ｓ１１０１のＰＴＺ制御処理の詳細を説明する。
Ｓ１３０１において、対応点抽出部３１０は、カメラ１１２により撮影された画像を画像取得部１３２から取得する。 Details of the PTZ control unit 300 are shown in FIG.
The PTZ control unit 300 includes a corresponding point extraction unit 310, an overlapping region evaluation unit 320, and a control signal generation unit 330. The corresponding point extraction unit 310 extracts corresponding points between the image captured by the camera 111 and the image captured by the camera 112. The overlapping area evaluation unit 320 evaluates the size of the overlapping area between the image captured by the camera 111 and the image captured by the camera 112 based on the points extracted by the corresponding point extraction unit 310. To do. The control signal generation unit 330 generates a control signal for controlling the visual field of the camera 111 based on the evaluation by the overlapping region evaluation unit 320.
The details of the PTZ control process performed by the PTZ control unit 300 are shown in FIG. Hereinafter, the details of the PTZ control process of S1101 will be described with reference to FIG.
In step S 1301, the corresponding point extraction unit 310 acquires an image captured by the camera 112 from the image acquisition unit 132.

Ｓ１３０２において、対応点抽出部３１０は、Ｓ１３０１で取得された画像から局所特徴量を抽出する。局所特徴量とは、画像中の局所領域に着目して抽出されるエッジ等の他の部分と区別のつく特徴である。本実施形態では、対応点抽出部３１０は、Ｓ１３０１で取得された画像からＳＩＦＴ特徴量を抽出する。ＳＩＦＴ特徴量は、画像中の輝度分布が極値を取る位置の近傍領域から方向別に輝度勾配を求めてヒストグラム化した特徴量で、画像シフト、変倍、回転に対する不変性に優れている特徴量である。対応点抽出部３１０は、Ｓ１３０２で抽出した局所特徴量をＰＴＺ制御処理の間、一時的に記憶装置１２に記憶しておく。
Ｓ１３０３において、対応点抽出部３１０は、カメラ１１１により撮影された画像を画像取得部１３１から取得する。以下では、Ｓ１３０３で取得されたカメラ１１１により撮影された画像を、第３の画像とする。また、以下では、Ｓ１３０１で取得されたカメラ１１２により撮影された画像を、第４の画像とする。
Ｓ１３０４において、対応点抽出部３１０は、Ｓ１３０２と同様の処理で、Ｓ１３０３で取得された画像から局所特徴量を抽出する。 In S1302, the corresponding point extraction unit 310 extracts a local feature amount from the image acquired in S1301. The local feature amount is a feature that can be distinguished from other parts such as an edge extracted by paying attention to a local region in the image. In the present embodiment, the corresponding point extraction unit 310 extracts SIFT feature values from the image acquired in S1301. The SIFT feature value is a feature value obtained by calculating a brightness gradient for each direction from a region near the position where the brightness distribution in the image takes an extreme value and forming a histogram, and is excellent in invariance to image shift, scaling, and rotation. It is. The corresponding point extraction unit 310 temporarily stores the local feature amount extracted in S1302 in the storage device 12 during the PTZ control process.
In step S 1303, the corresponding point extraction unit 310 acquires an image captured by the camera 111 from the image acquisition unit 131. Hereinafter, the image captured by the camera 111 acquired in S1303 is referred to as a third image. Hereinafter, the image captured by the camera 112 acquired in S1301 is referred to as a fourth image.
In S1304, the corresponding point extraction unit 310 extracts a local feature amount from the image acquired in S1303 by the same processing as in S1302.

Ｓ１３０５において、対応点抽出部３１０は、Ｓ１３０２及びＳ１３０４で抽出された局所特徴量に基づいて、第３の画像と第４の画像とから、対応する点を抽出する。対応点抽出部３１０は、例えば、第３の画像中のある点に対応する局所特徴量と第４の画像中のある点に対応する局所特徴量との類似度を算出し、類似度が所定の閾値以上である場合、それらの点を対応付ける。対応付けられた点を、対応点とする。対応点抽出部３１０は、局所特徴量間の類似度として、例えば、ベクトル同士の内積値やヒストグラムインターセクションを用いる。また、対応点抽出部３１０は、抽出した対応点の組に対してＲＡＮＳＡＣアルゴリズムによって信頼度の低い対応点を削除することができる。
以上の対応点抽出部３１０の処理の詳細は、非特許文献７に開示されている。 In S1305, the corresponding point extraction unit 310 extracts corresponding points from the third image and the fourth image based on the local feature amount extracted in S1302 and S1304. For example, the corresponding point extraction unit 310 calculates a similarity between a local feature corresponding to a certain point in the third image and a local feature corresponding to a certain point in the fourth image, and the similarity is predetermined. If it is equal to or greater than the threshold value, the points are associated. The associated point is set as a corresponding point. The corresponding point extraction unit 310 uses, for example, an inner product value between vectors or a histogram intersection as the similarity between local feature amounts. In addition, the corresponding point extraction unit 310 can delete the corresponding points with low reliability by the RANSAC algorithm with respect to the extracted set of corresponding points.
Details of the processing of the corresponding point extraction unit 310 described above are disclosed in Non-Patent Document 7.

Ｓ１３０６において、重複領域評価部３２０は、Ｓ１３０５で抽出された第３の画像と第４の画像との間の対応点の組から、第３の画像と第４の画像との間で重複する領域の大きさを評価する。以下、重複領域の評価方法について説明する。
ある人物がカメラ１１１、カメラ１１２により撮影されたとする。その場合の第３の画像、及び第４の画像の一例を、それぞれ図１４（ａ）、（ｂ）に示す。図１４の例では、（Ｐ１１、Ｐ２１）、（Ｐ１２、Ｐ２２）、（Ｐ１３、Ｐ２３）、（Ｐ１４、Ｐ２４）、（Ｐ１５、Ｐ２５）が対応点の組である。
図１４（ｃ）のＲ１に示した領域は、第３の画像と第４の画像との重複領域の一例である。重複領域評価部３２０は、対応点の組に基づいて、例えば、以下のような処理で、第３の画像と第４の画像との重複領域Ｒ１を取得する。即ち、重複領域評価部３２０は、対応点の組から第４の画像内の位置座標から第３の画像内の位置座標への変換パラメータを算出する。そして、重複領域評価部３２０は、第４の画像の４隅の点の位置座標を第３の画像内の位置座標に変換し、変換された４つの点で囲まれた領域と第３の画像とが重複している領域を、第３の画像と第４の画像との重複領域として取得する。また、重複領域評価部３２０は、第３の画像内の対応点を全て含む設定された領域を、第３の画像と第４の画像との重複領域として取得してもよい。即ち、重複領域評価部３２０は、対応点の座標から第３の画像中のカメラ１１２により撮影可能な領域を推定し、その面積を求める。
そして、重複領域評価部３２０は、取得した第３の画像と第４の画像との重複領域の大きさが設定された閾値より大きい場合、カメラ１１１とカメラ１１２との視野が十分に重複しているとして図１３のＰＴＺ制御処理を終了する。また、重複領域評価部３２０は、取得した第３の画像と第４の画像との重複領域の大きさが設定された閾値以下である場合、カメラ１１１とカメラ１１２との視野が十分に重複していないとして、Ｓ１３０７の処理に進む。 In S1306, the overlapping area evaluation unit 320 overlaps the third image and the fourth image from the set of corresponding points between the third image and the fourth image extracted in S1305. Evaluate the size of. Hereinafter, a method for evaluating the overlapping area will be described.
Assume that a certain person is photographed by the camera 111 and the camera 112. An example of the third image and the fourth image in that case is shown in FIGS. 14A and 14B, respectively. In the example of FIG. 14, (P11, P21), (P12, P22), (P13, P23), (P14, P24), (P15, P25) are a set of corresponding points.
An area indicated by R1 in FIG. 14C is an example of an overlapping area between the third image and the fourth image. Based on the set of corresponding points, the overlapping area evaluation unit 320 acquires an overlapping area R1 between the third image and the fourth image, for example, by the following process. In other words, the overlapping area evaluation unit 320 calculates a conversion parameter from a set of corresponding points to a position coordinate in the fourth image from a position coordinate in the fourth image. Then, the overlapping region evaluation unit 320 converts the position coordinates of the four corner points of the fourth image into the position coordinates in the third image, and the region surrounded by the four converted points and the third image Is obtained as an overlapping area between the third image and the fourth image. In addition, the overlapping area evaluation unit 320 may acquire a set area that includes all corresponding points in the third image as an overlapping area between the third image and the fourth image. That is, the overlapping area evaluation unit 320 estimates an area that can be captured by the camera 112 in the third image from the coordinates of the corresponding points, and obtains the area thereof.
Then, when the overlap area size between the acquired third image and the fourth image is larger than the set threshold value, the overlap area evaluation unit 320 sufficiently overlaps the fields of view of the camera 111 and the camera 112. If so, the PTZ control process in FIG. 13 is terminated. In addition, when the size of the overlap region between the acquired third image and the fourth image is equal to or less than a set threshold, the overlap region evaluation unit 320 sufficiently overlaps the fields of view of the camera 111 and the camera 112. If not, the process proceeds to S1307.

Ｓ１３０７において、制御信号生成部３３０は、Ｓ１３０６で取得された重複領域に基づいて、カメラ１１１の視野とカメラ１１２の視野との重複が増大するように、カメラ１１１のＰＴＺ駆動量を求める。例えば、図１４（ｃ）のような状況であれば、制御信号生成部３３０は、カメラ１１１の視野を、領域Ｒ１が視野の中心になるようにカメラ１１１の撮像系のパン及びチルトの駆動量を求める。また、制御信号生成部３３０は、領域Ｒ１が視野全体となるようにカメラ１１１のズームの駆動量を求める。
Ｓ１３０８において、制御信号生成部３３０は、Ｓ１３０７で求められたＰＴＺ駆動量に応じて撮像系を駆動させる制御信号を、カメラ１１１に送出する。カメラ１１１は、創出されたＰＴＺ駆動量の制御信号に応じて、撮像系のパン、チルト、ズームの機構を駆動する。そして、制御信号生成部３３０は、Ｓ１３０１の処理に進む。なお、カメラ１１１の向きを制御している間に被写体が動くことがあるため、Ｓ１３０３ではなくＳ１３０１に戻るようにする。
以上、図１３の処理によって、情報処理装置１０は、カメラ１１１とカメラ１１２との視野が重複するようにカメラ１１１のパン、チルト、ズーム駆動系の制御を行うことで、カメラ１１１の視野を制御する。図１３の処理の後にカメラ１１１により撮影される画像は、例えば図１５に示すようなシーンであり、人物の全体像が撮像されており、図１４（ｂ）とシーンが重複していることが分かる。 In S 1307, the control signal generation unit 330 obtains the PTZ drive amount of the camera 111 based on the overlapping area acquired in S 1306 so that the overlap between the visual field of the camera 111 and the visual field of the camera 112 increases. For example, in the situation as shown in FIG. 14C, the control signal generation unit 330 drives the pan and tilt of the imaging system of the camera 111 so that the field of view of the camera 111 becomes the center of the field of view R1. Ask for. Further, the control signal generation unit 330 obtains the zoom driving amount of the camera 111 so that the region R1 covers the entire field of view.
In step S 1308, the control signal generation unit 330 sends a control signal for driving the imaging system to the camera 111 according to the PTZ drive amount obtained in step S 1307. The camera 111 drives the pan, tilt, and zoom mechanisms of the imaging system in accordance with the created PTZ drive amount control signal. Then, the control signal generation unit 330 proceeds to the process of S1301. Since the subject may move while controlling the orientation of the camera 111, the process returns to S1301 instead of S1303.
As described above, the information processing apparatus 10 controls the field of view of the camera 111 by controlling the pan, tilt, and zoom drive systems of the camera 111 so that the fields of view of the camera 111 and the camera 112 overlap by the processing of FIG. To do. An image photographed by the camera 111 after the processing of FIG. 13 is a scene as shown in FIG. 15, for example, and an entire image of a person is captured, and the scene overlaps with FIG. I understand.

Ｓ１１０２において、位置関係算出部１５０は、図１３の処理の後、実施形態１と同様の処理で、カメラ１１１により撮影された画像の座標系からカメラ１１２により撮影された画像の座標系への座標変換パラメータを求める。また、位置関係算出部１５０は、例えば、カメラ１１１により撮影された画像と、カメラ１１２により撮影された画像と、からＳ１３０５と同様の処理で、対応点を抽出する。そして、位置関係算出部１５０は、抽出した対応点同士の位置座標に基づいて、カメラ１１１により撮影された画像の位置座標からカメラ１１２により撮影された画像の位置座標への変換パラメータを算出することとしてもよい。 In step S1102, the positional relationship calculation unit 150 performs processing similar to that in the first embodiment after the processing illustrated in FIG. 13 and coordinates from the coordinate system of the image captured by the camera 111 to the coordinate system of the image captured by the camera 112. Obtain conversion parameters. Further, the positional relationship calculation unit 150 extracts corresponding points from the image captured by the camera 111 and the image captured by the camera 112, for example, by the same processing as in S1305. Then, the positional relationship calculation unit 150 calculates a conversion parameter from the position coordinates of the image photographed by the camera 111 to the position coordinates of the image photographed by the camera 112 based on the extracted position coordinates of the corresponding points. It is good.

以上、本実施形態の処理により、情報処理装置１０は、カメラ１１１とカメラ１１２との視野を適切に重複させることができる。例えば、カメラ１１１が既に設置されている状況にカメラ１１２を新規に設置する等の場合、設置されたカメラ１１２と既存のカメラ１１１との視野が重複するか否かが不明である。情報処理装置１０は、このような場合でも、カメラ１１１とカメラ１１２との視野を適切に重複させ、同じオブジェクトを撮影可能なようにすることができる。
本実施形態では、カメラ１１１をＰＴＺ制御して視野をカメラ１１２と重複するようにしたが、カメラ１１２の近傍に複数のカメラが既に設置されている場合、情報処理装置１０は、それらの中から最適なカメラを選択して本実施形態の処理を適用できる。また、情報処理装置１０は、複数のカメラを制御して視野を重複させ、追加学習を行うようにしてもよい。 As described above, the information processing apparatus 10 can appropriately overlap the visual fields of the camera 111 and the camera 112 by the processing of this embodiment. For example, when a camera 112 is newly installed in a situation where the camera 111 is already installed, it is unclear whether or not the fields of view of the installed camera 112 and the existing camera 111 overlap. Even in such a case, the information processing apparatus 10 can appropriately overlap the visual fields of the camera 111 and the camera 112 so that the same object can be photographed.
In the present embodiment, the camera 111 is controlled by PTZ so that the field of view overlaps with the camera 112. However, when a plurality of cameras are already installed in the vicinity of the camera 112, the information processing apparatus 10 The process of this embodiment can be applied by selecting an optimal camera. Further, the information processing apparatus 10 may perform additional learning by controlling a plurality of cameras to overlap the visual field.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読み出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。
以上、本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではない。
例えば、上述した情報処理装置１０の機能構成の一部又は全てをハードウェアとして情報処理装置１０に実装してもよい。 <Other embodiments>
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.
As mentioned above, although preferable embodiment of this invention was explained in full detail, this invention is not limited to the specific embodiment which concerns.
For example, a part or all of the functional configuration of the information processing apparatus 10 described above may be implemented in the information processing apparatus 10 as hardware.

１０情報処理装置
１１ＣＰＵ
１１１カメラ１１２カメラ 10 Information processing apparatus 11 CPU
111 camera 112 camera

Claims

A correspondence between a first image in which an object is captured by a first imaging device, a coordinate system of the first image, and a coordinate system of a second image in which the object is captured by a second imaging device; , Based on the estimation means for estimating the shooting direction of the object in the second image,
Determining means for determining a recognition model to be used for detection of the object in the second image based on the shooting direction estimated by the estimating means;
Learning means for learning the recognition model determined by the determining means based on an image of the object included in the second image;
An information processing apparatus.

The image processing apparatus further includes an acquisition unit configured to acquire the object area in the second image based on the object area in the first image and the correspondence relationship.
The information processing apparatus according to claim 1, wherein the estimation unit estimates a shooting direction of the object in the second image based on the region acquired by the acquisition unit.

The information processing apparatus according to claim 2, wherein the object is a person, and the estimation unit estimates a shooting direction of the object in the second image based on an aspect ratio of the area of the person in the second image. .

The information processing apparatus according to claim 2, wherein the learning unit learns the recognition model determined by the determination unit based on a portion of the area acquired by the acquisition unit in the second image.

The determination unit determines the recognition model of the learning target by selecting the recognition model of the learning target from a plurality of set recognition models based on the shooting direction estimated by the estimation unit. The information processing apparatus according to any one of 1 to 4.

Based on the third image photographed by the first imaging means and the fourth image photographed by the second imaging means, the visual field of the first imaging means and the second imaging The information processing apparatus according to claim 1, further comprising a control unit that controls a field of view of the first imaging unit so that a region overlapping with the field of view of the unit increases.

The control unit determines whether the field of view of the first imaging unit and the field of view of the second imaging unit are based on the local feature amount in the third image and the local feature amount in the fourth image. The information processing apparatus according to claim 6, wherein a visual field of the first imaging unit is controlled so that an overlapping area increases.

An information processing method executed by an information processing apparatus,
A correspondence between a first image in which an object is captured by a first imaging device, a coordinate system of the first image, and a coordinate system of a second image in which the object is captured by a second imaging device; Based on the estimation step of estimating the shooting direction of the object in the second image;
A determination step of determining a learning target recognition model used for detection of the object in the second image based on the shooting direction estimated in the estimation step;
A learning step of learning the recognition model determined in the determination step based on an image of the object included in the second image;
An information processing method including:

The program for functioning a computer as each means of the information processing apparatus of any one of Claims 1 thru | or 7.