JP6659095B2

JP6659095B2 - Image processing apparatus, image processing method, and program

Info

Publication number: JP6659095B2
Application number: JP2015131841A
Authority: JP
Inventors: 内山　寛之; 寛之内山; 矢野　光太郎; 光太郎矢野; 一郎梅田; 睦凌郭
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-06-30
Filing date: 2015-06-30
Publication date: 2020-03-04
Anticipated expiration: 2035-06-30
Also published as: JP2017016356A

Description

本発明は、画像処理装置、画像処理方法及びプログラムに関する。 The present invention relates to an image processing device, an image processing method, and a program.

監視カメラシステムにおいて、視野が重複している複数のカメラを用いて物体を撮影することで、物体の三次元位置を推定する手法が提案されている。この手法によれば、ステレオ視の原理に従い、位置が既知のカメラで被写体を撮影することで、カメラ画像上の被写体の位置から被写体の三次元位置が推定される。 In a surveillance camera system, there has been proposed a method of estimating a three-dimensional position of an object by photographing the object using a plurality of cameras having overlapping visual fields. According to this method, the three-dimensional position of the subject is estimated from the position of the subject on the camera image by photographing the subject with a camera whose position is known according to the principle of stereo vision.

このとき、実際には物体が存在しない三次元位置において物体が存在すると推定されることが問題となる。以下では、物体が存在する三次元位置、すなわち正しく推定された三次元位置のことを実像と呼び、物体が存在しない三次元位置、すなわち誤って推定された三次元位置のことを虚像と呼ぶ。虚像は、カメラと人物との位置関係により生じる。図２は、虚像が生じるカメラと人物との位置関係を示している。図２においては、カメラ１が人物Ｂを、カメラ２が人物Ａと人物Ｂとをそれぞれ捉えている。カメラと人物とを結ぶ直線の交点が物体の三次元位置として推定されるため、カメラ１と人物Ｂとを結ぶ直線と、カメラ２と人物Ｂとを結ぶ直線との交点に実像が生じる。その一方で、カメラ１と人物Ｂとを結ぶ直線と、カメラ２と人物Ａとを結ぶ直線との交点にも虚像が生じる。 At this time, there is a problem that it is estimated that an object exists at a three-dimensional position where the object does not actually exist. Hereinafter, a three-dimensional position where an object exists, that is, a correctly estimated three-dimensional position, is called a real image, and a three-dimensional position where no object exists, that is, a three-dimensional position that is incorrectly estimated is called a virtual image. The virtual image occurs due to the positional relationship between the camera and the person. FIG. 2 illustrates a positional relationship between a camera and a person where a virtual image occurs. In FIG. 2, camera 1 captures person B, and camera 2 captures person A and person B, respectively. Since the intersection of the straight line connecting the camera and the person is estimated as the three-dimensional position of the object, a real image is generated at the intersection of the straight line connecting the camera 1 and the person B and the straight line connecting the camera 2 and the person B. On the other hand, a virtual image also occurs at the intersection of a straight line connecting camera 1 and person B and a straight line connecting camera 2 and person A.

この問題の解決方法として、特許文献１では、像の動きの連動性に基づいて虚像を低減する方法を提案している。具体的には、虚像に連動して動く像が多いという性質を利用して、ある像に連動して動く像を計数し、計数された値が多いと像は虚像として判別される。また、特許文献２は、人物の三次元移動軌跡を求め、これらの三次元移動軌跡の断片を連結して、それぞれの人物の完全な移動軌跡を算出する方法を開示している。特許文献２の方法によれば、一定時間の長さにわたる軌跡同士を連結することにより虚像を削減することができる。 As a solution to this problem, Patent Literature 1 proposes a method of reducing a virtual image based on the interlocking of image movement. Specifically, utilizing the property that many images move in conjunction with a virtual image, images that move in conjunction with a certain image are counted, and if the counted value is large, the image is determined as a virtual image. Patent Document 2 discloses a method of obtaining a three-dimensional movement trajectory of a person, connecting the fragments of the three-dimensional movement trajectory, and calculating a complete movement trajectory of each person. According to the method of Patent Literature 2, a virtual image can be reduced by connecting trajectories over a predetermined time length.

特許第５４５４５７３号公報Japanese Patent No. 5454573 特開２０１０−０６３００１号公報JP 2010-063001 A

特許文献１の方法には、カメラ数又は物体数が多いと虚像が増えるため、連動性に基づいて虚像を判別することが容易ではないという課題があった。また、特許文献２の方法は、一定時間の長さの軌跡同士を連結するため、即応性が要求される場合には利用しにくいとともに、計算量が大きくなるという課題があった。 The method of Patent Literature 1 has a problem that it is not easy to determine a virtual image based on interlocking, because the virtual image increases when the number of cameras or objects is large. In addition, the method of Patent Document 2 has a problem that it is difficult to use when responsiveness is required because the trajectories having a fixed length of time are connected to each other, and the amount of calculation increases.

本発明は、物体の三次元位置を推定する際に虚像を効果的に削減することを目的とする。 An object of the present invention is to effectively reduce virtual images when estimating a three-dimensional position of an object.

本発明の目的を達成するために、例えば、本発明の画像処理装置は以下の構成を備える。すなわち、
連続して撮影されたフレーム画像群を、視野が重複している複数の撮像部のそれぞれから取得する取得手段と、
前記フレーム画像群上の物体を追尾する追尾手段と、
前記物体のフレーム画像上の位置と前記複数の撮像部間の位置関係とに基づいて着目時刻における前記物体の三次元位置の候補を生成する生成手段と、
前記着目時刻における三次元位置の候補と、前記着目時刻より前の過去時刻における前記物体の三次元位置の候補とを対応付ける対応付け手段と、
前記着目時刻における三次元位置の候補が前記物体に対応する確度を示す局所対応確度と、前記過去時刻における三次元位置の候補が前記物体に対応する確度を示す積算対応確度と、に基づいて、前記着目時刻における三次元位置の候補が前記物体に対応する確度を示す積算対応確度を求める算出手段と、
前記着目時刻における三次元位置の候補のそれぞれについて、１つのフレーム画像群上で追尾されている物体に対応して記録されている前記積算対応確度を取得し、複数のフレーム画像群のそれぞれについて前記取得された積算対応確度に応じて前記三次元位置の候補の信頼度を求め、前記信頼度に基づいて前記着目時刻における前記物体の三次元位置を判定する判定手段と、
を備える。 In order to achieve an object of the present invention, for example, an image processing apparatus of the present invention has the following configuration. That is,
Acquisition means for acquiring a group of continuously captured frame images from each of a plurality of imaging units having overlapping visual fields,
Tracking means for tracking an object on the frame image group;
Generating means for generating a candidate for a three-dimensional position of the object at a time of interest based on a position on the frame image of the object and a positional relationship between the plurality of imaging units;
Associating means for associating the candidate of the three-dimensional position at the time of interest with the candidate of the three-dimensional position of the object at a past time before the time of interest;
Based on the local correspondence accuracy indicating the probability that the candidate of the three-dimensional position at the time of interest corresponds to the object, and the integration correspondence accuracy indicating the probability that the candidate of the three-dimensional position at the past time corresponds to the object, Calculation means for calculating an integrated correspondence accuracy indicating a probability that the candidate of the three-dimensional position at the time of interest corresponds to the object,
For each of the candidates for the three-dimensional position at the time of interest, the integration correspondence accuracy recorded corresponding to the object tracked on one frame image group is obtained, and for each of the plurality of frame image groups, Determining the reliability of the candidate of the three-dimensional position according to the acquired integration correspondence accuracy, determining means for determining the three-dimensional position of the object at the time of interest based on the reliability ,
Is provided.

本発明によれば、物体の三次元位置を推定する際に虚像を効果的に削減することができる。 According to the present invention, it is possible to effectively reduce virtual images when estimating the three-dimensional position of an object.

一実施形態に係る画像処理装置の機能構成例を示すブロック図。FIG. 1 is a block diagram illustrating a functional configuration example of an image processing apparatus according to an embodiment. 虚像について説明する図。The figure explaining a virtual image. 一実施形態に係る画像処理装置のハードウェア構成例を示す図。FIG. 1 is a diagram illustrating an example of a hardware configuration of an image processing apparatus according to an embodiment. 一実施形態に係る処理のフローチャート。9 is a flowchart of a process according to an embodiment. 三次元位置の候補を生成する処理のフローチャート。9 is a flowchart of a process for generating a candidate for a three-dimensional position. 三次元位置の候補を生成する処理を説明する図。The figure explaining the process which produces | generates the candidate of a three-dimensional position. 三次元位置の候補を生成する処理を説明する図。The figure explaining the process which produces | generates the candidate of a three-dimensional position. 対応確度を算出する処理のフローチャート。9 is a flowchart of a process for calculating a correspondence accuracy. 画像に投影された三次元位置の候補と物体との距離を説明する図。The figure explaining the distance of the candidate of the three-dimensional position projected on the image and the object. 三次元位置の候補と物体とを対応付ける処理のフローチャート。9 is a flowchart of a process of associating a three-dimensional position candidate with an object. 一実施形態における表示画像例を示す図。FIG. 4 is a diagram illustrating an example of a display image according to the embodiment.

以下、本発明の実施形態について説明する。以下では人物を検出対象とする場合について説明するが、検出対象は他の物体であってもよい。 Hereinafter, embodiments of the present invention will be described. Hereinafter, a case will be described in which a person is a detection target, but the detection target may be another object.

図３は、本実施形態に係る画像処理装置１００のハードウェア構成例を示す。撮像素子３０１は撮像部に相当し、光学的な被写体像を電気信号に変換する。撮像素子３０１としては、例えばＣＣＤ又はＣＭＯＳ等を用いることができる。信号処理回路３０２は、撮像素子３０１から得られた時系列の電気信号をデジタル信号に変換する。撮像素子３０１及び信号処理回路３０２により、被写体像を含む画像を生成することができる。以下の説明においては、画像処理装置１００は複数のカメラを有しており、それぞれのカメラが撮像素子３０１を備えているものとする。 FIG. 3 illustrates a hardware configuration example of the image processing apparatus 100 according to the present embodiment. The imaging device 301 corresponds to an imaging unit, and converts an optical subject image into an electric signal. As the imaging element 301, for example, a CCD or a CMOS can be used. The signal processing circuit 302 converts a time-series electric signal obtained from the image sensor 301 into a digital signal. An image including a subject image can be generated by the imaging element 301 and the signal processing circuit 302. In the following description, it is assumed that the image processing apparatus 100 has a plurality of cameras, and each of the cameras has an image sensor 301.

ＣＰＵ３０３は、ＲＯＭ３０４に格納されている制御プログラムを実行することにより、画像処理装置１００全体の制御を行う。より具体的には、ＣＰＵ３０３が制御プログラムを実行することにより、図１に示される各機能が実現され、及び後述するフローチャートに示される各処理が行われる。ＲＯＭ３０４は、ＣＰＵ３０３が実行する制御プログラム及び各種パラメータデータを格納する記憶媒体である。ＲＡＭ３０５は、画像及び各種情報を記憶する。また、ＲＡＭ３０５は、ＣＰＵ３０３のワークエリア、又はデータの一時待避領域として機能する。ディスプレイ３０６は画像を表示する。 The CPU 303 controls the entire image processing apparatus 100 by executing a control program stored in the ROM 304. More specifically, each function shown in FIG. 1 is realized by the CPU 303 executing the control program, and each processing shown in a flowchart described later is performed. The ROM 304 is a storage medium that stores a control program executed by the CPU 303 and various parameter data. The RAM 305 stores images and various information. The RAM 305 functions as a work area for the CPU 303 or a temporary save area for data. The display 306 displays an image.

画像処理装置１００の構成は図３に示すものには限られない。例えば、画像処理装置１００は汎用のＰＣを用いて実現することもできる。この場合、画像処理装置１００が撮像素子３０１、信号処理回路３０２及びディスプレイ３０６を有する必要はない。一実施形態において、撮像素子３０１及び信号処理回路３０２はカメラのような撮像装置に含まれ、この撮像装置は画像処理装置１００と通信可能である。このような実施形態において画像処理装置１００は、２つ以上の撮像素子３０１のそれぞれにより得られた電気信号を取得することができる。 The configuration of the image processing apparatus 100 is not limited to that shown in FIG. For example, the image processing apparatus 100 can be realized using a general-purpose PC. In this case, the image processing device 100 does not need to include the image sensor 301, the signal processing circuit 302, and the display 306. In one embodiment, the imaging device 301 and the signal processing circuit 302 are included in an imaging device such as a camera, and the imaging device can communicate with the image processing device 100. In such an embodiment, the image processing apparatus 100 can acquire an electric signal obtained by each of the two or more imaging elements 301.

また、コンピュータ等の処理装置において、ＣＰＵ３０３のようなプロセッサが、ネットワーク又は記憶媒体を介して取得したソフトウェア（プログラム）を実行することにより、図１に示される機能及び後述するフローチャートに示す各処理を実現してもよい。一方で、これらの機能及び処理のうち１以上を、電子回路等の専用のハードウェアを用いて実現することもできる。例えば、画像処理装置１００は物体検出専用のハードウェアで構成された物体検出専用装置であってもよい。 In a processing device such as a computer, a processor such as a CPU 303 executes software (program) acquired via a network or a storage medium to execute the functions shown in FIG. 1 and each process shown in a flowchart described later. It may be realized. On the other hand, one or more of these functions and processes can be realized using dedicated hardware such as an electronic circuit. For example, the image processing device 100 may be a device dedicated to object detection configured by hardware dedicated to object detection.

画像処理装置１００の機能構成を図１に示す。画像処理装置１００は、情報保持部１０３、物体追尾部１０２、位置記憶部１２３、確度算出部１０９及び位置確定部１１５を備える。画像処理装置１００はさらに、画像取得部１０１、位置生成部１０４、位置併合部１２４、対応記憶部１２５、位置更新部１１９、位置削除部１２２及び表示制御部１２６を備える。 FIG. 1 shows a functional configuration of the image processing apparatus 100. The image processing apparatus 100 includes an information storage unit 103, an object tracking unit 102, a position storage unit 123, a probability calculation unit 109, and a position determination unit 115. The image processing apparatus 100 further includes an image acquisition unit 101, a position generation unit 104, a position merging unit 124, a correspondence storage unit 125, a position update unit 119, a position deletion unit 122, and a display control unit 126.

以下の説明においては、画像処理装置１００は、視野が重複している２台のカメラを有しているものとする。上述のように、それぞれのカメラは撮像素子３０１を備えている。もっとも、カメラの台数は２台以上であれば特に制限されない。 In the following description, it is assumed that the image processing apparatus 100 has two cameras having overlapping visual fields. As described above, each camera includes the image sensor 301. However, the number of cameras is not particularly limited as long as it is two or more.

情報保持部１０３は、キャリブレーションを行うことにより得られた各カメラの内部パラメータ及び位置姿勢に関する情報を保持する。キャリブレーション方法としては公知の方法を採用することができる。例えば、環境中に設置されたキャリブレーションボードをカメラで撮像することにより得られた画像を用いて、各カメラの内部パラメータを求めることができる。具体的な方法としては、Zhengyou Zhang. "A flexible New Technique for Camera Calibration", IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1330-1334, 2000.に記載の方法が挙げられる。その後、環境中に設置された位置姿勢が既知であるキャリブレーションマーカーをカメラで撮像することにより得られた画像を用いて、各カメラの位置姿勢を推定することができる。 The information holding unit 103 holds information on the internal parameters and the position and orientation of each camera obtained by performing the calibration. A known method can be used as the calibration method. For example, an internal parameter of each camera can be obtained using an image obtained by imaging a calibration board installed in the environment with a camera. As a specific method, there is a method described in Zhengyou Zhang. "A flexible New Technique for Camera Calibration", IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (11): 1330-1334, 2000. Thereafter, the position and orientation of each camera can be estimated using an image obtained by imaging the calibration marker, whose position and orientation are set in the environment, with a known camera.

別の方法として、カメラによって撮像された画像からコーナーやＳＩＦＴ特徴等の特徴点を抽出し、それぞれの画像から抽出された特徴点を対応づけることにより、カメラの位置姿勢を推定する方法を用いることもできる。具体的な方法としては、Pierre Moulon, Pascal Monasse, and Renaud Marlet. "Adaptive structure from motion with a contrario model estimation", Computer Vision - ACCV 2012: 11th Asian Conference on Computer Vision, Part IV, pp. 257-270, 2013.に記載の方法が挙げられる。また、カメラの内部パラメータと位置姿勢とを同時に求めてもよい。 As another method, a method of extracting a feature point such as a corner or a SIFT feature from an image captured by a camera and associating the feature points extracted from each image with a method of estimating the position and orientation of the camera is used. Can also. As a specific method, Pierre Moulon, Pascal Monasse, and Renaud Marlet. "Adaptive structure from motion with a contrario model estimation", Computer Vision-ACCV 2012: 11th Asian Conference on Computer Vision, Part IV, pp. 257-270 , 2013. Further, the camera internal parameters and the position and orientation may be obtained at the same time.

画像取得部１０１は、連続して撮影されたフレーム画像群を複数の撮像部のそれぞれから取得する。例えば、画像取得部１０１は、各カメラから、連続して撮影されたフレーム画像群で構成される映像を取得することができる。本実施形態においては２台のカメラが用いられるため、画像取得部１０１は２つの映像を取得する。画像処理装置が備える各部は、ほぼ同時に各カメラにより撮像された複数のフレーム画像を参照して処理を行う。 The image acquisition unit 101 acquires a group of continuously captured frame images from each of the plurality of imaging units. For example, the image acquisition unit 101 can acquire, from each camera, a video composed of a group of frame images shot continuously. In this embodiment, since two cameras are used, the image acquisition unit 101 acquires two videos. Each unit included in the image processing apparatus performs processing with reference to a plurality of frame images captured by each camera almost simultaneously.

物体追尾部１０２は、画像取得部１０１が取得したフレーム画像群上の物体を追尾する。例えば、物体追尾部１０２は、画像取得部１０１により取得されたそれぞれのフレーム画像に対して追尾処理を行うことができる。この結果、追尾対象物体の画像領域がそれぞれのフレーム画像から検出される。本実施形態では人物の追尾が行われるため、物体追尾部１０２は、画像から人物領域を抽出し、着目時刻の画像から抽出された人物領域と、同一の人物に関して過去の画像から抽出された人物領域と、を対応付ける。こうして、複数の時刻にわたって同一人物を追尾することができる。追尾処理の方法としては公知の方法を用いることができる。例えば、連続する画像のそれぞれに対して人物検出処理を行い、それぞれの画像から検出された人物をマッチングスコアに基づいて対応づけることにより、人物追尾を行うことができる。具体的な方法としては、M. D. Breitenstein et al. "Robust tracking-by-detection using a detector confidence particle filter", 2009 IEEE 12th International Conference on Computer Vision (ICCV), pp. 1515-1522.に記載の方法が挙げられる。 The object tracking unit 102 tracks an object on the frame image group acquired by the image acquiring unit 101. For example, the object tracking unit 102 can perform a tracking process on each frame image acquired by the image acquisition unit 101. As a result, the image area of the tracking target object is detected from each frame image. In this embodiment, since the tracking of the person is performed, the object tracking unit 102 extracts the person region from the image, and the person region extracted from the image at the time of interest and the person extracted from the past image with respect to the same person. The area is associated with the area. Thus, the same person can be tracked over a plurality of times. A known method can be used as the tracking processing method. For example, a person tracking process can be performed by performing a person detection process on each of the continuous images and associating a person detected from each image based on the matching score. As a specific method, a method described in MD Breitenstein et al. "Robust tracking-by-detection using a detector confidence particle filter", 2009 IEEE 12th International Conference on Computer Vision (ICCV), pp. 1515-1522. No.

物体追尾部１０２による追尾処理の結果、それぞれの画像に含まれる追尾対象物体の二次元位置と、追尾対象物体の識別子とのセットが得られる。以下では、追尾対象物体の画像領域のことを追尾結果又は単に物体と呼び、追尾対象物体の識別子のことを追尾結果の識別子と呼ぶ。本実施形態においては、それぞれの画像に含まれる人物の二次元位置と、人物の識別子とのセットが得られる。より具体的には、それぞれの画像に含まれる人物領域の二次元位置と、人物領域の識別子とのセットが得られる。本実施形態において人物領域は矩形領域であり、人物領域の二次元位置としては矩形領域の代表点座標（ｘ，ｙ）、矩形領域の高さｈ、及び矩形領域の幅ｗが得られる。代表点座標は、例えば検出された顔領域の中心座標、又は矩形領域の中心座標等でありうる。また、追尾結果の識別子として、２Ｄ追尾ラベルｉが得られる。２Ｄ追尾ラベルｉは、それぞれのカメラについて、画像中で追尾されている人物を識別するための符号である。同じカメラにより得られた画像において、異なる人物に対しては異なる２Ｄ追尾ラベルｉが付されるように、２Ｄ追尾ラベルｉは決定される。 As a result of the tracking processing by the object tracking unit 102, a set of the two-dimensional position of the tracking target object included in each image and the identifier of the tracking target object is obtained. Hereinafter, the image area of the tracking target object is referred to as a tracking result or simply an object, and the identifier of the tracking target object is referred to as a tracking result identifier. In the present embodiment, a set of the two-dimensional position of the person included in each image and the identifier of the person is obtained. More specifically, a set of the two-dimensional position of the person region included in each image and the identifier of the person region is obtained. In the present embodiment, the person area is a rectangular area, and the representative point coordinates (x, y) of the rectangular area, the height h of the rectangular area, and the width w of the rectangular area are obtained as the two-dimensional position of the human area. The representative point coordinates may be, for example, the center coordinates of the detected face area or the center coordinates of the rectangular area. Also, a 2D tracking label i is obtained as an identifier of the tracking result. The 2D tracking label i is a code for identifying the person tracked in the image for each camera. The 2D tracking label i is determined so that different persons are given different 2D tracking labels i in images obtained by the same camera.

位置記憶部１２３は、三次元位置の情報を記憶する。詳細は後述する。 The position storage unit 123 stores information on a three-dimensional position. Details will be described later.

位置生成部１０４は、物体のフレーム画像上の位置と複数の撮像部間の位置関係とに基づいて着目時刻における物体の三次元位置の候補を生成する。例えば、位置生成部１０４は、物体追尾部１０２による追尾処理により得られた人物の二次元位置に基づいて、三次元位置の候補を新規に生成することができる。具体的には、位置生成部１０４は、それぞれの画像から検出された人物の二次元位置を対応付けることにより、三次元位置の候補を作成する。三次元位置とは三次元空間中の人物の位置を示すものであり、座標（ｘ，ｙ，ｚ）で表される。また、位置生成部１０４は、三次元位置の候補に対して、識別子と、属性情報とを割り当てる。以下では、三次元位置の候補を単に三次元位置と呼ぶことがある。 The position generation unit 104 generates a candidate for the three-dimensional position of the object at the time of interest based on the position of the object on the frame image and the positional relationship between the plurality of imaging units. For example, the position generation unit 104 can newly generate a three-dimensional position candidate based on the two-dimensional position of the person obtained by the tracking processing by the object tracking unit 102. Specifically, the position generation unit 104 creates three-dimensional position candidates by associating the two-dimensional positions of the persons detected from the respective images. The three-dimensional position indicates a position of a person in a three-dimensional space, and is represented by coordinates (x, y, z). In addition, the position generation unit 104 assigns an identifier and attribute information to the three-dimensional position candidate. Hereinafter, the three-dimensional position candidate may be simply referred to as a three-dimensional position.

識別子とは、生成された三次元位置の候補を識別するための符号であり、例えば３Ｄ追尾ラベルｊを用いることができる。異なる三次元位置に対しては異なる３Ｄ追尾ラベルｊが付されるように、３Ｄ追尾ラベルｊは決定される。また、属性情報とは、三次元位置の性質を表す値又は符号である。本実施形態においては、三次元位置に対応する追尾結果の属性（物体の属性）を用いて三次元位置の属性が決定される。例えば、人物の色（色属性）を属性として用いることができる。別の実施形態においては、色属性の代わりに、人物の年齢、性別、身長、向き若しくは大きさ、又は追尾結果から抽出した特徴量等を、属性として用いることができる。具体的な一例として、位置生成部１０４は、三次元位置の候補を生成する際に参照された各フレーム画像上の追尾結果から特徴量を抽出し、三次元位置の属性として、各フレーム画像について抽出された特徴量の統計量、例えば平均値、を算出することができる。 The identifier is a code for identifying the generated three-dimensional position candidate, and for example, a 3D tracking label j can be used. The 3D tracking label j is determined so that different 3D tracking labels j are assigned to different three-dimensional positions. The attribute information is a value or a code representing the property of the three-dimensional position. In the present embodiment, the attribute of the three-dimensional position is determined using the attribute of the tracking result (the attribute of the object) corresponding to the three-dimensional position. For example, the color (color attribute) of a person can be used as an attribute. In another embodiment, instead of the color attribute, the age, gender, height, direction, or size of the person, a feature amount extracted from the tracking result, or the like can be used as the attribute. As a specific example, the position generation unit 104 extracts a feature amount from a tracking result on each frame image referred to when generating a three-dimensional position candidate, and extracts a feature amount of each frame image as an attribute of the three-dimensional position. It is possible to calculate a statistic of the extracted feature quantity, for example, an average value.

位置生成部１０４は、対応付け部１０５、位置推定部１０６及び属性抽出部１０７を備える。対応付け部１０５は、物体追尾部１０２により追尾されている人物を画像間で対応付ける。位置推定部１０６は、対応付け部１０５により対応付けられた人物の三次元位置の候補を生成する。属性抽出部１０７は、対応付け部１０５により対応付けられた人物の属性を取得する。これらの詳細については後に説明する。 The position generation unit 104 includes an association unit 105, a position estimation unit 106, and an attribute extraction unit 107. The associating unit 105 associates a person tracked by the object tracking unit 102 between images. The position estimating unit 106 generates candidates for the three-dimensional position of the person associated by the associating unit 105. The attribute extraction unit 107 acquires the attribute of the person associated by the association unit 105. Details of these will be described later.

位置記憶部１２３は、三次元位置の情報を記憶する。例えば、位置記憶部１２３は、位置生成部１０４により生成された三次元位置の候補と、この三次元位置についての３Ｄ追尾ラベルｊと、この三次元位置についての属性と、のセットを記憶することができる。 The position storage unit 123 stores information on a three-dimensional position. For example, the position storage unit 123 stores a set of the three-dimensional position candidate generated by the position generation unit 104, a 3D tracking label j for the three-dimensional position, and an attribute for the three-dimensional position. Can be.

位置併合部１２４は、着目時刻における三次元位置の候補と、着目時刻より前の過去時刻における物体の三次元位置の候補とを対応づける。本実施形態においては、位置併合部１２４は、物体のフレーム画像上の位置と複数の撮像部間の位置関係とに基づいて生成された着目時刻における三次元位置の候補と、過去時刻における三次元位置の候補と、を併合する。こうして、位置併合部１２４は、着目時刻における三次元位置の候補を過去時刻における三次元位置の候補と対応づける。 The position merging unit 124 associates the candidate of the three-dimensional position at the time of interest with the candidate of the three-dimensional position of the object at a past time before the time of interest. In the present embodiment, the position merging unit 124 includes a three-dimensional position candidate at the time of interest generated based on the position of the object on the frame image and the positional relationship between the plurality of imaging units, and a three-dimensional position at the past time. And the position candidates are merged. Thus, the position merging unit 124 associates the candidate of the three-dimensional position at the time of interest with the candidate of the three-dimensional position at the past time.

例えば、位置併合部１２４は、位置生成部１０４により生成された複数の三次元位置の候補を１つに併合することができる。この処理により、同一人物に対応すると判定された三次元位置同士が併合される。また、この処理の結果、着目時刻における三次元位置の候補と、着目時刻より前の過去時刻における物体の三次元位置の候補と、の対応付けが実現できる。もっとも、三次元位置を併合することにより対応付けを行うことは必ずしも必須ではなく、着目時刻における三次元位置の候補と過去時刻における三次元位置の候補とが同一人物に対応すると判定されたことを示す情報を記録することもできる。 For example, the position merging unit 124 can merge a plurality of three-dimensional position candidates generated by the position generating unit 104 into one. By this processing, three-dimensional positions determined to correspond to the same person are merged. In addition, as a result of this processing, the correspondence between the candidate of the three-dimensional position at the time of interest and the candidate of the three-dimensional position of the object at the past time before the time of interest can be realized. However, it is not always necessary to perform the association by merging the three-dimensional positions, and it is determined that the candidate of the three-dimensional position at the time of interest and the candidate of the three-dimensional position at the past time correspond to the same person. The information shown can also be recorded.

位置生成部１０４による処理の結果、同一人物の三次元位置が複数生成されることがある。また、図４に示す処理全体により生じる推定誤りによって、同一人物の三次元位置が複数存在することがある。そこで、位置併合部１２４は、同一人物の三次元位置を１つに限定するために、同一人物とみなせる複数の三次元位置を探索して１つに併合する。また、位置併合部１２４は、現在処理対象となっている着目時刻におけるフレーム画像を用いて生成された三次元位置の候補と、着目時刻よりも前の過去時刻におけるフレーム画像を用いて生成された三次元位置の候補と、も併合する。対応記憶部１２５は、位置併合部１２４により併合された三次元位置が同一人物であることを示す情報を記憶する。これらの詳細については後に説明する。 As a result of the processing by the position generation unit 104, a plurality of three-dimensional positions of the same person may be generated. In addition, a plurality of three-dimensional positions of the same person may exist due to an estimation error caused by the entire process illustrated in FIG. Therefore, the position merging unit 124 searches for a plurality of three-dimensional positions that can be regarded as the same person and merges them into one in order to limit the three-dimensional position of the same person to one. In addition, the position merging unit 124 is generated using the three-dimensional position candidate generated using the frame image at the time of interest that is currently being processed and the frame image at the past time before the time of interest. The three-dimensional position candidates are also merged. The correspondence storage unit 125 stores information indicating that the three-dimensional positions merged by the position merging unit 124 are the same person. Details of these will be described later.

確度算出部１０９は、着目時刻における三次元位置の候補が物体に対応する積算対応確度を求める。この処理は、着目時刻におけるフレーム画像に基づいて求められた着目時刻における三次元位置の候補が物体に対応する確度を示す局所対応確度と、過去時刻における三次元位置の候補が物体に対応する確度を示す積算対応確度と、の双方に基づいて行われる。本実施形態において、確度算出部１０９は、着目時刻ｔのみを考慮した三次元位置ｊと追尾結果ｉとの間の対応確度（局所対応確度）と、過去のフレーム画像に基づく三次元位置ｊと追尾結果ｉとの間の対応確度（積算対応確度）とに基づいて対応確度を計算する。対応確度の計算方法については後に詳しく説明する。 The accuracy calculation unit 109 obtains the integration accuracy of the three-dimensional position candidate at the time of interest corresponding to the object. This processing is performed based on the local correspondence accuracy indicating the probability that the candidate of the three-dimensional position at the time of interest determined based on the frame image at the time of interest corresponds to the object, and the accuracy of the candidate of the three-dimensional position at the past time corresponding to the object. This is performed based on both the integration correspondence accuracy and In the present embodiment, the accuracy calculation unit 109 calculates the correspondence accuracy (local correspondence accuracy) between the tracking result i and the three-dimensional position j considering only the time of interest t, and the three-dimensional position j based on the past frame image. The correspondence accuracy is calculated based on the correspondence accuracy (accumulation correspondence accuracy) with the tracking result i. The method of calculating the correspondence accuracy will be described later in detail.

これらの対応確度とは、三次元位置の候補が示す物体が、追尾されている物体と同一である可能性を示す指標であり、三次元位置と追尾結果とが同一人物に対応する度合いを示すものである。こうして、確度算出部１０９は、三次元位置と追尾結果間の対応確度を計算する。ここで、三次元位置とは位置併合部１２４による併合処理により得られた三次元位置のことを指し、それぞれの三次元位置を３Ｄ位置ラベルｊを用いて三次元位置ｊと呼ぶ。また、追尾結果とは物体追尾部１０２による処理によりそれぞれの画像において追尾されている人物のことを指し、それぞれの追尾結果を２Ｄ位置ラベルｉを用いて追尾結果ｉと呼ぶ。 The correspondence accuracy is an index indicating the possibility that the object indicated by the candidate of the three-dimensional position is the same as the object being tracked, and indicates the degree to which the three-dimensional position and the tracking result correspond to the same person. Things. Thus, the accuracy calculation unit 109 calculates the accuracy of the correspondence between the three-dimensional position and the tracking result. Here, the three-dimensional position refers to a three-dimensional position obtained by the merging process by the position merging unit 124, and each three-dimensional position is referred to as a three-dimensional position j using a 3D position label j. The tracking result indicates a person being tracked in each image by the processing by the object tracking unit 102, and each tracking result is referred to as a tracking result i using a 2D position label i.

確度算出部１０９は、距離計算部１１０、相違度計算部１１１、局所確度計算部１１２、積算確度計算部１１３及び積算確度記憶部１１４を備える。距離計算部１１０は、三次元位置と物体追尾部の結果とカメラの位置関係とから物体距離を計算する。具体的には、画像に投影された三次元位置と追尾結果の画像上の距離を計算する。相違度計算部１１１は、三次元位置の持つ色属性と追尾結果の人物検出領域から抽出される色属性の相違度を計算する。局所確度計算部１１２は、着目時刻ｔのみを考慮した三次元位置と追尾結果間の対応確度（局所対応確度）を計算するものであり、詳細は後述する。積算確度計算部１１３は、積算確度記憶部１１４に記憶された積算対応確度を局所確度計算部１１２で取得された局所対応確度で更新するものであり、詳細は後述する。積算確度記憶部１１４は、求められた積算対応確度を、着目時刻における三次元位置の候補が物体に対応する確度を示す積算対応確度として記録する。記録される積算対応確度は、局所対応確度を積算して得られる値に相当する。 The certainty calculating unit 109 includes a distance calculating unit 110, a difference calculating unit 111, a local certainty calculating unit 112, an integrated certainty calculating unit 113, and an integrated certainty storing unit 114. The distance calculation unit 110 calculates the object distance from the three-dimensional position, the result of the object tracking unit, and the positional relationship of the camera. Specifically, the distance between the three-dimensional position projected on the image and the tracking result on the image is calculated. The difference calculation unit 111 calculates the difference between the color attribute of the three-dimensional position and the color attribute extracted from the tracking result of the person detection area. The local accuracy calculation unit 112 calculates the accuracy (local accuracy) between the three-dimensional position and the tracking result in consideration of only the time of interest t, and will be described in detail later. The integrated accuracy calculation unit 113 updates the integrated correspondence accuracy stored in the integrated accuracy storage unit 114 with the local correspondence accuracy acquired by the local accuracy calculation unit 112, and will be described later in detail. The integrated accuracy storage unit 114 records the obtained integrated corresponding accuracy as the integrated corresponding accuracy indicating the accuracy of the candidate of the three-dimensional position at the time of interest corresponding to the object. The recorded integration correspondence accuracy corresponds to a value obtained by integrating the local correspondence accuracy.

位置確定部１１５は、積算対応確度に基づいて前目時刻における物体の三次元位置を判定する。本実施形態において位置確定部１１５は、記録された積算対応確度がより高い、着目時刻における三次元位置の候補を、着目時刻における物体の三次元位置に対応すると判定する。このようにして位置確定部１１５は、対応確度から三次元位置を確定する。本実施形態では、三次元位置の実像らしさを表わす実像信頼度を計算し、それを用いて実像であることを確定する。さらに三次元位置と追尾結果の対応関係を求める。位置確定部１１５は、信頼度算出部１１６、位置選択部１１７及び結果選択部１１８を備える。信頼度算出部１１６は、三次元位置の実像らしさを表わす実像信頼度を計算する。位置選択部１１７は、実像信頼度が大きな三次元位置を求め、実像として確定する。結果選択部１１８は、三次元位置に対応する追尾結果を確定する。これらの処理の詳細は後述する。 The position determination unit 115 determines the three-dimensional position of the object at the previous eye time based on the integration correspondence accuracy. In the present embodiment, the position determination unit 115 determines that the candidate of the three-dimensional position at the time of interest that has the higher accumulated integration correspondence accuracy corresponds to the three-dimensional position of the object at the time of interest. In this way, the position determination unit 115 determines the three-dimensional position from the correspondence accuracy. In the present embodiment, the real image reliability indicating the real image likelihood of the three-dimensional position is calculated, and the real image reliability is determined using the calculated reliability. Further, the correspondence between the three-dimensional position and the tracking result is obtained. The position determination unit 115 includes a reliability calculation unit 116, a position selection unit 117, and a result selection unit 118. The reliability calculation unit 116 calculates a real image reliability indicating a real image likelihood of a three-dimensional position. The position selection unit 117 obtains a three-dimensional position with a high real image reliability and determines it as a real image. The result selection unit 118 determines a tracking result corresponding to the three-dimensional position. Details of these processes will be described later.

位置更新部１１９は、位置確定部１１５で求めた三次元位置と追尾結果の対応関係を用いて、三次元位置及び属性を更新する。位置更新部１１９は、座標更新部１２０及び属性更新部１２１を備える。座標更新部１２０及び属性更新部１２１は、それぞれ三次元位置及びそれが持つ属性を更新する。位置削除部１２２は、不要な三次元位置を削除する。表示制御部１２６は、物体の検出結果や三次元位置をカメラ画像とともにディスプレイに表示させる。 The position updating unit 119 updates the three-dimensional position and the attribute using the correspondence between the three-dimensional position obtained by the position determining unit 115 and the tracking result. The position updating unit 119 includes a coordinate updating unit 120 and an attribute updating unit 121. The coordinate updating unit 120 and the attribute updating unit 121 update the three-dimensional position and the attribute of the three-dimensional position. The position deleting unit 122 deletes unnecessary three-dimensional positions. The display control unit 126 displays the detection result and the three-dimensional position of the object on the display together with the camera image.

以下で、図４のフローチャートを参照して、画像処理装置１００の動作について説明する。ステップＳ４０１においては、上述のようにキャリブレーションを行うことにより、各カメラの内部パラメータ及び位置姿勢が推定され、これらの情報が情報保持部１０３に保持される。ステップＳ４０２において画像取得部１０１は、上述のように各カメラからフレーム画像を取得する。以下では、画像取得部１０１は時刻ｔにおけるフレーム画像を取得するものとする。ステップＳ４０３において物体追尾部は、上述のように人物追尾処理を行い、それぞれの追尾結果に代表点座標（ｘ，ｙ）、高さｈ、幅ｗ、及び２Ｄ追尾ラベルｉを割り当てる。 Hereinafter, the operation of the image processing apparatus 100 will be described with reference to the flowchart of FIG. In step S401, the internal parameters and the position and orientation of each camera are estimated by performing the calibration as described above, and the information is held in the information holding unit 103. In step S402, the image acquisition unit 101 acquires a frame image from each camera as described above. Hereinafter, it is assumed that the image acquisition unit 101 acquires a frame image at time t. In step S403, the object tracking unit performs the person tracking process as described above, and assigns the representative point coordinates (x, y), the height h, the width w, and the 2D tracking label i to each tracking result.

ステップＳ４０４において、位置生成部１０４は、上述のように人物の三次元位置の候補を生成し、位置記憶部１２３に三次元位置の情報を格納する。ステップＳ４０４の処理の詳細について、図５を参照して説明する。ステップＳ５０１において、位置推定部１０６は、ある画像から検出された追尾結果が、他の画像のどの追尾結果に対応するかを探索する。画像間での追尾結果の対応付け方法は特に限定されないが、例えばエピポーラ幾何を用いる方法が挙げられる。以下では、エピポーラ幾何を用いて追尾結果を対応づける方法について説明する。説明のために、以下ではカメラ１及びカメラ２の２つのカメラが用いられるものとし、一方のカメラにより撮像された画像をカメラ画像１、他方のカメラにより撮像された画像をカメラ画像２と呼ぶ。 In step S404, the position generation unit 104 generates the candidates for the three-dimensional position of the person as described above, and stores the information of the three-dimensional position in the position storage unit 123. Details of the processing in step S404 will be described with reference to FIG. In step S501, the position estimating unit 106 searches for a tracking result detected from a certain image corresponding to a tracking result of another image. The method of associating the tracking results between the images is not particularly limited, and includes, for example, a method using epipolar geometry. Hereinafter, a method for associating the tracking result using the epipolar geometry will be described. For the sake of explanation, it is assumed below that two cameras, camera 1 and camera 2, are used, and an image captured by one camera is referred to as camera image 1, and an image captured by the other camera is referred to as camera image 2.

図６（Ａ）はカメラ画像１を示す。カメラ画像１には、人物Ａが映っている。情報保持部１０３は、それぞれのカメラの相対的位置姿勢及び内部パラメータを示す情報を有しているため、この情報を参照して、カメラ画像１において所定の位置に映っている人物Ａは、カメラ画像２においてどの位置に映るのかを知ることができる。具体的には、人物Ａがカメラ画像２に映っている場合、その位置はエピポーラ線と呼ばれる直線上のどこかになる。 FIG. 6A shows a camera image 1. The camera image 1 shows the person A. Since the information holding unit 103 has information indicating the relative position and orientation of each camera and internal parameters, the person A shown at a predetermined position in the camera image 1 is referred to by referring to this information. It is possible to know at which position the image 2 appears. Specifically, when the person A is shown in the camera image 2, the position is somewhere on a straight line called an epipolar line.

カメラ１，２の位置姿勢及び内部パラメータから得られる、カメラ画像１とカメラ画像２との間での位置関係の情報を含む行列である基礎行列をＦとする。さらに、カメラ画像１における人物Ａの次元座標を表わすベクトルをｘとする。すると、従来より知られているように、ると、エピポーラ線ｌは次式で表わされる。

Let F be a fundamental matrix which is a matrix containing information on the positional relationship between the camera image 1 and the camera image 2 obtained from the position and orientation of the cameras 1 and 2 and the internal parameters. Further, a vector representing the dimensional coordinates of the person A in the camera image 1 is defined as x. Then, as conventionally known, the epipolar line 1 is expressed by the following equation.

対応付け部１０５は、カメラ画像２において、代表点とエピポーラ線間の距離が閾値以下となる人物を、人物Ａに対応する人物として判定する。例えば、図６において、人物Ａには、人物Ｂ及び人物Ｃが対応する。もっとも、人物Ａと同一の人物はカメラ画像２中にはせいぜい１人しかいないから、人物Ｂと人物Ｃとの少なくとも一方は、人物Ａに対応する人物として誤検出されている。 The associating unit 105 determines a person whose distance between the representative point and the epipolar line is equal to or smaller than the threshold in the camera image 2 as a person corresponding to the person A. For example, in FIG. 6, a person A corresponds to a person B and a person C. However, since there is at most one person identical to person A in camera image 2, at least one of person B and person C is erroneously detected as a person corresponding to person A.

ステップＳ５０２において位置推定部１０６は、ステップＳ５０１で対応づけられた人物の組に基づいて、この人物の三次元位置を推定する。この推定は、三角測量の原理に従って行うことができる。例えば、三次元空間中において、カメラＡの光学中心から人物Ａの代表点の方向へと向かう直線と、カメラＢの光学中心から人物Ｂの代表点の方向へと向かう直線と、の交点を、同一人物と推定される人物Ａ，Ｂの推定三次元位置として取得することができる。具体例としては、位置推定部１０６は、図７に示すように、各カメラ画像における人物の代表点の座標とカメラの中心座標とを結ぶ三次元空間中の直線を取得する。そして、位置推定部１０６は、複数のカメラから得られたこれらの直線が交差する座標を取得する。厳密にはこれらの直線が交差することは稀であるから、位置推定部１０６は、交点の座標の代わりに、これらの直線の共通垂線の中点座標を人物の推定三次元位置として取得することができる。 In step S502, the position estimating unit 106 estimates the three-dimensional position of the person based on the set of persons associated in step S501. This estimation can be made according to the principle of triangulation. For example, in a three-dimensional space, an intersection of a straight line from the optical center of the camera A toward the representative point of the person A and a straight line from the optical center of the camera B toward the representative point of the person B, It can be obtained as estimated three-dimensional positions of persons A and B estimated to be the same person. As a specific example, as shown in FIG. 7, the position estimating unit 106 acquires a straight line in the three-dimensional space connecting the coordinates of the representative point of the person in each camera image and the center coordinates of the camera. Then, the position estimating unit 106 acquires coordinates at which these straight lines obtained from a plurality of cameras intersect. Strictly speaking, these straight lines rarely intersect. Therefore, the position estimating unit 106 obtains, as the estimated three-dimensional position of the person, the midpoint coordinates of the common perpendicular to these straight lines instead of the coordinates of the intersection. Can be.

ステップＳ５０３において属性抽出部１０７は、ステップＳ５０１で対応づけられた人物の組について属性を取得する。本実施形態において、属性抽出部１０７は、それぞれのカメラ画像に含まれる追尾結果の属性を取得し、得られた属性値に基づいて人物の組についての属性を取得する。具体例としては、属性抽出部１０７は、カメラ画像１における人物Ａの追尾結果の色属性と、カメラ画像２における人物Ｂの追尾結果の色属性とを計算することができる。この色属性は、例えば追尾結果の色の平均でありうる。そして、属性抽出部１０７は、それぞれの追尾結果の色属性の平均を、人物Ａ，Ｂについての色属性として用いることができる。 In step S503, the attribute extracting unit 107 acquires the attribute of the set of persons associated in step S501. In the present embodiment, the attribute extracting unit 107 acquires the attribute of the tracking result included in each camera image, and acquires the attribute of the set of persons based on the obtained attribute value. As a specific example, the attribute extracting unit 107 can calculate the color attribute of the tracking result of the person A in the camera image 1 and the color attribute of the tracking result of the person B in the camera image 2. This color attribute can be, for example, the average of the colors of the tracking result. Then, the attribute extracting unit 107 can use the average of the color attributes of the tracking results as the color attributes of the persons A and B.

別の実施形態において対応付け部１０５は、エピポーラ幾何の代わりに、又はエピポーラ幾何に加えて、追尾結果の属性を参照して人物の対応付けを行うこともできる。具体例としては、各追尾結果の属性の誤差が閾値以下である場合に、各人物を対応づけることができる。この場合、ステップＳ５０１の前に、属性抽出部１０７は、各追尾結果の属性（例えば色属性）を取得することができる。また、ステップＳ５０３において属性抽出部１０７は、既に求められている、対応するカメラ画像に含まれる追尾結果の色属性を平均することにより、人物の色属性を取得することができる。 In another embodiment, the associating unit 105 can associate a person with reference to the attribute of the tracking result instead of or in addition to the epipolar geometry. As a specific example, when the error of the attribute of each tracking result is equal to or smaller than a threshold, each person can be associated. In this case, before step S501, the attribute extracting unit 107 can acquire an attribute (for example, a color attribute) of each tracking result. Also, in step S503, the attribute extraction unit 107 can acquire the color attribute of the person by averaging the color attributes of the tracking result included in the corresponding camera image, which have already been obtained.

ステップ４０４では、２台のカメラ間での対応付けについて説明した。しかしながら、３台以上のカメラを用いる場合も、同様に対応付けを行うことができる。例えば、複数のカメラから選択された２台のカメラの組み合わせのそれぞれについて、対応付け及び三次元位置の候補の生成を行うことができる。また、多視点の幾何拘束を用いて対応付け及び三次元位置の候補の生成を行うこともできる。 Step 404 has described the association between the two cameras. However, when three or more cameras are used, the association can be similarly performed. For example, for each combination of two cameras selected from a plurality of cameras, association and generation of three-dimensional position candidates can be performed. Also, correspondence and generation of three-dimensional position candidates can be performed using geometric constraints of multiple viewpoints.

ステップＳ４０４によれば、ステップＳ４０３で検出された追尾結果に基づいて新しい三次元位置が作成される。この処理により、以前のフレーム画像では検出されていない新しい実像候補を探し出すことができる。 According to step S404, a new three-dimensional position is created based on the tracking result detected in step S403. By this process, a new real image candidate that has not been detected in the previous frame image can be found.

ステップＳ４０５において位置併合部１２４は、上述のように、同一人物に対応すると判定される三次元位置同士を１つに併合する。本実施形態では、三次元位置同士の距離が短い場合にそれぞれの三次元位置は同一人物を表すものとみなし、位置併合部１２４はそれぞれの三次元位置を併合する。具体例としては、位置併合部１２４は、互いの距離が一定値以下である三次元位置を探索し、距離が短い三次元位置同士のグループを作成する。こうして１つのグループに入れられた三次元位置が併合される。位置併合部１２４は、そして、それぞれのグループについて三次元位置の平均座標をもとめ、これを併合後の三次元位置の座標とする。 In step S405, the position merging unit 124 merges three-dimensional positions determined to correspond to the same person into one, as described above. In the present embodiment, when the distance between the three-dimensional positions is short, the three-dimensional positions are regarded as representing the same person, and the position merging unit 124 merges the three-dimensional positions. As a specific example, the position merging unit 124 searches for a three-dimensional position in which the distance between the three-dimensional positions is equal to or less than a certain value, and creates a group of three-dimensional positions in which the distance is short. Thus, the three-dimensional positions included in one group are merged. The position merging unit 124 then obtains the average coordinates of the three-dimensional positions for each group, and sets the average coordinates of the three-dimensional positions after the merging.

本実施形態においては、着目時刻のフレーム画像に基づいて位置生成部１０４により生成された三次元位置と、過去のフレーム画像に基づいて位置生成部１０４により生成された三次元位置との距離が短い場合も、それぞれの三次元位置を併合する。具体例としては、位置併合部１２４は、前回のステップＳ４０５により過去のフレーム画像を参照して得られた併合後の三次元位置と、今回のステップＳ４０４において着目時刻のフレーム画像を参照して生成された三次元位置の候補と、を併合することができる。別の実施形態においては、過去のフレーム画像に基づいて過去のステップＳ４０４において生成された三次元位置の候補と、着目時刻のフレーム画像に基づいて今回のステップＳ４０４において生成された三次元位置の候補と、を併合することもできる。 In the present embodiment, the distance between the three-dimensional position generated by the position generation unit 104 based on the frame image at the time of interest and the three-dimensional position generated by the position generation unit 104 based on the past frame image is short. Also, the three-dimensional positions are merged. As a specific example, the position merging unit 124 generates the three-dimensional position after merging obtained by referring to the past frame image in the previous step S405 and the frame image at the time of interest in the current step S404. Can be merged with the three-dimensional position candidates. In another embodiment, the candidate of the three-dimensional position generated in the past step S404 based on the past frame image and the candidate of the three-dimensional position generated in the current step S404 based on the frame image of the time of interest are provided. And can also be merged.

併合後の三次元位置の属性及び３Ｄ追尾ラベルｊとしては、併合前のいずれかの三次元位置のものが採用される。本実施形態においては、最も古い時刻から存在する三次元位置の属性及び３Ｄ追尾ラベルｊが採用される。このような構成は、三次元位置に生成された時刻を示す情報を付加することにより実現できる。しかしながら、他の採用基準を用いることもできる。位置併合部１２４は、さらに、それぞれ異なる物体に対応すると判定された過去時刻における２以上の三次元位置の候補を併合した場合、２以上の三次元位置が同一物体に対応することを示す情報を記録することができる。例えば、位置併合部１２４は、採用されなかった３Ｄ追尾ラベルを有する三次元位置と、採用された３Ｄ追尾ラベルを有する三次元位置とが、同一人物を表すことを示す情報を記録することができる。具体例としては、位置併合部１２４は、これらの３Ｄ追尾ラベルが同一であることを示す情報を対応記憶部１２５に記録することができる。 As the attribute of the three-dimensional position after merging and the 3D tracking label j, any one of the three-dimensional positions before merging is adopted. In the present embodiment, the attribute of the three-dimensional position existing from the oldest time and the 3D tracking label j are adopted. Such a configuration can be realized by adding information indicating the time generated at the three-dimensional position. However, other recruitment criteria can be used. The position merging unit 124 further includes information indicating that two or more three-dimensional positions correspond to the same object when merging two or more three-dimensional position candidates at the past time determined to correspond to different objects. Can be recorded. For example, the position merging unit 124 can record information indicating that the three-dimensional position having the adopted 3D tracking label and the three-dimensional position having the adopted 3D tracking label represent the same person. . As a specific example, the position merging unit 124 can record information indicating that these 3D tracking labels are the same in the correspondence storage unit 125.

本実施形態では距離が短い三次元位置同士を併合したが、併合方法はこの方法には限られず、他の基準に従って併合する三次元位置を決定することもできる。例えば、着目時刻における三次元位置の候補と過去時刻における三次元位置の候補との間の距離を用いることができる。また、着目時刻における三次元位置の候補に対応する物体の属性と過去時刻における三次元位置の候補に対応する物体の属性との相違度を用いることもできる。これらのうちの少なくとも一方に基づいて、着目時刻における三次元位置の候補と過去時刻における三次元位置の候補とを併合することができる。例えば、属性が類似する三次元位置同士を併合することもできるし、三次元位置間の距離と属性の相違とに基づいて併合する三次元位置を決定することもできる。ここで、現在時刻又は過去時刻における三次元位置の候補に対応する物体とは、三次元位置の候補を生成する際に参照された各フレーム画像上の物体（追尾結果）であってもよい。一方で、過去時刻における三次元位置の候補に対応する物体は、三次元位置の候補に対応する積算対応確度が所定値以上でありかつ積算対応確度が最も高い物体としてフレーム画像毎に選択された物体（追尾結果）であってもよい。 In the present embodiment, the three-dimensional positions with a short distance are merged. However, the merging method is not limited to this method, and the three-dimensional position to be merged can be determined according to other criteria. For example, the distance between the candidate of the three-dimensional position at the time of interest and the candidate of the three-dimensional position at the past time can be used. Further, the degree of difference between the attribute of the object corresponding to the candidate of the three-dimensional position at the time of interest and the attribute of the object corresponding to the candidate of the three-dimensional position at the past time can also be used. Based on at least one of these, the candidate of the three-dimensional position at the time of interest and the candidate of the three-dimensional position at the past time can be merged. For example, three-dimensional positions with similar attributes can be merged, or the three-dimensional positions to be merged can be determined based on the distance between the three-dimensional positions and the difference in attributes. Here, the object corresponding to the candidate of the three-dimensional position at the current time or the past time may be an object (tracking result) on each frame image referred to when generating the candidate of the three-dimensional position. On the other hand, the object corresponding to the candidate of the three-dimensional position at the past time is selected for each frame image as the object whose integration corresponding accuracy corresponding to the candidate of the three-dimensional position is equal to or more than the predetermined value and whose integration corresponding accuracy is the highest. It may be an object (tracking result).

ステップＳ４０５によれば、同一人物である可能性のある複数の実像候補を１つに絞り込むことができる。この処理により、実像の特定精度の向上が見込まれる。また、冗長な三次元位置を削除することにより、計算量削減の効果が見込まれる。 According to step S405, a plurality of real image candidates that may be the same person can be narrowed down to one. This processing is expected to improve the accuracy of specifying the real image. Further, by removing redundant three-dimensional positions, an effect of reducing the amount of calculation is expected.

ステップＳ４０６で確度算出部１０９は、上述のように、三次元位置と追尾結果間の対応確度を計算する。ステップＳ４０６の詳細を図８のフローチャートを参照して説明する。本実施形態においては着目時刻における三次元位置の候補と過去時刻における三次元位置の候補とを併合して得られた三次元位置ｊのそれぞれについて対応確度が計算される。しかしながら、着目時刻における三次元位置の候補、又は着目時刻における複数の三次元位置の候補を併合して得られる三次元位置の候補、のそれぞれについて対応確度が計算されてもよい。 In step S406, the accuracy calculation unit 109 calculates the accuracy of the correspondence between the three-dimensional position and the tracking result as described above. Details of step S406 will be described with reference to the flowchart of FIG. In the present embodiment, the correspondence accuracy is calculated for each of the three-dimensional positions j obtained by merging the three-dimensional position candidates at the time of interest with the three-dimensional position candidates at the past time. However, the corresponding accuracy may be calculated for each of the three-dimensional position candidates at the time of interest or the three-dimensional position candidates obtained by merging a plurality of three-dimensional position candidates at the time of interest.

ステップＳ８０１において距離計算部１１０は、図９（ａ）に示すように、三次元位置ｊのそれぞれを、それぞれのカメラ画像に投影する。次にステップＳ８０２で距離計算部１１０は、図９（ｂ）に示すように、カメラ画像に投影された三次元位置ｊと追尾結果ｉとの画像上の距離を計算する。例えば、カメラ画像１上で人物Ａ，Ｂが追尾されており、カメラ画像２上で人物Ｃ，Ｄが追尾されている場合について考える。この場合、まずカメラ画像１に投影された三次元位置ｊと人物Ａの追尾結果との距離、及びカメラ画像１に投影された三次元位置ｊと人物Ｂの追尾結果との間の距離が計算される。また、カメラ画像２に投影された三次元位置ｊと人物Ｃの追尾結果との間の距離、及びカメラ画像２に投影された三次元位置ｊと人物Ｄの追尾結果との間の距離も計算される。また、これらの計算がそれぞれの三次元位置ｊについて行われる。 In step S801, the distance calculation unit 110 projects each of the three-dimensional positions j onto each camera image as shown in FIG. 9A. Next, in step S802, the distance calculation unit 110 calculates the distance on the image between the three-dimensional position j projected on the camera image and the tracking result i, as shown in FIG. 9B. For example, consider a case where persons A and B are tracked on camera image 1 and persons C and D are tracked on camera image 2. In this case, first, the distance between the three-dimensional position j projected on the camera image 1 and the tracking result of the person A and the distance between the three-dimensional position j projected on the camera image 1 and the tracking result of the person B are calculated. Is done. The distance between the three-dimensional position j projected on the camera image 2 and the tracking result of the person C, and the distance between the three-dimensional position j projected on the camera image 2 and the tracking result of the person D are also calculated. Is done. Further, these calculations are performed for each three-dimensional position j.

ステップＳ８０３で相違度計算部１１１は、三次元位置ｊの持つ属性と追尾結果ｉの属性との相違度を計算する。ここで、追尾結果ｉの属性とは、追尾結果ｉに対応する人物検出領域から抽出された属性のことを指す。本実施形態において、三次元位置ｊの持つ属性、及び追尾結果ｉの属性は色属性のことを指す。また、本実施形態において、三次元位置ｊの持つ色属性と追尾結果ｉの色属性との相違度としては、三次元位置ｊの持つ色属性のＲＧＢ値と追尾結果ｉの色属性のＲＧＢ値との２乗誤差を用いる。カメラ画像１上で人物Ａ，Ｂが追尾されており、カメラ画像２上で人物Ｃ，Ｄが追尾されている上記の場合、三次元位置ｊと、人物Ａ，Ｂ，Ｃ，Ｄのそれぞれとの間で相違度が計算される。また、これらの計算がそれぞれの三次元位置ｊについて行われる。 In step S803, the difference calculation unit 111 calculates the difference between the attribute of the three-dimensional position j and the attribute of the tracking result i. Here, the attribute of the tracking result i indicates an attribute extracted from the person detection area corresponding to the tracking result i. In the present embodiment, the attribute of the three-dimensional position j and the attribute of the tracking result i indicate color attributes. Further, in the present embodiment, the difference between the color attribute of the three-dimensional position j and the color attribute of the tracking result i is defined as the RGB value of the color attribute of the three-dimensional position j and the RGB value of the color attribute of the tracking result i. Is used. In the above case where the persons A and B are tracked on the camera image 1 and the persons C and D are tracked on the camera image 2, the three-dimensional position j and each of the persons A, B, C and D The difference between is calculated. Further, these calculations are performed for each three-dimensional position j.

ステップＳ８０４で局所確度計算部１１２は、三次元位置ｊと追尾結果ｉが対応している確度を表す局所対応確度を求める。この際、着目時刻におけるフレーム画像上での着目時刻における三次元位置の候補に対応する位置と物体の位置との間の距離と、着目時刻における三次元位置の候補に対応する物体の属性と物体の属性との相違度と、の少なくとも一方を用いることができる。具体的な例としては、局所確度計算部１１２は、ステップＳ８０２で計算された距離と、ステップＳ８０３で計算された属性の相違度との少なくとも一方に基づいて、局所対応確度を求めることができる。ここで、三次元位置の候補に対応する物体は、三次元位置の候補を生成する際に参照された各フレーム画像上の物体でありうる。 In step S804, the local likelihood calculation unit 112 obtains a local correspondence likelihood indicating the likelihood that the three-dimensional position j corresponds to the tracking result i. At this time, the distance between the position of the object corresponding to the candidate of the three-dimensional position at the time of interest on the frame image at the time of interest and the position of the object, the attribute of the object corresponding to the candidate of the three-dimensional position at the time of interest, and the object And / or the degree of difference from the attribute. As a specific example, the local likelihood calculation unit 112 can calculate the local correspondence accuracy based on at least one of the distance calculated in step S802 and the attribute difference calculated in step S803. Here, the object corresponding to the three-dimensional position candidate may be an object on each frame image referred to when generating the three-dimensional position candidate.

例えば、局所確度計算部１１２は、ステップＳ８０２で計算された距離と、ステップＳ８０３で計算された属性の相違度とに基づいて局所対応確度s^t _i,jを計算する。局所確度計算部１１２は、距離及び相違度が小さいほど局所対応確度が大きくなるように、局所対応確度を計算することができる。具体例としては、局所確度計算部１１２は、ステップＳ８０２で計算された距離とステップＳ８０３で計算された相違度との重み付き平均を任意の重みを用いて求め、さらに局所対応確度として得られた重み付け平均の逆数を求めることができる。本実施形態において、局所確度計算部１１２は、局所対応確度を距離と相違度との双方から求めるが、どちらか１つだけを用いて局所対応確度を求めてもよい。ステップＳ８０４により、三次元位置ｊと、人物Ａ，Ｂ，Ｃ，Ｄのそれぞれとの間で局所対応確度が計算される。また、これらの計算がそれぞれの三次元位置ｊについて行われる。このように、局所対応確度は、着目時刻におけるフレーム画像に基づいて求められる。 For example, the local probability calculation unit 112 calculates the local corresponding likelihood s ^t _{i, j} on the basis of the distance calculated in step S802, the in the dissimilarity of the computed attributes in step S803. The local likelihood calculation unit 112 can calculate the local correspondence accuracy such that the smaller the distance and the difference, the larger the local correspondence accuracy. As a specific example, the local accuracy calculation unit 112 obtains a weighted average of the distance calculated in step S802 and the difference calculated in step S803 using an arbitrary weight, and is further obtained as the local correspondence accuracy. The reciprocal of the weighted average can be determined. In the present embodiment, the local likelihood calculation unit 112 obtains the local correspondence accuracy from both the distance and the difference, but the local correspondence accuracy may be obtained using only one of them. In step S804, the local correspondence accuracy is calculated between the three-dimensional position j and each of the persons A, B, C, and D. Further, these calculations are performed for each three-dimensional position j. As described above, the local correspondence accuracy is obtained based on the frame image at the time of interest.

ステップＳ８０５で積算確度計算部１１３は、三次元位置ｊと追尾結果ｉとの現在の積算対応確度を、ステップＳ８０４で求めた局所対応確度s^t _i,jと、過去に求めた積算対応確度a^t-1 _i,jとから求める。こうして求められる現在の積算対応確度a^t _i,jは、着目時刻ｔにおけるフレーム画像を用いて算出された積算対応確度に相当する。また、過去に求めた積算対応確度a^t-1 _i,jとは、時刻ｔ−１におけるフレーム画像を用いて算出された積算対応確度のことを指す。 Integrating accuracy calculating section 113 in step S805, the current integration corresponding accuracy of the three-dimensional position j the tracking result i, topical corresponding likelihood s ^t _i, and _j calculated in step S804, the integrated response probability a determined previously ^{It is calculated from t-1} _{i, j} . The current integration correspondence accuracy a ^t _{i, j} thus obtained corresponds to the integration correspondence accuracy calculated using the frame image at the time t of interest. Further, the integration-related accuracy a ^t-1 _{i, j} obtained in the past indicates the integration-related accuracy calculated using the frame image at the time t-1.

積算確度計算部１１３は、局所対応確度s^t _i,j及び過去の積算対応確度a^t-1 _i,jが大きいほど現在の積算対応確度が大きくなるように、現在の積算対応確度a^t _i,jを求めることができる。例えば、積算確度計算部１１３は、現在の積算対応確度a^t _i,jとして、局所対応確度s^t _i,jと過去の積算対応確度a^t-1 _i,jとの任意の重みを用いた重み付き平均を求めることができる。具体例としては、下式を用いる方法が挙げられる。

上式においてｗは更新重みを表し、通常０＜ｗ＜１である。 Integrating accuracy calculating section 113, the local response accuracy s ^t _{i, j,} and the more so the integration corresponding accuracy of the current increases past accumulated corresponding accuracy a ^t-1 _{i, j} is large, the present integration corresponding accuracy a ^t _{i , j} can be obtained. For example, the integrated probability calculation unit 113, the present integration corresponding accuracy a ^t _i, as _j, using any of the weights of the local corresponding likelihood s ^t _{i, j} and past accumulated corresponding accuracy a ^t-1 _{i, j} A weighted average can be determined. As a specific example, there is a method using the following formula.

In the above equation, w represents the update weight, and is usually 0 <w <1.

ステップＳ８０５において積算確度計算部１１３はさらに、閾値以上の値を有する現在の積算対応確度a^t _i,jを、積算確度記憶部１１４に記録する。次回のステップＳ４０７において時刻ｔ＋１におけるフレーム画像を用いて積算対応確度a^t+1 _i,jを算出する際には、こうして記録された積算対応確度a^t _i,jは、過去の積算対応確度として参照される。一方で、積算確度計算部１１３は、現在の積算対応確度が所定値未満である場合、この積算対応確度を記録しない。三次元位置ｊと追尾結果ｉとの間の積算対応確度a^t _i,jが閾値未満である場合には、積算確度計算部１１３は三次元位置ｊと追尾結果ｉとの関係は薄いものと判断し、積算対応確度a^t _i,jを記録せず破棄する。 In step S _< b> 805, the integration accuracy calculation unit 113 further records the current integration correspondence accuracy a ^t _{i, j} having a value equal to or greater than the threshold value in the integration accuracy storage unit 114. Integrating corresponding accuracy by using a frame image at time t + 1 in the next step S407 a ^{t + 1} _i, when calculating the _j is thus recorded accumulated corresponding accuracy a ^t _{i, j} is a past cumulative corresponding Accuracy Referenced. On the other hand, when the current integration correspondence accuracy is less than the predetermined value, the integration accuracy calculation unit 113 does not record the integration correspondence accuracy. Integrating corresponding accuracy a ^t _i between the three-dimensional position j the tracking result _i, if _j is less than the threshold, the accumulated probability calculation unit 113 and those thin relation to the three-dimensional position j and tracking result i Judge and discard without adding the integration correspondence accuracy a ^t _{i, j} .

ステップＳ４０７において位置確定部１１５は、それぞれの三次元位置ｊについて、三次元位置ｊと追尾結果ｉとの間の積算対応確度a^t _i,jを用いて、実像らしさを表す実像信頼度L_jを求める。そして、位置確定部１１５は、実像信頼度L_jに基づいて、三次元位置ｊを実像を表す三次元位置として確定する。具体的には、位置確定部１１５は、実像として確定された三次元位置ｊの実像信頼度が高くなるように、三次元位置ｊのうちいくつかを実像を表す三次元位置として確定する。そして、位置確定部１１５、実像として確定した三次元位置に対して対応する追尾結果を割り当て、実像として確定された三次元位置ｊと追尾結果ｉとの対応関係を記録する。ステップＳ４０７の詳細を図１０に示す。図１０の処理においては、ステップＳ１００４で処理を継続しないと判定されるまで、ステップＳ１００１からステップＳ１００４までが繰り返し実行される。 In step S407, for each three-dimensional position j, the position determination unit 115 uses the integrated correspondence accuracy a ^t _{i, j} between the three-dimensional position j and the tracking result i to obtain a real image reliability L _j representing the likelihood of a real image. Ask for. The position determination unit 115, based on the real image reliability L _j, to determine the three-dimensional position j as a three-dimensional position representing the real image. Specifically, the position determination unit 115 determines some of the three-dimensional positions j as three-dimensional positions representing the real image so that the real image reliability of the three-dimensional position j determined as the real image is increased. Then, the position determination unit 115 allocates a corresponding tracking result to the three-dimensional position determined as the real image, and records the correspondence between the three-dimensional position j determined as the real image and the tracking result i. FIG. 10 shows the details of step S407. In the process of FIG. 10, steps S1001 to S1004 are repeatedly executed until it is determined in step S1004 that the process is not continued.

ステップＳ１００１において信頼度算出部１１６は、三次元位置ｊの実像らしさを表す実像信頼度L_jを求める。信頼度算出部１１６は、積算対応確度a^t _i,jがより高い追尾結果ｉを有しているほど実像信頼度L_jが高くなるように、三次元位置ｊの実像信頼度L_jを算出する。具体例としては次のような方法が挙げられる。以下の処理は、ステップＳ１００３において実像として確定されていない全ての三次元位置ｊと、ステップＳ１００３において実像に対応する追尾結果として確定されていない全ての追尾結果ｉを用いて行われる。 Step reliability calculation unit 116 in S1001 calculates the real reliability L _j representing the real image likeness of the three-dimensional position j. Reliability calculation unit 116, integrating corresponding accuracy a ^t _i, so _j becomes higher real reliability L _j as have higher tracking result i, calculate the real reliability L _j of the three-dimensional position j I do. Specific examples include the following method. The following processing is performed using all three-dimensional positions j not determined as a real image in step S1003 and all tracking results i not determined as tracking results corresponding to the real image in step S1003.

まず、信頼度算出部１１６は、着目時刻における三次元位置の候補のそれぞれについて、１つのフレーム画像群上で追尾されている物体に対応して記録されている積算対応確度を取得する。例えば、信頼度算出部１１６は、それぞれのカメラ画像について、積算対応確度a^t _i,jが最も高くなる追尾結果ｉを探索する。 First, the reliability calculating unit 116 obtains, for each of the three-dimensional position candidates at the time of interest, the integrated correspondence accuracy recorded corresponding to the object tracked on one frame image group. For example, the reliability calculation unit 116 searches for a tracking result i with the highest integrated correspondence accuracy a ^t _{i, j} for each camera image.

ここでは、１つの三次元位置ｊに対応する、それぞれのカメラ画像上の追尾結果ｉは１つであるとする。例えば、カメラ画像１上で人物Ａ，Ｂが追尾されており、カメラ画像２上で人物Ｃ，Ｄが追尾されている上記の場合であって、まだ三次元位置ｊが実像として確定されていない場合について説明する。この場合、信頼度算出部１１６は、フレーム画像群上で追尾されている１以上の物体のうち積算対応確度が最も高くなる物体を選択する。そして、フレーム画像群についての積算対応確度として、選択された物体に対応して記録されている積算対応確度が選択される。例えば、人物Ａと人物Ｂとのうち、三次元位置ｊとの積算対応確度がより大きくなる人物の追尾結果が、カメラ画像１上の追尾結果として選択される。また、人物Ｃと人物Ｄとのうち、三次元位置ｊとの積算対応確度がより大きくなる人物の追尾結果が、カメラ画像２上の追尾結果として選択される。 Here, it is assumed that the tracking result i on each camera image corresponding to one three-dimensional position j is one. For example, in the above case where the persons A and B are tracked on the camera image 1 and the persons C and D are tracked on the camera image 2, the three-dimensional position j has not yet been determined as a real image. The case will be described. In this case, the reliability calculation unit 116 selects an object having the highest integrated correspondence accuracy among one or more objects tracked on the frame image group. Then, as the integration correspondence accuracy for the frame image group, the integration correspondence accuracy recorded corresponding to the selected object is selected. For example, of the person A and the person B, the tracking result of the person whose integration correspondence accuracy with the three-dimensional position j is larger is selected as the tracking result on the camera image 1. In addition, a tracking result of a person having a larger integration correspondence accuracy between the three-dimensional position j and the person C and the person D is selected as a tracking result on the camera image 2.

そして、信頼度算出部１１６は、複数のフレーム画像群のそれぞれについて取得された積算対応確度に応じて三次元位置の候補の実像信頼度を求める。例えば、信頼度算出部１１６は、三次元位置ｊの実像信頼度L_jとして、それぞれのカメラ画像について選択された追尾結果と三次元位置ｊとの間の積算対応確度を合計する。具体的には、三次元位置ｊの実像信頼度L_jは下式で表すことができる。

Then, the reliability calculation unit 116 obtains the real image reliability of the candidate of the three-dimensional position according to the integration correspondence accuracy acquired for each of the plurality of frame image groups. For example, the reliability calculation unit 116, a real image reliability L _j of the three-dimensional position j, summing the integration corresponding accuracy between the tracking result selected for each of the camera images and the three-dimensional position j. Specifically, the real image reliability L _{j at} the three-dimensional position j can be expressed by the following equation.

上式において、A_jは三次元位置ｊに対応する追尾結果の集合である。１つの三次元位置ｊに対応するそれぞれのカメラ画像上の追尾結果ｉは１つであるから、A_jに含まれる追尾結果はカメラ１つにつき最大１つであり、A_jの要素数の最大値はカメラ数である。このように、実像信頼度L_jは、それぞれの三次元位置ｊについて、実像信頼度L_jが高くなるように追尾結果ｉを仮に対応づけることにより求められる。以上のように、信頼度算出部１１６は、それぞれの三次元位置ｊについて実像信頼度L_jを求める。 In the above equation, A _j is a set of tracking results corresponding to the three-dimensional position j. Maximum Since tracking result i on each camera image is one, the tracking result contained in A _j is the most one per one camera 1, the A _j number of elements corresponding to one of the three-dimensional position j The value is the number of cameras. As described above, the real image reliability _Lj is obtained by temporarily associating the tracking result i with each of the three-dimensional positions j such that the real image reliability _Lj becomes higher. As described above, the reliability calculation unit 116 obtains the real image reliability L _j for each of the three-dimensional position j.

ステップＳ１００２において、位置選択部１１７は、より高い実像信頼度を有する三次元位置の候補を、着目時刻における物体の三次元位置に対応すると判定する。例えば、位置選択部１１７は、ステップＳ１００１で求めた実像信頼度L_jが最大となる三次元位置ｊを選択する。選択された三次元位置ｊは、実像を表す三次元位置として確定される。こうして確定された三次元位置は、続くステップＳ１００１では選択されない。 In step S1002, the position selection unit 117 determines that the candidate of the three-dimensional position having higher real image reliability corresponds to the three-dimensional position of the object at the time of interest. For example, the position selecting unit 117, real reliability L _j obtained in step S1001 to select the three-dimensional position j with the maximum. The selected three-dimensional position j is determined as a three-dimensional position representing a real image. The three-dimensional position thus determined is not selected in the subsequent step S1001.

位置選択部１１７は、実像信頼度が最も高い三次元位置の候補であって、三次元位置の候補に対応する物体が所定数以上存在する三次元位置の候補を選択して、着目時刻における物体の三次元位置に対応すると判定することができる。ここで、所定数は２以上であれば特に限定されない。三次元位置の候補に対応する物体が１つしかない、すなわち三次元位置の候補に対応する物体が写っているフレーム画像が１つしかない場合、この三次元位置は正確に算出されていない可能性が高い。 The position selection unit 117 selects a candidate of the three-dimensional position having the highest real image reliability, and a candidate of the three-dimensional position in which the object corresponding to the candidate of the three-dimensional position exists in a predetermined number or more, and selects the object at the time of interest. Can be determined to correspond to the three-dimensional position. Here, the predetermined number is not particularly limited as long as it is two or more. If there is only one object corresponding to the candidate for the three-dimensional position, that is, if there is only one frame image showing the object corresponding to the candidate for the three-dimensional position, the three-dimensional position may not be calculated accurately. High in nature.

ステップＳ１００３において結果選択部１１８は、ステップＳ１００２で選択された三次元位置ｊについて、ステップＳ１００１でそれぞれのカメラ画像について選択された追尾結果を、三次元位置ｊに対応する追尾結果として確定する。もっとも、別の実施形態において、三次元位置ｊを生成するために用いられた追尾結果を、三次元位置ｊに対応する追尾結果として確定してもよい。このように、三次元位置の候補に対応する物体は、三次元位置の候補を生成する際に参照された各フレーム画像上の物体であってもよい。また、三次元位置の候補に対応する物体は、三次元位置の候補に対応する積算対応確度が所定値以上でありかつ積算対応確度が最も高い物体としてフレーム画像毎に選択された物体であってもよい。 In step S1003, for the three-dimensional position j selected in step S1002, the result selecting unit 118 determines the tracking result selected for each camera image in step S1001 as the tracking result corresponding to the three-dimensional position j. However, in another embodiment, the tracking result used to generate the three-dimensional position j may be determined as the tracking result corresponding to the three-dimensional position j. As described above, the object corresponding to the three-dimensional position candidate may be an object on each frame image referred to when generating the three-dimensional position candidate. Further, the object corresponding to the candidate of the three-dimensional position is an object whose integration corresponding accuracy corresponding to the candidate of the three-dimensional position is equal to or more than a predetermined value and which is selected for each frame image as an object having the highest integration corresponding accuracy. Is also good.

ステップＳ１００３においてはさらに除外処理が行われ、選択された三次元位置の候補と、三次元位置の候補に対応する物体（追尾結果）と、はステップＳ１００２における選択処理の選択の対象から外される。例えば、こうして確定された追尾結果は、続くステップＳ１００２では選択されない。 In step S1003, exclusion processing is further performed, and the selected three-dimensional position candidate and the object (tracking result) corresponding to the three-dimensional position candidate are excluded from selection in the selection processing in step S1002. . For example, the tracking result thus determined is not selected in the subsequent step S1002.

ステップＳ１００４で信頼度算出部１１６は、未確定の三次元位置及び追尾結果のうち、対応付け可能なものが存在するか否かを判定する。存在する場合、処理はステップＳ１００１に戻る。また、存在しない場合、図１０の処理は終了する。 In step S1004, the reliability calculation unit 116 determines whether or not there is an undetermined three-dimensional position and a tracking result that can be associated. If there is, the process returns to step S1001. If not, the process in FIG. 10 ends.

例えば、ステップＳ８０５では積算対応確度が高い三次元位置と追尾結果との組み合わせが記録される。一方で、三次元位置との対応確度が高い追尾結果が２以上のカメラ画像に存在することにより、三次元位置を精度良く求めることができる。このような観点から、三次元位置について、対応確度が高い追尾結果が存在するカメラ画像の数が１以下である場合に、この三次元位置については追尾結果を対応付けられないものと判定することができる。そして、全ての三次元位置について追尾結果を対応付けできない場合、未確定の三次元位置及び追尾結果のうち対応付け可能なものは存在しないと判定することができる。 For example, in step S805, a combination of a three-dimensional position having a high integration correspondence accuracy and a tracking result is recorded. On the other hand, the three-dimensional position can be obtained with high accuracy because the tracking result with high correspondence accuracy with the three-dimensional position exists in two or more camera images. From such a viewpoint, when the number of camera images for which a tracking result with high correspondence accuracy exists for a three-dimensional position is 1 or less, it is determined that the tracking result cannot be associated with the three-dimensional position. Can be. If the tracking results cannot be associated with all the three-dimensional positions, it can be determined that there is no unidentified three-dimensional position and tracking result that can be associated.

以上のステップＳ１００１〜Ｓ１００４においては、実像信頼度が最大となる三次元位置を選択及び確定し、この三次元位置に対応する追尾結果を確定する選択処理と、選択された三次元位置及び追尾結果を除外する除外処理と、が繰り返される。このように、三次元位置に割り当てる追尾結果の組み合わせを探索することで、実像として確定される三次元位置の実像信頼度を高くすることができるため、三次元位置と追尾結果との照合精度向上の効果が見込まれる。このような繰り返し処理を行うことは、仮に過去時刻における積算対応確度を考慮しない場合であっても、高い信頼性をもって実像を表す三次元位置を選択することができる点で有利である。一方で、三次元位置と追尾結果との照合方法は、実像として確定された三次元位置の実像信頼度が高くなるのであれば特に限定されない。本実施形態ではグリーディな方法で組み合わせを探索したが、他の組み合わせ探索の方法を用いてもよい。また、それぞれの三次元位置の実像信頼度の和が大きくなるように、総当たり又は他の探索方法で三次元位置に割り当てる追尾結果の組み合わせを探索してもよい。 In the above steps S1001 to S1004, a selection process for selecting and confirming the three-dimensional position at which the real image reliability is maximum, and for determining the tracking result corresponding to the three-dimensional position, the selected three-dimensional position and the tracking result And an exclusion process for excluding are repeated. As described above, by searching for a combination of tracking results assigned to the three-dimensional position, the reliability of the real image of the three-dimensional position determined as the real image can be increased, and thus the accuracy of matching between the three-dimensional position and the tracking result is improved. The effect is expected. Performing such repetitive processing is advantageous in that a three-dimensional position representing a real image can be selected with high reliability even if the integration correspondence accuracy at the past time is not considered. On the other hand, the method of comparing the three-dimensional position with the tracking result is not particularly limited as long as the real image reliability of the three-dimensional position determined as the real image increases. In this embodiment, a combination is searched for by a greedy method, but another combination search method may be used. Further, a combination of tracking results assigned to the three-dimensional position may be searched by a brute force or another search method so that the sum of the real image reliability of each three-dimensional position is increased.

以上のように、ステップＳ４０６においては、三次元位置と追尾結果とを照合することにより、これらが同一の物体であることを示す指標である積算対応確度が算出される。さらにステップＳ４０７では、この積算対応確度を用いて三次元位置が実像か否かが判定される。これらの処理により、虚像である三次元位置を効果的に削減できるため、三次元位置推定の精度向上を見込むことができる。特に、積算対応確度の算出の際には、着目時刻ｔのフレーム画像に基づいて得られる局所対応確度に加えて、過去の複数フレームの対応確度も参照される。このため、過去の長期間のフレーム画像に基づく三次元位置と追尾結果との照合が可能となるため、効果的に虚像を削減する効果が見込まれる。 As described above, in step S406, by comparing the three-dimensional position with the tracking result, the integrated correspondence accuracy, which is an index indicating that these are the same object, is calculated. Further, in step S407, it is determined whether or not the three-dimensional position is a real image using the integration correspondence accuracy. By these processes, the three-dimensional position, which is a virtual image, can be effectively reduced, so that improvement in the accuracy of three-dimensional position estimation can be expected. In particular, when calculating the integrated correspondence accuracy, reference is made to the past correspondence accuracy of a plurality of frames in addition to the local correspondence accuracy obtained based on the frame image at the time of interest t. For this reason, since the three-dimensional position based on the past long-term frame image can be compared with the tracking result, the effect of effectively reducing the virtual image is expected.

また、本実施形態によれば、三次元位置と追尾結果との組み合わせに結びつけて積算対応確度が記録され、記録された積算対応確度を用いて三次元位置と追尾結果との照合が行われる。このため、過去のカメラ画像自体を参照することは必須ではなく、過去の物体追尾で得られる追尾結果の位置及び２Ｄ追尾ラベル等を記憶し、毎フレーム参照することは必須ではない。このような構成によれば、記憶領域サイズを削減する効果及び計算量を削減する効果が見込まれる。 Further, according to the present embodiment, the integration correspondence accuracy is recorded in association with the combination of the three-dimensional position and the tracking result, and the three-dimensional position and the tracking result are collated using the recorded integration correspondence accuracy. Therefore, it is not essential to refer to the past camera image itself, and it is not essential to store the position of the tracking result obtained in the past object tracking, the 2D tracking label, and the like, and refer to each frame. According to such a configuration, an effect of reducing the storage area size and an effect of reducing the amount of calculation are expected.

さらに、ステップＳ４０６では、カメラ画像上に投影された三次元位置と追尾結果との画像上の距離と、属性の相違度と、の双方を用いて三次元位置と追尾結果との照合が行われた。このような構成によれば、距離と属性の相違度とのどちらかだけでは照合が難しい場合あっても、両方を用いることにより効果的に照合を行うことができる。もっとも、距離と属性の相違度との双方を用いることは必須ではなく、どちらか一方を用いて照合を行ってもよいし、別のさらなる情報を用いてもよい。 Further, in step S406, the three-dimensional position is compared with the tracking result using both the distance between the three-dimensional position projected on the camera image and the tracking result on the image and the degree of difference between the attributes. Was. According to such a configuration, even when it is difficult to perform matching only by using either the distance or the difference between the attributes, it is possible to perform effective matching by using both. However, it is not essential to use both the distance and the difference between the attributes, and the matching may be performed using one of them, or another additional information may be used.

また、ステップＳ４０７では、実像信頼度L_jが対応確度a^t _i,jの和として計算される。そのため、三次元位置に対応する追尾結果の数が多いほど実像信頼度L_jは大きくなる。このように、一実施形態においては、積算対応確度が取得されたフレーム画像群の数が多いほど実像信頼度が大きくなるように、三次元位置の候補の実像信頼度が求められる。多視点幾何においては、より多くのカメラに物体が映っているほど対応付けの曖昧性が減少するという性質がある。本実施形態の方法では、三次元位置に対応する追尾結果の数が多いほど大きくなるように実像信頼度L_jが計算されるため、より多くのカメラ画像上の追尾結果に対応づけられた三次元位置が実像として確定されやすくなり、精度向上の効果が見込まれる。 In step S407, the real image reliability L _j is calculated as the sum of the corresponding probabilities a ^t _{i, j} . Therefore, real reliability as the number of tracking results corresponding to the three-dimensional position L _j becomes larger. As described above, in one embodiment, the real image reliability of the candidate of the three-dimensional position is obtained such that the larger the number of frame images in which the integration correspondence accuracy is acquired, the larger the real image reliability. In a multi-view geometry, there is a property that the ambiguity of the association is reduced as more objects are reflected in more cameras. In the method of the present embodiment, since the real image reliability _Lj is calculated so as to increase as the number of tracking results corresponding to the three-dimensional position increases, the third order associated with the tracking results on more camera images The original position is easily determined as a real image, and an effect of improving accuracy is expected.

ステップＳ４０８において位置更新部１１９は、着目時刻における物体の三次元位置に対応すると判定された三次元位置の候補に対応する物体の各フレーム画像上の位置と、複数の撮像部間の位置関係とに基づいて、物体の三次元位置を求める。例えば、位置更新部１１９は、ステップＳ４０７で求めた三次元位置と追尾結果の対応関係を用いて、三次元位置の座標を求めることができる。位置更新部１１９はさらに、着目時刻における三次元位置の候補を、対応すると判定された物体の三次元位置で更新することができる。こうして更新された三次元位置の候補は、次のステップＳ４０５における併合処理において用いることができる。 In step S408, the position update unit 119 determines the position of each object corresponding to the candidate of the three-dimensional position determined to correspond to the three-dimensional position of the object at the time of interest on each frame image, and the positional relationship between the plurality of imaging units. , The three-dimensional position of the object is obtained. For example, the position updating unit 119 can obtain the coordinates of the three-dimensional position using the correspondence between the three-dimensional position and the tracking result obtained in step S407. The position updating unit 119 can further update the candidate of the three-dimensional position at the time of interest with the three-dimensional position of the object determined to correspond. The updated three-dimensional position candidates can be used in the merging process in the next step S405.

例えば座標更新部１２０は、三次元位置に対応するそれぞれのカメラ画像上の追尾結果を用いて、三次元位置の座標を再度算出することができる。ここでは、着目時刻ｔにおけるフレーム画像から検出された追尾結果の座標を用いて三次元位置の座標が算出される。具体的には、ステップＳ５０２と同様に、カメラ画像上の追尾結果の代表点の座標とカメラ中心の座標とに基づいて、三角測量の原理で座標を求めることができる。３つ以上のカメラ画像上の追尾結果を用いて三次元位置の座標を算出する場合には、カメラ中心の座標とカメラ画像上の追尾結果の代表点の座標とを結ぶ三次元空間中の直線をそれぞれ求め、これらの直線に最も近い座標を求めることができる。具体例としては、これらの直線との距離の二乗和が最も小さくなる座標を、カメラ画像上の追尾結果に対応する三次元位置の座標として求めることができる。 For example, the coordinate updating unit 120 can calculate the coordinates of the three-dimensional position again using the tracking result on each camera image corresponding to the three-dimensional position. Here, the coordinates of the three-dimensional position are calculated using the coordinates of the tracking result detected from the frame image at the time point of interest t. Specifically, similarly to step S502, the coordinates can be obtained by the principle of triangulation based on the coordinates of the representative point of the tracking result on the camera image and the coordinates of the camera center. When calculating the coordinates of the three-dimensional position using the tracking results on three or more camera images, a straight line in the three-dimensional space connecting the coordinates of the camera center and the coordinates of the representative point of the tracking results on the camera images And the coordinates closest to these straight lines can be obtained. As a specific example, the coordinates at which the sum of the squares of the distances from these straight lines is the smallest can be obtained as the coordinates of the three-dimensional position corresponding to the tracking result on the camera image.

また、属性更新部１２１は、三次元位置の属性を、追尾結果の属性に基づいて更新することができる。例えば、それぞれのカメラ画像上の追尾結果の画素値に基づいて、三次元位置の属性を求めることができる。具体例としては、それぞれのカメラ画像上の追尾結果の画素値の平均値を三次元位置の属性として求めることができる。 Further, the attribute updating unit 121 can update the attribute of the three-dimensional position based on the attribute of the tracking result. For example, an attribute of a three-dimensional position can be obtained based on a pixel value of a tracking result on each camera image. As a specific example, the average value of the pixel values of the tracking result on each camera image can be obtained as the attribute of the three-dimensional position.

このように、ステップＳ４０８の処理によって、位置記憶部１２３に記録されている三次元位置の候補の座標及び属性が更新される。こうして更新された三次元位置の候補の座標及び属性は、続くステップＳ４０５における併合処理において用いることができる。もっとも、位置記憶部１２３に記録されている三次元位置の候補の座標及び属性を更新することは必ずしも必須ではない。 As described above, the coordinates and attributes of the three-dimensional position candidates recorded in the position storage unit 123 are updated by the processing in step S408. The updated coordinates and attributes of the three-dimensional position candidate can be used in the merging process in step S405. However, it is not always necessary to update the coordinates and attributes of the three-dimensional position candidates recorded in the position storage unit 123.

ステップＳ４０９において位置削除部１２２は、不要な三次元位置を削除する。例えば、位置削除部１２２は、所定期間以上物体に対応すると判定されておらず、所定期間以上三次元位置の候補と併合されていない、過去の時刻において生成された三次元位置の候補を削除することができる。具体例としては、一定フレーム数以上の期間、ステップＳ４０７で実像として確定されなかった三次元位置を削除の対象とすることができる。不要な三次元位置を削除することで、記憶領域の削減、及びステップＳ４０５からステップＳ４０８での計算量の効果が見込まれる。 In step S409, the position deletion unit 122 deletes an unnecessary three-dimensional position. For example, the position deletion unit 122 deletes a candidate for a three-dimensional position generated at a past time that has not been determined to correspond to an object for a predetermined period or more and has not been merged with a candidate for a three-dimensional position for a predetermined period or more. be able to. As a specific example, a three-dimensional position that has not been determined as a real image in step S <b> 407 during a period equal to or more than a certain number of frames can be a deletion target. By deleting unnecessary three-dimensional positions, the effect of reducing the storage area and the amount of calculation in steps S405 to S408 can be expected.

ステップＳ４１０において表示制御部１２６は、以上の処理で求められた人物追尾結果及び三次元位置推定結果を表示機器に表示させる。例えば表示制御部１２６は、ステップＳ４０７で実像として確定された三次元位置と、実像として確定された三次元位置に対応する各カメラ画像上の追尾結果の位置を、表示機器に表示させることができる。 In step S410, the display control unit 126 causes the display device to display the person tracking result and the three-dimensional position estimation result obtained by the above processing. For example, the display control unit 126 can cause the display device to display the three-dimensional position determined as the real image in step S407 and the position of the tracking result on each camera image corresponding to the three-dimensional position determined as the real image. .

図１１には、表示機器に表示される画面の構成例を示す。図１１に示す画面は、１つ以上のカメラ画像１１０１と三次元マップ１１０４とを含む。図１１には、４台のカメラを用いた場合の表示例を示し、４つのカメラ画像１１０１はそれぞれのカメラで撮像された画像である。 FIG. 11 shows a configuration example of a screen displayed on the display device. The screen shown in FIG. 11 includes one or more camera images 1101 and a three-dimensional map 1104. FIG. 11 shows a display example when four cameras are used, and four camera images 1101 are images captured by the respective cameras.

カメラ画像１１０１には、ステップ４０３での追尾結果を示すシンボル（枠）が重畳表示される。本実施形態においては、ステップＳ４０７で実像として確定された三次元位置に対応する各カメラ画像上の追尾結果を示す枠が、それぞれのカメラ画像１１０１に重畳される。異なるカメラに映っている人物が同一であるか否かをユーザが容易に識別できるように、異なるカメラ画像においても同一の人物を示す枠は同じ色で表示される。 A symbol (frame) indicating the tracking result in step 403 is superimposed on the camera image 1101. In the present embodiment, a frame indicating a tracking result on each camera image corresponding to the three-dimensional position determined as a real image in step S407 is superimposed on each camera image 1101. Frames indicating the same person are displayed in the same color in different camera images so that the user can easily identify whether or not the same person appears in different cameras.

三次元マップ１１０４には、位置記憶部１２３から取得された人物の三次元位置を示すシンボル１１０３と、カメラの位置及び向きを示すシンボル１１０２とが、床面とともに三次元画像として表示されている。本実施形態においては、ステップＳ４０７で実像として確定された三次元位置が、三次元マップ１１０４に表示される。カメラ画像１１０１上の人物と三次元マップ１１０４上の人物が同一であるか否かをユーザが容易に識別できるように、カメラ画像１１０１上の追尾結果を示す枠と三次元マップ１１０４上の三次元位置を示すシンボルとは同じ色で表示される。 In the three-dimensional map 1104, a symbol 1103 indicating the three-dimensional position of the person acquired from the position storage unit 123 and a symbol 1102 indicating the position and orientation of the camera are displayed as a three-dimensional image together with the floor surface. In the present embodiment, the three-dimensional position determined as the real image in step S407 is displayed on the three-dimensional map 1104. A frame indicating a tracking result on the camera image 1101 and a three-dimensional map on the three-dimensional map 1104 so that the user can easily identify whether the person on the camera image 1101 and the person on the three-dimensional map 1104 are the same. The symbol indicating the position is displayed in the same color.

ステップＳ４０５では、併合された三次元位置が同一人物を表すことを示す情報が対応記憶部１２５に記憶される。このように、いくつかの三次元位置が同一の人物を表すことを示す情報が対応記憶部１２５に記憶されている場合、これらの三次元位置が同一の人物を表すことがと分かるように三次元マップ１１０４を表示することができる。例えば、ステップＳ４０５においていくつかの三次元位置が併合された場合、ステップ４０５で併合された後の三次元位置に対応する人物を表すシンボルの近くに、併合前の三次元位置に対応する人物を表すシンボルと同じ色の印を表示することができる。このように、対応記憶部１２５を設けることで、異なる三次元位置が同じ人物を表すことを示すことが可能となる。このような構成は、単に追跡結果を表示する場合の他に、例えば映像解析処理により人物の長期間の三次元軌跡を算出する場合又は人物を同定する場合等に有効に適用可能である。 In step S405, information indicating that the merged three-dimensional position represents the same person is stored in the correspondence storage unit 125. As described above, when information indicating that some three-dimensional positions represent the same person is stored in the correspondence storage unit 125, the three-dimensional position is determined to represent that the three-dimensional positions represent the same person. The original map 1104 can be displayed. For example, when some three-dimensional positions are merged in step S405, a person corresponding to the three-dimensional position before merging is placed near a symbol representing a person corresponding to the three-dimensional position merged in step 405. A mark of the same color as the symbol to be displayed can be displayed. Thus, by providing the correspondence storage unit 125, it is possible to indicate that different three-dimensional positions represent the same person. Such a configuration can be effectively applied to, for example, a case where a long-term three-dimensional trajectory of a person is calculated by video analysis processing or a case where a person is identified, in addition to a case where a tracking result is simply displayed.

図１１の例では、人物が同一であるか否かを表現するためにそれぞれの人物を異なる色で表示した。別の例においては、視認性を高めるために、人物固有の番号、文字又は記号等を、カメラ画像１１０１又は三次元マップ１１０４に重畳表示してもよい。また、図１１の例では、三次元マップ１１０４を用いて三次元位置を三次元空間中に表示したが、三次元位置を二次元マップ上に表示することもできる。三次元位置を三次元マップ上に表す場合には、三次元マップから人物の床面上の位置に加えて高さ方向の位置を知ることができる。特に、物体追尾を顔認識に基づいて行う場合等、人物の位置が頭部を基準として決定される場合には、三次元マップから人物の身長を知ることができる。一方で、三次元位置を二次元マップ上に表す場合には、人物の床面上での位置が視認しやすくなるという効果がある。 In the example of FIG. 11, each person is displayed in a different color to express whether or not the person is the same. In another example, a number, a character, a symbol, or the like unique to a person may be superimposed on the camera image 1101 or the three-dimensional map 1104 in order to enhance visibility. Further, in the example of FIG. 11, the three-dimensional position is displayed in the three-dimensional space using the three-dimensional map 1104, but the three-dimensional position can be displayed on the two-dimensional map. When the three-dimensional position is represented on the three-dimensional map, the position of the person in the height direction can be known from the three-dimensional map in addition to the position of the person on the floor surface. In particular, when the position of a person is determined based on the head, such as when object tracking is performed based on face recognition, the height of the person can be known from the three-dimensional map. On the other hand, when the three-dimensional position is represented on the two-dimensional map, there is an effect that the position of the person on the floor is easily recognized.

また、三次元マップ１１０４の視点は固定されていてもよいし、ユーザが視点を変更できる機構が設けられていてもよい。三次元マップ１１０４に表示できるものは以上で挙げたものには限られない。例えば、家具等の配置を示す見取り図を床面に重畳表示してもよいし、その他の物体を重畳表示してもよい。このようにすることで、人物と、人物の周囲にある物体との位置関係をわかりやすく提示することができる。 In addition, the viewpoint of the three-dimensional map 1104 may be fixed, or a mechanism that allows the user to change the viewpoint may be provided. What can be displayed on the three-dimensional map 1104 is not limited to the above. For example, a floor plan showing the arrangement of furniture and the like may be superimposed on the floor, or another object may be superimposed. By doing so, the positional relationship between the person and the object around the person can be presented in an easily understandable manner.

ステップＳ４１１で、画像取得部１０１は、処理を継続するか否かを判定する。新たなフレーム画像をカメラから取得可能である場合、処理はステップＳ４０２に戻り、新たに取得されたフレーム画像を用いて処理が繰り返される。一方で、新たなフレーム画像をカメラから取得可能ではない場合、又は終了を示すユーザ指示があった場合、図４の処理は終了する。 In step S411, the image acquisition unit 101 determines whether to continue the processing. If a new frame image can be acquired from the camera, the process returns to step S402, and the process is repeated using the newly acquired frame image. On the other hand, when a new frame image cannot be obtained from the camera, or when there is a user instruction indicating the end, the processing in FIG. 4 ends.

実像信頼度L_jを求める際には、積算対応確度a^t _i,j以外のパラメータを参照することもできる。例えば、本実施形態においては、長期間実像として確定されなかった三次元位置は削除される。このため、三次元位置の存在時間が長いことは、追尾結果が対応づけられる頻度が高い、すなわち実像である尤もらしさが高いことを意味する。このため、三次元位置ｊの存在時間が長いほど実像信頼度L_jが増加するように、実像信頼度L_jを求めてもよい。一実施形態においては、より古い時刻に生成された三次元位置の候補と併合されているほど実像信頼度が大きくなるように、三次元位置の候補の実像信頼度が求められる。より古い時刻に生成された三次元位置の候補と併合されていることは、三次元位置ｊの存在時間が長いことを意味する。例えば、三次元位置ｊの存在時間をT_jとすると、実像信頼度L_jを次式のように求めることができる。

上式において、αは定数である。このような構成によれば、実像を示す三次元位置をより正確に選択し、虚像を削減する効果が見込まれる。 When obtaining the real reliability L _j is accumulated corresponding accuracy a ^t _i, can refer to parameters other than _j. For example, in the present embodiment, a three-dimensional position that has not been determined as a real image for a long time is deleted. Therefore, a longer existence time of the three-dimensional position means that the tracking result is frequently associated, that is, the likelihood of being a real image is high. Therefore, as the three-dimensional position longer real reliability L _j is the presence time of j is increased, it may be obtained real image reliability L _j. In one embodiment, the real image reliability of the three-dimensional position candidate is determined so that the more the three-dimensional position candidate generated at an earlier time is merged, the higher the real image reliability becomes. The fact that the three-dimensional position j is merged with the candidate of the three-dimensional position generated at an earlier time means that the existence time of the three-dimensional position j is long. For example, assuming that the existence time of the three-dimensional position j is T _j , the real image reliability L _j can be obtained as in the following equation.

In the above equation, α is a constant. According to such a configuration, an effect of more accurately selecting a three-dimensional position indicating a real image and reducing a virtual image is expected.

本実施形態においては、積算対応確度を局所対応確度と過去の積算対応確度との和として、実像信頼度を積算対応確度の和として、それぞれ計算している。一方で、積算対応確度を局所対応確度と過去の積算対応確度との積として、実像信頼度を積算対応確度の積として、それぞれ計算してもよい。この場合、積算対応確度及び実像信頼度は尤度ではなく確率として計算できる。 In the present embodiment, the integrated correspondence accuracy is calculated as the sum of the local correspondence accuracy and the past integrated correspondence accuracy, and the real image reliability is calculated as the sum of the integrated correspondence accuracy. On the other hand, the integrated correspondence accuracy may be calculated as the product of the local correspondence accuracy and the past integrated correspondence accuracy, and the real image reliability may be calculated as the product of the integrated correspondence accuracy. In this case, the integration correspondence accuracy and the real image reliability can be calculated as probabilities instead of likelihoods.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other Examples)
The present invention supplies a program for realizing one or more functions of the above-described embodiments to a system or an apparatus via a network or a storage medium, and one or more processors in a computer of the system or the apparatus read and execute the program. This processing can be realized. Further, it can be realized by a circuit (for example, an ASIC) that realizes one or more functions.

１０１：画像取得部、１０２：物体追尾部、１０４：位置生成部、１０９：確度算出部、１１５：位置確定部、１２３：位置記憶部、１２４：位置併合部 101: image acquisition unit, 102: object tracking unit, 104: position generation unit, 109: accuracy calculation unit, 115: position determination unit, 123: position storage unit, 124: position merging unit

Claims

Acquisition means for acquiring a group of continuously captured frame images from each of a plurality of imaging units having overlapping visual fields,
Tracking means for tracking an object on the frame image group;
Generating means for generating a candidate for a three-dimensional position of the object at a time of interest based on a position on the frame image of the object and a positional relationship between the plurality of imaging units;
Associating means for associating the candidate of the three-dimensional position at the time of interest with the candidate of the three-dimensional position of the object at a past time before the time of interest;
Based on the local correspondence accuracy indicating the probability that the candidate of the three-dimensional position at the time of interest corresponds to the object, and the integration correspondence accuracy indicating the probability that the candidate of the three-dimensional position at the past time corresponds to the object, Calculation means for calculating an integrated correspondence accuracy indicating a probability that the candidate of the three-dimensional position at the time of interest corresponds to the object,
For each of the three-dimensional position candidates at the time of interest, the integration correspondence accuracy recorded corresponding to the object tracked on one frame image group is obtained, and for each of the plurality of frame image groups, Determining the reliability of the candidate of the three-dimensional position according to the acquired integration correspondence accuracy, determining means for determining the three-dimensional position of the object at the time of interest based on the reliability ,
An image processing apparatus comprising:

The determining unit selects an object having the highest integration correspondence accuracy among one or more objects tracked on the frame image group, and determines the selected object as the integration correspondence accuracy for the frame image group. The image processing apparatus according to claim 1 , wherein the integration correspondence accuracy recorded in correspondence with ( 1 ) is selected.

The calculating means, the integration if the corresponding probability is less than a predetermined value, characterized in that to suppress the recording of the integrated response accuracy, the image processing apparatus according to claim 1 or 2.

The method according to claim 1, wherein the determining unit obtains the reliability of the candidate of the three-dimensional position so that the reliability increases as the number of the frame image groups from which the integration correspondence accuracy is acquired increases. 4. The image processing device according to 3 .

The associating unit merges the candidate for the three-dimensional position at the time of interest and the candidate for the three-dimensional position at the past time, so that the candidate for the three-dimensional position at the time of interest and the three-dimensional position at the past time are combined. characterized in that associating the position of the candidate, the image processing apparatus according to any one of claims 1 to 4.

The said determination means calculates | requires the reliability of the said three-dimensional position candidate so that the said reliability may become large, so that it may be merged with the candidate of the three-dimensional position produced | generated at the older time. Item 6. The image processing device according to Item 5 .

The associating means includes: a distance between the candidate of the three-dimensional position at the time of interest and the candidate of the three-dimensional position at the past time; an attribute of the object corresponding to the candidate of the three-dimensional position at the time of interest; Merging the three-dimensional position candidate at the time of interest and the three-dimensional position candidate at the past time based on at least one of the degree of difference with the attribute of the object corresponding to the three-dimensional position candidate at the time. wherein the image processing apparatus according to claim 5 or 6.

The associating unit, when merging two or more three-dimensional position candidates at the past time determined to correspond to different objects, respectively, generates information indicating that the two or more three-dimensional positions correspond to the same object. wherein the recording, the image processing apparatus according to any one of claims 5 to 7.

The associating unit deletes the candidate of the three-dimensional position generated at a past time that is not determined to correspond to the object for a predetermined period or more and is not merged with the candidate of the three-dimensional position for a predetermined period or more. characterized by, the image processing apparatus according to any one of claims 5 to 8.

The determining means includes:
A candidate for the three-dimensional position having the highest reliability, and a candidate for a three-dimensional position where an object corresponding to the candidate for the three-dimensional position is present for a predetermined number or more is selected. A selection process for determining that it corresponds to the position;
The selected three-dimensional position candidate and an object corresponding to the three-dimensional position candidate, an exclusion process of excluding from the target in the selection process,
And repeating the image processing apparatus according to any one of claims 1 to 9.

The calculating means, the distance between the position of the object being tracked on the frame image in the position and the time of interest on the frame images corresponding to the candidate of the three-dimensional position of the object in the time of interest the local and obtaining the corresponding accuracy, the image processing apparatus according to any one of claims 1 to 10 based on.

The local correspondence accuracy is calculated based on a difference between an attribute of an object corresponding to a candidate of the three-dimensional position of the object at the time of interest and an attribute of an object tracked on a frame image at the time of interest. and obtaining the image processing apparatus according to any one of claims 1 to 11.

The object corresponding to the candidate of the three-dimensional position is an object on each frame image referred to when generating the candidate of the three-dimensional position, and the integrated correspondence accuracy corresponding to the candidate of the three-dimensional position is a predetermined value or more. 13. The image processing according to any one of claims 7 to 12 , wherein the object is selected for each frame image as the object having the highest integration correspondence accuracy. apparatus.

Based on the position on each frame image of the object corresponding to the candidate for the three-dimensional position determined to correspond to the three-dimensional position of the object at the time of interest, based on a positional relationship between the plurality of imaging units, and further comprising means for determining the three-dimensional position of an object, an image processing apparatus according to any one of claims 1 to 13.

15. The image processing apparatus according to claim 14 , further comprising: means for updating a candidate of a three-dimensional position at the time of interest with a three-dimensional position of the object determined to correspond.

The said integration correspondence accuracy is the parameter | index which shows the possibility that the object which the said candidate of the three-dimensional position shows is the same as the said tracked object, The said any one of Claim 1 thru | or 15 characterized by the above-mentioned. An image processing apparatus according to claim 1.

An image processing method performed by the image processing apparatus,
An acquisition step of acquiring a group of continuously captured frame images from each of a plurality of imaging units having overlapping visual fields,
A tracking step of tracking an object on the frame image group;
A generation step of generating a candidate for a three-dimensional position of the object at a time of interest based on a position on the frame image of the object and a positional relationship between the plurality of imaging units;
An associating step of associating the candidate of the three-dimensional position at the time of interest with the candidate of the three-dimensional position of the object at a past time before the time of interest;
Based on the local correspondence accuracy indicating the probability that the candidate of the three-dimensional position at the time of interest corresponds to the object, and the integration correspondence accuracy indicating the probability that the candidate of the three-dimensional position at the past time corresponds to the object, A calculation step of obtaining an integrated correspondence accuracy indicating a probability that the three-dimensional position candidate at the time of interest corresponds to the object,
For each of the candidates for the three-dimensional position at the time of interest, the integration correspondence accuracy recorded corresponding to the object tracked on one frame image group is obtained, and for each of the plurality of frame image groups, Determining a reliability of the candidate of the three-dimensional position according to the acquired integration correspondence accuracy, a determining step of determining a three-dimensional position of the object at the time of interest based on the reliability ,
An image processing method comprising:

Program for causing a computer to function as each unit of the image processing apparatus according to any one of claims 1 to 16.