JP2015064778A

JP2015064778A - Detection object identification device, conversion device, monitoring system, and computer program

Info

Publication number: JP2015064778A
Application number: JP2013198777A
Authority: JP
Inventors: 顕司渡辺; Kenji Watanabe; 康雄荻内; Yasuo Ogiuchi; 多喜夫栗田; Takio Kurita
Original assignee: Hiroshima University NUC; Sumitomo Electric Industries Ltd
Current assignee: Hiroshima University NUC; Sumitomo Electric Industries Ltd
Priority date: 2013-09-25
Filing date: 2013-09-25
Publication date: 2015-04-09

Abstract

PROBLEM TO BE SOLVED: To provide a useful and new technology for detecting the same detection object (person, vehicle or the like) in a plurality of videos without imposing such restrictions that a part of the visual fields of a plurality of cameras is shared.SOLUTION: A detection object identification device includes: an identification part 65 for identifying whether or not a first detection object detected from a first video is the same object as a second detection object detected from a second video; and a conversion part 64 for converting first data X' indicating the first detection object in the first video by using a conversion parameter A to generate converted first data AX'. The identification part 65 identifies whether or not the first detection object is the same object as the second detection object on the basis of the converted first data AX' and second data Y' indicating the second detection object in the second video. The conversion parameter is a parameter for correcting the first data in accordance with a difference between a way of detecting the detection object in the first video and a way of detecting the detection object in the second video.

Description

本発明は、検出対象識別装置、変換装置、監視システム、及びコンピュータプログラムに関するものである。 The present invention relates to a detection target identification device, a conversion device, a monitoring system, and a computer program.

特許文献１には、カメラによって撮像された画像から人物位置を検出する画像監視装置が開示されている。特許文献１の画像監視装置では、複数のカメラにより撮像された画像それぞれから人物の位置を検出し、各画像から検出された人物位置の情報を統合する。 Patent Document 1 discloses an image monitoring apparatus that detects a person position from an image captured by a camera. In the image monitoring apparatus of Patent Document 1, the position of a person is detected from each of images captured by a plurality of cameras, and information on the position of the person detected from each image is integrated.

特許第５１４７７６１号公報Japanese Patent No. 5147761 特許第４７０６５３５号公報Japanese Patent No. 4706535 特許第４６８５３９０号公報Japanese Patent No. 4665390 特許第５０８５６２１号公報Japanese Patent No. 5085621

特許文献１に記載の技術では、人物を検出するために用いられる複数のカメラは、互いに視野の一部を共有している必要がある。
しかし、カメラの設置場所の都合上、複数のカメラの視野の一部を共有することが困難な場合がある。
このため、視野の一部を共有するという制約が生じない方が、カメラの設置の自由度が向上する。 In the technique described in Patent Document 1, it is necessary that a plurality of cameras used for detecting a person share a part of the field of view.
However, it may be difficult to share a part of the field of view of a plurality of cameras due to the installation location of the cameras.
For this reason, the freedom degree of installation of a camera improves if the restriction | limiting which shares a part of visual field does not arise.

そこで、本発明は、複数のカメラの視野の一部を共有するという制約を生じさせることなく、複数の映像における同一の検出対象（人、車両など）を検出できるようにするために有用で新規な技術を提供することである。 Therefore, the present invention is useful and novel in order to be able to detect the same detection target (person, vehicle, etc.) in a plurality of videos without causing the restriction of sharing a part of the field of view of the plurality of cameras. Is to provide new technology.

一の観点からみた本発明は、第１映像から検出された第１検出対象と第２映像から検出された第２検出対象とが同一対象であるかを識別する識別部と、前記第１映像における前記第１検出対象を示す第１データに対して、第１変換パラメータを用いた変換を行って変換第１データを生成し、又は、前記第２映像における前記第２検出対象を示す第２データに対して、第２変換パラメータを用いた変換を行って変換第２データを生成する変換部と、訓練データを用いた機械学習を行って前記第１変換パラメータ及び前記第２変換パラメータを学習する学習部と、を備え、前記識別部は、前記変換第１データと、前記第２映像における前記第２検出対象を示す第２データと、に基づいて、前記第１検出対象が前記第２検出対象と同一対象であるかを識別し、又は、前記変換第２データと、前記第１映像における前記第１検出対象を示す第１データと、に基づいて、前記第２検出対象が前記第１検出対象と同一対象であるかを識別し、前記訓練データは、同一対象が存在する学習用の第１映像及び学習用の第２映像それぞれにおける前記同一対象を示すデータであり、前記機械学習は、目的関数を最適化するように前記第１変換パラメータ及び前記第２変換パラメータを求めることで行われ、前記第１変換パラメータ及び前記第２変換パラメータは、行列であり、前記目的関数は、前記第１変換パラメータの行列及び第２変換パラメータの行列のうちの一方の行列が、他方に対して疑似逆行列に近づくようにするための項を含む検出対象識別装置に関するものである。
上記本発明によれば、第１映像と第２映像における検出対象に検出され方の違いに応じて、検出対象を示すデータが変換されるため、第１検出対象と第２検出対象とが同一対象であるかを識別するのが容易になる。 According to one aspect of the present invention, an identification unit for identifying whether a first detection target detected from a first video and a second detection target detected from a second video are the same target, and the first video The first data indicating the first detection target is converted using the first conversion parameter to generate converted first data, or the second data indicating the second detection target in the second video is displayed. A conversion unit that performs conversion on the data using the second conversion parameter to generate conversion second data, and performs machine learning using training data to learn the first conversion parameter and the second conversion parameter A learning unit that performs the first detection target based on the converted first data and the second data indicating the second detection target in the second video. Whether it is the same target as the detection target Whether the second detection target is the same target as the first detection target based on identification or the converted second data and the first data indicating the first detection target in the first video The training data is data indicating the same target in each of the first video for learning and the second video for learning in which the same target exists, and the machine learning is to optimize the objective function. The first conversion parameter and the second conversion parameter are obtained, the first conversion parameter and the second conversion parameter are matrices, and the objective function is the matrix of the first conversion parameters and the second conversion parameters. The present invention relates to a detection target identification device including a term for making one of the two transformation parameter matrices approach a pseudo inverse matrix with respect to the other.
According to the present invention, since the data indicating the detection target is converted according to the difference in the detection target in the first video and the second video, the first detection target and the second detection target are the same. It becomes easy to identify the target.

他の観点からみた本発明は、第１映像から検出された第１検出対象を示す第１データに対して、第１変換パラメータを用いた変換を行って変換第１データを生成し、又は、第２映像から検出された第２検出対象を示す第２データに対して、第２変換パラメータを用いた変換を行って変換第２データを生成する変換部と、訓練データを用いた機械学習を行って前記第１変換パラメータ及び前記第２変換パラメータを学習する学習部と、を備え、前記訓練データは、同一対象が存在する学習用の第１映像及び学習用の第２映像それぞれにおける前記同一対象を示すデータであり、前記機械学習は、目的関数を最適化するように前記第１変換パラメータ及び前記第２変換パラメータを求めることで行われ、前記第１変換パラメータ及び前記第２変換パラメータは、行列であり、前記目的関数は、前記第１変換パラメータの行列及び第２変換パラメータの行列のうちの一方の行列が、他方に対して疑似逆行列に近づくようにするための項を含む変換装置に関するものである。
上記本発明によれば、前記映像と前記他の映像における検出対象の検出され方の違いに応じて、検出対象データを変換することができる。 According to another aspect of the present invention, the first data indicating the first detection target detected from the first video is converted using the first conversion parameter to generate the converted first data, or A conversion unit that generates conversion second data by performing conversion using the second conversion parameter on the second data indicating the second detection target detected from the second video, and machine learning using training data. A learning unit that performs learning of the first conversion parameter and the second conversion parameter, and the training data is the same in each of the first video for learning and the second video for learning in which the same object exists. The machine learning is performed by obtaining the first transformation parameter and the second transformation parameter so as to optimize an objective function, and the first transformation parameter and the second transformation. The parameter is a matrix, and the objective function has a term for causing one of the matrix of the first transformation parameter and the matrix of the second transformation parameter to approach a pseudo inverse matrix with respect to the other. It is related with the conversion apparatus containing.
According to the present invention, detection target data can be converted according to a difference in detection method of detection targets between the video and the other video.

他の観点からみた本発明は、前記検出対象識別装置と、前記第１映像を撮像する第１カメラと、前記第２映像を撮像する第２カメラと、を備える監視システムに関するものである。 From another viewpoint, the present invention relates to a monitoring system including the detection target identification device, a first camera that captures the first video, and a second camera that captures the second video.

他の観点からみた本発明は、コンピュータを、前記検出対象識別装置として機能させるためのコンピュータプログラムである。さらに他の観点からみた本発明は、前記検出対象識別装置又は変換装置において行われる方法（検出対象識別方法又は変換方法）である。 From another viewpoint, the present invention is a computer program for causing a computer to function as the detection target identification device. From another viewpoint, the present invention is a method (detection target identification method or conversion method) performed in the detection target identification device or conversion device.

本発明によれば、複数のカメラの視野の一部を共有するという制約を生じさせることなく、複数の映像における同一の検出対象を検出できるようにするために有用で新規な技術が得られる。 According to the present invention, it is possible to obtain a useful and novel technique for detecting the same detection target in a plurality of videos without causing a restriction that a part of the field of view of the plurality of cameras is shared.

監視システムの全体構成図である。1 is an overall configuration diagram of a monitoring system. カメラ配置を示す図である。It is a figure which shows camera arrangement | positioning. 映像処理システムの構成図である。It is a block diagram of a video processing system. 識別処理部の構成図である。It is a block diagram of an identification process part. 学習（線形回帰）の仕方を示す図である。It is a figure which shows the method of learning (linear regression). 記憶部の補間係数行列テーブルを示す図である。It is a figure which shows the interpolation coefficient matrix table of a memory | storage part. 実験結果を示す表である。It is a table | surface which shows an experimental result.

以下、本発明の好ましい実施形態について図面を参照しながら説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

［１．本願発明の実施形態の概要］
（１）実施形態に係る検出対象識別装置は、第１映像から検出された第１検出対象と第２映像から検出された第２検出対象とが同一対象であるかを識別する識別部と、前記第１映像における前記第１検出対象を示す第１データに対して、第１変換パラメータを用いた変換を行って変換第１データを生成し、又は、前記第２映像における前記第２検出対象を示す第２データに対して、第２変換パラメータを用いた変換を行って変換第２データを生成する変換部と、訓練データを用いた機械学習を行って前記第１変換パラメータ及び前記第２変換パラメータを学習する学習部と、を備え、前記識別部は、前記変換第１データと、前記第２映像における前記第２検出対象を示す第２データと、に基づいて、前記第１検出対象が前記第２検出対象と同一対象であるかを識別し、又は、前記変換第２データと、前記第１映像における前記第１検出対象を示す第１データと、に基づいて、前記第２検出対象が前記第１検出対象と同一対象であるかを識別し、前記訓練データは、同一対象が存在する学習用の第１映像及び学習用の第２映像それぞれにおける前記同一対象を示すデータであり、前記機械学習は、目的関数を最適化するように前記第１変換パラメータ及び前記第２変換パラメータを求めることで行われ、前記第１変換パラメータ及び前記第２変換パラメータは、行列であり、前記目的関数は、前記第１変換パラメータの行列及び第２変換パラメータの行列のうちの一方の行列が、他方に対して疑似逆行列に近づくようにするための項を含む。
上記の検出対象識別装置によると、第１映像における検出対象の検出され方と、第２映像における検出対象の検出され方との違いに応じて、データを補正することができるため、検出対象が同一対象であるかを識別するのが容易となる。 [1. Outline of Embodiment of Present Invention]
(1) The detection target identification device according to the embodiment includes an identification unit that identifies whether the first detection target detected from the first video and the second detection target detected from the second video are the same target; The first data indicating the first detection target in the first video is converted using a first conversion parameter to generate converted first data, or the second detection target in the second video A conversion unit that performs conversion using the second conversion parameter to generate second converted data, and performs machine learning using training data to perform the first conversion parameter and the second data A learning unit for learning a conversion parameter, wherein the identification unit is based on the converted first data and second data indicating the second detection target in the second video. Is the same as the second detection target Whether the second detection target is the first detection target based on the converted second data and the first data indicating the first detection target in the first video. The training data is data indicating the same target in each of the first video for learning and the second video for learning in which the same target exists, and the machine learning is an objective function The first transformation parameter and the second transformation parameter are obtained so as to optimize the first transformation parameter, the first transformation parameter and the second transformation parameter are matrices, and the objective function is the first transformation. One of the parameter matrix and the second transformation parameter matrix includes a term for making the other approach a pseudo inverse matrix.
According to the detection target identification device, the data can be corrected according to the difference between the detection target detection method in the first video and the detection target detection method in the second video. It becomes easy to identify whether they are the same object.

（２）前記検出対象識別装置は、前記変換パラメータを、訓練データを用いた機械学習によって学習する学習部を更に備え、前記訓練データは、同一対象が存在する学習用の第１映像及び学習用の第２映像それぞれにおける前記同一対象を示すデータであるのが好ましい。この場合、変換パラメータを学習によって得ることができる。 (2) The detection target identification device further includes a learning unit that learns the conversion parameter by machine learning using training data, and the training data includes a first video for learning in which the same target exists and a learning target. Preferably, the data indicates the same object in each of the second videos. In this case, the conversion parameter can be obtained by learning.

（３）前記目的関数は、前記機械学習における過学習を抑制するための項を含むのが好ましい。この場合、機械学習における過学習を抑制して適切な変換パラメータを得ることができる。 (3) It is preferable that the objective function includes a term for suppressing overlearning in the machine learning. In this case, it is possible to obtain an appropriate conversion parameter by suppressing overlearning in machine learning.

（４）前記第２映像から検出された複数の時刻における前記第２検出対象を示す複数の前記第２データに基づいて、局所性保存射影法（ＬＬＰ：ＬｏｃａｌｉｔｙＰｒｅｓｅｒｖｉｎｇＰｒｏｊｅｃｔｉｏｎｓ）によって識別空間を生成する空間生成部を更に備え、前記識別部は、前記識別空間に射影された前記変換第１データに基づいて、前記第１検出対象が前記第２検出対象と同一対象であるかを識別するのが好ましい。この場合、局所性保存射影法によって、第２データの局所性が反映された識別空間が得られ、識別性が向上する。 (4) Based on a plurality of second data indicating the second detection target at a plurality of times detected from the second video, an identification space is generated by a locality preserving projection method (LLP). A space generation unit is further provided, and the identification unit identifies whether the first detection target is the same target as the second detection target based on the converted first data projected onto the identification space. preferable. In this case, an identification space reflecting the locality of the second data is obtained by the locality preserving projection method, and the discrimination is improved.

（５）複数のパラメータ候補から、前記変換第１データの生成に用いられる前記第１変換パラメータを選択する選択部を更に備えているのが好ましい。この場合、複数のパラメータ候補から適切な変換パラメータを選択することができる。 (5) It is preferable to further include a selection unit that selects the first conversion parameter used for generating the converted first data from a plurality of parameter candidates. In this case, an appropriate conversion parameter can be selected from a plurality of parameter candidates.

（６）前記選択部は、前記第１映像を撮像したときの状況を示す第１情報及び前記第２映像を撮像したときの状況を示す第２情報の少なくともいずれか一方に基づいて、複数のパラメータ候補から、前記変換第１データの生成に用いられる前記第１変換パラメータを選択するのが好ましい。この場合、撮像したときの状況に基づいて、適切な変換パラメータを選択することができる。 (6) The selection unit includes a plurality of information based on at least one of first information indicating a situation when the first video is captured and second information indicating a situation when the second video is captured. It is preferable to select the first conversion parameter used for generating the converted first data from parameter candidates. In this case, an appropriate conversion parameter can be selected based on the situation when the image is taken.

（７）前記第１情報又は前記第２情報は、前記第１映像又は前記第２映像を撮影したカメラのカメラパラメータを含む情報であるのが好ましい。この場合、撮影時のカメラパラメータに応じて、適切な変換パラメータを選択することができる。 (7) It is preferable that the first information or the second information is information including a camera parameter of a camera that has captured the first video or the second video. In this case, an appropriate conversion parameter can be selected according to the camera parameter at the time of shooting.

（８）前記第１情報又は前記第２情報は、前記第１映像又は前記第２映像を撮影した時刻を示す時刻情報を含む情報であるのが好ましい。この場合、撮影時刻に応じて、適切な変換パラメータを選択することができる。 (8) It is preferable that the first information or the second information is information including time information indicating a time when the first video or the second video is captured. In this case, an appropriate conversion parameter can be selected according to the shooting time.

（９）前記変換部は、複数の前記第１変換パラメータを用いて前記変換第１データを生成するよう構成され、前記変換部は、前記第１データに対して前記複数の前記第１変換パラメータのうちの一の第１変換パラメータを用いた変換を行って生成された第１中間データと、前記第１データに対して前記複数の前記第１変換パラメータのうちの他の第１変換パラメータを用いた変換を行って生成された第２中間データと、に基づいて、前記変換第１データを生成するのが好ましい。この場合、完全に適切な一つの変換パラメータを用いることができない場合であっても、複数の変換パラメータから比較的適切な変換第１データを生成することができる。 (9) The conversion unit is configured to generate the converted first data using a plurality of the first conversion parameters, and the conversion unit performs the plurality of the first conversion parameters on the first data. First intermediate data generated by performing conversion using one of the first conversion parameters, and other first conversion parameters of the plurality of first conversion parameters for the first data. The converted first data is preferably generated based on the second intermediate data generated by performing the conversion used. In this case, even if a completely appropriate conversion parameter cannot be used, relatively appropriate conversion first data can be generated from a plurality of conversion parameters.

（１０）本実施形態に係る変換装置は、第１映像から検出された第１検出対象を示す第１データに対して、第１変換パラメータを用いた変換を行って変換第１データを生成し、又は、第２映像から検出された第２検出対象を示す第２データに対して、第２変換パラメータを用いた変換を行って変換第２データを生成する変換部と、訓練データを用いた機械学習を行って前記第１変換パラメータ及び前記第２変換パラメータを学習する学習部と、を備え、前記訓練データは、同一対象が存在する学習用の第１映像及び学習用の第２映像それぞれにおける前記同一対象を示すデータであり、前記機械学習は、目的関数を最適化するように前記第１変換パラメータ及び前記第２変換パラメータを求めることで行われ、
前記第１変換パラメータ及び前記第２変換パラメータは、行列であり、前記目的関数は、前記第１変換パラメータの行列及び第２変換パラメータの行列のうちの一方の行列が、他方に対して疑似逆行列に近づくようにするための項を含む。この場合、前記映像と前記他の映像における検出対象の検出され方の違いに応じて、検出対象データを変換することができる。 (10) The conversion device according to the present embodiment generates conversion first data by performing conversion using the first conversion parameter on the first data indicating the first detection target detected from the first video. Alternatively, the second data indicating the second detection target detected from the second video is converted using the second conversion parameter to generate the converted second data, and the training data is used. A learning unit that performs machine learning to learn the first conversion parameter and the second conversion parameter, and the training data includes a first video for learning and a second video for learning in which the same object exists, respectively. The machine learning is performed by obtaining the first transformation parameter and the second transformation parameter so as to optimize the objective function,
The first transformation parameter and the second transformation parameter are matrices, and the objective function is a pseudo inverse of one of the first transformation parameter matrix and the second transformation parameter matrix with respect to the other. Includes terms to approximate the matrix. In this case, the detection target data can be converted in accordance with the difference in how the detection target is detected in the video and the other video.

（１１）本実施形態に係る監視システムは、前記検出対象識別装置と、前記第１映像を撮像する第１カメラと、前記第２映像を撮像する第２カメラと、を備える監視システムである。 (11) The monitoring system according to the present embodiment is a monitoring system including the detection target identification device, a first camera that captures the first video, and a second camera that captures the second video.

（１２）本実施形態に係るコンピュータプログラムは、コンピュータを、前記検出対象識別装置又は変換装置として機能させるためのコンピュータプログラムである。 (12) The computer program according to the present embodiment is a computer program for causing a computer to function as the detection target identification device or the conversion device.

［２．本願発明の実施形態の詳細］
［２．１システムの全体構成］
図１は、実施形態に係る監視システム１を示している。監視システム１は、映像処理システム２と、映像処理システム２に接続された複数（Ｐ個）のカメラ３−１，３−２，・・，３−Ｐと、を備えている。本実施形態の監視システム１は、主に、人を検出対象とするものであるが、検出対象は人以外のもの、例えば、車両であってもよい。 [2. Details of Embodiment of Present Invention]
[2.1 Overall system configuration]
FIG. 1 shows a monitoring system 1 according to the embodiment. The monitoring system 1 includes a video processing system 2 and a plurality (P) of cameras 3-1, 3-2,..., 3-P connected to the video processing system 2. Although the monitoring system 1 of this embodiment mainly uses a person as a detection target, the detection target may be other than a person, for example, a vehicle.

映像処理システム２は、複数のカメラ３−１，３−２，・・，３−Ｐによって撮像された映像を処理する。監視システム１は、例えば、映像処理システム２における映像処理によって検出された不審者を、複数のカメラ３−１，３−２，・・，３−Ｐによって追跡することができる。 The video processing system 2 processes video captured by a plurality of cameras 3-1, 3-2,. The monitoring system 1 can track, for example, a suspicious person detected by video processing in the video processing system 2 by a plurality of cameras 3-1, 3-2,.

映像処理システム２は、複数（Ｐ個）のカメラ３−１，３−２，・・，３−Ｐそれぞれに対応して設けられた複数（Ｐ個）の処理装置４−１，４−２，・・，４−Ｐと、複数の処理装置４−１，４−２，・・，４−Ｐが接続された映像統合装置（検出対象識別装置）５と、を備えている。
なお、映像処理システム２は、複数の装置４−１，４−２，・・，４−Ｐ，５で構成されている必要はなく、１台の装置によって構成されていてもよい。また、処理装置４−１，４−２，・・，４−Ｐは、複数のカメラに対して一対一で設けられている必要はなく、一つの処理装置が２以上のカメラの映像を処理してもよい。 The video processing system 2 includes a plurality of (P) processing devices 4-1, 4-2 provided corresponding to each of the plurality (P) of cameras 3-1, 3-2,. ,..., 4-P and a video integration device (detection target identification device) 5 to which a plurality of processing devices 4-1, 4-2,.
The video processing system 2 does not need to be configured by a plurality of devices 4-1, 4-2,..., 4-P, 5, and may be configured by a single device. The processing devices 4-1, 4-2,..., 4-P do not need to be provided one-to-one with respect to a plurality of cameras, and one processing device processes two or more cameras. May be.

図２は、複数のカメラ３−１，３−２，・・，３−Ｐのうち、第１カメラ３−１及び第２カメラ３−２の設置の仕方の例を示している。図２に示すように、第１カメラ３−１及び第２カメラ３−２は、建物１００に取り付けられている。第１カメラ３−１及び第２カメラ３−２は、建物の存在する敷地１０１における異なる場所を監視（撮像）している。つまり、第１カメラ３−１の視野Ｖ１は、第２カメラ３−２の視野Ｖ２とは、重複しておらず、互いに離れている。 FIG. 2 shows an example of how to install the first camera 3-1 and the second camera 3-2 among the plurality of cameras 3-1, 3-2,. As shown in FIG. 2, the first camera 3-1 and the second camera 3-2 are attached to a building 100. The first camera 3-1 and the second camera 3-2 monitor (image) different locations on the site 101 where the building exists. That is, the visual field V1 of the first camera 3-1 does not overlap with the visual field V2 of the second camera 3-2 and is separated from each other.

本実施形態の監視システム１では、例えば、第２カメラ３−２によって撮像された映像から検出された検出対象（不審者など）Ｈが、第１カメラ３−１の視野Ｖ１に対応するエリアに移動すると、第１カメラ３−１によって撮像された映像に検出対象Ｈが現われたことを識別して、その検出対象Ｈの追跡を行うことができる。 In the monitoring system 1 of the present embodiment, for example, a detection target (such as a suspicious person) H detected from an image captured by the second camera 3-2 is in an area corresponding to the visual field V1 of the first camera 3-1. When it moves, it can be identified that the detection target H has appeared in the video imaged by the first camera 3-1, and the detection target H can be tracked.

なお、複数のカメラ３−１，３−２，・・，３−Ｐには、互いに視野を一部共有するものが含まれていてもよい。本実施形態の監視システム１では、カメラの視野が一部共有されているか否かにかかわらず、複数のカメラ３−１，３−２，・・，３−Ｐで、同一の検出対象物Ｍを追跡することができる。 The plurality of cameras 3-1, 3-2,..., 3 -P may include those that partially share a field of view. In the monitoring system 1 of the present embodiment, the same detection object M is detected by the plurality of cameras 3-1, 3-2,..., 3 -P, regardless of whether or not the camera field of view is partially shared. Can be tracked.

［２．２映像処理システム］
映像処理システム２の処理装置４―１,・・，４−Ｐは、図３に示すように、映像処理部４１と、制御部４２と、を備えている。また、映像処理システム２の映像統合装置（検出対象識別装置）５は、識別処理部５１と、学習部５２と、記憶部５３と、を備えている。
処理装置４１及び映像統合装置５は、それぞれ、コンピュータを有して構成されている。処理装置４１及び映像統合装置５における後述の各機能は、それぞれが備えるコンピュータにコンピュータプログラムを実行させることで発揮される。なお、前記コンピュータプログラムは、ＣＤ−ＲＯＭなどの記憶媒体に格納させて頒布することができる。 [2.2 Video processing system]
The processing devices 4-1,..., 4-P of the video processing system 2 include a video processing unit 41 and a control unit 42, as shown in FIG. The video integration device (detection target identification device) 5 of the video processing system 2 includes an identification processing unit 51, a learning unit 52, and a storage unit 53.
Each of the processing device 41 and the video integration device 5 includes a computer. Each function to be described later in the processing device 41 and the video integration device 5 is exhibited by causing a computer provided therein to execute a computer program. The computer program can be distributed by being stored in a storage medium such as a CD-ROM.

［２．２．１映像処理部（検出部）］
映像処理部（検出部；人検出部）４１は、検出対象である人の検出処理を行う。検出処理では、カメラ３−１（，３−２，・・，３−Ｐ）によって撮像された映像から、人（検出対象）が存在するエリアを抽出したデータ（検出対象データ）の生成が行われる。 [2.2.1 Video processing unit (detection unit)]
A video processing unit (detection unit; human detection unit) 41 performs detection processing of a person who is a detection target. In the detection process, data (detection target data) is generated by extracting an area where a person (detection target) is present from an image captured by the camera 3-1 (, 3-2,..., 3-P). Is called.

実施形態に係る映像処理部４１では、ＨＯＧ（ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）特徴を用いて、人の検出処理が行われる。なお、検出処理としては、ＨＯＧ特徴を用いたものに限定されず、様々な公知の検出処理手法を採用することができる。具体的には、色情報又はテクスチャ情報等に基づく他の画像特徴量を用いて、検出処理を行ってもよい。
映像処理部４１によって検出された検出対象を示すデータ（検出対象データ）は、処理装置４の通信部４３を介して、映像統合装置５の通信部５４へ送信される。 In the video processing unit 41 according to the embodiment, a human detection process is performed using HOG (Histograms of Oriented Gradients) features. Note that the detection processing is not limited to the one using the HOG feature, and various known detection processing methods can be employed. Specifically, the detection process may be performed using other image feature amounts based on color information or texture information.
Data (detection target data) indicating the detection target detected by the video processing unit 41 is transmitted to the communication unit 54 of the video integration device 5 via the communication unit 43 of the processing device 4.

本実施形態において、映像統合装置５へ送信される検出対象データは、カメラによって撮像された映像のうち検出対象である「人」が存在するものとして検出された範囲における特徴データ（ＨＯＧ特徴データ）である。ただし、検出対象データは、カメラによって撮像された映像から検出対象である「人」が存在するものとして検出された前記範囲を切り出しただけの部分的な映像データ（輝度値データ）であってもよい。
なお、ＨＯＧ特徴で示される検出対象データは、図示が困難であることから、理解の容易のため、図面においては、人が存在する範囲の部分的な映像データとして検出対象データを示す。 In the present embodiment, the detection target data transmitted to the video integration device 5 is feature data (HOG feature data) in a range where it is detected that a “person” that is a detection target is present in the video captured by the camera. It is. However, even if the detection target data is partial video data (luminance value data) obtained by cutting out the above-described range detected as a detection target “person” from the video captured by the camera. Good.
Since the detection target data indicated by the HOG feature is difficult to illustrate, the detection target data is shown as partial video data in a range where a person exists in the drawing for easy understanding.

［２．２．２制御部］
処理装置４１の制御部４２は、処理装置４−１（，４−２，・・）に接続されたカメラ３−１（，３−２，・・）を制御する。カメラは、パン機能、チルト機能、及びズーム機能を有しており、制御部４２は、パン、チルト、及びズームの少なくとも一つを制御するカメラパラメータを生成し、そのカメラパラメータに従ってカメラを制御することで、カメラの視野を制御することができる。なお、カメラは、パン機能、チルト機能、及びズーム機能の全ての機能を有している必要はないが、これらの機能のうちの少なくとも一つの機能を有しているのが好ましい。 [2.2.2 Control unit]
The control unit 42 of the processing device 41 controls the camera 3-1 (, 3-2,...) Connected to the processing device 4-1 (, 4-2,...). The camera has a pan function, a tilt function, and a zoom function, and the control unit 42 generates camera parameters that control at least one of pan, tilt, and zoom, and controls the cameras according to the camera parameters. Thus, the field of view of the camera can be controlled. The camera does not have to have all of the pan function, the tilt function, and the zoom function, but preferably has at least one of these functions.

なお、検出対象データの元になった映像を撮像したカメラの撮像時におけるカメラパラメータは、処理装置４−１（，４−２，・・）の通信部４３から映像統合処理装置５の通信部５４へ送信される。したがって、映像統合装置５は、検出対象データ及びその検出対象データの元になった映像のカメラパラメータの双方を取得することができる。
なお、カメラ３−１（，３−２，・・）が、視野固定のものである場合において、映像統合装置５が各カメラのカメラパラメータを把握している場合には、カメラパラメータを映像統合装置５へ送信するのを省略してもよい。 Note that the camera parameters at the time of imaging of the camera that captured the video that is the basis of the detection target data are from the communication unit 43 of the processing device 4-1 (, 4-2,. 54. Therefore, the video integration device 5 can acquire both the detection target data and the camera parameters of the video that is the basis of the detection target data.
When the camera 3-1 (, 3-2,...) Has a fixed field of view, and the video integration device 5 knows the camera parameters of each camera, the camera parameters are integrated into the video. Transmission to the device 5 may be omitted.

また、制御部４２は、カメラパラメータに加えて又は代えて、映像を撮像した時刻を示す時刻情報を、映像統合装置５へ送信してもよい。
さらに、制御部４２は、検出対象データの元になった映像を撮像したカメラを識別するカメラ識別情報も、映像統合装置５へ送信する。 In addition to or instead of the camera parameters, the control unit 42 may transmit time information indicating the time at which the video is captured to the video integration device 5.
Further, the control unit 42 also transmits camera identification information for identifying the camera that captured the video that is the basis of the detection target data to the video integration device 5.

前述のカメラパラメータ、時刻情報、カメラ識別情報は、後述する補間係数行列（変換パラメータ）を選択するための情報として用いられる。
また、カメラパラメータ及び時刻情報は、映像を撮像したときの状況を示す情報（情報情報）であり、より適切な補間係数行列（変換パラメータ）を選択するために用いられる。なお、状況情報は、カメラパラメータ及び時刻情報に限られるものではなく、撮像時の状況を示す他の情報（例えば、天気を示す情報）であってもよい。 The camera parameter, time information, and camera identification information described above are used as information for selecting an interpolation coefficient matrix (conversion parameter) described later.
The camera parameter and time information are information (information information) indicating a situation when an image is captured, and are used to select a more appropriate interpolation coefficient matrix (conversion parameter). The situation information is not limited to the camera parameter and the time information, and may be other information (for example, information indicating the weather) indicating the situation at the time of imaging.

［２．２．３識別処理部］
映像統合装置５の識別処理部５１は、一のカメラの映像から検出された検出対象データと、他のカメラの映像から検出された検出対象データと、のマッチング（同一性の識別）を行う。
より具体的には、例えば、図２に示す人Ｍが不審者として第２カメラ３−２の視野Ｖ２内で検出された場合において、識別処理部５１は、第２カメラ３−２とは別のカメラである第１カメラ３−１によって撮像された映像から検出された１又は複数の人物が、第２カメラ３−２によって撮像された映像から検出された人Ｍ（不審者）と同一対象（同一人物）であるか否かを識別する。 [2.2.3 Identification processing unit]
The identification processing unit 51 of the video integration device 5 performs matching (identity identification) between detection target data detected from the video of one camera and detection target data detected from the video of another camera.
More specifically, for example, when the person M shown in FIG. 2 is detected as a suspicious person in the visual field V2 of the second camera 3-2, the identification processing unit 51 is different from the second camera 3-2. One or a plurality of persons detected from the video imaged by the first camera 3-1 which is the same camera as the person M (suspicious person) detected from the video imaged by the second camera 3-2 Whether or not the same person is identified.

識別処理部５１による対象同一性の識別は、図４に示すように、第１カメラ３−１の映像から検出された追跡候補サンプル（群）（第１カメラ３−１の映像から検出された１又は複数人の検出対象データ）Ｘ’のうち、初期検出サンプル（第２カメラ３−２の映像から検出された検出対象データ）Ｙ’に良く近似するものを、同一対象として識別することで行われる。 As shown in FIG. 4, the identification processing unit 51 identifies the target identity as a tracking candidate sample (group) detected from the video of the first camera 3-1 (detected from the video of the first camera 3-1. By identifying one or a plurality of detection target data) X ′ that closely approximates the initial detection sample (detection target data detected from the video of the second camera 3-2) Y ′ as the same target Done.

識別処理部５１は、局所性保存射影法（ＬＰＰ;ＬｏｃａｌｉｔｙＰｒｅｓｅｒｖｉｎｇＰｒｏｊｅｃｔｉｏｎｓ）によって、識別空間（射影空間）６２を生成する識別空間生成部６１を備えている。局所性保存射影法（ＬＰＰ）は、入力データ６０である初期検出サンプルＹ’の分布を考慮した射影空間生成法である。局所性保存射影法を用いると、複数（Ｎ_１個）の検出対象データｄ２−１，ｄ２−２，・・，ｄ２−Ｎ_１からなる初期検出サンプルＹ’の局所性を保存した識別空間６２を生成することができる。 The identification processing unit 51 includes an identification space generation unit 61 that generates an identification space (projection space) 62 by a locality preserving projection method (LPP; Locality Preserving Projections). The locality preserving projection method (LPP) is a projection space generation method considering the distribution of the initial detection sample Y ′ as the input data 60. With locality saving projection method, a plurality detection target data d2-1 of _{(N 1} pieces), d2-2, · ·, identification space saved locality initial detection sample Y 'consisting of _{d2-N 1} 62 Can be generated.

初期検出サンプルＹ’ （（Ｍ＋１）×Ｎ_１次元）を構成する複数（Ｎ_１個）の検出対象データｄ２−１，ｄ２−２，・・，ｄ２−Ｎ_１は、第２カメラ３−２の映像から検出された同一の検出対象（不審者Ｍ）についての複数（Ｎ_１個）のフレーム分の検出対象データ（異なる時点の複数（Ｎ_１個）の映像からそれぞれ抽出された複数の検出対象データ）である。 Detection target data d2-1 early detection sample Y 'plurality constituting the ((M + 1) × _{N 1-dimensional)} _{(1 N), d2-2, ··, d2} -N 1 , the second camera 3-2 Detection target data for a plurality of (N ₁ ) frames (a plurality of detections extracted from a plurality (N ₁ ) of images at different time points) for the same detection target (suspicious person M) detected from the video of Target data).

さらに識別処理部５１は、追跡候補サンプル群６３を構成するそれぞれの追跡候補サンプルＸ’（（Ｍ＋１）×Ｎ_２次元）を、識別空間６２に射影する。なお、実際には、識別空間６２に射影される追跡候補サンプルＸ’には、後述の変換（線形補間）が施され、変換された追跡候補サンプルＡ^ＴＸ’（変換データ）が、識別空間６２に射影される。なお、図４において、Ｎ_１＝Ｎ_２であってもよいし、Ｎ_１≠Ｎ_２であってもよい。 Further, the identification processing unit 51 projects each tracking candidate sample X ′ ((M + 1) × N _two- dimensional) constituting the tracking candidate sample group 63 onto the identification space 62. Actually, the tracking candidate sample X ′ projected onto the identification space 62 is subjected to the conversion (linear interpolation) described later, and the converted tracking candidate sample A ^T X ′ (conversion data) is converted into the identification space. Projected to 62. In FIG. 4, N ₁ = N ₂ may be satisfied, or N ₁ ≠ N ₂ may be satisfied.

識別処理部５１の検出部（識別部）６５は、距離空間である識別空間６２において、最近傍探索法（１−ＮＮ）によって、初期検出サンプルＹ’に対する距離が最も小さくなる追跡候補サンプルＸ’（変換データＡ^ＴＸ’）を抽出する。検出部６５は、抽出された追跡候補サンプルＸ’（変換データＡ^ＴＸ’）の距離（初期検出サンプルＹ’に対する距離）が、基準距離よりも小さい場合に、抽出された追跡候補サンプルＸ’が初期検出サンプルＹ’と同一対象（同一人物）であることを識別する。 The detection unit (identification unit) 65 of the identification processing unit 51 uses the nearest neighbor search method (1-NN) in the identification space 62 which is a metric space, and the tracking candidate sample X ′ having the smallest distance to the initial detection sample Y ′. (Conversion data A ^T X ′) is extracted. When the distance of the extracted tracking candidate sample X ′ (transformed data A ^T X ′) (the distance to the initial detection sample Y ′) is smaller than the reference distance, the detection unit 65 extracts the tracking candidate sample X ′ that has been extracted. Is the same target (same person) as the initial detection sample Y ′.

上記の識別処理は、第１カメラ３−１の映像の複数フレームから得られた追跡候補サンプルについて一括して行われる。
また、上記の識別処理は、第１カメラ３−１の映像のフレーム毎に行ってもよい。例えば、第１カメラ３−１の映像の１番目のフレームの映像に関しては、図４に示すように、１番目のフレームの映像から検出された４つの追跡候補サンプル（４人分の検出対象データ）ｄ１ａ−１，ｄ１ｂ−１，ｄ１ｃ−１，ｄ１ｄ−１が、それぞれ変換パラメータＡによって変換された上で、識別空間６２に射影され、初期検出サンプルＹ’とマッチングする追跡候補サンプルの抽出が行われてもよい。
さらに、第１カメラのｉ番目のフレームの映像に関しても、ｉ番目のフレームの映像から検出された４つの追跡候補サンプル（４人分の検出対象データ）ｄ１ａ−ｉ，ｄ１ｂ−ｉ，ｄ１ｃ−ｉ，ｄ１ｄ−ｉが、それぞれ変換パラメータＡによって変換された上で、識別空間６２に射影され、初期検出サンプルＹ’とマッチングする追跡候補サンプルの抽出が行われてもよい。
さらに、複数フレームから得られた追跡候補サンプルについて一括して行われる識別処理と、フレーム毎の識別処理とを両方行ってもよい。両方行う場合、それぞれの結果の組み合わせに応じて、適切な識別結果を得てもよい。 The above identification processing is performed collectively for the tracking candidate samples obtained from a plurality of frames of the video of the first camera 3-1.
Moreover, you may perform said identification process for every flame | frame of the image | video of the 1st camera 3-1. For example, for the first frame image of the first camera 3-1, as shown in FIG. 4, four tracking candidate samples detected from the first frame image (detection target data for four people). ) D1a-1, d1b-1, d1c-1, and d1d-1 are converted by the conversion parameter A, projected onto the identification space 62, and extraction of tracking candidate samples matching the initial detection sample Y ′ is extracted. It may be done.
Further, regarding the video of the i-th frame of the first camera, four tracking candidate samples (data to be detected for four persons) d1a-i, d1b-i, d1c-i detected from the video of the i-th frame. , D1d-i may be respectively converted by the conversion parameter A, projected to the identification space 62, and extraction of tracking candidate samples matching the initial detection sample Y ′ may be performed.
Furthermore, both the identification process performed collectively for the tracking candidate samples obtained from a plurality of frames and the identification process for each frame may be performed. When both are performed, an appropriate identification result may be obtained according to the combination of the results.

初期検出サンプルＹ’と同一対象（同一人物）であると識別された追跡候補サンプルＸ’（図４では、追跡候補サンプルサンプルｄ１ｃ−１，ｄ１ｃ−ｉ）が存在する場合、追跡候補サンプル群を撮影した第１カメラ３−１の視野内に、追跡対象になる不審者Ｍが存在していることになる。
そこで、識別処理部５１の情報生成部６６は、初期検出サンプルＹ’と同一対象（同一人物）であると識別された追跡候補サンプルＸ’（検出対象データｄ１ｃ−１，ｄ１ｃ−ｉ）によって示される検出対象が、不審者（追跡対象）であることを示す不審者（追跡対象）存在情報を生成する。不審者存在情報は、通信部５４を介して、処理装置４２の制御部４２に与えられる。制御部４２は、識別された不審者の移動にカメラの視野を追従させるようにカメラパラメータを生成し、第１カメラ３−１の視野を制御する。以上の処理によって、第１カメラ３−１の視野に現れた不審者（追跡対象）を第１カメラ３−１で追跡することができる。 When there are tracking candidate samples X ′ (in FIG. 4, tracking candidate sample samples d1c-1 and d1c-i) identified as the same target (same person) as the initial detection sample Y ′, the tracking candidate sample group is selected. The suspicious person M to be tracked is present in the field of view of the first camera 3-1 that has been photographed.
Therefore, the information generation unit 66 of the identification processing unit 51 is indicated by the tracking candidate sample X ′ (detection target data d1c-1, d1c-i) identified as the same target (same person) as the initial detection sample Y ′. Suspicious person (tracking target) presence information indicating that the detected target is a suspicious person (tracking target) is generated. The suspicious person presence information is given to the control unit 42 of the processing device 42 via the communication unit 54. The control unit 42 generates a camera parameter so that the visual field of the camera follows the movement of the identified suspicious person, and controls the visual field of the first camera 3-1. Through the above process, the suspicious person (tracking target) appearing in the field of view of the first camera 3-1 can be tracked by the first camera 3-1.

さて、検出対象の同一性の識別のため、第２カメラ３−２の映像（第２映像）から検出された検出対象の特徴量（ＨＯＧ特徴量）である初期検出サンプルを、第１カメラ３−１の映像（第１映像）から検出された検出対象の特徴量（ＨＯＧ特徴量）を比較しようとした場合、両特徴量には、第２カメラ３−２での検出対象の見え方（検出され方）と、第１カメラ３−１での検出対象の見え方（検出され方）と、の違いに基づく相違が生じる。
つまり、第１カメラ３−１と第２カメラ３−２とは異なる場所に設置されているため、カメラを通じた検出対象の見え方には相違がある。この相違は、例えば、カメラ毎の視野の相違や光（照明光、太陽光）の当たり方などが原因で生じる。
したがって、同一の検出対象（人物）を、第１カメラ３−１で撮像した場合の特徴量と第２カメラ３−２で撮像した場合の特徴量とには、検出対象の見え方（検出され方）の違いに起因した相違が反映されたものとなる。 Now, in order to identify the identity of the detection target, an initial detection sample that is a feature amount (HOG feature amount) of the detection target detected from the video (second video) of the second camera 3-2 is used as the first camera 3. When the feature amount (HOG feature amount) of the detection target detected from the video of -1 (first video) is to be compared, both feature amounts include how the detection target of the second camera 3-2 looks ( A difference based on the difference between the detection method) and the appearance of the detection target in the first camera 3-1 (detection method) occurs.
That is, since the first camera 3-1 and the second camera 3-2 are installed in different places, there is a difference in how the detection target is seen through the camera. This difference is caused by, for example, a difference in field of view for each camera or how light (illumination light, sunlight) is applied.
Therefore, the feature amount when the same detection target (person) is imaged by the first camera 3-1 and the feature amount when the second camera 3-2 is imaged are detected (detected). The difference due to the difference in ()) is reflected.

したがって、第１カメラ３−１の映像から検出された検出対象の特徴量と、第２カメラ３−２の映像から検出された検出対象の特徴量と、を単に比較しても、両特徴量には、見え方の違い（カメラの違い）に起因する相違が存在しており、同一対象であることの識別を困難にすることがある。 Therefore, even if the feature quantity of the detection target detected from the video of the first camera 3-1 and the feature quantity of the detection target detected from the video of the second camera 3-2 are simply compared, both feature quantities , There are differences due to differences in appearance (camera differences), which may make it difficult to identify the same object.

これに対して、本実施形態では、カメラ毎の検出対象の見え方（検出され方）の相違を、検出対象を示す特徴量の線形補間によって補う。つまり、一のカメラの映像から検出された検出対象の特徴量を、適切な補間係数行列（変換パラメータ）Ａによって変換して、他のカメラでの検出対象に見え方（検出され方）に近づくように補正（線形補間）する。 On the other hand, in this embodiment, the difference in the appearance (detection method) of the detection target for each camera is compensated by linear interpolation of the feature quantity indicating the detection target. That is, the feature quantity of the detection target detected from the video of one camera is converted by an appropriate interpolation coefficient matrix (conversion parameter) A, and approaches how it is detected (detected) by the other camera. To correct (linear interpolation).

補間係数行列（変換パラメータ）Ａは、複数のカメラ間における検出対象の見え方の相違に応じたものである。補間係数行列（変換パラメータ）Ａは、同一の検出対象について、一のカメラの映像から検出される検出対象（の特徴量）と、他のカメラの映像から検出される検出対象（の特徴量）と、に基づいて、線形回帰（機械学習）によって推定される。
一のカメラの映像から検出された検出対象の特徴量（第１データ）Ｘ’を、線形回帰によって推定された補間係数行列（変換パラメータ）Ａによって変換（線形補間；補正）することで、一のカメラの映像から検出された検出対象の特徴量（第１データ）を当該検出対象を他のカメラの映像から検出したときの特徴量に近似させた変換特徴量（変換データ；変換第１データ）Ａ^ＴＸ’を得ることができる。 The interpolation coefficient matrix (conversion parameter) A corresponds to the difference in the appearance of the detection target among a plurality of cameras. The interpolation coefficient matrix (conversion parameter) A is a detection target (a feature amount) detected from the video of one camera and a detection target (a feature amount) detected from the video of another camera for the same detection target. And is estimated by linear regression (machine learning).
The feature quantity (first data) X ′ detected from the video of one camera is converted (linear interpolation; correction) by the interpolation coefficient matrix (conversion parameter) A estimated by linear regression. Conversion feature amount (conversion data; conversion first data) obtained by approximating the feature amount (first data) of the detection target detected from the video of the other camera to the feature amount when the detection target is detected from the video of the other camera ) A ^TX 'can be obtained.

変換特徴量Ａ^ＴＸ’を、他のカメラによって検出される検出対象の特徴量（第２データ）Ｙ’と比較（例えば、前述のような射影空間での処理）することで、カメラ毎の検出対象の見え方の違いによる影響を抑制して、同一の検出対象の識別を容易に行うことができる。 By comparing the converted feature amount A ^T X ′ with the feature amount (second data) Y ′ of the detection target detected by another camera (for example, processing in the projection space as described above), It is possible to easily identify the same detection target while suppressing the influence of the difference in the appearance of the detection target.

［２．２．４学習部］
図５は、学習部５２における補間係数行列（変換パラメータ）の学習方法（線形回帰による推定方法）を示している。
学習部５２による学習は、訓練データを用いた機械学習（教師あり機械学習）によって行われる。なお、補間係数行列（変換パラメータ）の学習は、識別処理部５１による識別処理に先立って行われる。 [2.2.4 Learning Department]
FIG. 5 shows a learning method (an estimation method based on linear regression) of an interpolation coefficient matrix (conversion parameter) in the learning unit 52.
Learning by the learning unit 52 is performed by machine learning (supervised machine learning) using training data. Note that the interpolation coefficient matrix (conversion parameter) is learned prior to the identification processing by the identification processing unit 51.

以下では、第１カメラ３−１及び第２カメラ３−２における検出対象の見え方（検出され方）の相違に応じた補間係数行列（変換パラメータ）Ａ，Ｂを学習する例を説明する。ここで、補間係数行列Ａは、第１カメラ３−１の映像（第１映像）から検出された検出対象を示す特徴量（第１データ）を補間するための係数（係数行列）である。補間係数行列Ｂは、第２カメラ３−２の映像（第２映像）から検出された検出対象を示す特徴量（第２データ）を補間するための係数（係数行列）である。 Hereinafter, an example in which interpolation coefficient matrices (conversion parameters) A and B corresponding to a difference in appearance (detection method) of the detection target in the first camera 3-1 and the second camera 3-2 will be described. Here, the interpolation coefficient matrix A is a coefficient (coefficient matrix) for interpolating the feature quantity (first data) indicating the detection target detected from the video (first video) of the first camera 3-1. The interpolation coefficient matrix B is a coefficient (coefficient matrix) for interpolating the feature amount (second data) indicating the detection target detected from the video (second video) of the second camera 3-2.

補間係数行列Ａ，Ｂを学習するための訓練データとしては、第１カメラ３−１の映像（学習用第１映像）に基づく第１訓練データＸと、第２カメラ３−２の映像（学習用第２映像；学習用第１映像に存在する人物と同一の人物が存在する映像）に基づく第２訓練データＹと、が用いられる。第１訓練データＸ及び第２訓練データＹには、それぞれ、複数の検出対象（人物）が含まれる。第１訓練データＸ及び第２訓練データＹに含まれる検出対象（人物）は、互いに同一である。
第１訓練データＸ及び第２訓練データＹは、それぞれ、同数（Ｎ個）のサンプルデータによって構成される。サンプルデータは、検出対象（人物）の特徴量(ＨＯＧ特徴量)データとして構成されている。 As training data for learning the interpolation coefficient matrices A and B, the first training data X based on the video of the first camera 3-1 (the first video for learning) and the video of the second camera 3-2 (learning) Second training data Y based on a second video for video; a video in which the same person as the person in the first video for learning exists) is used. Each of the first training data X and the second training data Y includes a plurality of detection targets (persons). The detection targets (persons) included in the first training data X and the second training data Y are the same.
The first training data X and the second training data Y are each composed of the same number (N) of sample data. The sample data is configured as feature amount (HOG feature amount) data of a detection target (person).

ここでは、検出対象である人物を「クラス（ｃｌａｓｓ）」と考える。同一のクラスであれば同一の検出対象（同一人物）であり、クラスが異なれば、異なる検出対象（異なる人物）であるものとする。
図５に示す第１訓練データＸ及び第２訓練データＹには、それぞれ、ｃｌａｓｓ１の人物及びｃｌａｓｓ２の人物が含まれている。
第１訓練データＸ及び第２訓練データＹは、それぞれ、サンプルデータとして、カメラ３−１，３−２の映像から検出されたｃｌａｓｓ１の人物の特徴量(ＨＯＧ特徴量)を複数有して構成され、カメラ３−１，３−２の映像から検出されたｃｌａｓｓ２の人物の特徴量(ＨＯＧ特徴量)を複数有して構成されている。 Here, the person to be detected is considered as a “class”. If they are the same class, they are the same detection target (same person), and if the classes are different, they are different detection objects (different persons).
The first training data X and the second training data Y shown in FIG. 5 include a person of class 1 and a person of class 2, respectively.
Each of the first training data X and the second training data Y includes a plurality of class 1 person feature values (HOG feature values) detected from the images of the cameras 3-1 and 3-2 as sample data. In addition, a plurality of class 2 person feature values (HOG feature values) detected from the images of the cameras 3-1 and 3-2 are provided.

例えば、第１訓練データＸは、第１カメラ３−１の映像から検出されたｃｌａｓｓ１の人物の特徴量をＮ／２個ほど有し、第１カメラ３−１の映像から検出されたｃｌａｓｓ２の人物の特徴量をＮ／２個ほど有する。
また、第２訓練データＹは、第２カメラ３−２の映像から検出されたｃｌａｓｓ１の人物の特徴量を、第１訓練データＸと同数のＮ／２個ほど有し、第２カメラ３−２の映像から検出されたｃｌａｓｓ２の人物の特徴量を、第１訓練データＸと同数のＮ／２個ほど有する。 For example, the first training data X has about N / 2 feature quantities of the person of class 1 detected from the video of the first camera 3-1, and the class 2 of the class 2 detected from the video of the first camera 3-1. It has about N / 2 feature values of a person.
Further, the second training data Y has about N / 2 feature quantities of the person of class 1 detected from the video of the second camera 3-2, the same number as the first training data X, and the second camera 3- The number of features of the person of class 2 detected from the video of 2 is about N / 2, which is the same number as the first training data X.

Ｎ／２個（複数）の特徴量（サンプルデータ）は、例えば、ｃｌａｓｓ１の人物が映った映像のＮ／２個（複数）のフレームそれぞれから、ｃｌａｓｓ１の人物を示す特徴量を求めることによって得られる。つまり、Ｎ／２個（複数）の特徴量は、同一人物（ｃｌａｓｓ１の人物）についての異なる時間における特徴量である。 The N / 2 (plurality) of feature values (sample data) are obtained, for example, by determining the feature quantity indicating the class1 person from each of N / 2 (plurality) frames of the video in which the person of class1 is shown. It is done. That is, the N / 2 (plurality) of feature quantities are the feature quantities at different times for the same person (person of class 1).

このような第１訓練データＸは、第１カメラ３−１でｃｌａｓｓ１の人物を撮像したときに得られる様々な特徴量と、第１カメラ３−１でｃｌａｓｓ２の人物を撮像したときに得られる様々な特徴量と、を含んでいる。したがって、第１訓練データＸに含まれるこれらの特徴量（サンプルデータ）は、第１カメラ３−１で検出対象（人物）を捉えたときの見え方（検出され方）に影響を受けたものとなっている。
また、第２訓練データＹは、第２カメラ３−２でｃｌａｓｓ１の人物を撮像したときに得られる様々な特徴量と、第２カメラ３−２でｃｌａｓｓ２の人物を撮像したときに得られる様々な特徴量と、を含んでいる。したがって、第２訓練データＹに含まれるこれらの特徴量（サンプルデータ）は、第２カメラ３−２で検出対象（人物）を捉えた時の見え方（検出され方）に影響を受けたものとなっている。 Such first training data X is obtained when various feature amounts obtained when the person of class 1 is imaged by the first camera 3-1, and when the person of class 2 is imaged by the first camera 3-1. Various feature quantities are included. Therefore, these feature quantities (sample data) included in the first training data X are affected by how they are seen (detected) when the detection target (person) is captured by the first camera 3-1. It has become.
The second training data Y includes various feature amounts obtained when the person of class 1 is imaged by the second camera 3-2 and various characteristics obtained when the person of class 2 is imaged by the second camera 3-2. Features. Therefore, these feature quantities (sample data) included in the second training data Y are affected by how they are seen (detected) when the detection target (person) is captured by the second camera 3-2. It has become.

本実施形態において、以上の第１訓練データＸ及び第２訓練データＹが与えられた場合、補間係数行列Ａ，Ｂの推定は、以下の目的関数Ｄを最適化（最小化；ａｒｇｍｉｎＤ）する補間係数行列Ａ，Ｂの推定値を求めることで行われる。

In the present embodiment, when the first training data X and the second training data Y are given, the estimation of the interpolation coefficient matrices A and B optimizes the following objective function D (minimization; arg min D). This is done by obtaining estimated values of the interpolation coefficient matrices A and B to be performed.

なお、ｘ_ｉ（ｉ＝１〜Ｎ）は、第１訓練データＸを構成するｉ番目のサンプルデータ（特徴量）を示すベクトル（Ｍ次ベクトル）である。ｙ（ｉ＝１〜Ｎ）ｉは、第２訓練データＹを構成するｉ番目のサンプルデータ（特徴量）を示すベクトル（Ｍ次ベクトル）である。Ｍは、ベクトルｘ_ｉ，ｙ_ｉの次数である。Ｒは、実数の集合である。 Note that x _i (i = 1 to N) is a vector (M-order vector) indicating the i-th sample data (feature amount) constituting the first training data X. y (i = 1 to N) i is a vector (Mth order vector) indicating the i-th sample data (feature amount) constituting the second training data Y. M is the order of the vectors x _i and y _i . R is a set of real numbers.

上記の目的関数Ｄの右辺第１項Ｔ１は、第１訓練データＸを補間係数行列Ａによって変換した値と、第２訓練データＹと、の間の誤差を小さくさせるための項であり、当該誤差のＬ２ノルムとして算出される。
同じく右辺第２項Ｔ２は、第２訓練データＹを補間係数行列Ｂによって変換した値と、第１訓練データＸと、の間の誤差を小さくさせるための項であり、当該誤差のＬ２ノルムとして算出される。 The first term T1 on the right side of the objective function D is a term for reducing an error between the value obtained by converting the first training data X by the interpolation coefficient matrix A and the second training data Y. Calculated as the L2 norm of error.
Similarly, the second term T2 on the right-hand side is a term for reducing the error between the value obtained by converting the second training data Y by the interpolation coefficient matrix B and the first training data X. As the L2 norm of the error, Calculated.

同じく右辺第３項Ｔ３及び第４項は、機械学習（線形回帰）における過学習を抑制するための項である。第３項Ｔ３は、補間係数行列Ａの過学習を抑制するための項であり、ＡのＬ２ノルムである。第４項Ｔ４は、補間係数行列Ｂの過学習を抑制するための項であり、ＢのＬ２ノルムである。 Similarly, the third term T3 and the fourth term on the right side are terms for suppressing overlearning in machine learning (linear regression). The third term T3 is a term for suppressing overlearning of the interpolation coefficient matrix A, and is the L2 norm of A. The fourth term T4 is a term for suppressing overlearning of the interpolation coefficient matrix B, and is the L2 norm of B.

同じく右辺第５項Ｔ５は、第１補間係数行列（第１変換パラメータ）Ａの行列及び第２補間係数行列（第２変換パラメータ）Ｂの行列のうちの一方の行列が、他方に対して疑似逆行列に近づくようにするための項である。
つまり、第５項Ｔ５は、以下のｆ_１，ｆ_２をほぼ等しくするためのものである。
Similarly, the fifth term T5 on the right side indicates that one of the matrix of the first interpolation coefficient matrix (first conversion parameter) A and the matrix of the second interpolation coefficient matrix (second conversion parameter) B is pseudo with respect to the other. This is a term for approaching the inverse matrix.
That is, the fifth term T5 is for making the following f ₁ and f ₂ substantially equal.

なお、第３項Ｔ３〜第４項Ｔ４におけるα，β，γは、バランシングパラメータであり、任意の非負値を設定できる。 Note that α, β, and γ in the third term T3 to the fourth term T4 are balancing parameters, and any non-negative value can be set.

上記の目的関数Ｄを最小化する補間係数行列Ａ，Ｂの推定値を求めるには、例えば、上記のＤの式から導出される下記の式に基づいて、不完全コレスキー分解（Incomplete Cholesky decomposition）による最適化を行うことができる。
In order to obtain the estimated values of the interpolation coefficient matrices A and B that minimize the objective function D, for example, an incomplete Cholesky decomposition (Incomplete Cholesky decomposition) is performed based on the following expression derived from the expression D above. ) Can be optimized.

以上のようにして推定された補間係数行列（変換パラメータ）Ａ，Ｂは、記憶部５３に保存される。記憶部５３に保存された補間係数行列Ａ，Ｂは、識別処理部５１による識別処理に用いられる。なお、第２カメラ３−２の映像から検出された検出対象が初期検出サンプルであり、第１カメラ３−１の映像から検出された検出対象が追跡候補サンプルである場合には、前述のように補間係数行列としてＡが用いられる。一方、第１カメラ３−１の映像から検出された検出対象が初期検出サンプルであり、第２カメラ３−２の映像から検出された検出対象が追跡候補サンプルである場合には、補間係数行列としてＢが用いられる。 The interpolation coefficient matrices (conversion parameters) A and B estimated as described above are stored in the storage unit 53. The interpolation coefficient matrices A and B stored in the storage unit 53 are used for identification processing by the identification processing unit 51. When the detection target detected from the video of the second camera 3-2 is an initial detection sample and the detection target detected from the video of the first camera 3-1 is a tracking candidate sample, as described above A is used as the interpolation coefficient matrix. On the other hand, when the detection target detected from the video of the first camera 3-1 is an initial detection sample and the detection target detected from the video of the second camera 3-2 is a tracking candidate sample, an interpolation coefficient matrix B is used as

［２．２．５記憶部と選択部］
記憶部５３に記憶される補間係数行列は、実際には、Ａ，Ｂの二つだけではなく、複数のカメラ３−１，３−２，・・，３−Ｐそれぞれの間で必要とされる多様な補間係数行列が予め学習部５２によって学習され、記憶部５３に記憶されている。
記憶部５３に記憶された多くの補間係数行列（パラメータ候補）の中から、必要な補間係数行列を選択するため、識別処理部５１は、追跡候補サンプルＸ’の変換（線形補間）に用いるための補間係数行列を選択する選択部６７を備えている。 [2.2.5 Storage unit and selection unit]
The interpolation coefficient matrix stored in the storage unit 53 is actually required between each of the plurality of cameras 3-1, 3-2,. Various interpolation coefficient matrices are learned in advance by the learning unit 52 and stored in the storage unit 53.
In order to select a necessary interpolation coefficient matrix from many interpolation coefficient matrices (parameter candidates) stored in the storage unit 53, the identification processing unit 51 is used for conversion (linear interpolation) of the tracking candidate sample X ′. A selection unit 67 for selecting the interpolation coefficient matrix.

選択部６７は、初期検出サンプルＹ’が検出された映像を撮像したカメラ（初期検出カメラ）を示すカメラ識別情報と、追跡候補サンプルＸ’が検出された映像を撮像したカメラ（追跡カメラ）を示すカメラ識別情報と、に基づいて、記憶部５３に記憶された複数の補間係数行列（パラメータ候補）の中から、適切な補間係数行列を選択する。
なお、記憶部５３の複数の補間係数行列（パラメータ候補）は、初期検出カメラと追跡カメラとの組み合わせ毎に記憶されており、初期検出カメラ及び追跡カメラを示すカメラ識別情報によって、適切な補間係数行列の選択が可能である。 The selection unit 67 includes camera identification information indicating a camera (initial detection camera) that has captured the video from which the initial detection sample Y ′ is detected, and a camera (tracking camera) that has captured the video from which the tracking candidate sample X ′ has been detected. Based on the camera identification information shown, an appropriate interpolation coefficient matrix is selected from among a plurality of interpolation coefficient matrices (parameter candidates) stored in the storage unit 53.
A plurality of interpolation coefficient matrices (parameter candidates) in the storage unit 53 are stored for each combination of the initial detection camera and the tracking camera, and an appropriate interpolation coefficient is determined based on camera identification information indicating the initial detection camera and the tracking camera. A matrix can be selected.

また、補間係数行列は、初期検出カメラと追跡カメラとの組み合わせが同じであっても、少なくともいずれか一方のカメラの視野（パン・チルト・ズーム）が可変であると、カメラの視野（例えば、カメラの向き（パン角度））の変化によって、両カメラの見え方（検出され方）の違いに変化が生じる。 In addition, even if the combination of the initial detection camera and the tracking camera is the same, if the field of view (pan / tilt / zoom) of at least one of the cameras is variable, the interpolation coefficient matrix is Due to the change in the direction of the camera (pan angle), the difference in how the two cameras look (detected) changes.

そこで、記憶部５３には、図６に示すように、カメラパラメータ（例えば、カメラの向き（パン角度））毎に、補間係数行列が記憶されている。
図６では、例えば、初期検出カメラ３−２が視野固定型のカメラであって、追跡カメラ３−１の視野が可変（カメラパラメータが可変）のカメラである場合において、追跡カメラ３−１の向き（パン角度）に応じて、それぞれ、補間係数行列が格納されたテーブルを示している。なお、図６のテーブルは、追跡カメラ３−１のカメラパラメータの違いだけに対応した１次元のテーブルであるが、初期検出カメラ３−２のカメラパラメータの違いにも対応した２次元のテーブルであってもよい。
このように、初期検出カメラ３−２のカメラパラメータ（第２情報；第２状況情報）及び追跡カメラ３−１のカメラパラメータ（第１情報；第１状況情報）少なくとも一方のカメラパラメータ（状況情報）に応じて補間係数行列（パラメータ候補）を記憶部５３に記憶している場合、選択部６７は、初期検出カメラ３−２及び追跡カメラ３−１の少なくとも一方のカメラパラメータ（状況情報）に基づいて、適切な補間係数行列の選択が可能である。なお、当然に、初期検出カメラ３−２及び追跡カメラ３−１の両方のカメラパラメータ（状況情報）に基づいて、適切な補間係数行列の選択を行ってもよい。 Therefore, as shown in FIG. 6, the storage unit 53 stores an interpolation coefficient matrix for each camera parameter (for example, camera orientation (pan angle)).
In FIG. 6, for example, when the initial detection camera 3-2 is a fixed field of view camera and the tracking camera 3-1 is a variable field of view (camera parameter is variable), A table in which an interpolation coefficient matrix is stored for each direction (pan angle) is shown. The table in FIG. 6 is a one-dimensional table corresponding only to the camera parameter difference of the tracking camera 3-1, but is a two-dimensional table corresponding to the camera parameter difference of the initial detection camera 3-2. There may be.
As described above, at least one camera parameter (situation information) of the camera parameter (second information; second situation information) of the initial detection camera 3-2 and the camera parameter (first information; first situation information) of the tracking camera 3-1. ) Is stored in the storage unit 53, the selection unit 67 sets the camera parameter (situation information) of at least one of the initial detection camera 3-2 and the tracking camera 3-1. Based on this, it is possible to select an appropriate interpolation coefficient matrix. Naturally, an appropriate interpolation coefficient matrix may be selected based on the camera parameters (situation information) of both the initial detection camera 3-2 and the tracking camera 3-1.

図６の補間係数行列テーブルは、カメラ３−１の１０°毎の向きに対応して補間係数行列が設けられている。この場合、向きの間隔が比較的粗いため、カメラ３−１の向きが１５°である場合には、補間係数行列テーブルにおいて設定された向きとは完全に一致しなくなる。
この場合、選択部６７は、補間係数行列テーブル中において１５°に最も近似する向き（例えば、２０°）の補間係数行列Ａ_２０を選択することができる。
ただし、選択部６７は、一つの補間係数行列を選択するのではなく、カメラ３−１のカメラパラメータが示す向き（１５°）の近傍の前後の２つの向き（１０°，２０°）それぞれの補間係数行列Ａ_１０，Ａ_２０を選択してもよい。 The interpolation coefficient matrix table of FIG. 6 is provided with an interpolation coefficient matrix corresponding to the orientation of the camera 3-1 every 10 °. In this case, since the interval between the directions is relatively coarse, when the direction of the camera 3-1 is 15 °, the direction set in the interpolation coefficient matrix table does not completely match.
In this case, the selection unit 67 can select the interpolation coefficient matrix A ₂₀ having a direction (for example, 20 °) that is closest to 15 ° in the interpolation coefficient matrix table.
However, the selection unit 67 does not select one interpolation coefficient matrix, but each of two directions (10 °, 20 °) before and after the direction (15 °) indicated by the camera parameter of the camera 3-1. Interpolation coefficient matrices A ₁₀ and A ₂₀ may be selected.

この場合、変換部６４は、これら２つの補間係数行列Ａ_１０，Ａ_２０を用いた変換、例えば、（Ａ_１０ ^ＴＸ’）・Ｗ_１＋（Ａ_２０ ^ＴＸ’）・Ｗ_２の演算で、変換特徴量（変換データ;変換第１データ）を求めてもよい。ここで、Ｗ_１，Ｗ_２は、バランシングパラメータ（ウェイト）であり、カメラ３−１のカメラパラメータ（向き）に応じて、決定される。例えば、カメラ３−１の向きが１５°であれば、Ｗ_１＝Ｗ_２＝０．５とすることができる。
このように、変換特徴量（変換データ;変換第１データ）は、補間係数行列Ａ_１０を用いた第１中間データ（Ａ_１０ ^ＴＸ’）と、補間係数行列Ａ_２０を用いた第２中間データ（Ａ_２０ ^ＴＸ’）と、に基づいて、生成されてもよい。
記憶部５３に記憶されている複数の補間係数行列を用いることで、記憶部５３に存在しない補間係数行列の補間が可能である。 In this case, the conversion unit 64 performs a conversion using these two interpolation coefficient matrices A ₁₀ and A ₂₀ , for example, an operation of (A ₁₀ ^TX ′) · W ₁ + (A ₂₀ ^TX ′) · W ₂ . The conversion feature amount (conversion data; conversion first data) may be obtained. Here, W ₁ and W ₂ are balancing parameters (weights), and are determined according to the camera parameters (orientation) of the camera 3-1. For example, if the orientation of the camera 3-1 is 15 °, W ₁ = W ₂ = 0.5 can be set.
As described above, the conversion feature amount (conversion data; conversion first data) includes the first intermediate data (A ₁₀ ^TX ′) using the interpolation coefficient matrix A ₁₀ and the second intermediate data using the interpolation coefficient matrix A _20. May be generated based on the data (A ₂₀ ^T X ′).
By using a plurality of interpolation coefficient matrices stored in the storage unit 53, an interpolation coefficient matrix that does not exist in the storage unit 53 can be interpolated.

さらに、記憶部５３に記憶された補間係数行列は、図６に示すように、撮像時刻ｔ_１，ｔ_２，・・，ｔ_ｆ毎に区分けされたものであってもよい。時刻ｔ_１，ｔ_２，・・，ｔ_ｆによる区分けの単位としては、例えば、明け方、昼間、夕方、夜中といったおおまかなものでもよい。記憶部５３の補間係数行列が時刻によって区分けされていることで、選択部６７は、撮影時刻を示す時刻情報に基づいて、適切な補間係数行列を選択することができる。 Furthermore, the stored interpolation coefficient matrix in the storage unit 53, as shown in FIG. 6, the imaging time t _1, t _2, · ·, may be those divided for each t _f. As a unit of division by the times t ₁ , t ₂ ,..., T _f , for example, rough ones such as dawn, daytime, evening, and midnight may be used. Since the interpolation coefficient matrix in the storage unit 53 is divided according to time, the selection unit 67 can select an appropriate interpolation coefficient matrix based on the time information indicating the photographing time.

［２．３実験結果］
図７は、監視システム１を用いた実験結果を示している。
実験に用いた監視システム１は、２つのカメラ（第１カメラ３−１及び第２カメラ３−２）を有しており、４名（４ｃｌａｓｓ）の人物を第１カメラ３−１及び第２カメラ３−２それぞれで撮像した映像から検出された検出対象データを訓練データＸ，Ｙとして用いた。訓練データＸ，Ｙそれぞれのサンプルデータ数は、２０とした。 [2.3 Experimental results]
FIG. 7 shows the experimental results using the monitoring system 1.
The monitoring system 1 used in the experiment has two cameras (a first camera 3-1 and a second camera 3-2), and four persons (4 classes) are assigned to the first camera 3-1 and the second camera. Detection target data detected from images captured by the cameras 3-2 was used as training data X and Y. The number of sample data for each of the training data X and Y was 20.

識別処理における初期検出サンプルＹ’としては、第１カメラ３-１及び第２カメラ３−２それぞれで同一人物（不審者）を数秒間撮像した映像から検出された検出対象データ（離散的にサンプリングされた検出対象データ）を用いた。第１カメラ３−１の映像に基づく初期検出サンプルＹ’の数Ｎ_１は５１サンプルとし、第２カメラ３−２の映像に基づく初期検出サンプルＹ’の数Ｎ_１は８８サンプルとした。 As the initial detection sample Y ′ in the identification process, detection target data (discretely sampled) detected from images obtained by capturing the same person (suspicious person) for several seconds with the first camera 3-1 and the second camera 3-2, respectively. Detected data). Initial detection sample Y based on the image of the first camera 3-1 'number N ₁ of the 51 samples, the initial detection sample Y based on the image of the second camera 3-2' number N ₁ of was 88 samples.

人検出及び線形補間に用いられる特徴量としては、ＨＯＧ特徴量を用いた。 The HOG feature value was used as the feature value used for human detection and linear interpolation.

追跡候補サンプルＸ’としては、同一フレーム内に４人（４ｃｌａｓｓ）が存在する映像（追跡映像）の複数フレームから得られたものを用いた。第１カメラ３−１にて撮像した映像（追跡映像）のフレーム数は、８８フレームとし、第２カメラで撮像した映像（追跡映像）のフレーム数は５１フレームとした。 As the tracking candidate sample X ′, one obtained from a plurality of frames of a video (tracking video) in which four people (4 classes) exist in the same frame was used. The number of frames of video (tracking video) captured by the first camera 3-1 was 88 frames, and the number of frames of video (tracking video) captured by the second camera was 51 frames.

追跡映像のフレーム毎に、追跡候補サンプル群の中から初期検出サンプルと一致するものを識別することを行い、検出率＝（正しく識別されたフレーム数／全フレーム数）を求めた。
なお、検出率は、訓練データ、初期検出サンプル、追跡候補サンプルを異ならせた複数の試行の平均値として求めた。 For each frame of the tracking video, a tracking candidate sample group that matches the initial detection sample was identified, and detection rate = (number of correctly identified frames / total number of frames) was obtained.
The detection rate was obtained as an average value of a plurality of trials with different training data, initial detection samples, and tracking candidate samples.

実験は、第１カメラ３−１を初期検出カメラとし、第２カメラ３−２を追跡カメラとした場合、及び、第２カメラ３−２を初期検出カメラとし、第１カメラ３−１を追跡カメラとした場合それぞれについて行った。図７において、前者は、“Ｔｒａｉｎ：Ｃａｍｅｒａ１”として示し、後者は、“Ｔｒａｉｎ：Ｃａｍｅｒａ２”として示した。 In the experiment, the first camera 3-1 is an initial detection camera, the second camera 3-2 is a tracking camera, and the second camera 3-2 is an initial detection camera, and the first camera 3-1 is tracked. When it was set as a camera, it went about each. In FIG. 7, the former is shown as “Train: Camera1”, and the latter is shown as “Train: Camera2”.

また、実験は、ケース１〜ケース３の３通りの条件でも行った。
ケース１は、補間係数行列Ａ、Ｂの学習に際して、前述の目的関数Ｄの右辺の第３項Ｔ３〜第５項Ｔ５を省略した目的関数Ｄ’を用い（図７において“ＣｏＲ：Ｏｆｆ”）、識別空間６２の生成にＬＰＰを用いない（図７において“ＬＰＰ：Ｏｆｆ”）のケースである。
ケース２は、補間係数行列Ａ、Ｂの学習に際して、前述の目的関数Ｄを用い（図７において“ＣｏＲ”：Ｏｎ）、識別空間６２の生成にＬＰＰを用いない（図７において“ＬＰＰ：Ｏｆｆ”）のケースである。
ケース３は、補間係数行列Ａ、Ｂの学習に際して、前述の目的関数Ｄを用い（図７において“ＣｏＲ”：Ｏｎ）、識別空間６２の生成にＬＰＰを用いた（図７において“ＬＰＰ：Ｏｎ”）のケースである。 The experiment was also performed under three conditions, Case 1 to Case 3.
Case 1 uses the objective function D ′ in which the third term T3 to the fifth term T5 on the right side of the objective function D are omitted when learning the interpolation coefficient matrices A and B (“CoR: Off” in FIG. 7). In this case, LPP is not used to generate the identification space 62 (“LPP: Off” in FIG. 7).
Case 2 uses the aforementioned objective function D (“CoR”: On in FIG. 7) when learning the interpolation coefficient matrices A and B, and does not use LPP to generate the identification space 62 (“LPP: Off” in FIG. 7). )).
Case 3 uses the aforementioned objective function D (“CoR”: On in FIG. 7) when learning the interpolation coefficient matrices A and B, and uses LPP to generate the identification space 62 (“LPP: On in FIG. 7). )).

図７に示す実験結果によると、ケース１のように、目的関数Ｄの第３項Ｔ３〜第５項Ｔ５を省略すると、検出率が５０％未満であるのに対して、ケース２のように、第３項Ｔ３〜第５項Ｔ５を有する目的関数Ｄを用いて学習（線形回帰）を行うと、検出率が５５％以上となり、検出率の向上を図ることができる。
なお、第３項Ｔ３及び第４項Ｔ４と、第５項Ｔ５とは、双方が目的関数中に含まれている必要はなく、（第１項Ｔ１及び第２項に加えて）いずれか一方だけが目的関数中に含まれている場合であっても、ケース１よりは良い検出率が得られることが実験的に確認されている。 According to the experimental results shown in FIG. 7, when the third term T3 to the fifth term T5 of the objective function D are omitted as in the case 1, the detection rate is less than 50%, whereas as in the case 2. When learning (linear regression) is performed using the objective function D having the third term T3 to the fifth term T5, the detection rate becomes 55% or more, and the detection rate can be improved.
Note that the third term T3, the fourth term T4, and the fifth term T5 do not need to be included in the objective function, and either one (in addition to the first term T1 and the second term). Even if only the objective function is included in the objective function, it has been experimentally confirmed that a better detection rate than Case 1 can be obtained.

さらに、ケース３のように、ＬＰＰを用いて識別空間６２を生成することで、６９％以上の検出率が得られた。 Further, as in case 3, by generating the identification space 62 using LPP, a detection rate of 69% or more was obtained.

［３．付記］
なお、今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味、及び範囲内でのすべての変更が含まれることが意図される。
例えば、上記実施形態では、監視システム１は、複数のカメラを有していることを前提としているが、１台のカメラであってもよい。すなわち、監視システム１は、１台のカメラの第１映像から検出された第１検出対象が、前記カメラの第２映像（第１映像とは視野が異なる映像）から検出された第２検出対象と同一対象であるかを識別する識別部を備えたものであってもよい。 [3. Addendum]
The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the meanings described above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.
For example, in the above embodiment, the monitoring system 1 is premised on having a plurality of cameras, but may be a single camera. In other words, the monitoring system 1 is configured such that the first detection target detected from the first video of one camera is the second detection target detected from the second video (video having a different field of view from the first video) of the camera. It may be provided with an identification unit for identifying whether or not the same object.

１監視システム
２映像処理システム
３−１，３−２，３−Ｐカメラ
４−１，４−２，４−Ｐ処理装置
５映像統合装置（検出対象識別装置）
４１映像処理部（対象検出部）
４２制御部
４３通信部
５１識別部
５２学習部
５４通信部
５３記憶部
６０初期検出サンプル
６１識別空間生成部
６２識別空間
６３追跡候補サンプル群
６４変換部（変換装置）
６５検出部（識別部）
６６情報生成部
６７選択部
１００建物
１０１敷地
Ｍ人物
Ａ第１補間係数行列（第１変換パラメータ）
Ｂ第２補間係数行列（第２変換パラメータ） DESCRIPTION OF SYMBOLS 1 Monitoring system 2 Image processing system 3-1, 3-2, 3-P Camera 4-1, 4-2, 4-P Processing apparatus 5 Image | video integrated apparatus (detection target identification apparatus)
41 Video processing unit (target detection unit)
42 control unit 43 communication unit 51 identification unit 52 learning unit 54 communication unit 53 storage unit 60 initial detection sample 61 identification space generation unit 62 identification space 63 tracking candidate sample group 64 conversion unit (conversion device)
65 Detection part (identification part)
66 Information generation unit 67 Selection unit 100 Building 101 Site M Person A First interpolation coefficient matrix (first conversion parameter)
B Second interpolation coefficient matrix (second conversion parameter)

Claims

An identification unit for identifying whether the first detection target detected from the first video and the second detection target detected from the second video are the same target;
The first data indicating the first detection target in the first video is converted using a first conversion parameter to generate converted first data, or the second detection target in the second video A conversion unit that performs conversion using the second conversion parameter on the second data indicating the converted second data; and
A learning unit that performs machine learning using training data to learn the first conversion parameter and the second conversion parameter;
With
Whether the first detection target is the same target as the second detection target based on the converted first data and the second data indicating the second detection target in the second video. Or based on the converted second data and the first data indicating the first detection target in the first video, the second detection target is the same target as the first detection target. Identify and
The training data is data indicating the same target in each of the first video for learning and the second video for learning in which the same target exists,
The machine learning is performed by obtaining the first conversion parameter and the second conversion parameter so as to optimize an objective function,
The first conversion parameter and the second conversion parameter are matrices,
The objective function includes a term for causing one of the matrix of the first conversion parameter and the matrix of the second conversion parameter to approach a pseudo inverse matrix with respect to the other.

A learning unit that learns the conversion parameter by machine learning using training data;
The detection target identification device according to claim 1, wherein the training data is data indicating the same target in each of the first video for learning and the second video for learning in which the same target exists.

The detection target identification device according to claim 2, wherein the objective function includes a term for suppressing overlearning in the machine learning.

A space generation unit that generates an identification space by a locality preserving projection method (LLP) based on the plurality of second data indicating the second detection target at a plurality of times detected from the second video. Further comprising
The identification unit identifies whether the first detection target is the same target as the second detection target based on the converted first data projected onto the identification space. The detection target identification device according to the item.

The detection object identification device according to any one of claims 1 to 4, further comprising a selection unit that selects the first conversion parameter used for generating the converted first data from a plurality of parameter candidates.

The selection unit is configured to select a plurality of parameter candidates based on at least one of first information indicating a situation when the first video is captured and second information indicating a situation when the second video is captured. The detection target identification device according to claim 5, wherein the first conversion parameter used for generating the converted first data is selected.

The detection target identification device according to claim 6, wherein the first information or the second information is information including a camera parameter of a camera that has captured the first video or the second video.

The detection target identification device according to claim 6, wherein the first information or the second information is information including time information indicating a time at which the first video or the second video is captured.

The conversion unit is configured to generate the converted first data using a plurality of the first conversion parameters,
The conversion unit converts the first data generated by performing conversion using the first conversion parameter of the plurality of first conversion parameters to the first data, and the first data. 2. The conversion first data is generated based on second intermediate data generated by performing conversion using another first conversion parameter among the plurality of first conversion parameters. The detection object identification device of any one of -8.

The first data indicating the first detection target detected from the first video is converted using the first conversion parameter to generate converted first data, or the second data detected from the second video is detected. A conversion unit that performs conversion using the second conversion parameter on the second data indicating the detection target, and generates converted second data;
A learning unit that performs machine learning using training data to learn the first conversion parameter and the second conversion parameter;
With
The training data is data indicating the same target in each of the first video for learning and the second video for learning in which the same target exists,
The machine learning is performed by obtaining the first conversion parameter and the second conversion parameter so as to optimize an objective function,
The first conversion parameter and the second conversion parameter are matrices,
The objective function includes a term for causing one of the first transformation parameter matrix and the second transformation parameter matrix to approach a pseudo inverse matrix with respect to the other.

A detection target identification device according to claim 1;
A first camera that captures the first video;
A second camera that captures the second video;
A monitoring system comprising:

A computer program for causing a computer to function as the detection target identification device according to claim 1.