JP2018106236A

JP2018106236A - Video analysis apparatus, video analysis method and program

Info

Publication number: JP2018106236A
Application number: JP2016249167A
Authority: JP
Inventors: 佑介河本; Yusuke Komoto
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-12-22
Filing date: 2016-12-22
Publication date: 2018-07-05

Abstract

PROBLEM TO BE SOLVED: To provide a technique capable of more accurately identifying an object from a video.SOLUTION: Extraction means extracts a first feature quantity related to a specified object from a video imaged by an image means. Determination means compares a second feature quantity related an object detected from a video different from the video that the first feature quantity is extracted with the first feature quantity and determines that the object detected from the different video is the specified object. Acquisition means acquires a first status information indicating a status related to the video that the first feature quantity is extracted and a second status information indicating a status related to the video that the second feature quantity is extracted. Update means determines whether the first feature quantity is updated based on the second feature quantity by comparing the first status information and the second status information acquired by the acquisition means and update them when the determination means determines that the object detected from the different video is the specified object.SELECTED DRAWING: Figure 5

Description

本発明は、映像から特定のオブジェクトを検出する映像解析技術に関する。 The present invention relates to a video analysis technique for detecting a specific object from a video.

近年、ネットワークカメラ等の映像監視システムは、従来のコンビニエンスストア、レンタルビデオ店といった一つの店舗にとどまらず、商店街、ショッピングモールや公園等、広い範囲を監視することも求められている。 In recent years, video surveillance systems such as network cameras are required to monitor a wide range of shopping streets, shopping malls, parks and the like as well as conventional stores such as convenience stores and rental video stores.

また、ネットワークカメラの映像から人物等を検出し、検出した人物等を追跡する技術が知られている。また、顔認証技術を用いて、検出した人物が特定の人物であるか同定する技術も知られている。 In addition, a technique for detecting a person or the like from an image of a network camera and tracking the detected person or the like is known. A technique for identifying whether a detected person is a specific person using a face authentication technique is also known.

特に、多数の人が通行するエリアを広く対象とする広域監視システムでは、映像から得られる人物に関する特徴に基づいて、人物を追跡及び同定するための映像解析技術が求められている。 In particular, in a wide-area monitoring system that targets a wide area where many people pass, there is a demand for video analysis technology for tracking and identifying a person based on characteristics related to the person obtained from the video.

特許文献１には、人物を追跡及び同定する映像解析技術が記載されている。特許文献１では、映像から人物の検出を行い、検出した人物から顔特徴量と外観特徴量を算定し、それらを監視対象者の特徴量と比較する。そして、その比較結果に基づいて、検出された人物が監視対象者かどうか判定している。また、顔特徴量の比較により監視対象者と判定された場合に、外観の特徴量を更新することで、服装などの変化による外観の特徴量の変化に追随している。 Patent Document 1 describes a video analysis technique for tracking and identifying a person. In Patent Document 1, a person is detected from an image, a face feature amount and an appearance feature amount are calculated from the detected person, and these are compared with the feature amount of a monitoring target person. Then, based on the comparison result, it is determined whether or not the detected person is a monitoring target person. In addition, when it is determined that the person is a monitoring target by comparing the facial feature amounts, the appearance feature amount is updated to follow the change in the appearance feature amount due to a change in clothes or the like.

特開２０１３−１９６４２３JP2013-196423

しかしながら、前述した特許文献１の手法では、特徴量の比較および更新時に、各特徴量を取得した際の状況について考慮されていない。 However, in the method of Patent Document 1 described above, the situation when each feature amount is acquired is not considered at the time of comparison and update of the feature amount.

そこで、本発明の目的は、より精度良く、映像からオブジェクトを同定することができる技術を提供することである。 Accordingly, an object of the present invention is to provide a technique capable of identifying an object from a video with higher accuracy.

上記課題を解決するために、本発明の映像解析装置は以下の構成を備える。すなわち、撮像手段によって撮像される映像から、指定されたオブジェクトである指定オブジェクトに係る第１の特徴量を抽出する抽出手段と、前記第１の特徴量が抽出された映像とは異なる映像から検出されたオブジェクトに係る第２の特徴量と、前記第１の特徴量とを比較し、前記異なる映像から検出されたオブジェクトが、前記指定オブジェクトであるか判定する判定手段と、前記第１の特徴量を抽出した映像に係る状況を示す第１の状況情報と、前記第２の特徴量を抽出した映像に係る状況を示す第２の状況情報とを取得する取得手段と、前記異なる映像から検出されたオブジェクトが前記指定オブジェクトであると前記判定手段によって判定された場合に、前記取得手段によって取得された前記第１の状況情報と前記第２の状況情報とを比較することによって前記第１の特徴量を前記第２の特徴量に基づいて更新するか判定し、更新すると判定した場合に、前記第１の特徴量を前記第２の特徴量に基づいて更新する更新手段とを有する。 In order to solve the above problems, the video analysis apparatus of the present invention has the following configuration. That is, an extraction unit that extracts a first feature amount related to a designated object that is a designated object from a video imaged by the imaging unit, and a video that is different from the video from which the first feature amount is extracted A determination unit that compares the first feature amount with the second feature amount related to the selected object and determines whether the object detected from the different video is the designated object; and the first feature Acquisition means for acquiring first situation information indicating the situation relating to the video from which the amount has been extracted and second situation information indicating the situation relating to the video from which the second feature quantity has been extracted, and detection from the different video When the determination unit determines that the determined object is the designated object, the first situation information and the second situation information acquired by the acquisition unit To determine whether to update the first feature value based on the second feature value, and when it is determined to update, the first feature value is determined based on the second feature value. And updating means for updating.

本発明によれば、より精度良く、映像からオブジェクトを同定することができる技術を提供することができる。 According to the present invention, it is possible to provide a technique capable of identifying an object from a video with higher accuracy.

実施形態１における映像監視システムの全体構成を示すブロック図である。1 is a block diagram illustrating an overall configuration of a video monitoring system in Embodiment 1. FIG. 人物を追跡、同定する映像解析処理の一例を示す図である。It is a figure which shows an example of the video analysis process which tracks and identifies a person. 実施形態１における映像解析処理の人物特徴量の更新方法の一例を示す図である。It is a figure which shows an example of the update method of the person feature-value of the video analysis process in Embodiment 1. FIG. ハードウェア構成を示すブロック図である。It is a block diagram which shows a hardware configuration. 実施形態１の構成を示す機能ブロック図である。3 is a functional block diagram showing a configuration of Embodiment 1. FIG. 状況情報の取得方法の一例を示す図である。It is a figure which shows an example of the acquisition method of status information. 各データテーブルの一例を示す図である。It is a figure which shows an example of each data table. 実施形態１における映像解析処理の詳細を示すフローチャートである。4 is a flowchart illustrating details of video analysis processing in the first embodiment. 実施形態２における特徴量の追加方法の一例を示す図である。FIG. 10 is a diagram illustrating an example of a feature amount adding method according to the second embodiment. 実施形態２における映像解析処理の詳細を示すフローチャートである。10 is a flowchart illustrating details of video analysis processing in the second embodiment.

以下、本発明の実施の形態を、添付の図面に基づいて詳細に説明する。なお、以下の実施形態において示す構成は一例であり、本発明は以下の実施形態で説明する構成に限定されるものではない。また、各実施形態において、同定するオブジェクトとして人物を例に説明するが、車等の他のオブジェクトにも応用可能である。 Embodiments of the present invention will be described below in detail with reference to the accompanying drawings. In addition, the structure shown in the following embodiment is an example, and this invention is not limited to the structure demonstrated in the following embodiment. In each embodiment, a person is described as an example of an object to be identified, but the present invention can also be applied to other objects such as a car.

＜実施形態１＞
以下、実施形態１を説明する。まず、図１を用いて本実施形態における映像監視システム１０１の構成を説明する。 <Embodiment 1>
The first embodiment will be described below. First, the configuration of the video monitoring system 101 in this embodiment will be described with reference to FIG.

映像監視システム１０１は、例えば、スーパーマーケットやコンビニエンスストア、レンタルビデオ店、商店街、ショッピングモールや公園などの映像を撮像する。 The video monitoring system 101 captures images of, for example, supermarkets, convenience stores, rental video stores, shopping streets, shopping malls, and parks.

映像監視システム１０１は、クライアントＰＣ（パーソナルコンピュータ）１０２、映像解析サーバ１０３、録画サーバ１０４、及びネットワークカメラ（撮像装置）１０５を有している。 The video monitoring system 101 includes a client PC (personal computer) 102, a video analysis server 103, a recording server 104, and a network camera (imaging device) 105.

クライアントＰＣ１０２は、ユーザからの操作入力を受け付ける。そして、映像解析サーバ１０３、録画サーバ１０４、及びネットワークカメラ１０５の各装置に対して操作入力に応じた指示（コマンド）を出力し、その指示に応じた動作を実行させる。また、各装置から取得した情報や画像をディスプレイに表示させる。 The client PC 102 receives an operation input from the user. Then, an instruction (command) corresponding to the operation input is output to each of the video analysis server 103, the recording server 104, and the network camera 105, and an operation corresponding to the instruction is executed. In addition, information and images acquired from each device are displayed on the display.

映像解析サーバ１０３は、クライアントＰＣ１０２からの指示を受け付け、その指示に基づいて映像解析処理を行う。ここでの映像解析処理とは、録画サーバ１０４またはネットワークカメラ１０５から取得した映像に対して解析処理を行い、映像内に映っている人物等の対象を判別したり、対象の移動を追跡したり、対象の行動を分析したりすることをいう。人物を追跡、同定する映像解析処理については後述する。ここで、同定とは、ある映像で検出された人物と、その映像とは異なる他の映像から検出された人物とが同じ人物であるか判定することである。ここでいう他の映像とは同じネットワークカメラ１０５で撮像された映像でもよいし、他のネットワークカメラ１０５で撮像された映像でもよい。他のネットワークカメラ１０５で撮像された場合、撮像される被写体の状況や、明るさ等の撮像状況が異なってくる。また、同じネットワークカメラ１０５で撮像された場合であっても、撮像方向や周囲の明るさ等が変われば撮像状況は変化する。撮像状況が変化すると、撮像された画像から抽出される特徴量も変わってくる。 The video analysis server 103 receives an instruction from the client PC 102 and performs video analysis processing based on the instruction. The video analysis process here is an analysis process for the video acquired from the recording server 104 or the network camera 105 to determine a target such as a person shown in the video, or to track the movement of the target. , Or to analyze the behavior of the target. Video analysis processing for tracking and identifying a person will be described later. Here, the identification is to determine whether a person detected in a certain video and a person detected from another video different from the video are the same person. The other video referred to here may be a video captured by the same network camera 105 or a video captured by another network camera 105. When the image is taken by another network camera 105, the situation of the subject to be imaged and the imaging situation such as brightness are different. Even when images are captured by the same network camera 105, the imaging situation changes if the imaging direction, ambient brightness, or the like changes. When the imaging state changes, the feature amount extracted from the captured image also changes.

録画サーバ１０４は、クライアントＰＣ１０２または映像解析サーバ１０３からの指示を受け付け、その指示に基づいて映像録画処理を行う。ここで、映像録画処理とは、録画サーバ１０４と同じネットワークに接続されている複数のネットワークカメラ１０５に対して映像出力指示を行って、その指示に応じて出力される映像を取得し、取得した映像を保存する処理を指す。また、映像録画処理には、ネットワークカメラ１０５から得られるメタ情報を保存する処理も含む。メタ情報とは、例えば、時間情報、位置情報、カメラの設定情報、ネットワークカメラ１０５に対応するセンサの情報などがある。 The recording server 104 receives an instruction from the client PC 102 or the video analysis server 103 and performs video recording processing based on the instruction. Here, the video recording process refers to a video output instruction to a plurality of network cameras 105 connected to the same network as the recording server 104, and acquires and acquires a video output according to the instruction. Refers to the process of saving video. The video recording process also includes a process of storing meta information obtained from the network camera 105. The meta information includes, for example, time information, position information, camera setting information, sensor information corresponding to the network camera 105, and the like.

ネットワークカメラ１０５は、例えば、建物の天井や壁、電柱などに設けられる。各ネットワークカメラ１０５は撮像を行い、周囲の映像を取得する。ネットワークカメラ１０５は、クライアントＰＣ１０２、映像解析サーバ１０３、又は、録画サーバ１０４からの指示を受け付け、周囲の撮像や、カメラの回転制御（パン、チルト）、音声の収録などを行う。ネットワークカメラ１０５には、先述のメタ情報取得のために、人感センサ、温度センサ、音声センサ、赤外線センサなど、種々のセンサが接続されていてもよい。各装置１０２〜１０５は、それぞれネットワークに接続され相互に通信が可能であるものとする。また、各機能を一台のＰＣあるいはサーバに集約し、複数の装置の機能を一台の装置で実現するようにしてもよい。 The network camera 105 is provided, for example, on the ceiling, wall, or utility pole of a building. Each network camera 105 captures an image and acquires surrounding video. The network camera 105 receives an instruction from the client PC 102, the video analysis server 103, or the recording server 104, and performs surrounding imaging, camera rotation control (pan, tilt), audio recording, and the like. Various sensors such as a human sensor, a temperature sensor, a voice sensor, and an infrared sensor may be connected to the network camera 105 in order to acquire the above-described meta information. Each of the devices 102 to 105 is connected to a network and can communicate with each other. Alternatively, the functions may be integrated into a single PC or server, and the functions of a plurality of devices may be realized by a single device.

図２及び図５を用いて、人物を追跡及び同定する映像解析処理の一例を説明する。なお、図５は、映像解析サーバ１０３の機能ブロック、及び、録画サーバ１０４の機能ブロックを含むブロック図である。また、ネットワークカメラ１０５ａは「位置Ａ」において撮像を行うものとする。また、ネットワークカメラ１０５ｂは「位置Ａ」とは異なる「位置Ｂ」において撮像を行うものとする。 An example of video analysis processing for tracking and identifying a person will be described with reference to FIGS. FIG. 5 is a block diagram including functional blocks of the video analysis server 103 and functional blocks of the recording server 104. The network camera 105a performs imaging at “position A”. The network camera 105b performs imaging at “position B” different from “position A”.

録画サーバ１０４の映像取得部５０２は、ネットワークカメラ１０５に対して、映像を送信するよう指示（コマンド）を送信する。そして、その指示の応答として各ネットワークカメラ１０５で撮像された映像を取得する。取得した映像は録画サーバ１０４に保存してもよいし、保存せず後段の処理を行う機能ブロック（人物検出部５０３等）に出力してもよい。 The video acquisition unit 502 of the recording server 104 transmits an instruction (command) to the network camera 105 to transmit the video. Then, as a response to the instruction, an image captured by each network camera 105 is acquired. The acquired video may be stored in the recording server 104 or may be output to a functional block (person detection unit 503 or the like) that performs subsequent processing without being stored.

次に、映像解析サーバ１０３の人物検出部５０３は、取得された映像を解析して人物の検出を行う。 Next, the person detection unit 503 of the video analysis server 103 analyzes the acquired video and detects a person.

ここで、図２において、人物２０１および人物２０２は各位置にいる人物を表している。人物検出部５０３は、ネットワークカメラ１０５ａによって撮像された位置Ａにおける映像から人物２０１を検出する。また、人物検出部５０３は、ネットワークカメラ１０５ｂによって撮像された位置Ｂにおける映像から人物２０２を検出する。人物を検出する方法は、パターンマッチングを用いた方法等、公知の種々の方法を用いることができる。 Here, in FIG. 2, a person 201 and a person 202 represent persons at each position. The person detection unit 503 detects the person 201 from the video at the position A captured by the network camera 105a. In addition, the person detection unit 503 detects the person 202 from the video at the position B captured by the network camera 105b. As a method of detecting a person, various known methods such as a method using pattern matching can be used.

映像解析サーバ１０３の特徴抽出部５０４は、人物検出部５０３によって検出された人物の領域の画像から、人物の特徴量を抽出する。ここで、特徴量とは画像から目立つ特徴を抽出することを指す。人物の特徴量は、顔又は全身の特徴量であってもよい。また、人物の特徴量は画像を解析して検出される、歩幅や歩容、身長、体系、歩幅、服装等の情報であってもよい。また、特徴量とは、人物の画像の色ヒストグラムやエッジの位置等を示す情報であってもよい。人物の同定処理に用いることができるものならばどのようなものでもよい。 The feature extraction unit 504 of the video analysis server 103 extracts the feature amount of the person from the image of the person area detected by the person detection unit 503. Here, the feature amount refers to extracting a prominent feature from the image. The feature amount of the person may be a feature amount of the face or the whole body. Further, the feature amount of a person may be information such as a stride, a gait, a height, a system, a stride, a clothes, and the like detected by analyzing an image. The feature amount may be information indicating a color histogram of an image of a person, an edge position, or the like. Anything can be used as long as it can be used for the person identification process.

特徴抽出部５０４は、顔の特徴を抽出する場合は、画像から顔の各パーツ（目や鼻等）の位置や大きさ、形状などを特徴量として抽出し、それをデータ化する。データ化した特徴同士をマッチングさせることで類似度を算出し、類似度を用いて同じ顔であるか識別することができる。以上のように、特徴量の抽出方法については、公知の種々の技術を用いることができる。 When extracting a facial feature, the feature extraction unit 504 extracts the position, size, shape, and the like of each part (eyes, nose, etc.) of the face from the image as a feature amount, and converts it into data. Similarity can be calculated by matching data features, and the same face can be identified using the similarity. As described above, various known techniques can be used for the feature quantity extraction method.

図２においては、特徴抽出部５０４は、検出された人物に対応する領域の画像２０３及び２０４から、それぞれ人物の特徴量２０５及び２０６を抽出する。 In FIG. 2, the feature extraction unit 504 extracts person feature amounts 205 and 206 from the images 203 and 204 in the region corresponding to the detected person.

特徴照合部５０５は、例えば、位置Ａで得られた特徴量２０５と、位置Ｂで得られた特徴量２０６とを照合することで、人物２０１と人物２０２とが同一人物かどうか判定する。具体的には特徴量同士を比較し、所定の類似度以上の場合に、同一人物であると判定する。各ネットワークカメラ１０５から取得された映像すべてに対して、随時、特徴量を抽出し、抽出された特徴量同士を照合することで、映像監視システ１０１において検出された人物を同定することができる。さらに、同定した結果を用いて人物の移動経路などを推定することができる。 The feature matching unit 505 determines, for example, whether the person 201 and the person 202 are the same person by collating the feature quantity 205 obtained at the position A and the feature quantity 206 obtained at the position B. Specifically, the feature quantities are compared with each other, and when they are equal to or higher than a predetermined similarity, it is determined that they are the same person. It is possible to identify a person detected in the video surveillance system 101 by extracting feature amounts from time to time for all videos acquired from each network camera 105 and collating the extracted feature amounts. Furthermore, it is possible to estimate the movement path of the person using the identified result.

ここで、図３は、実施形態１における特徴更新部５０７によって実行される人物の特徴量の更新の一例を示す図である。 Here, FIG. 3 is a diagram illustrating an example of the update of the feature amount of the person executed by the feature update unit 507 according to the first embodiment.

状況情報２０７は、特徴量２０５を抽出した画像２０３を撮像した時の状況を示す情報である。また、状況情報２０８は、特徴量２０６を抽出した画像２０４を撮像した時の状況を示す情報である。状況情報は、映像解析サーバ１０３の状況情報取得部５０６によって取得される。 The situation information 207 is information indicating a situation when the image 203 from which the feature amount 205 is extracted is captured. The status information 208 is information indicating a status when the image 204 from which the feature amount 206 is extracted is captured. The situation information is acquired by the situation information acquisition unit 506 of the video analysis server 103.

状況情報は、カメラと被写体の距離や、明るさ、混雑度などの映像取得状況に関する情報であり、映像を解析することで取得してもよいし、センサ等から取得してもよい。 The situation information is information regarding the video acquisition status such as the distance between the camera and the subject, brightness, and the degree of congestion, and may be acquired by analyzing the video or may be acquired from a sensor or the like.

ここで、同定対象情報３０３は、同定対象の特徴量を示す特徴量情報と、その特徴量を取得した映像を撮像した際の状況情報とが含まれる。 Here, the identification target information 303 includes feature amount information indicating the feature amount of the identification target, and situation information when a video having acquired the feature amount is captured.

特徴照合部５０５は、同定対象情報３０３に含まれる特徴量を示す情報と、位置Ａで取得された特徴量２０５とを比較（照合）することで、位置Ａで検出された人物が同定対象の人物かどうか判定する。 The feature matching unit 505 compares (matches) the information indicating the feature quantity included in the identification target information 303 with the feature quantity 205 acquired at the position A, so that the person detected at the position A can be identified. Determine if you are a person.

特徴更新部５０７は、特徴照合部５０５によって同一人物であると判定された場合、同定対象情報３０３に含まれる状況情報と、位置Ａで取得された状況情報２０５とを比較し、比較結果に応じて同定対象情報の特徴量及び状況情報を更新する。 When the feature matching unit 505 determines that the person is the same person, the feature update unit 507 compares the situation information included in the identification target information 303 with the situation information 205 acquired at the position A, and according to the comparison result. The feature amount and status information of the identification target information are updated.

特徴更新部５０７は、同定対象情報３０３の特徴量よりも位置Ａで取得された特徴量２０５の方が、信頼度が高い条件で取得されたと判定した場合、同定対象情報３０３を特徴量２０５と状況情報２０７で更新する。なお、ここで言う信頼度とは、精度良く同定できるかを表す信頼度である。 When the feature update unit 507 determines that the feature amount 205 acquired at the position A is acquired under a condition with higher reliability than the feature amount of the identification target information 303, the feature update unit 507 sets the identification target information 303 as the feature amount 205. Update with status information 207. In addition, the reliability said here is the reliability showing whether it can identify with a sufficient precision.

つまり、次の位置Ｂでの人物同定処理に用いる同定対象情報として、同定対象情報３０３を特徴量２０５と状況情報２０７で更新した同定対象情報２０９を用いることとなる。 That is, as the identification target information used for the person identification process at the next position B, the identification target information 209 obtained by updating the identification target information 303 with the feature amount 205 and the situation information 207 is used.

ここで、状況情報とその信頼度について、図６を用いて更に詳細に説明する。図６は、取得する状況情報とその取得方法を示している。また、図６は、状況情報が変動した場合に影響を受ける特徴量、その特徴量の信頼度が高くなる条件の一例も示している。 Here, the situation information and its reliability will be described in more detail with reference to FIG. FIG. 6 shows the status information to be acquired and the acquisition method. FIG. 6 also shows an example of the feature quantity that is affected when the situation information fluctuates, and the conditions under which the reliability of the feature quantity increases.

まず、明るさに関する状況情報について説明する。映像を解析して明るさ情報を取得する場合は、状況情報取得部５０６は画像の背景差分を求めることで照明光の変化を取得する。また、センサから明るさ情報を取得する場合は、状況情報取得部５０６は、照度センサから照度値の変化を示す情報を取得できる。明るさに関する状況情報は、輝度値であってもよい。例えば、画面全体の輝度の平均値であってもよい。 First, the situation information regarding brightness will be described. When the brightness information is acquired by analyzing the video, the situation information acquisition unit 506 acquires the change in the illumination light by obtaining the background difference of the image. Moreover, when acquiring brightness information from a sensor, the status information acquisition unit 506 can acquire information indicating a change in illuminance value from the illuminance sensor. The situation information regarding brightness may be a luminance value. For example, the average value of the luminance of the entire screen may be used.

また、特徴量２０５を取得した映像の状況情報２０７が、現在の同定対象情報３０３の状況情報以下の暗さであることを示している場合、信頼度が低いため、特徴更新部５０７は特徴量を更新しない。一方、状況情報２０７が現在の同定対象情報３０３の状況情報より明るいことを示している場合、特徴更新部５０７は特徴量を更新する。例えば、暗い状況では陰影がはっきりしないためその状況で取得された特徴量には更新しない。一方、明るかったり、かつ、画面内の明るさの差が少ない状況で特徴量が取得されたりした場合、その特徴量に更新する。 Further, when the situation information 207 of the video from which the feature amount 205 is acquired indicates that the darkness is equal to or less than the situation information of the current identification target information 303, the feature update unit 507 determines that the feature amount is low because the reliability is low. Do not update. On the other hand, when the status information 207 indicates that the status information of the current identification target information 303 is brighter, the feature update unit 507 updates the feature amount. For example, since the shadow is not clear in a dark situation, the feature amount acquired in that situation is not updated. On the other hand, when the feature amount is acquired in a situation where the brightness is bright and the difference in brightness within the screen is small, the feature amount is updated.

次にネットワークカメラ１０５と人物との距離に関する状況情報について説明する。映像を解析してネットワークカメラ１０５と人物との距離情報を取得する場合、状況情報取得部５０６は、被写体の大きさおよび大きさの変化から距離を推定できる。センサから距離を求める場合には、状況情報取得部５０６は、測距センサを用いて対象との距離を取得することができる。 Next, status information regarding the distance between the network camera 105 and a person will be described. When the distance information between the network camera 105 and the person is acquired by analyzing the video, the situation information acquisition unit 506 can estimate the distance from the size of the subject and the change in the size. When obtaining the distance from the sensor, the situation information acquisition unit 506 can acquire the distance from the target using the distance measuring sensor.

特徴量２０５を取得した映像の状況情報２０７が、現在の同定対象情報３０３の状況情報が示す距離以上であることを示している場合、特徴更新部５０７は特徴量を更新しない。一方、特徴量２０５を取得した映像の状況情報２０７が、現在の同定対象情報３０３の状況情報が示す距離より近いことを示している場合、特徴更新部５０７は特徴量を更新する。これは、例えば、ネットワークカメラ１０５と被写体の距離が遠い場合、顔特徴量に顔の器官の詳細を示す情報が含まれないことがあり、信頼度が落ちてしまうためである。そのため、ネットワークカメラ１０５と被写体の距離がなるべく近く、顔の器官がはっきり映る距離で取得された特徴量のほうが高い信頼度である。 When the situation information 207 of the video from which the feature amount 205 is acquired indicates that the distance is greater than the distance indicated by the current identification target information 303, the feature update unit 507 does not update the feature amount. On the other hand, when the status information 207 of the video from which the feature amount 205 is acquired indicates that the distance is indicated by the status information of the current identification target information 303, the feature update unit 507 updates the feature amount. This is because, for example, when the distance between the network camera 105 and the subject is long, information indicating the details of the facial organs may not be included in the face feature amount, and reliability is reduced. Therefore, the distance between the network camera 105 and the subject is as close as possible, and the feature quantity acquired at a distance where the facial organ is clearly visible has higher reliability.

次に、混雑度に関する状況情報について説明する。映像を解析して混雑度を取得する場合は、状況情報取得部５０６は、画像に含まれる人物を検出し、撮像範囲に対応する面積と人数から混雑度を取得することができる。混雑度に関する状況情報は、例えば、所定の面積あたりの人数である。センサから混雑度を求める場合には、状況情報取得部５０６は、赤外線センサの変化量を見ることで混雑度を計測することができる。 Next, situation information regarding the degree of congestion will be described. When analyzing the video and acquiring the congestion level, the situation information acquisition unit 506 can detect the person included in the image and acquire the congestion level from the area and the number of people corresponding to the imaging range. The status information regarding the degree of congestion is, for example, the number of people per predetermined area. When obtaining the degree of congestion from the sensor, the situation information acquisition unit 506 can measure the degree of congestion by looking at the amount of change in the infrared sensor.

特徴量２０５を取得した映像の状況情報２０７が、現在の同定対象情報３０３の状況情報が示す混雑度合い以上に混雑していることを示している場合、特徴更新部５０７は特徴量を更新しない。一方、特徴量２０５を取得した映像の状況情報２０７が、現在の同定対象情報３０３の状況情報が示す混雑度よりも混雑していないことを示している場合、特徴更新部５０７は特徴量を更新する。 When the situation information 207 of the video from which the feature amount 205 has been acquired indicates that the degree of congestion is greater than the degree of congestion indicated by the situation information of the current identification target information 303, the feature update unit 507 does not update the feature amount. On the other hand, when the status information 207 of the video from which the feature amount 205 has been acquired indicates that the congestion level indicated by the status information of the current identification target information 303 is less crowded, the feature update unit 507 updates the feature amount. To do.

これは、例えば、混雑度が高い場合、自然体で歩くことが難しくなるため、歩幅や歩容などの特徴量は信頼度が落ちてしまうためである。そのため、周囲の影響を受けずスムーズに歩ける程度の混雑度合の方が高い信頼度であるといえる。 This is because, for example, when the degree of congestion is high, it is difficult to walk with a natural body, and thus the reliability of feature amounts such as stride and gait decreases. Therefore, it can be said that the degree of congestion that allows walking smoothly without being influenced by the surroundings is higher reliability.

以上の状況情報を少なくとも１つ用いて、特徴量を更新するか判定すればよい。ただし、複数の状況情報を用いればより精度良く特徴量を更新するか判断することができる。例えば、３つの状況情報のうち、２つ以上の状況情報に基づいて更新すると判定された場合に、更新するようにしてもよい。また、３つの状況情報が全てにおいて更新すると判定された場合に、更新するようにしてもよい。 What is necessary is just to determine whether a feature-value is updated using at least one of the above situation information. However, if a plurality of situation information is used, it can be determined whether the feature amount is updated more accurately. For example, the update may be performed when it is determined to update based on two or more pieces of situation information among the three pieces of situation information. Moreover, you may make it update, when it determines with all three status information being updated.

また、状況情報の取得方法と信頼度が高くなる条件について述べたが、これは各ネットワークカメラ１０５が設置する状況によって異なる場合がある。例えば、ネットワークカメラ１０５が設置されている箇所がすべて薄暗い状況となっていれば、明るさの度合は高くない方が高い信頼度となる。また、条件を示す情報が予め記憶されている例を説明したが、条件を指定する入力を受け付けて条件を変更できるようにしてもよい。以上が、状況情報とその信頼度についての説明である。 Also, the situation information acquisition method and the conditions for increasing the reliability have been described, but this may differ depending on the situation where each network camera 105 is installed. For example, if all the locations where the network camera 105 is installed are in a dim situation, the higher the degree of brightness, the higher the reliability. Moreover, although the example in which the information indicating the condition is stored in advance has been described, the condition may be changed by receiving an input specifying the condition. The above is the explanation of the situation information and its reliability.

次に、図７を用いて映像監視システム１０１が保持する映像解析結果を示すデータテーブル７０１について説明する。データテーブル７０１は、対象ＩＤ、画像、特徴量、状況情報ＩＤ、及び、位置を情報として有している。 Next, a data table 701 indicating video analysis results held by the video monitoring system 101 will be described with reference to FIG. The data table 701 includes a target ID, an image, a feature amount, a situation information ID, and a position as information.

「対象ＩＤ」は取得された画像から検出された人物を一意に識別するためのＩＤである。「画像」は人物が検出された時の画像を示す情報である。これは静止画でもよいし、動画であってもよい。「特徴量」は検出された人物に対して特徴抽出処理を行って求められた特徴量である。 “Target ID” is an ID for uniquely identifying a person detected from the acquired image. “Image” is information indicating an image when a person is detected. This may be a still image or a moving image. The “feature amount” is a feature amount obtained by performing feature extraction processing on the detected person.

「状況情報ＩＤ」は対象の画像を取得した際の状況情報を一意に識別するためのＩＤである。「位置」は対象が検出された位置を示す情報である。 The “situation information ID” is an ID for uniquely identifying the situation information when the target image is acquired. “Position” is information indicating the position where the target is detected.

次に、データテーブル７０２について説明する。データテーブル７０２は、状況情報を保持するためのデータテーブルである。データテーブル７０２は、状況情報ＩＤ、明るさ、カメラとの距離、及び、混雑度を情報として有している。「明るさ」は映像を取得した際の明るさを示す状況情報である。明るさを示す状況情報は照度値として保持してもよいし、大中小などで表現してもよい。「カメラとの距離」は、カメラと対象（人物）との距離を示す状況情報を示す。「混雑度」は、画像を取得した際の混雑度合を示す状況情報である。データテーブル７０１及びデータテーブル７０２は、各装置１０２〜１０４の内のどの装置が保持してもよい。以上が各テーブルについての説明である。 Next, the data table 702 will be described. The data table 702 is a data table for holding status information. The data table 702 includes status information ID, brightness, distance from the camera, and congestion level as information. “Brightness” is status information indicating the brightness when an image is acquired. The situation information indicating the brightness may be held as an illuminance value, or may be expressed in large, medium, or small. “Distance to camera” indicates status information indicating the distance between the camera and the target (person). “Congestion degree” is status information indicating the degree of congestion when an image is acquired. The data table 701 and the data table 702 may be held by any of the devices 102 to 104. This completes the description of each table.

次に、図８のフローチャートを用いて、映像監視システム１０１における処理の流れを説明する。ここでは位置Ａで見つかった人物を同定対象として指定し、別の位置Ｂで見つかった人物と照合し、特徴量の更新を行う処理について説明する。この同定対象として指定された指定オブジェクトが各映像に含まれているかを同定することとなる。必要に応じて、図３及び図７を参照して説明する。なお、Ｓ８０２以降において、位置Ａで取得された特徴量２０５及び状況情報２０７で同定処理を行う例について説明とする。 Next, the flow of processing in the video monitoring system 101 will be described using the flowchart of FIG. Here, a process will be described in which a person found at position A is designated as an identification target, collated with a person found at another position B, and the feature amount is updated. It is identified whether or not the designated object designated as the identification target is included in each video. This will be described with reference to FIGS. 3 and 7 as necessary. Note that an example in which the identification processing is performed with the feature amount 205 and the situation information 207 acquired at the position A in and after S802 will be described.

Ｓ８０１において、同定対象指定部５０９はユーザの入力を受け付けて同定対象を設定する。ここで、Ｓ８０１の詳細について図８（ｂ）を用いて説明する。図８（ｂ）では、位置Ａにあるネットワークカメラ５０１ａが人物２０１を検出し、検出した人物をユーザの入力によって同定対象として設定された場合のフローについて説明する。 In step S 801, the identification target designating unit 509 receives a user input and sets an identification target. Details of S801 will be described with reference to FIG. FIG. 8B illustrates a flow when the network camera 501a at the position A detects the person 201 and the detected person is set as an identification target by the user's input.

Ｓ８１１において、映像取得部５０２は、位置Ａで撮像された映像を取得する。 In step S811, the video acquisition unit 502 acquires the video captured at the position A.

Ｓ８１２において、状況情報取得部５０６は、位置Ａで撮像された映像に対応する状況情報２０７を取得する。この状況情報２０７は特徴量２０５を抽出した時点における状況情報であるとより精度良く更新するか決定することができる。 In step S812, the situation information acquisition unit 506 acquires the situation information 207 corresponding to the video imaged at the position A. Whether the status information 207 is status information at the time of extracting the feature amount 205 can be determined to be updated more accurately.

Ｓ８１３において、人物検出部５０３はＳ８１１で取得した映像から人物の検出を行う。 In step S813, the person detection unit 503 detects a person from the video acquired in step S811.

Ｓ８１４において、特徴抽出部５０４はＳ８１３で検出された人物２０１に対して特徴抽出処理を行い、特徴量２０５を抽出する。抽出する特徴量は人物同定が行えるものであればどのようなものであってもよいが、この例では顔特徴量を用いるものとして説明する。Ｓ８１１〜Ｓ８１４の処理は必要に応じて繰り返し行われるものとする。 In step S814, the feature extraction unit 504 performs feature extraction processing on the person 201 detected in step S813, and extracts a feature amount 205. The feature amount to be extracted may be any as long as it can identify a person, but in this example, description will be made assuming that a facial feature amount is used. The processing of S811 to S814 is repeatedly performed as necessary.

Ｓ８１５において、同定対象指定部５０９は、ユーザの入力を受け付けて、同定対象を設定する。ユーザの入力受付は、例えば、Ｓ８１３で検出された人物の画像をディスプレイに一覧表示させ、ユーザから指定を受け付けることで行われる。以上がＳ８０１の詳細である。 In step S815, the identification target designating unit 509 accepts user input and sets an identification target. The user's input reception is performed, for example, by displaying a list of images of the persons detected in S813 and receiving a designation from the user. The above is the details of S801.

図８（ａ）に戻り、Ｓ８０２において、映像取得部５０２は、位置Ｂの映像を取得する。 Returning to FIG. 8A, in S802, the video acquisition unit 502 acquires the video of the position B.

Ｓ８０３において、状況情報取得部５０６は、位置Ｂで撮像された映像に対応する状況情報２０８を取得する。 In step S 803, the situation information acquisition unit 506 acquires the situation information 208 corresponding to the video imaged at the position B.

Ｓ８０４において、人物検出部５０３はＳ８０２で取得した映像から人物２０２の検出を行う。 In step S804, the person detection unit 503 detects the person 202 from the video acquired in step S802.

Ｓ８０５において、特徴抽出部５０４はＳ８０４で検出された人物２０２に対して特徴抽出処理を行い、人物２０２の特徴量２０６を抽出する。ここでは位置Ｂのカメラについて述べたが、Ｓ８０２〜Ｓ８０５の処理は各位置のカメラごとに繰り返し行われるものとする。 In step S 805, the feature extraction unit 504 performs feature extraction processing on the person 202 detected in step S 804 and extracts the feature amount 206 of the person 202. Although the camera at the position B has been described here, the processing of S802 to S805 is repeatedly performed for each camera at each position.

Ｓ８０６において、特徴照合部５０５はＳ８０１で指定された人物の同定対象情報２０９の特徴量と、Ｓ８０５で得られた特徴量２０８の照合（比較）を行う。 In step S806, the feature matching unit 505 performs matching (comparison) between the feature amount of the person identification target information 209 specified in step S801 and the feature amount 208 obtained in step S805.

Ｓ８０７において、特徴照合部５０５はＳ８０６での照合結果を判定して同定対象の人物２０１と、位置Ｂで検出された人物２０２が同一かどうか判定する。同一かどうかの判定は、特徴量の類似度によって行われ、例えば、特徴量の類似度が閾値より高ければ同一であると判定する。同一であると判定された場合、Ｓ８０８に進み、同一でないと判定されればＳ８１０に進む。 In step S807, the feature matching unit 505 determines the matching result in step S806 and determines whether the person 201 to be identified and the person 202 detected at the position B are the same. Whether or not they are the same is determined based on the similarity between the feature amounts. For example, if the similarity between the feature amounts is higher than a threshold, it is determined that they are the same. If it is determined that they are the same, the process proceeds to S808, and if it is determined that they are not the same, the process proceeds to S810.

Ｓ８０８において、特徴照合部５０５は、同定対象情報２０９の状況情報と、Ｓ８０７で同一と判定された人物２０２に関する特徴量を取得した際の状況情報２０８とを比較する。この例では位置Ａで取得した状況情報２０７（図７における００１ＥＮＶ）と、位置Ｂで取得した状況情報２０８（図７における００２ＥＮＶ）とを比較する。比較した結果、位置Ａで取得した状況情報２０７の方が良い条件であると判定されればＳ８１０に進み、そうでなければＳ８０９に進む。状況情報の比較は、図６の表に示した特徴量の信頼度が高くなる条件に従って行われる。状況情報２０７（００１ＥＮＶ）は、データテーブル７０２の表によると、明るさ「小」、カメラとの距離「近」、混雑度「小」である。状況情報２０８（００２ＥＮＶ）は、明るさ「大」、カメラとの距離「近」、混雑度「小」である。先述したように顔特徴量の場合は、より明るい状況で取得された方が高い信頼度であるため、状況情報２０８（００２ＥＮＶ）の方が状況情報２０７（００１ＥＮＶ）よりも良い状況であるといえる。そのため、この例ではＳ８０９に進む。 In step S808, the feature matching unit 505 compares the situation information of the identification target information 209 with the situation information 208 when the feature amount related to the person 202 determined to be the same in step S807 is acquired. In this example, the situation information 207 (001ENV in FIG. 7) acquired at the position A is compared with the situation information 208 (002ENV in FIG. 7) acquired at the position B. As a result of the comparison, if it is determined that the condition information 207 acquired at the position A is a better condition, the process proceeds to S810; otherwise, the process proceeds to S809. The comparison of the situation information is performed in accordance with the conditions for increasing the reliability of the feature amount shown in the table of FIG. According to the table of the data table 702, the status information 207 (001ENV) is “low” brightness, “near” the distance from the camera, and “low” congestion. The status information 208 (002ENV) includes brightness “large”, distance to the camera “near”, and congestion degree “small”. As described above, in the case of the facial feature amount, the reliability is higher when it is acquired in a brighter situation, and thus it can be said that the situation information 208 (002ENV) is a better situation than the situation information 207 (001ENV). . Therefore, in this example, the process proceeds to S809.

他の例として、位置Ａでは明るさ「小」、カメラとの距離「近」、位置Ｂでは明るさ「大」、カメラとの距離「遠」という状況のように、単純に比較できない場合は、撮像状況に重みを付けて比較するようにしてもよい。明るさのほうがカメラとの距離よりも重要な条件であるとすれば、この例では位置Ｂの撮像状況のほうが良いといえる。 As another example, when the brightness is “small” at the position A, the distance is “near” with the camera, the brightness is “large” at the position B, and the distance is “far” with the camera, The imaging situation may be weighted for comparison. If the brightness is a more important condition than the distance from the camera, it can be said that the imaging situation at position B is better in this example.

次にＳ８０９において、特徴更新部５０７はＳ８０７及びＳ８０８の処理の結果を受けて、同定対象の特徴量の更新を行う。特徴量の更新処理は、同定対象として設定されている位置Ａで取得された特徴量２０５を、位置Ｂで取得された特徴量２０６に置き換えることで行われる。 In step S809, the feature update unit 507 receives the results of the processing in steps S807 and S808, and updates the feature quantity to be identified. The feature amount update processing is performed by replacing the feature amount 205 acquired at the position A set as the identification target with the feature amount 206 acquired at the position B.

Ｓ８１０において、表示部５０８は、特徴量の照合及び更新（Ｓ８０６〜Ｓ８０９）の結果をディスプレイに表示させる。この例では、照合の結果同一人物と判定されたことと、位置Ｂよりも位置Ａのほうが取得した状況が良い状況であると判定されたため特徴量の更新を行ったことが結果として出力される。以上によって、撮像状況の違いを考慮しつつ、より良い撮像状況のときのみ、特徴量を更新することができる。 In step S810, the display unit 508 causes the display to display the result of the feature amount collation and update (S806 to S809). In this example, as a result of collation, it is determined that the person is the same person, and since it is determined that the situation acquired at the position A is better than the position B, the fact that the feature amount has been updated is output as a result. . As described above, the feature quantity can be updated only in a better imaging situation while taking into account the difference in imaging situation.

本実施形態では、顔特徴量についてのみ説明したが、これに限らず、全身特徴量や歩幅特徴量など、特徴量ごとに照合して、状況情報を比較し、それぞれ特徴量の更新を行ってもよい。 In the present embodiment, only the facial feature amount has been described. Also good.

また、Ｓ８０９において、特徴量更新は特徴量を置き換えるとしたが、位置Ａと位置Ｂの特徴量の平均を算出し、その平均特徴量で置き換えるようにしてもよい。 In S809, the feature amount update replaces the feature amount. However, an average of the feature amounts of the position A and the position B may be calculated and replaced with the average feature amount.

以上のように実施形態１の映像監視システム１０１によれば、特徴量を取得した際の撮像状況を考慮した特徴量の更新を行うことができる。これにより、信頼度の低い特徴量に更新することを防ぐことができるため、人物同定や追跡の精度を向上することができる。 As described above, according to the video monitoring system 101 of the first embodiment, the feature amount can be updated in consideration of the imaging state when the feature amount is acquired. Thereby, since it can prevent updating to the feature-value with low reliability, the precision of person identification and tracking can be improved.

＜実施形態２＞
次に実施形態２について説明する。実施形態１においては、特徴量を取得した際の状況情報を取得し、状況情報を考慮した特徴量の更新を行うことについて説明した。実施形態２では、同定対象の特徴量を更新する場合だけでなく、追加で保持する例についても説明する。なお、実施形態１と同様な部分については、適宜、説明を省略する。 <Embodiment 2>
Next, Embodiment 2 will be described. In the first embodiment, it has been described that the situation information at the time of obtaining the feature quantity is obtained and the feature quantity is updated in consideration of the situation information. In the second embodiment, not only the case of updating the feature quantity to be identified, but also an example of additionally storing the feature quantity will be described. Note that description of portions similar to those of the first embodiment is omitted as appropriate.

以下、図９及び図１０のフローチャートを用いて、実施形態２について説明する。以下の説明では、同定対象情報２１０と、図７の対象ＩＤ「００１」の同定対象情報（図９における同定対象情報２０９）が同定対象の情報として保持されている場合について説明する。なお、新たに対象ＩＤ「００２」の特徴量（図９における特徴量２０６）と照合を行う場合について述べる。 Hereinafter, Embodiment 2 will be described with reference to the flowcharts of FIGS. 9 and 10. In the following description, a case will be described in which the identification target information 210 and the identification target information (identification target information 209 in FIG. 9) of the target ID “001” in FIG. 7 are held as identification target information. A case will be described in which matching is newly performed with the feature amount (feature amount 206 in FIG. 9) of the target ID “002”.

Ｓ１００１〜Ｓ１００５はＳ８０１〜Ｓ８０５と同様なため説明を省略する。 Since S1001 to S1005 are the same as S801 to S805, description thereof will be omitted.

Ｓ１００６において、特徴照合部５０５は、保持されている複数の同定対象情報の特徴量２０９，２１０それぞれと、特徴量２０６とを照合する。 In step S 1006, the feature matching unit 505 matches the feature quantities 209 and 210 of the plurality of identification target information held with the feature quantity 206.

Ｓ１００７において、特徴照合部５０５は、Ｓ１００６での複数の照合結果に基づいて同定対象と設定した人物と、位置Ｂで検出された人物２０２とが同一かどうか判定する。同一かどうかの判定は、同定対象情報２０９の特徴量と特徴量２０６との類似度、及び、同定対象情報２１０の特徴量と特徴量２０６との類似度をそれぞれ閾値と比較し、どちらかの類似度が閾値より高ければ同一であると判定する。同一であると判定された場合、Ｓ１００８に進み、同一でないと判定されればＳ１０１０に進む。 In step S1007, the feature matching unit 505 determines whether the person set as the identification target and the person 202 detected at the position B are the same based on the plurality of matching results in step S1006. Whether or not they are identical is determined by comparing the similarity between the feature quantity of the identification target information 209 and the feature quantity 206 and the similarity between the feature quantity of the identification target information 210 and the feature quantity 206 with a threshold value. If the degree of similarity is higher than the threshold, it is determined that they are the same. If it is determined that they are the same, the process proceeds to S1008, and if it is determined that they are not the same, the process proceeds to S1010.

Ｓ１００８において、特徴照合部５０５はＳ１００７で同一と判定された人物に関して、特徴量２０６を取得した際の状況情報２０８と、現在保持されている同定対象情報の各状況情報とを比較する。 In step S 1008, the feature matching unit 505 compares the situation information 208 when the feature amount 206 is acquired with the situation information of the identification target information currently held for the person determined to be the same in step S 1007.

状況情報の比較は、図６の表に示した特徴量の信頼度が高くなる条件に従って行われる。同定対象情報２０９の状況情報と状況情報２０８との比較、同定対象情報２１０の状況情報と状況情報２０８との比較が実行される。 The comparison of the situation information is performed according to the condition that the reliability of the feature amount shown in the table of FIG. 6 is high. A comparison between the status information of the identification target information 209 and the status information 208 and a comparison between the status information of the identification target information 210 and the status information 208 are executed.

Ｓ１００９において、特徴更新部５０７は、Ｓ１００７及びＳ１００８の処理結果を受けて、特徴量を追加又は更新する処理を行う。 In step S1009, the feature update unit 507 performs processing for adding or updating feature amounts in response to the processing results in steps S1007 and S1008.

まず、特徴更新部５０７は、現在、比較対象として保持している同定対象情報の数（保持数）を確認し、保持数が閾値未満かどうか判定し、閾値未満であれば特徴量２０８を、次の照合を行う際の同定対象情報として追加する。保持数は予め設定されていてもよいし、ユーザが指定できるようにしてもよい。 First, the feature update unit 507 confirms the number of identification target information (retention number) currently held as a comparison target, determines whether the retention number is less than a threshold value, It is added as identification target information for the next verification. The holding number may be set in advance or may be specified by the user.

なお、既に保持されている同定対象情報２０９及び２１０よりも良い条件で取得された特徴量のみ同定対象情報として追加するようにしてもよい。また、既に保持されている同定対象情報２０９及び２１０と異なる状況情報の場合に追加で保持するようにしてもよい。 Note that only feature quantities acquired under conditions better than the identification target information 209 and 210 that are already held may be added as identification target information. Further, in the case of situation information different from the identification target information 209 and 210 that are already held, they may be additionally held.

一方、保持数が閾値以上であれば保持している同定対象のうち、最も良くない条件（状況情報）のものと置き換える（更新する）。閾値以上保持している場合に、どの同定対象情報と置き換えるかはこれに限らず様々な方法がある。例えば、取得した時刻が最も古いもの、状況情報が最も近いもの、などと置き換える方法があり、照合精度を向上させるような方法であればどのようなものでもよい。時刻による置き換え方法は、顔特徴量の場合は時刻が古くても撮像状況が良い方がよく、全身の特徴量は撮像状況が多少悪くとも時刻が新しいほうがよいなど特徴量ごとの性質によっても異なる。 On the other hand, if the holding number is equal to or greater than the threshold value, it is replaced (updated) with the worst condition (situation information) among the identification targets held. There are various methods not limited to which identification target information is replaced when the threshold value is held above the threshold value. For example, there is a method of replacing the oldest acquired time or the closest status information, and any method can be used as long as it improves the collation accuracy. The replacement method based on time is different depending on the characteristics of each feature amount, such as when the face feature amount is better in the imaging situation even if the time is older, and the whole body feature amount is better even if the imaging situation is somewhat worse .

Ｓ１０１０において、表示部５０８は、特徴量の照合、及び、特徴量の更新又は追加（Ｓ１００６〜Ｓ１００９）の結果をディスプレイに表示させる。この例では、照合の結果、同一人物と判定されたことと、特徴量を追加で保持したことが結果として出力される。以上によって、撮像状況の違いを考慮しつつ、同定対象の特徴量を追加又は更新することができる。 In step S 1010, the display unit 508 causes the display to display the result of feature amount collation and feature amount update or addition (S 1006 to S 1009). In this example, as a result of collation, it is determined that the person is the same person, and that the feature amount is additionally held is output as a result. As described above, it is possible to add or update the feature quantity to be identified while taking into account the difference in the imaging situation.

以上のように、実施形態２の映像監視システムによれば、特徴量を取得する際の状況情報を取得及び保持することで、状況情報を考慮した特徴量の追加及び保持を行うことができるようになる。これにより、人物同定及び追跡のための特徴量を複数保持することで、より正確に同定できる確率を高めることができる。また、全ての特徴量を追加するわけではないため、照合対象として保持する特徴量の数が増え過ぎることがなく、特徴量照合の計算量を抑えることができる。 As described above, according to the video monitoring system of the second embodiment, it is possible to add and hold a feature quantity considering the situation information by obtaining and holding the situation information when the feature quantity is obtained. become. Accordingly, by holding a plurality of feature amounts for person identification and tracking, it is possible to increase the probability of more accurate identification. In addition, since not all feature quantities are added, the number of feature quantities held as collation targets does not increase excessively, and the calculation amount of feature quantity collation can be suppressed.

（その他の実施例）
次に、図４を用いて、各実施形態の各機能を実現するためのハードウェア構成を説明する。なお、各装置１０２〜１０４も、図５に示すハードウェア構成により実現可能であるが、組込みシステム、タブレット端末等により実現してもよい。 (Other examples)
Next, a hardware configuration for realizing each function of each embodiment will be described with reference to FIG. Each of the devices 102 to 104 can also be realized by the hardware configuration shown in FIG. 5, but may be realized by an embedded system, a tablet terminal, or the like.

図４において、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）４０１は中央演算処理装置であり、コンピュータプログラムに基づいて他の構成要素と協働し、情報処理装置全体の動作を制御する。ＲＯＭ（Ｒｅａｄｏｎｌｙｍｅｍｏｒｙ）４０２は読出し専用メモリであり、コンピュータプログラムやそのプログラムで規定される処理に使用するデータ等を記憶する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）４０３は書込み可能メモリであり、ＣＰＵ４０１のワークエリア等として機能する。 In FIG. 4, a CPU (Central Processing Unit) 401 is a central processing unit and controls the operation of the entire information processing apparatus in cooperation with other components based on a computer program. A ROM (Read only memory) 402 is a read-only memory, and stores a computer program, data used for processing defined by the program, and the like. A RAM (Random Access Memory) 403 is a writable memory and functions as a work area of the CPU 401.

外部記憶装置４０４は記録媒体へのアクセスし、ＵＳＢメモリ等のメディア（記録媒体）に記憶されたコンピュータプログラムやデータを映像監視システム１０１にロードすることができる。ストレージ４０５はＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）やＳＳＤ（ＳｏｌｉｄｓｔａｔｅＤｒｉｖｅ）等の大容量メモリとして機能する装置である。ストレージ４０５には、各種コンピュータプログラムやデータが格納される。 The external storage device 404 can access a recording medium and load a computer program and data stored in a medium (recording medium) such as a USB memory into the video monitoring system 101. The storage 405 is a device that functions as a large-capacity memory such as an HDD (Hard Disk Drive) or an SSD (Solid state Drive). The storage 405 stores various computer programs and data.

操作部４０６はユーザからの指示やコマンドの入力を受け付ける装置であり、キーボードやポインティングデバイス、タッチパネル等で実現することができる。 The operation unit 406 is a device that receives an instruction and a command input from a user, and can be realized by a keyboard, a pointing device, a touch panel, or the like.

ディスプレイ４０７は、操作部４０６から入力されたコマンドや、それに対する情報処理装置の応答出力等を表示する表示装置である。インタフェース（Ｉ／Ｆ）４０８は外部装置とのデータのやり取りを中継する装置である。システムバス４０９は、情報処理装置内のデータのやり取りを行うデータバスである。 The display 407 is a display device that displays a command input from the operation unit 406 and a response output of the information processing apparatus with respect to the command. An interface (I / F) 408 is a device that relays data exchange with an external device. A system bus 409 is a data bus for exchanging data in the information processing apparatus.

なお、本発明は、上述の実施形態の１以上の機能を実現するプログラムを１つ以上のプロセッサが読出して実行する処理でも実現可能である。プログラムは、ネットワーク又は記憶媒体を介して、プロセッサを有するシステム又は装置に供給するようにしてもよい。また、本発明は、上述の実施形態の１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention can also be realized by processing in which one or more processors read and execute a program that realizes one or more functions of the above-described embodiment. The program may be supplied to a system or apparatus having a processor via a network or a storage medium. The present invention can also be realized by a circuit (for example, ASIC) that realizes one or more functions of the above-described embodiments.

また、図５に示す各機能ブロックは、図４に示すハードウェアにより実現してもよいし、ソフトウェアにより実現することもできる。 Each functional block shown in FIG. 5 may be realized by the hardware shown in FIG. 4 or may be realized by software.

また、本発明は以上説明した各実施形態に限定されることはなく、本発明の要旨を逸脱しない範囲において種々変更が可能である。例えば、各実施形態を組み合わせたものも本
明細書の開示内容に含まれる。 The present invention is not limited to the embodiments described above, and various modifications can be made without departing from the scope of the present invention. For example, what combined each embodiment is also contained in the content of an indication of this specification.

５０２映像取得部
５０３人物検出部
５０４特徴抽出部
５０５特徴照合部
５０６状況情報取得部
５０７特徴更新部 502 video acquisition unit 503 person detection unit 504 feature extraction unit 505 feature collation unit 506 status information acquisition unit 507 feature update unit

Claims

Extraction means for extracting a first feature amount relating to a designated object, which is a designated object, from a video imaged by the imaging means;
The second feature amount relating to the object detected from the video different from the video from which the first feature amount is extracted is compared with the first feature amount, and the object detected from the different video is Determining means for determining whether the object is the designated object;
Acquisition means for acquiring first situation information indicating a situation relating to the video from which the first feature amount has been extracted and second situation information indicating a situation relating to the video from which the second feature quantity has been extracted;
When the determination unit determines that the object detected from the different video is the specified object, the first situation information acquired by the acquisition unit is compared with the second situation information. Updating means for determining whether to update the first feature quantity based on the second feature quantity and updating the first feature quantity based on the second feature quantity when it is determined to update A video analysis apparatus characterized by comprising:

The situation information includes the situation related to the distance between the imaged object and the imaging means that imaged the object, the situation related to the degree of congestion around the imaged object, and the brightness of the video obtained by imaging the object. The video analysis apparatus according to claim 1, wherein the video analysis apparatus is at least one of such situations.

The video analysis apparatus according to claim 1, wherein the different video is a video captured by an imaging unit different from the imaging unit.

The updating unit determines whether the second situation information is more suitable for processing in the determination unit than the first situation information. When the updating unit determines that the second situation information is more suitable, The video analysis device according to claim 1, wherein the video analysis device is updated based on the second status information.

The updating means determines that the brightness indicated by the second situation information is more suitable for the processing in the determination means when the brightness indicated by the first situation information is brighter. The video analysis apparatus according to claim 4.

The updating means is configured such that the distance between the imaged object indicated by the second situation information and the image pickup means obtained by imaging the object indicates the imaged object and the object indicated by the first situation information. The video analysis apparatus according to claim 4 or 5, wherein the video analysis apparatus determines that the determination unit is suitable for processing when the distance to the image pickup unit is shorter than the image pickup unit.

The update unit is more suitable for the process in the determination unit when the congestion level indicated by the second situation information indicates that the congestion level is less than the congestion level indicated by the first status information. The video analysis device according to claim 4, wherein the video analysis device is determined as follows.

The determination unit compares a plurality of comparison target feature amounts including the first feature amount with the second feature amount. The update unit has a predetermined number of the comparison target feature amounts. The video analysis device according to any one of claims 1 to 7, wherein when the number is smaller than the number, the second feature amount is added to the plurality of comparison target feature amounts used by the determination unit. .

An extraction step of extracting a first feature amount relating to a designated object, which is a designated object, from a video imaged by the imaging means;
Comparing the first feature quantity with a second feature quantity associated with an object detected from a video different from the video from which the first feature quantity has been extracted,
An acquisition step of acquiring first situation information indicating a situation relating to the video from which the first feature amount has been extracted and second situation information indicating a situation relating to the video from which the second feature amount has been extracted;
When the determination step determines that the object detected from the different video is the designated object, the first situation information acquired by the acquisition step is compared with the second situation information. The updating step of determining whether to update the first feature quantity based on the second feature quantity and updating the first feature quantity based on the second feature quantity when it is determined to update. And a video analysis method characterized by comprising:

The program for functioning a computer as each means of the video-analysis apparatus of any one of Claims 1-8.