JP2013092955A

JP2013092955A - Video analysis device and system

Info

Publication number: JP2013092955A
Application number: JP2011235601A
Authority: JP
Inventors: Hiroki Watanabe; 裕樹渡邉; Atsushi Hiroike; 敦廣池
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2011-10-27
Filing date: 2011-10-27
Publication date: 2013-05-16
Anticipated expiration: 2031-10-27
Also published as: JP5743849B2

Abstract

PROBLEM TO BE SOLVED: To provide a video analysis device for quickly analyzing a scene in which various dynamic objects appear.SOLUTION: The video analysis device includes: a dynamic object area detection part 103 for detecting an area in which a dynamic object exists from a frame image 102 of an input video; and an object category determination part 104 for determining the category of the object detected by the dynamic object area detection part. The video analysis device generates an "existence probability map" showing probability that an object appears in each coordinate in a video from the object area and the category determined by the object category determination part, and stores the "existence probability map" as time series data to generate the existence probability map of each category in a fixed time span.

Description

本発明は、映像中の動的物体の解析を行うための映像解析装置、システムに関するものである。 The present invention relates to a video analysis apparatus and system for analyzing a dynamic object in a video.

映像監視システムにおいては、映像中の動的物体を検出し、検出された物体を解析・検索する機能が実用化されている。例えば、映像中から人物の顔領域を検出し、過去に蓄積された顔画像と照合することで、不審人物や重要人物が写った際にユーザに通知する、といった機能を備える映像監視システムが存在する。 In a video surveillance system, a function for detecting a dynamic object in a video and analyzing / searching the detected object has been put into practical use. For example, there is a video surveillance system that has the function of notifying a user when a suspicious person or an important person is captured by detecting the face area of a person in the video and comparing it with face images accumulated in the past To do.

検出対象の物体は、現在のところ人物の顔が最も一般的であるが、今後例えば、車両や種々の物品など多様な動的物体を検出・解析対象に入れることが望まれている。ここで、処理の軽減を図るために、解析処理効率の向上が必要とされている。 Currently, the detection target object is most commonly a human face, but in the future, for example, it is desired to include various dynamic objects such as vehicles and various articles as detection / analysis targets. Here, in order to reduce the processing, it is necessary to improve the analysis processing efficiency.

解析処理効率の向上に関して、例えば特許文献１には、物体の存在確率を利用して、物体領域を検出するための画像処理を実施する領域を限定する手段が示されている。特許文献１の手法は、焦点距離や解像度など、撮像系の静的な情報を利用して、画像処理を行う領域を決定するものであり、車載カメラのように撮影環境や撮影機器が限定された環境では有効である。 Regarding improvement in analysis processing efficiency, for example, Patent Document 1 discloses a means for limiting an area in which image processing for detecting an object area is performed using the existence probability of an object. The method of Patent Document 1 is to determine a region for image processing using static information of an imaging system such as a focal length and resolution, and a photographing environment and photographing equipment are limited like an in-vehicle camera. It is effective in other environments.

一方、画像照合以外でのシーン解析手法としては、特許文献２において、人物の動線をデータベースに保存しておくことで、特定の人物行動を表す条件データに該当する人物が現れたシーンを検索する方法について述べている。動線による検索は、ひとつの物体の動作に着目したシーン検索を行うのには有効である。 On the other hand, as a scene analysis method other than image matching, in Patent Document 2, a scene in which a person corresponding to condition data representing a specific person action appears is searched by storing a person's flow line in a database. Describes how to do. The search based on the flow line is effective for performing a scene search focusing on the motion of one object.

特開2010-003254号JP 2010-003254 A 特開2009-284167号JP2009-284167

しかしながら、監視カメラのように、撮影状況や映像中の被写体の位置が事前に予測できない環境（非統制環境）において、特許文献１の手法を適用することはできない。 However, the method of Patent Document 1 cannot be applied in an environment (uncontrolled environment) where the shooting situation and the position of the subject in the video cannot be predicted in advance, like a surveillance camera.

また、特許文献２では、不正行為等の特定の行為の検出には有効であるが、多様な動的物体が現れるシーンを高速に解析するものではない。 Patent Document 2 is effective for detecting a specific action such as an illegal action, but does not analyze a scene in which various dynamic objects appear at high speed.

本発明に係る映像解析装置は、入力された映像のフレーム画像から、動的な物体が存在する領域を検出する動的物体領域検出部と、この検出された動的物体領域検出部で検出された物体のカテゴリを判別する物体カテゴリ判別部とを有し、この物体カテゴリ判別部で判別されたカテゴリ毎に、前記検出された動的物体領域（座標又は場）において物体が現れる確率を表す「存在確率マップ」を生成し、それを時系列データで保存することで、一定のタイムスパンにおける各カテゴリの存在確率マップを生成する。 The video analysis device according to the present invention is detected by a dynamic object region detection unit that detects a region where a dynamic object exists from a frame image of an input video, and the detected dynamic object region detection unit. An object category discriminating unit for discriminating the category of the detected object, and for each category discriminated by the object category discriminating unit, a probability that an object appears in the detected dynamic object region (coordinate or field) An “existence probability map” is generated and saved as time series data, thereby generating an existence probability map of each category in a certain time span.

上記構成により、映像空間における物体の存在確率マップを、物体のカテゴリ毎に求めることができる。その結果、存在確率マップを利用して、物体検出処理における画像認識処理の実施領域を限定することで、物体検出を高速化することが可能である。 With the above configuration, an object existence probability map in the video space can be obtained for each object category. As a result, it is possible to speed up the object detection by using the existence probability map to limit the execution area of the image recognition process in the object detection process.

本発明による映像解析システム１００の機能ブロック図である。1 is a functional block diagram of a video analysis system 100 according to the present invention. 本発明による映像解析システム１００における存在確率マップの生成を説明するための図である。It is a figure for demonstrating the production | generation of the existence probability map in the video analysis system 100 by this invention. 本発明による映像解析システム１００における存在確率マップの生成処理手順を表すフローチャートである。It is a flowchart showing the production | generation process procedure of the existence probability map in the video analysis system 100 by this invention. 複数カテゴリの物体検出の一例を説明するための図である。It is a figure for demonstrating an example of the object detection of multiple categories. 複数カテゴリの物体検出の一例の処理手順を表すフローチャートである。It is a flowchart showing the process sequence of an example of the object detection of multiple categories. 物体検出の信頼度を算出する処理手順を表すフローチャートである。It is a flowchart showing the process sequence which calculates the reliability of an object detection. 存在確率マップを用いた物体検出の高速化を説明するための図である。It is a figure for demonstrating speeding-up of the object detection using a presence probability map. 存在確率マップを用いた物体検出の高速化の処理手順を表すフローチャートである。It is a flowchart showing the process sequence of the acceleration of the object detection using an existence probability map. 存在確率マップを用いた異常シーン検知を説明するための図である。It is a figure for demonstrating the abnormal scene detection using an existence probability map. 存在確率マップを用いた異常シーン検知の処理手順を表すフローチャートである。It is a flowchart showing the process sequence of the abnormal scene detection using an existence probability map. 存在確率マップを用いた類似シーン検索を説明するための図である。It is a figure for demonstrating the similar scene search using an existence probability map. 類似シーン検索用の映像データベースの構成とデータ例を示す図である。It is a figure which shows the structure and data example of the video database for a similar scene search. 類似シーン検索システムの構成を示す図。The figure which shows the structure of a similar scene search system. 類似シーン検索のシーケンスを示す図。The figure which shows the sequence of a similar scene search. マーケティングへの応用の実施例を説明する図。The figure explaining the Example of the application to marketing. PTZカメラを用いた広域存在確率マップ生成の実施例を説明する図。The figure explaining the Example of the wide area existence probability map production | generation using a PTZ camera. 時刻毎の広域存在確率マップを用いたカメラのPTZ制御の実施例を説明する図。The figure explaining the Example of the PTZ control of the camera using the wide area presence probability map for every time.

以下、図面を参照して、本発明の好適な実施例について詳細に説明する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

図１は、映像解析システム１００の全体構成図の一例である。映像解析システム１００は、映像入力装置１０１、操作情報入力装置１０７、表示装置１１０、計算機１１１から構成される。計算機１１１は、映像入力部１０２、動的物体領域検出部１０３、物体カテゴリ判別部１０４、存在確率算出部１０５、存在確率蓄積部１０６、操作情報入力部１０８、存在確率出力部１０９、を備え、汎用のコンピュータ上に実装可能である。映像入力装置１０１は、映像再生機器やカメラなど、外部映像をシステムに取り込むための入力インタフェースである。 FIG. 1 is an example of an overall configuration diagram of the video analysis system 100. The video analysis system 100 includes a video input device 101, an operation information input device 107, a display device 110, and a computer 111. The computer 111 includes a video input unit 102, a dynamic object region detection unit 103, an object category determination unit 104, an existence probability calculation unit 105, an existence probability accumulation unit 106, an operation information input unit 108, and an existence probability output unit 109. It can be implemented on a general-purpose computer. The video input device 101 is an input interface for capturing external video into the system, such as a video playback device or a camera.

映像入力部１０２は、映像入力装置１０１から映像データを受け取り、フレーム画像（静止画像）に変換し、動的物体領域検出部１０３に出力する。 The video input unit 102 receives video data from the video input device 101, converts it into a frame image (still image), and outputs it to the dynamic object region detection unit 103.

動的物体領域検出部１０３は、映像入力部１０２からフレーム画像を受け取り、画像認識処理によってフレーム画像中で動的物体が映っている領域の座標を特定する。なお、動的物体とは、それ自身で移動可能な物体を意味するもので、動物や車、自転車等が挙げられる。本実施例において動的物体領域検出部１０３では、複数の種類（カテゴリ）の物体を検出できるものとする。動的物体領域検出部１０３は、検出した物体領域の座標情報と検出結果の信頼度を出力する。例えば、物体領域は矩形データとして導出され、座標情報は[矩形の左上隅の水平座標, 矩形の左上隅の垂直座標, 矩形の右下隅の水平座標, 矩形の右下隅の垂直座標]という形式で出力される。検出結果の信頼度は、矩形画像の「物体らしさ」を表す値である。 The dynamic object region detection unit 103 receives the frame image from the video input unit 102, and specifies the coordinates of the region where the dynamic object is reflected in the frame image by image recognition processing. The dynamic object means an object that can move by itself, and examples thereof include animals, cars, and bicycles. In this embodiment, the dynamic object region detection unit 103 can detect a plurality of types (categories) of objects. The dynamic object region detection unit 103 outputs the coordinate information of the detected object region and the reliability of the detection result. For example, the object area is derived as rectangular data, and the coordinate information is in the form of [horizontal coordinates of the upper left corner of the rectangle, vertical coordinates of the upper left corner of the rectangle, horizontal coordinates of the lower right corner of the rectangle, vertical coordinates of the lower right corner of the rectangle] Is output. The reliability of the detection result is a value representing the “object likeness” of the rectangular image.

物体カテゴリ判別部１０４は、動的物体検出部１０３で検出された領域に映っている物体の意味論的カテゴリを特定する。カテゴリは、例えば、「人」や「車」などである。カテゴリの粒度に関しては、使用目的と判別精度に応じて自由に変更可能であり、例えば、「人」カテゴリをさらに分類し、「男性」、「女性」のように下位カテゴリを出力しても良い。 The object category discriminating unit 104 specifies a semantic category of an object shown in the area detected by the dynamic object detecting unit 103. The category is, for example, “person” or “car”. The category granularity can be freely changed according to the purpose of use and discrimination accuracy. For example, the “person” category may be further classified, and lower categories such as “male” and “female” may be output. .

存在確率算出部１０５は、カテゴリ毎に存在確率マップを生成する。存在確率算出部１０５は、まず、動的物体検出部１０３で検出された矩形データと信頼度、および物体カテゴリ判別部１０４で求められたカテゴリを用いて、単一フレームの存在確率マップを導出する。存在確率マップはカメラ映像のXY座標に対応するテーブル状のデータ構造であり、各座標における存在確率は、例えば、単純に検出領域の信頼度を元に算出される。単一フレームの存在確率マップは、存在確率蓄積部１０６に、時系列データとして格納される。次に、存在確率蓄積部１０６から一定のタイムスパンの存在確率マップを読み出し集計することで、そのタイムスパンにおける存在確率マップを生成し、存在確率出力部１０９に出力する。 The existence probability calculation unit 105 generates an existence probability map for each category. First, the existence probability calculation unit 105 derives a single frame existence probability map using the rectangular data and reliability detected by the dynamic object detection unit 103 and the category obtained by the object category determination unit 104. . The existence probability map is a table-like data structure corresponding to the XY coordinates of the camera image, and the existence probability at each coordinate is calculated based on, for example, the reliability of the detection area. The existence probability map of a single frame is stored in the existence probability accumulation unit 106 as time series data. Next, an existence probability map of a certain time span is read from the existence probability accumulation unit 106 and aggregated, thereby generating an existence probability map in that time span and outputting it to the existence probability output unit 109.

存在確率蓄積部１０６は、存在確率算出部１０５で計算されたカテゴリ毎の存在確率マップを時系列データとして保存する。 The existence probability accumulation unit 106 stores the existence probability map for each category calculated by the existence probability calculation unit 105 as time series data.

操作情報入力装置１０７は、マウスやキーボード、タッチデバイスなど、ユーザの操作をシステムに伝えるための入力インタフェースである。 The operation information input device 107 is an input interface for transmitting user operations to the system, such as a mouse, a keyboard, and a touch device.

操作情報入力部１０８は、操作情報入力装置１０７から入力されたユーザの操作情報を、存在確率算出部１０５に伝える。例えば、存在確率蓄積部１０６で保持されている存在確率を初期化したり、存在確率を蓄積するタイムスパンを指定したりするための指示をシステムに伝える。 The operation information input unit 108 transmits the user operation information input from the operation information input device 107 to the existence probability calculation unit 105. For example, an instruction for initializing the existence probability held in the existence probability accumulation unit 106 or designating a time span for accumulating the existence probability is transmitted to the system.

存在確率出力部１０９は、存在確率算出部１０５で算出された存在確率マップのデータを解析結果として表示装置１１０に出力する。 The existence probability output unit 109 outputs the existence probability map data calculated by the existence probability calculation unit 105 to the display device 110 as an analysis result.

表示装置１１０は、液晶ディスプレイやＣＲＴなどの出力インタフェースであり、存在確率出力部１０９から受け取った解析結果を画面表示する。 The display device 110 is an output interface such as a liquid crystal display or a CRT, and displays the analysis result received from the existence probability output unit 109 on the screen.

図２は、存在確率算出部１０５が、存在確率マップを生成する様子を示す図であり、左から右への時系列での処理を表す。２０１、２０２、２０３は、映像入力部１０２で抽出されたフレーム画像である。画像中の太枠の矩形は、動的物体領域検出部１０３で検出された物体の領域を表す。フレーム画像２０１では、「車」と「人」という２種類のカテゴリの物体が検出されている。この結果を元に、それぞれ存在確率マップ２０４、２０７を算出する。存在確率マップの各座標の値は、該当する物体領域の信頼度に比例した値とする。次のフレーム画像２０２についても同様に存在確率マップを求め、フレーム画像２０１で生成した存在確率マップと合わせて集計する。集計方法は、例えば、時系列データにおける各座標の信頼度の最大値または平均値を用いる方法がある。この結果、２０５や２０８のような２フレームのタイムスパンにおける存在確率マップが生成される。以上の処理をフレーム画像２０３まで繰り返すことで、最終的に２０６や２０９のような存在確率マップが生成される。このように、データ自体は「点」である座標で管理し、更新時や利用時には、領域単位で「場」として管理する。 FIG. 2 is a diagram illustrating a state in which the existence probability calculation unit 105 generates an existence probability map, and represents time-series processing from left to right. Reference numerals 201, 202, and 203 are frame images extracted by the video input unit 102. A thick rectangle in the image represents an object region detected by the dynamic object region detection unit 103. In the frame image 201, two types of objects of “car” and “person” are detected. Based on this result, existence probability maps 204 and 207 are calculated, respectively. The value of each coordinate in the existence probability map is a value proportional to the reliability of the corresponding object region. An existence probability map is similarly obtained for the next frame image 202 and is added together with the existence probability map generated by the frame image 201. As a totaling method, for example, there is a method of using the maximum value or the average value of the reliability of each coordinate in the time series data. As a result, an existence probability map such as 205 or 208 in a time span of two frames is generated. By repeating the above processing up to the frame image 203, an existence probability map such as 206 or 209 is finally generated. In this way, the data itself is managed by coordinates that are “points”, and is managed as “places” in units of areas at the time of update or use.

図３は、映像解析システム１００が存在確率マップを出力する処理を説明するフローチャートである。以下、図３の各ステップについて説明する。 FIG. 3 is a flowchart for explaining processing in which the video analysis system 100 outputs an existence probability map. Hereinafter, each step of FIG. 3 will be described.

ステップＳ３０１では、映像入力部１０２において、入力画像からフレーム画像（静止画像）を取得する。フレーム画像は、一定の時間間隔で繰り返し取得される。ここで、入力映像は必ずしも固定カメラからの映像である必要はなく、映像の座標と存在確率マップの座標との対応がとれていれば、自由視点カメラを用いても良い。以下では、簡略のために、固定カメラを用いた場合について説明する。 In step S301, the video input unit 102 acquires a frame image (still image) from the input image. Frame images are repeatedly acquired at regular time intervals. Here, the input video does not necessarily need to be a video from a fixed camera, and a free viewpoint camera may be used as long as the video coordinates correspond to the coordinates of the existence probability map. Hereinafter, for the sake of brevity, a case where a fixed camera is used will be described.

ステップＳ３０２では、動的物体領域検出部１０３によって、ステップＳ３０１で取得したフレーム画像に対して画像認識処理を行い、動的物体領域の座標データを求める。検出処理においては、各動的物体領域に対して「物体らしさ」を表す信頼度が与えられるものとする。複数カテゴリの物体検出を行う方法は様々であるが、その一例として複数テンプレートを用いた照合手法に基づく方法について図４の説明として後述する。 In step S302, the dynamic object region detection unit 103 performs image recognition processing on the frame image acquired in step S301 to obtain coordinate data of the dynamic object region. In the detection process, it is assumed that a reliability indicating “object-likeness” is given to each dynamic object region. There are various methods for performing object detection in a plurality of categories. As an example, a method based on a matching method using a plurality of templates will be described later with reference to FIG.

ステップＳ３０３では、処理対象のフレーム画像に関する存在確率マップを初期化する。ここでは、物体カテゴリ判別部１０４で判別可能なカテゴリ数と同数の存在確率マップを作成する。 In step S303, the existence probability map relating to the processing target frame image is initialized. Here, the same number of existence probability maps as the number of categories discriminable by the object category discrimination unit 104 are created.

ステップＳ３０４からステップＳ３０７は、動的物体領域検出部１０３で検出された全ての領域に対する繰り返し処理である。 Steps S 304 to S 307 are repetitive processing for all regions detected by the dynamic object region detection unit 103.

ステップＳ３０５では、物体カテゴリ判別部１０４によって、検出された物体のカテゴリを判定する。判定方法は、例えば、ステップＳ３０２で複数テンプレートを用いた照合手法を用いた場合は、対象物体と一致したテンプレートのカテゴリを出力すれば良い。 In step S 305, the category of the detected object is determined by the object category determination unit 104. As the determination method, for example, when a collation method using a plurality of templates is used in step S302, a template category that matches the target object may be output.

ステップＳ３０６では、ステップＳ３０５で判定されたカテゴリについての存在確率マップを更新する。本ステップでは、物体領域に対応する座標の値として、領域の信頼度に比例した値を設定する。すでに設定済みの場合は、大きい方の値を採用する。 In step S306, the existence probability map for the category determined in step S305 is updated. In this step, a value proportional to the reliability of the region is set as the coordinate value corresponding to the object region. If it has already been set, use the larger value.

ステップＳ３０８では、各カテゴリの単一フレームにおける存在確率マップを、存在確率蓄積部１０６に保存する。 In step S 308, the existence probability map in a single frame of each category is stored in the existence probability accumulation unit 106.

ステップＳ３０９では、一定のタイムスパンにおける存在確率マップを存在確率蓄積部１０６から読み出し、カテゴリごとに集計する。集計方法としては、図２の説明で述べたとおり、最大値や平均値を用いる。 In step S309, an existence probability map in a certain time span is read from the existence probability accumulation unit 106 and aggregated for each category. As the counting method, as described in the explanation of FIG. 2, the maximum value or the average value is used.

ステップＳ３１０は、ステップＳ３０９で求めた一定のタイムスパンにおける存在確率マップを、存在確率出力部１０９に出力する。 In step S310, the existence probability map in the fixed time span obtained in step S309 is output to the existence probability output unit 109.

ステップＳ３１１は、映像入力部１０２において全てのフレーム画像の切り出しが終わっていれば、本処理フローを終了する。処理が済んでいないフレーム画像が残っていれば、ステップＳ３０１に戻り、次のフレーム画像を処理する。 In step S311, if all the frame images have been cut out in the video input unit 102, the process flow ends. If a frame image that has not been processed remains, the process returns to step S301 to process the next frame image.

以上、本発明の映像解析システムにおける、存在確率マップの生成処理について説明した。次に、図４を用いて、図３のステップＳ３０２における複数カテゴリの動的物体領域の検出方法の一例について説明する。以下で説明する手法は、複数テンプレートに対する照合手法に基づく物体検出手法である。 The existence probability map generation processing in the video analysis system of the present invention has been described above. Next, an example of a method for detecting a plurality of categories of dynamic object regions in step S302 in FIG. 3 will be described with reference to FIG. The method described below is an object detection method based on a matching method for a plurality of templates.

図４が表す方法では、予め、検出したい物体の典型的な画像（テンプレート）の画像特徴量を抽出し、テンプレートデータベースに保存しておく。画像特徴量は、色特徴や形状特徴など画像そのものが有する特徴を数値化したものであり、例えば固定長ベクトルデータで与えられる。 In the method shown in FIG. 4, an image feature amount of a typical image (template) of an object to be detected is extracted in advance and stored in a template database. The image feature amount is obtained by quantifying features of the image itself such as color features and shape features, and is given as, for example, fixed-length vector data.

入力画像４０１が与えられると、まず、走査窓４０２の位置やサイズを機械的に変更して、物体の候補領域を抽出する。次に、全ての候補領域に対して、予め用意した複数のテンプレートの中から、特徴量ベクトル空間上での最近傍テンプレートを探索する。最近傍のテンプレートとのベクトル間距離が閾値以下であれば、物体であると判定して、候補領域を検出結果に加える。このとき、最近傍テンプレートとの距離を検出結果の信頼度として用いる事ができる。 When the input image 401 is given, first, the position and size of the scanning window 402 are mechanically changed to extract an object candidate region. Next, the nearest neighbor template in the feature vector space is searched from a plurality of templates prepared in advance for all candidate regions. If the distance between vectors with the nearest neighbor template is less than or equal to the threshold, it is determined that the object is an object, and the candidate area is added to the detection result. At this time, the distance from the nearest template can be used as the reliability of the detection result.

図５のフローチャートを用いて、複数カテゴリの物体検出の一例の処理手順を説明する。 A processing procedure of an example of object detection of a plurality of categories will be described using the flowchart of FIG.

ステップＳ５０１は、入力画像から候補領域を抽出する。候補領域は、走査窓を一定ステップ毎に移動、サイズ変更することで、機械的に抽出される。全ての候補領域について、ステップＳ５０２〜ステップＳ５０６の処理を行う。 In step S501, candidate areas are extracted from the input image. Candidate areas are mechanically extracted by moving and resizing the scanning window at fixed steps. The processing from step S502 to step S506 is performed for all candidate regions.

ステップＳ５０３では、候補領域の信頼度を算出する。信頼度の算出方法としては、図４で述べたように、複数テンプレートとの特徴量ベースの照合を用いる方法がある。処理手順については、図６のフローチャートで詳しく説明する。また、別の方法として、機械学習ベースの識別器を用いる方法も知られている。この場合は、システムが扱うカテゴリの数だけ識別器を用意する必要がある。 In step S503, the reliability of the candidate area is calculated. As a reliability calculation method, as described in FIG. 4, there is a method of using feature amount-based collation with a plurality of templates. The processing procedure will be described in detail with reference to the flowchart of FIG. As another method, a method using a machine learning-based classifier is also known. In this case, it is necessary to prepare as many classifiers as the number of categories handled by the system.

ステップＳ５０４では、ステップＳ５０３で求めた候補領域の信頼度が閾値以下であれば、ステップＳ５０５に移動し、それ以外はステップＳ５０５をスキップする。 In step S504, if the reliability of the candidate area calculated | required by step S503 is below a threshold value, it will move to step S505 and will skip step S505 otherwise.

ステップＳ５０５は、処理中の候補領域を検出結果リストに追加する。 In step S505, the candidate area being processed is added to the detection result list.

全ての候補領域に対して、ステップＳ５０２〜ステップＳ５０６の処理が済んだ場合、ステップＳ５０７は検出結果リストを出力し、本処理フローを終了する。検出結果は、領域の座標情報（例えば、[矩形の左上隅の水平座標, 矩形の左上隅の垂直座標, 矩形の右下隅の水平座標, 矩形の右下隅の垂直座標]）と信頼度の組として出力される。 When the processes in steps S502 to S506 have been completed for all candidate areas, step S507 outputs a detection result list, and the process flow ends. The detection result is a combination of area coordinate information (for example, [horizontal coordinates of the upper left corner of the rectangle, vertical coordinates of the upper left corner of the rectangle, horizontal coordinates of the lower right corner of the rectangle, vertical coordinates of the lower right corner of the rectangle]) and reliability. Is output as

図６は、図５のステップＳ５０３における、候補領域の信頼度を算出する処理手順を表すフローチャートである。図６は、図４で説明した複数テンプレートとの照合手法に基づく信頼度算出方法である。 FIG. 6 is a flowchart showing a processing procedure for calculating the reliability of the candidate area in step S503 of FIG. FIG. 6 shows a reliability calculation method based on the matching method with a plurality of templates described in FIG.

ステップＳ６０１では、最近傍テンプレートＴ＝null、最近傍テンプレートからの距離ｄ＝０、として状態を初期化する。 In step S601, the state is initialized with the nearest neighbor template T = null and the distance d = 0 from the nearest neighbor template.

ステップＳ６０２では、候補領域の画像特徴量を抽出する。画像特徴量は、図４で述べたテンプレートデータベースと同様の方法で抽出する。 In step S602, the image feature amount of the candidate area is extracted. The image feature amount is extracted by the same method as the template database described in FIG.

次に、テンプレートデータベースの全てのテンプレートに対して、ステップＳ６０３〜ステップＳ６０７の処理を行う。 Next, the processes in steps S603 to S607 are performed on all templates in the template database.

ステップＳ６０４では、入力画像と処理対象のテンプレートT’との特徴量ベクトル間距離ｄ’を求める。 In step S604, a feature vector distance d 'between the input image and the processing target template T' is obtained.

ステップＳ６０５では、ステップＳ６０５で求めた距離がｄより小さければステップＳ６０６に移動し、そうでなければ、ステップＳ６０６をスキップする。 In step S605, if the distance obtained in step S605 is smaller than d, the process moves to step S606, and if not, step S606 is skipped.

ステップＳ６０６は、最近傍テンプレートＴを処理中のテンプレートＴ’に置き換え、最近傍テンプレートからの距離ｄをステップＳ６０４で求めたｄ’に置き換える。 In step S606, the nearest template T is replaced with the template T 'being processed, and the distance d from the nearest template is replaced with d' obtained in step S604.

データベース中の全てのテンプレートに対して処理が済んだ場合、ステップＳ６０１に移動する。 If all templates in the database have been processed, the process moves to step S601.

ステップＳ６０１では、候補領域の信頼度を出力する。信頼度は、例えば、最近傍テンプレートからの距離ｄの逆数１／ｄ（ｄ≠０）と定義できる。 In step S601, the reliability of the candidate area is output. The reliability can be defined as, for example, the reciprocal 1 / d (d ≠ 0) of the distance d from the nearest template.

以上、映像解析システム１００による、存在確率マップの生成手法について説明した。映像解析システム１００は、映像入力部１０２において、入力映像からフレーム画像を切り出し、動的物体領域検出部１０３と物体カテゴリ判別部１０４において、フレーム画像中の物体の位置、カテゴリ、検出の信頼度を算出する。これらの情報から、存在確率算出部１０５では、処理中のフレーム画像における存在確率マップをカテゴリ毎に生成する。また、存在確率蓄積部に保存された存在確率マップの時系列データを集計することで、一定のタイムスパンにおける存在確率マップを求める。これにより、映像中の特定の場所に特定カテゴリ物体が現れる可能性を求めることができる。 The existence probability map generation method by the video analysis system 100 has been described above. In the video analysis system 100, the video input unit 102 cuts out a frame image from the input video, and the dynamic object region detection unit 103 and the object category determination unit 104 determine the position, category, and detection reliability of the object in the frame image. calculate. From these pieces of information, the existence probability calculation unit 105 generates an existence probability map in the frame image being processed for each category. Further, the existence probability map in a certain time span is obtained by totaling the time series data of the existence probability map stored in the existence probability accumulation unit. Thereby, the possibility that a specific category object appears in a specific place in the video can be obtained.

以下では、存在確率マップを利用した物体検出の高速化について述べる。 In the following, speeding up of object detection using an existence probability map will be described.

図４〜図６で述べたように、画像中から複数カテゴリの物体を検出する場合、全ての候補領域に対して、全てのテンプレートとの照合処理を行う必要があり、処理負荷が非常に大きい。これは機械学習によるカテゴリ毎の識別器を用いて判別を行う場合でも同様である。 As described with reference to FIGS. 4 to 6, when detecting an object of a plurality of categories from an image, it is necessary to perform matching processing with all templates for all candidate regions, and the processing load is very large. . The same applies to the case where discrimination is performed using a discriminator for each category by machine learning.

そこで、本発明では、存在確率マップを利用し、対象カテゴリが現れにくい領域については認識処理を省略することにより、物体検出の高速化を実現する。以下、図７と図８を用いて、処理手順を説明する。 Therefore, in the present invention, the object detection speed is increased by using the existence probability map and omitting the recognition process for the region where the target category is difficult to appear. Hereinafter, a processing procedure will be described with reference to FIGS. 7 and 8.

図７は、存在確率マップを用いた物体検出の高速化を説明するための図である。 FIG. 7 is a diagram for explaining speeding up of object detection using the existence probability map.

図７の例においては、あらかじめ映像解析装置１００を用いて、「車」カテゴリの存在確率マップ７０２と「人」カテゴリの存在確率マップ７０４が生成済みであるとする。 In the example of FIG. 7, it is assumed that the existence analysis map 702 for the “car” category and the existence probability map 704 for the “people” category have been generated using the video analysis apparatus 100 in advance.

フレーム画像７０１が入力されると、「車」、「人」それぞれの存在確率マップをチェックし、存在確率の高い領域（７０３、７０５の太枠の内側）のみを、抽出手段によって抽出し、検出処理対象とする。また、「車」カテゴリの存在確率が高い領域については、「車」カテゴリのテンプレート７０７だけを使って、信頼度を求める。同様に「人」カテゴリの存在確率の高い領域については、「人」カテゴリのテンプレート７０８だけを使って、信頼度を求める。 When the frame image 701 is input, the existence probability maps of “car” and “person” are checked, and only an area with a high existence probability (inside the thick frames of 703 and 705) is extracted and detected. It becomes a processing target. Further, for an area where the existence probability of the “car” category is high, the reliability is obtained using only the template 707 of the “car” category. Similarly, for an area where the existence probability of the “person” category is high, the reliability is obtained using only the template 708 of the “person” category.

機械学習による識別器を用いる場合も、判別に使用する識別器の数を減らす事ができるため、同様に処理を効率化することができる。 In the case of using classifiers by machine learning, the number of classifiers used for discrimination can be reduced, so that the processing can be improved in the same manner.

このように、候補領域の削減と信頼度計算の簡略化という２段階の効率化によって、検出処理全体が高速化される。検出対象のカテゴリが１種類だけの場合でも、前段の候補領域の削減によって高速な検出が可能である。 As described above, the entire detection process is speeded up by the two-stage efficiency of reduction of candidate areas and simplification of reliability calculation. Even when there is only one type of category to be detected, high-speed detection is possible by reducing the previous candidate area.

図８は、存在確率マップを用いた物体検出の高速化の処理手順を表すフローチャートである。図８は、図５の複数カテゴリの物体検出処理を高速化するものである。 FIG. 8 is a flowchart showing a processing procedure for speeding up object detection using an existence probability map. FIG. 8 speeds up the object detection processing of the plurality of categories in FIG.

Ｓ８０１〜Ｓ８０８は、カテゴリ毎の存在確率マップに対する処理である。 S801 to S808 are processes for the existence probability map for each category.

Ｓ８０２は、図５のステップＳ５０１と同様に、候補領域を抽出する処理であるが、存在確率の低い領域は無視される。ここで、存在確率の高い領域か低い領域かは、予め定められた閾値か、あるいはユーザが設定した値を基準とする。 S802 is a process of extracting candidate areas, similar to step S501 in FIG. 5, but areas with low existence probabilities are ignored. Here, the region having a high existence probability or the region having a low probability is based on a predetermined threshold value or a value set by the user.

Ｓ８０３〜Ｓ８０７は、Ｓ８０２で抽出された全ての候補領域に対する処理である。図５からの変化分はＳ８０４の信頼度算出に関する処理である。 S803 to S807 are processes for all candidate regions extracted in S802. The change from FIG. 5 is processing related to the reliability calculation in S804.

Ｓ８０４では、対象カテゴリについての候補領域の信頼度を算出する。図５のＳ５０３が「物体らしさ」を求めるのに対して、Ｓ８０４では「特定のカテゴリの物体らしさ」を求める。図７の説明で述べたように、カテゴリを限定することによって、例えば、比較対象となるテンプレート数を削減できるため、判別処理を効率化することができる。 In S804, the reliability of the candidate area for the target category is calculated. S503 in FIG. 5 obtains “object-likeness”, whereas in S804, “object-likeness of a specific category” is obtained. As described with reference to FIG. 7, by limiting the categories, for example, the number of templates to be compared can be reduced, so that the discrimination process can be made more efficient.

Ｓ８０５、Ｓ８０６、Ｓ８０９については、図５と同様の処理である。 Steps S805, S806, and S809 are the same as those in FIG.

以上、存在確率を用いた物体検出の高速化について説明した。過去に蓄積した存在確率マップを利用することで、新規に入力されたフレーム画像に対する画像処理の処理数を削減することができ、複数カテゴリの物体を効率的に検出することができる。 The speeding up of object detection using the existence probability has been described above. By using the existence probability map accumulated in the past, it is possible to reduce the number of image processes for newly input frame images, and to efficiently detect objects in a plurality of categories.

存在確率マップの別の利用方法として、以下では図９と図１０を用いて異常シーン検知について述べる。実施例２では、存在確率の高い領域を利用していたが、ここでは逆に存在確率の低い領域を活用する。 As another method of using the existence probability map, abnormal scene detection will be described below with reference to FIGS. 9 and 10. In the second embodiment, an area having a high existence probability is used. However, an area having a low existence probability is used here.

図９は、存在確率マップを用いた異常シーン検知の説明のための図である。 FIG. 9 is a diagram for explaining the abnormal scene detection using the existence probability map.

図９の例において、「人」の存在確率マップ９０５が事前に生成済みであるとする。 In the example of FIG. 9, it is assumed that the “person” existence probability map 905 has been generated in advance.

フレーム画像９０１が入力されると、動的物体領域検出部１０３によって物体検出結果９０２のように、「人」カテゴリの物体９０３と９０４が検出される。ここで、抽出手段によって、存在確率マップ９０５において存在確率が閾値以下である領域を抽出しておく。次に、検出結果領域の存在確率マップ９０５上での領域９０６、９０７を求める。この例では、領域９０６は、存在確率の高い領域で検出されているが、領域９０７は存在確率の低い領域での検出結果となっている。ここで、存在確率の高い領域か低い領域かは、予め定められた閾値か、あるいはユーザが設定した値を基準とする。このような検出結果は、「滅多に起こらない事象」であり、異常シーンとして捉えることができる。実施例３では、異常シーンを検知すると、例えば、表示装置１１０に警告メッセージを表示する。 When the frame image 901 is input, the objects 903 and 904 in the “person” category are detected by the dynamic object region detection unit 103 as in the object detection result 902. Here, a region where the existence probability is less than or equal to the threshold value is extracted from the existence probability map 905 by the extraction means. Next, areas 906 and 907 on the detection result area existence probability map 905 are obtained. In this example, the region 906 is detected in a region having a high existence probability, but the region 907 is a detection result in a region having a low existence probability. Here, the region having a high existence probability or the region having a low probability is based on a predetermined threshold value or a value set by the user. Such a detection result is an “event that rarely occurs” and can be regarded as an abnormal scene. In the third embodiment, when an abnormal scene is detected, for example, a warning message is displayed on the display device 110.

図１０は、存在確率マップを用いた異常シーン検出の処理手順を説明するためのフローチャートである。 FIG. 10 is a flowchart for explaining an abnormal scene detection processing procedure using an existence probability map.

ステップＳ１００１では、映像入力部１０２によって、入力映像からフレーム画像を取得する。 In step S 1001, the video input unit 102 acquires a frame image from the input video.

ステップＳ１００２では、動的物体領域検出部１０３によって、ステップＳ１００１で取得したフレーム画像中から、物体の領域を特定する。 In step S1002, the dynamic object region detection unit 103 identifies an object region from the frame image acquired in step S1001.

ステップＳ１００３からステップＳ１００８は、ステップＳ１００２で検出された全ての物体領域に対する処理である。 Steps S1003 to S1008 are processing for all object regions detected in step S1002.

ステップＳ１００４では、物体カテゴリ判別部１０４によって、検出された物体のカテゴリのカテゴリを判別する。 In step S1004, the object category determination unit 104 determines the category of the detected object category.

ステップＳ１００５では、ステップＳ１００４で判別したカテゴリの存在確率マップを、存在確率蓄積部１０６から読み出し、検出された領域の存在確率を求める。 In step S1005, the existence probability map of the category determined in step S1004 is read from the existence probability accumulation unit 106, and the existence probability of the detected area is obtained.

ステップＳ１００６では、ステップＳ１００５で計算した、検出された領域における物体の存在確率が閾値以下である場合は、ステップＳ１００７に移動し、そうでなければステップＳ１００７をスキップする。ステップＳ１００６の判定処理は、以上のように、単純に領域の存在確率だけを用いてもよいし、それに加えてステップＳ１００２で得られる検出領域の信頼度を利用することもできる。例えば、領域の存在確率をｑとし、信頼度から求めた存在確率をｐとすると、ｐとｑの確率差の尺度Ｋ＝ｐ×Ｌｏｇ（ｐ／ｑ）＋（１−ｐ）×Ｌｏｇ（（１−ｐ）／（１−ｑ））を用いても良い。Ｋは、ｐとｑが異なる程、大きな値になるため、Ｋの値が大きいほど異常なシーンであると捉えることができる。 In step S1006, if the existence probability of the object in the detected area calculated in step S1005 is equal to or less than the threshold value, the process moves to step S1007, and if not, step S1007 is skipped. As described above, the determination processing in step S1006 may simply use only the existence probability of the region, or may use the reliability of the detection region obtained in step S1002. For example, if the existence probability of the region is q and the existence probability obtained from the reliability is p, a measure of the probability difference between p and q is K = p × Log (p / q) + (1−p) × Log (( 1-p) / (1-q)) may be used. Since K becomes larger as p and q differ, it can be understood that the larger the value of K, the more abnormal the scene.

ステップＳ１００７では、異常シーンを検知したことをユーザに伝えるために、表示装置１１０に警告メッセージを表示する。 In step S1007, a warning message is displayed on the display device 110 to inform the user that an abnormal scene has been detected.

ステップＳ１００９では、映像に次のフレームが存在すれば、ステップＳ１００１に戻り処理を続行し、そうでなければ本処理を終了する。 In step S1009, if there is a next frame in the video, the process returns to step S1001 to continue the process. Otherwise, the process ends.

以上、存在確率マップを利用した異常シーン検知について説明した。存在確率マップを用いることで、システムは入力映像から自動的に異常シーンを発見し、ユーザに警告を出すことができる。 The abnormal scene detection using the existence probability map has been described above. By using the existence probability map, the system can automatically find an abnormal scene from the input video and warn the user.

実施例２と実施例３は、存在確率マップをリアルタイムに利用した方法である。以下では、過去の蓄積映像に対して存在確率マップから得られる情報を関連付けて保存しておくことで、類似シーン検索を実現する方法について説明する。 The second and third embodiments are methods using the existence probability map in real time. Hereinafter, a method of realizing a similar scene search by associating and storing information obtained from the existence probability map with respect to past accumulated video will be described.

図１１は、存在確率マップを用いた類似シーン検索を説明する図である。 FIG. 11 is a diagram for explaining similar scene search using an existence probability map.

類似シーン検索を行うためは、映像の特徴を表す数値データ（映像特徴量）を算出し、映像データベース１１０９に保存しておく必要がある。 In order to perform a similar scene search, it is necessary to calculate numerical data (video feature amount) representing video features and store it in the video database 1109.

図１１の例では、まず、蓄積映像１１０１に対して、実施例１の映像解析システム１００を用いて存在確率マップ１１０３、１１０４を導出する。 In the example of FIG. 11, first, existence probability maps 1103 and 1104 are derived for the accumulated video 1101 using the video analysis system 100 of the first embodiment.

次に、存在確率マップを検索に利用可能な特徴量に変換する。特徴量は、例えば、存在確率マップ１１０３、１１０４をそれぞれ１１０６、１１０７のように格子状に分割し、各格子における存在確率の平均値を求め、それらを連結してベクトル化することで得られる。 Next, the existence probability map is converted into a feature quantity usable for search. The feature amount can be obtained, for example, by dividing the existence probability maps 1103 and 1104 into a grid pattern such as 1106 and 1107, obtaining an average value of existence probabilities in each grid, and connecting them to vectorize them.

図１１の例は、「車」「人」という２種類のカテゴリの存在確率マップを用いて特徴量を計算しているが、カテゴリが１種類の時や２種類以上の場合も、各カテゴリに対して同様の処理を適用することで、特徴量ベクトルを得ることができる。 In the example of FIG. 11, the feature amount is calculated using the existence probability map of two types of categories “car” and “person”, but each category includes one category or two or more categories. A feature vector can be obtained by applying the same processing to the above.

得られた特徴量は、映像データを関連付けて、映像データベース１１０９に登録する。データベースの構造については、後述の図１２で改めて説明する。 The obtained feature quantity is registered in the video database 1109 in association with the video data. The structure of the database will be described again with reference to FIG.

検索を行う際は、クエリ映像１１１０にたいして、同様に存在確率マップの導出、存在確率マップの特徴量化を行う。 When the search is performed, the existence probability map is similarly derived and the existence probability map is characterized for the query video 1110.

この結果、得られた特徴量を、データベースに保存された映像の特徴量と比較し、ベクトル間距離が小さい順に出力する。 As a result, the obtained feature quantity is compared with the feature quantity of the video stored in the database, and output in ascending order of the vector distance.

図１２は、映像データベース１１０９の構成とデータ例を示す図である。ここではテーブル形式の構成例を示すが、データ形式は任意で良い。 FIG. 12 is a diagram illustrating a configuration of the video database 1109 and a data example. Here, a configuration example of the table format is shown, but the data format may be arbitrary.

データベースは、基本項目として、映像ＩＤフィールド１２０１、映像データフィールド１２０２、映像特徴量フィールド１２０３を有する。必要に応じて、他の書誌情報（映像の撮影場所、日時など）を追加しても良い。 The database includes a video ID field 1201, a video data field 1202, and a video feature amount field 1203 as basic items. If necessary, other bibliographic information (video shooting location, date and time) may be added.

映像ＩＤフィールド１２０１は、各映像の識別番号を保持する。映像データフィールド１２０２は、ユーザが検索結果の確認する際に再生される動画像データを保持する。映像特徴量フィールド１２０３は、存在確率マップから算出した映像特徴量を保持する。映像特徴量は、例えば、図１１のように、各カテゴリの存在確率マップを格子状に分割し、各領域の存在確率の平均値を数値化したものであり、固定長のベクトルデータで表される。 The video ID field 1201 holds the identification number of each video. The video data field 1202 holds moving image data that is reproduced when the user confirms the search result. The video feature value field 1203 holds the video feature value calculated from the existence probability map. For example, as shown in FIG. 11, the image feature amount is obtained by dividing the existence probability map of each category into a grid pattern, and quantifying the average value of the existence probabilities of each region, and is represented by fixed-length vector data. The

図１３は、類似シーンの検索を可能とする映像解析装置１００の全体構成図である。図１３は、図１の構成に映像データベース１１０９を接続したシステム構成であり、計算機１１１には特徴量算出部１３０１と検索結果出力部１３０２が追加される。 FIG. 13 is an overall configuration diagram of the video analysis apparatus 100 that enables retrieval of similar scenes. FIG. 13 shows a system configuration in which a video database 1109 is connected to the configuration of FIG. 1, and a feature amount calculation unit 1301 and a search result output unit 1302 are added to the computer 111.

特徴量算出部１３０１は、存在確率算出部１０５で算出された存在確率マップから、映像特徴量を計算する。映像特徴量は、図１２で説明したように、固定長のベクトルデータとして出力される。 The feature amount calculation unit 1301 calculates a video feature amount from the presence probability map calculated by the presence probability calculation unit 105. The video feature amount is output as fixed-length vector data as described with reference to FIG.

映像データベース１１０９は、映像登録時には、特徴量算出部１３０１で計算された映像特徴量と映像入力部１０２で入力された映像データを関連付けて格納する。また、類似シーン検索時には、特徴量算出部１３０１で抽出されたクエリ映像の特徴量とベクトル間距離の近いデータを探し、距離の小さい順に出力する。 At the time of video registration, the video database 1109 stores the video feature amount calculated by the feature amount calculation unit 1301 and the video data input by the video input unit 102 in association with each other. Further, when searching for similar scenes, the search is performed for data that is close to the feature amount of the query video extracted by the feature amount calculation unit 1301 and the vector distance, and is output in ascending order of distance.

検索結果出力部１３０２は、映像データベース１１０９から得られた検索結果を表示装置１１０に出力する。 The search result output unit 1302 outputs the search result obtained from the video database 1109 to the display device 110.

図１４は、類似シーン検索のシーケンス図である。図１４の１４０１は、映像登録時のデータの流れであり、１４０２は、類似シーン検索時のデータの流れである。 FIG. 14 is a sequence diagram of similar scene search. 1401 in FIG. 14 is a data flow at the time of video registration, and 1402 is a data flow at the time of similar scene search.

映像登録時１４０１では、まず、映像入力装置１０１から入力された映像１４０３が計算機１１１に送られる。 At the time of video registration 1401, first, the video 1403 input from the video input device 101 is sent to the computer 111.

計算機１１１では、図３の処理フローに従い、存在確率マップの生成１４０４が行われる。続いて、特徴量算出部１３０１において、存在確率マップの特徴量化１４０５が行われる。最後に、特徴量と映像データの組１４０６を映像データベース１１０９に送る。 The computer 111 generates an existence probability map 1404 in accordance with the processing flow of FIG. Subsequently, the feature amount calculation unit 1301 performs feature amount conversion 1405 of the existence probability map. Finally, the feature quantity / video data set 1406 is sent to the video database 1109.

映像データベース１１０９は、計算機１１１から送られてきた特徴量と映像データの組１４０６を関連付けて登録する（１４０７）。 The video database 1109 registers the feature amount sent from the computer 111 and the video data set 1406 in association with each other (1407).

類似シーン検索時１４０２では、ユーザが操作情報入力部１０７から検索要求１４０９を計算機１１１に対して発行する。この時、映像入力装置１０１からクエリ映像１４０８が計算機１１１に送られる。 At the time of similar scene search 1402, the user issues a search request 1409 to the computer 111 from the operation information input unit 107. At this time, a query video 1408 is sent from the video input device 101 to the computer 111.

計算機１１１では、登録時と同様に、存在確率マップの生成１４１０、特徴量化１４１１を行い、クエリ特徴量としてデータベース１１０９に送信する。 In the computer 111, as in the registration, the existence probability map is generated 1410 and the feature value 1411 is transmitted to the database 1109 as the query feature value.

データベース１１０９は、クエリ特徴量とベクトル間距離の近いデータを検索し（１４１３）、検索結果の類似映像リスト１４１４を計算機１１１に返す。 The database 1109 searches for data having a close query feature and vector distance (1413), and returns a similar video list 1414 as a search result to the computer 111.

計算機１１１では、類似映像リストを整形して表示データを構成し、表示装置１１０に送信する。 The computer 111 shapes the similar video list to form display data, and transmits the display data to the display device 110.

表示装置１１０は、表示データ１４１５を検索結果としてユーザに提示する（１４１６）。 The display device 110 presents the display data 1415 to the user as a search result (1416).

以上、存在確率マップを利用した類似シーン検索について説明した。 The similar scene search using the existence probability map has been described above.

映像の「場」における物体の存在確率に着目した映像特徴量を用いることで、見た目の特徴量を用いただけでは得られない検索結果を取得する事ができ、多様なアプリケーションを実現可能である。 By using video feature quantities that focus on the existence probability of objects in the “field” of the video, it is possible to obtain search results that cannot be obtained simply by using the apparent feature quantities, and various applications can be realized.

例えば、路上カメラ映像において、交通事故の発生現場の映像データをクエリとすることで、人物や車の数に着目して、同様に事故の起きやすい場所や時間帯をさがすことができる。 For example, in a road camera video, by using video data of a traffic accident occurrence site as a query, it is possible to look for places and time zones where accidents are likely to occur, focusing on the number of people and cars.

また、マーケティングへの応用として、店舗内の監視映像の検出対象を性別や年齢などに応じて詳細なカテゴリに分けることで、人流が類似する場所やレイアウトを発見することができる。 In addition, as an application to marketing, it is possible to find places and layouts that are similar in human flow by dividing the detection target of surveillance video in a store into detailed categories according to gender, age, and the like.

例えば、図１５は本発明の類似シーン検索をマーケティングへ応用した際の表示画面例を表す図である。表示画面は、クエリ映像１５０１、検索条件入力フォーム１５０２、検索ボタン１５０３、検索結果表示領域１５０４から構成され、表示装置１１０に表示される。 For example, FIG. 15 is a diagram showing a display screen example when the similar scene search of the present invention is applied to marketing. The display screen includes a query video 1501, a search condition input form 1502, a search button 1503, and a search result display area 1504, and is displayed on the display device 110.

この例では、映像データベースには、映像データと特徴量の他に、撮影日時や販促活動の実施状況、販売数の推移などが保存されている。販促活動とは、例えば、商品に関するアナウンスや、デジタルサイネージへの商品情報や売り場情報の表示などである。 In this example, in the video database, in addition to video data and feature quantities, shooting date and time, implementation status of sales promotion activities, changes in the number of sales, and the like are stored. The sales promotion activities include, for example, announcements regarding products, display of product information and sales floor information on digital signage, and the like.

ユーザは、入力装置１０１を用いて、クエリ映像１５０１と追加の検索条件（日時の範囲など）を指定し、検索ボタン１５０３をクリックする。映像解析装置１００は、クエリ映像から存在確率マップを生成し、それを元に計算した特徴量によって、データベースから類似シーンを検索する。またこの時、追加条件によって検索結果の絞り込みを行う。得られた検索結果を、映像と関連付けられた情報と合わせて表示装置１１０に送ることで、検索結果表示領域１５０４に類似店舗の情報が表示される。この結果を用いて、ユーザが効果的な販促活動を実施してもよいし、販売数の推移が好ましい店舗と同じ販促活動を自動的に実施してもよい。例えば、デジタルサイネージへの広告表示などは自動化が比較的容易であると考えられる。 The user uses the input device 101 to specify a query video 1501 and additional search conditions (such as a date / time range) and clicks a search button 1503. The video analysis device 100 generates a presence probability map from the query video, and searches for a similar scene from the database based on the feature amount calculated based on the map. At this time, the search result is narrowed down according to the additional condition. Information on similar stores is displayed in the search result display area 1504 by sending the obtained search result to the display device 110 together with information associated with the video. Using this result, the user may perform an effective sales promotion activity, or may automatically execute the same sales promotion activity as that of a store where the number of sales is favorable. For example, advertisement display on digital signage is considered to be relatively easy to automate.

以上の説明では、固定カメラを前提として処理フローを述べてきたが、図３の説明においても触れたとおり、映像の座標と存在確率マップの座標との対応がとれていれば、広域の存在確率マップを生成することができる。以下では、広域の存在確率マップの生成方法と、その活用例について述べる。 In the above description, the processing flow has been described on the premise of a fixed camera. As mentioned in the description of FIG. 3, as long as the correspondence between the coordinates of the video and the coordinates of the existence probability map can be taken, A map can be generated. The following describes a method for generating a wide-area existence probability map and an example of its use.

図１６は、パン、チルト、ズーム（ＰＴＺ）が可能なカメラによって撮影された広域映像から広域の存在確率マップを生成する方法を説明する図である。 FIG. 16 is a diagram for explaining a method of generating a wide area existence probability map from a wide area video imaged by a camera capable of panning, tilting, and zooming (PTZ).

映像入力装置１０１のカメラは、ＰＴＺを制御することで１６０１の範囲の映像を撮影可能である。ただし、最大にズームアウトした状態でも一度に１６０２の範囲しか撮影することができない。 The camera of the video input device 101 can shoot video in the range of 1601 by controlling the PTZ. However, only the range of 1602 can be shot at a time even when zoomed out to the maximum.

そこで、本発明の映像解析装置においては、カメラＰＴＺを制御し、広域映像を複数の部分領域映像（１６０２、１６０３、１６０４など）に分割し、各部分領域映像に対して一定時間映像を撮りため、存在確率マップを生成する。この結果得られた部分領域の存在確率マップを結合することで、広域の存在確率マップ１６０８を得る。 Therefore, in the video analysis apparatus of the present invention, the camera PTZ is controlled to divide a wide area video into a plurality of partial area videos (1602, 1603, 1604, etc.) and to take a video for a certain period of time for each partial area video. Generate an existence probability map. By combining the existence probability maps of the partial regions obtained as a result, a wide-area existence probability map 1608 is obtained.

図１７は、広域の存在確率マップの活用例として、カメラのPTZの自動制御を表した図である。 FIG. 17 is a diagram illustrating automatic control of the PTZ of the camera as an example of utilizing the wide area existence probability map.

上記の広域存在確率マップを、時間帯毎に集計することで、その時間に限定した存在確率を求める。１７０１、１７０２、１７０３は、それぞれ６時、１２時、１８時の時間帯に撮影した映像から得られた存在確率マップである。広域を撮影する場合には、時間帯毎に動的物体が映る可能性の高い領域が変わるため、その時間帯にあったＰＴＺ制御をすることで、有用な映像を重点的に記録することができる。図の例では、６時にはＡ、Ｂ、Ｃ、Ｄの領域に動的物体が多く存在するため、遷移図１７０４のようにＡ、Ｂ、Ｃ、Ｄを撮影するようにＰＴＺ制御のスケジューリングを行う。同様に、１２時には遷移図１７０５のようにＢ、Ｃ、Ｄ、Ｆ、Ｇ、Ｈを、１８時には遷移図１７０６のようにＦ、Ｇ、Ｈを優先的に撮影する。 The above-mentioned wide-area existence probability map is totaled for each time zone to obtain an existence probability limited to that time. Reference numerals 1701, 1702, and 1703 are existence probability maps obtained from videos taken at 6 o'clock, 12 o'clock, and 18 o'clock respectively. When shooting a wide area, the area where the dynamic object is highly likely to change is changed for each time zone. Therefore, PTZ control suitable for the time zone can be used to record useful images. it can. In the example in the figure, there are many dynamic objects in the A, B, C, and D areas at 6 o'clock, so PTZ control scheduling is performed so that A, B, C, and D are captured as shown in the transition diagram 1704. . Similarly, B, C, D, F, G, and H are preferentially photographed at 12 o'clock as in transition diagram 1705, and F, G, and H are preferentially photographed at 18:00 as in transition diagram 1706.

１００：映像解析装置、１０１：映像入力装置、１０２：映像入力部、１０３：動的物体領域検出部、１０４：物体カテゴリ判別部、１０５：存在確率算出部、１０６：存在確率蓄積部、１０７：操作情報入力装置、１０８：操作情報入力部、１０９：存在確率出力部、１１０：表示装置、１１１：計算機 100: Video analysis device, 101: Video input device, 102: Video input unit, 103: Dynamic object region detection unit, 104: Object category determination unit, 105: Existence probability calculation unit, 106: Existence probability accumulation unit, 107: Operation information input device 108: Operation information input unit 109: Existence probability output unit 110: Display device 111: Computer

Claims

A dynamic object region detection unit for detecting a region where a dynamic object exists from a frame image of the input video;
An object category discriminating unit for discriminating a category of an object detected by the dynamic object region detecting unit;
For each category determined by the object category determination unit, an existence probability calculation unit that generates an existence probability map representing a probability that an object appears in the detected dynamic object region;
An existence probability accumulation unit for storing the existence probability map as time series data;
The existence probability calculation unit generates an existence probability map for each category in a certain time span from the existence probability map of the time-series data stored in the existence probability storage unit,
And a presence probability output unit that outputs a presence probability map for each category in the fixed time span.

Furthermore, from the existence probability map for each category in the fixed time span, there is an extraction means for extracting an area having an existence probability equal to or higher than a threshold value,
The video analysis apparatus according to claim 1, wherein the dynamic object detection unit detects the dynamic object in a region where the existence probability is a threshold value or more.

Furthermore, from the existence probability map for each category in the fixed time span, having an extraction means for extracting an area having an existence probability equal to or less than a threshold value,
The video analysis apparatus according to claim 1, wherein the dynamic object detection unit detects an abnormal scene when the dynamic object is detected in an area where the existence probability is equal to or less than a threshold value.

The video analysis apparatus according to claim 1, wherein the input video is a wide-area video shot by a camera capable of panning, tilting, and zooming.

The existence probability map is generated for each time zone,
5. The video analysis apparatus according to claim 4, further comprising control means for controlling pan, tilt, and zoom of the camera so that an area having a high existence probability is photographed for each time period.

A dynamic object region detection unit for detecting a region where a dynamic object exists from a frame image of the input video;
An object category discriminating unit for discriminating a category of an object detected by the dynamic object region detecting unit;
For each category determined by the object category determination unit, an existence probability calculation unit that generates an existence probability map representing a probability that an object appears in the detected dynamic object region;
A feature amount calculation unit for calculating a video feature amount from the existence probability map;
A video database that stores the video feature quantity and the input video in association with each other;
When a search request for a scene similar to a query video is received, a video having a feature quantity similar to the video feature quantity of the query video calculated by the feature quantity calculation unit is searched from the video database and output as a similar video A video analysis apparatus characterized by:

A video input device;
A video input unit for extracting a frame image from the video input from the video input device;
A dynamic object region detection unit for detecting a region where a dynamic object exists from the frame image;
An object category discriminating unit for discriminating a category of an object detected by the dynamic object region detecting unit;
For each category determined by the object category determination unit, an existence probability calculation unit that generates an existence probability map representing a probability that an object appears in the detected dynamic object region;
An existence probability storage unit for storing the existence probability map as time series data;
The existence probability calculation unit generates an existence probability map for each category in a certain time span from the existence probability map of the time series data stored in the existence probability accumulation unit,
An existence probability output unit that outputs an existence probability map for each category in the fixed time span;
A video analysis system comprising: a display device that displays the output existence probability map.