JP2022133547A

JP2022133547A - Video image analysis system and video image analysis method

Info

Publication number: JP2022133547A
Application number: JP2021032281A
Authority: JP
Inventors: 健一森田; Kenichi Morita; 良起伊藤; Yoshiki Ito; 敦廣池; Atsushi Hiroike
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2021-03-02
Filing date: 2021-03-02
Publication date: 2022-09-14
Also published as: WO2022185569A1

Abstract

To pay an attention to a continuity of a feature amount extracted in each processing to adjust a frame rate in each processing.SOLUTION: A video image analysis system is a video image analysis system comprising: a biology attribute estimation part that extracts an attribute of a biology in a video image; an object recognition part that extracts the attribute of an object in the video image; a motion recognition part that recognizes a motion of the biology in the video image; and a frame rate adjustment part that controls an execution timing of each part. The frame rate adjustment part is characterized so that a frame rate of a video image processed by the motion recognition part is set to be higher than that of the frame rate of the video image processed by at least one of the biology attribute estimation part and the object recognition part.SELECTED DRAWING: Figure 1

Description

本発明は、映像解析技術に関する。 The present invention relates to video analysis technology.

防犯カメラの普及に伴い、複数地点で撮影された映像から特定の人物や車両を検出するニーズが高まっている。しかし、従来の防犯カメラシステムは、記録装置に蓄積された膨大なデータから所望のシーンの検索が困難である。 With the spread of security cameras, there is an increasing need to detect specific people or vehicles from images taken at multiple locations. However, with conventional security camera systems, it is difficult to retrieve a desired scene from the vast amount of data stored in the recording device.

映像解析機能は、リアルタイム検知機能と履歴検索機能に大別される。リアルタイム検知機能は、監視映像中に特定の対象（人物・物体・動物）や特定の動きの出現を検知し、ユーザに通知する。履歴検索機能は、特定の対象（人物、物体、動物など）の外見又は動きをクエリとして、過去の映像データの特徴量データベースを検索し、検索対象が映る映像を抽出する。 The video analysis function is roughly divided into a real-time detection function and a history search function. The real-time detection function detects the appearance of a specific target (person, object, animal) or specific movement in the surveillance video and notifies the user. The history search function uses the appearance or movement of a specific target (person, object, animal, etc.) as a query to search a feature amount database of past video data, and extracts videos in which the search target appears.

本技術分野の背景技術として、特開２００１－１６７０９５号公報（特許文献１）及び国際公開第２０１７／０１７８０８号（特許文献２）がある。特開２００１－１６７０９５号公報には、入力画像データから画像特徴量を抽出し特徴記述子を生成する特徴記述子生成部と、生成された特徴記述子を入力画像データと対応づけて蓄積する画像情報蓄積部と、入力画像データに付随して入力する属性情報に基づいて属性リストを生成する属性リスト生成部と、属性情報に関する検索条件が入力すると前記属性リストを検索して当該検索条件に適合する属性情報を出力すると共に、特徴記述子に関する検索条件が入力すると前記画像情報蓄積部を検索して当該検索条件に適合する画像データを出力する画像検索部と、を備えることを特徴とする画像検索システムが記載されている（請求項１参照）。 As background art in this technical field, there are Japanese Patent Application Laid-Open No. 2001-167095 (Patent Document 1) and International Publication No. 2017/017808 (Patent Document 2). Japanese Unexamined Patent Application Publication No. 2001-167095 discloses a feature descriptor generation unit that extracts image feature amounts from input image data and generates feature descriptors, and an image that stores the generated feature descriptors in association with input image data. an information storage unit, an attribute list generation unit that generates an attribute list based on attribute information input with input image data, and when search conditions regarding attribute information are input, the attribute list is searched to match the search conditions. an image retrieval unit that outputs attribute information for the image, and that, when a retrieval condition related to the feature descriptor is input, retrieves the image information storage unit and outputs image data that matches the retrieval condition. A search system is described (see claim 1).

また、国際公開第２０１７／０１７８０８号公報には、プロセッサと前記プロセッサが実行するプログラムを格納する記憶装置とを含む、画像処理システムであって、前記プロセッサは、映像データから複数フレームを作成し、前記複数フレームにおいて移動物体を検出し、検出した前記移動物体それぞれの軌跡の特徴量を前記複数フレームから抽出してデータベースに記録し、前記複数フレームのそれぞれにおいて、移動物体の画像から特徴量を抽出して前記データベースに記録することを含む特徴登録処理、の内容を、予め定められた条件に従って決定し、前記複数フレームのそれぞれにおいて、決定した前記特徴登録処理の内容を実行する、画像処理システムが記載されている（請求項１参照）。 Further, International Publication No. 2017/017808 discloses an image processing system including a processor and a storage device that stores a program executed by the processor, wherein the processor creates a plurality of frames from video data, A moving object is detected in the plurality of frames, a feature amount of a trajectory of each of the detected moving objects is extracted from the plurality of frames and recorded in a database, and in each of the plurality of frames, the feature amount is extracted from the image of the moving object. and recording in the database according to predetermined conditions, and executing the determined content of the feature registration processing in each of the plurality of frames. (see claim 1).

特開２００１－１６７０９５号公報Japanese Patent Application Laid-Open No. 2001-167095 国際公開第２０１７／０１７８０８号WO2017/017808

映像解析機能のうち行動認識機能は、短時間での変化が小さい属性認識機能より、短い時間間隔で人物の動きを捉える必要がある。このため、行動認識機能は、大きな計算機リソースを使用し、計算コストが大きい。計算リソースが不十分な環境下では、処理の停滞によってリアルタイムの行動認識処理が困難となり、リアルタイムの行動検知や特徴量データベースの逐次更新が不可能となる。従って、履歴検索によって不特定の対象を準リアルタイムに特定することが不可能となり、履歴検索で特定した対象の情報を活用したリアルタイム検知も不可能となる。また、サーバ計算機あたりに取り扱える映像（すなわち、カメラの台数）が制限される。このように、行動認識機能は属性認識機能と取り扱いを変えることが望まれている。 Among the video analysis functions, the action recognition function needs to capture the movement of a person at shorter time intervals than the attribute recognition function, which has small changes in a short period of time. For this reason, the action recognition function uses large computer resources and has a high computational cost. In an environment with insufficient computing resources, real-time action recognition processing becomes difficult due to processing stagnation, and real-time action detection and sequential updating of the feature amount database become impossible. Therefore, it becomes impossible to specify an unspecified target in quasi-real time by history search, and real-time detection using the information of the target specified by history search is also impossible. Also, the number of videos (that is, the number of cameras) that can be handled by each server computer is limited. Thus, it is desired that the action recognition function and the attribute recognition function be handled differently.

また、映像データベースの履歴検索で目撃情報を照会して不審者を発見することと、履歴検索機能とリアルタイム検知機能との連携強化による不審者の追跡に基づく不審者の確保と二次被害の防止が重要である。目撃情報に基づいてターゲットを発見するためには、履歴検索により特定したターゲットの情報を活用した高精度のリアルタイム検知が望まれている。 In addition, it is possible to identify suspicious persons by querying eyewitness information in the video database history search, and to secure suspicious persons and prevent secondary damage based on the tracking of suspicious persons by strengthening cooperation between the history search function and the real-time detection function. is important. In order to discover targets based on eyewitness information, high-precision real-time detection using target information identified by historical searches is desired.

そこで、本発明では、各処理で抽出される特徴量の継続性に着目して、処理によってフレームレートを調整する映像解析技術の提供を目的とする。 Therefore, an object of the present invention is to provide a video analysis technique that adjusts the frame rate by processing, focusing on the continuity of feature amounts extracted in each processing.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、映像解析システムであって、所定の処理を実行する演算装置と、前記演算装置に接続された記憶デバイスとを有する計算機によって構成され、前記演算装置は、映像中のオブジェクトの特徴量データを格納する特徴量データベースにアクセス可能であって、前記映像解析システムは、前記演算装置が、前記映像中の生物の属性を抽出する生物属性推定部と、前記演算装置が、前記映像中の物体の属性を抽出する物体認識部と、前記演算装置が、前記映像中の生物の動きを認識する動き認識部と、前記演算装置が、前記各部の実行タイミングを制御するためのフレームレート調整部とを有し、前記フレームレート調整部は、前記動き認識部が処理する映像のフレームレートを、前記生物属性推定部及び前記物体認識部の少なくとも一方が処理する映像のフレームレートより高く設定することを特徴とする。 A representative example of the invention disclosed in the present application is as follows. That is, the video analysis system is composed of a computer having an arithmetic device for executing predetermined processing and a storage device connected to the arithmetic device, and the arithmetic device analyzes the feature amount data of an object in the video. A stored feature amount database is accessible, and the video analysis system includes a biological attribute estimating unit in which the computing device extracts attributes of living things in the video; An object recognition unit for extracting attributes, a motion recognition unit for recognizing the movement of a creature in the image, and a frame rate adjustment unit for controlling the execution timing of each unit. wherein the frame rate adjustment unit sets the frame rate of the video processed by the motion recognition unit higher than the frame rate of the video processed by at least one of the biological attribute estimation unit and the object recognition unit. and

本発明の一態様によれば、映像解析に必要な計算機リソースを低減し、計算コストを低減できる。前述した以外の課題、構成及び効果は、以下の実施例の説明によって明らかにされる。 According to one aspect of the present invention, it is possible to reduce computer resources required for video analysis and reduce calculation costs. Problems, configurations, and effects other than those described above will be clarified by the following description of the embodiments.

本発明の実施例の映像解析システムの論理的な構成を示すブロック図である。1 is a block diagram showing the logical configuration of a video analysis system according to an embodiment of the present invention; FIG. 本実施例の映像解析システムの物理的な構成を示すブロック図である。1 is a block diagram showing the physical configuration of the video analysis system of this embodiment; FIG. 本実施例の特徴量データベースの構成例を示す図である。It is a figure which shows the structural example of the feature-value database of a present Example. 本実施例の特徴量ＤＢ構築処理のフローチャートである。6 is a flowchart of feature amount DB construction processing according to the embodiment; 本実施例の画像検索処理のフローチャートである。4 is a flowchart of image search processing according to the embodiment; 本実施例の画像検索処理のフローチャートである。4 is a flowchart of image search processing according to the embodiment; 本実施例の映像解析システムによる処理を示す図である。It is a figure which shows the process by the video-analysis system of a present Example. 本実施例の検索画面の一例を示す図である。It is a figure which shows an example of the search screen of a present Example.

図１は、本発明の実施例の映像解析システム１００の構成を示すブロック図である。 FIG. 1 is a block diagram showing the configuration of a video analysis system 100 according to an embodiment of the invention.

本実施例の映像解析システム１００は、映像取得部１１、骨格推定部１２、人物矩形抽出部１３、人物特徴抽出部１４、人物追跡部１５、時系列行動認識部１６、フレームレート調整部１７～１９、人物属性推定部２０、物体認識部２１、ＦＤＢ登録部２２、着目時間制御部２３、クエリ設定部２４、検索結果出力部２５、リアルタイム検知部２６及び検知ルールメモリ２７を有する。 The video analysis system 100 of the present embodiment includes a video acquisition unit 11, a skeleton estimation unit 12, a person rectangle extraction unit 13, a person feature extraction unit 14, a person tracking unit 15, a time-series action recognition unit 16, a frame rate adjustment unit 17- 19, a person attribute estimation unit 20, an object recognition unit 21, an FDB registration unit 22, a time-of-interest control unit 23, a query setting unit 24, a search result output unit 25, a real-time detection unit 26, and a detection rule memory 27.

映像解析システム１００は、ＦＤＢサーバ２００と接続されており、映像から抽出された特徴量をＦＤＢサーバ２００に登録し、ＦＤＢサーバ２００を検索して検索結果を取得する。 The video analysis system 100 is connected to the FDB server 200, registers the feature amount extracted from the video in the FDB server 200, searches the FDB server 200, and acquires the search result.

映像解析システム１００は、複数のカメラ３００と接続されており、カメラ３００から映像を取得する。 The image analysis system 100 is connected to a plurality of cameras 300 and acquires images from the cameras 300 .

映像取得部１１は、一つまたは複数のカメラ３００から映像を取得するインターフェースである。映像取得部１１は、着目時間制御部２３から出力されたフレームレート制御値に従って、カメラ３００にリクエストを送信し、所定のフレームレートの映像を取得する。また、映像取得部１１は、カメラ３００が撮影し、カメラ３００に設定されているフレームレートで配信する映像を受信し、受信した映像のフレームを間引いて、所定のフレームレートの映像を生成してもよい。映像取得部１１は、着目時間制御部２３から出力されるフレームレート制御値に従って、時系列行動認識部１６による行動認識に必要十分なフレームレートを調整して、所定のフレームレートの映像を出力する。例えば、カメラ３００が配信する映像が３０ｆｐｓである場合、例えば５ｆｐｓまで間引いてフレームレートを低くする。映像取得部１１は、カメラ３００から映像を直接取得せず、映像管理システム（Video Management System）やレコーダーを経由して映像を取得してもよい。映像取得部１１は、リアルタイムで撮影された映像ではなく、過去に撮影された映像をバッチ処理するために取得してもよい。なお、本明細書中に記載する映像は、連続するフレーム画像でもよい。映像取得部１１が取得したフレームには、フレームＩＤが付与される。 The image acquisition unit 11 is an interface that acquires images from one or more cameras 300 . The video acquisition unit 11 transmits a request to the camera 300 according to the frame rate control value output from the time-of-interest control unit 23, and acquires video at a predetermined frame rate. Further, the video acquisition unit 11 receives video captured by the camera 300 and distributed at a frame rate set in the camera 300, thins out frames of the received video, and generates video with a predetermined frame rate. good too. The video acquisition unit 11 adjusts the frame rate necessary and sufficient for action recognition by the time-series action recognition unit 16 according to the frame rate control value output from the time-of-interest control unit 23, and outputs video at a predetermined frame rate. . For example, when the video delivered by the camera 300 is 30 fps, the frame rate is reduced by thinning to 5 fps. The video acquisition unit 11 may acquire the video via a video management system or a recorder instead of directly acquiring the video from the camera 300 . The image acquisition unit 11 may acquire not images captured in real time but images captured in the past for batch processing. Note that the images described in this specification may be continuous frame images. A frame ID is assigned to the frame acquired by the video acquisition unit 11 .

骨格推定部１２は、取得した映像に写っている人物の骨格を推定し、骨格データを生成する。骨格推定部１２は、ＯｐｅｎｐｏｓｅやＯｐｅｎｐｉｆｐａｆなどの深層ネットワークモデルを活用したボトムアップ型の骨格推定手法を用いてもよいし、Ｙｏｌｏなどの人物検知を実施して各人物についてＨＲＮｅｔなどの深層ネットワークモデルを活用したトップダウン型の骨格推定手法を用いてもよいし、人物が身に着けているマーカーなどを検出することによる骨格推定手法を用いてもよい。なお、後段で時系列行動認識部１６による行動認識を行わない場合、骨格推定を行わず人物検知を行ってもよい。また、骨格推定部１２は、骨格推定と人物検知の両方を行ってもよい。骨格が推定されたオブジェクトは人物であると推定され、推定された人物毎に人物ＩＤが付与される。 The skeleton estimating unit 12 estimates the skeleton of the person appearing in the acquired video and generates skeleton data. The skeleton estimation unit 12 may use a bottom-up skeleton estimation method that utilizes a deep network model such as Openpose or Openpifpaf, or performs person detection such as Yolo and calculates a deep network model such as HRNet for each person. A top-down skeletal structure estimation method that utilizes a person may be used, or a skeletal structure estimation method based on detecting markers worn by a person may be used. Note that when the action recognition by the time-series action recognition unit 16 is not performed in the latter stage, the person detection may be performed without performing the skeleton estimation. Also, the skeleton estimation unit 12 may perform both skeleton estimation and person detection. An object whose skeleton is estimated is assumed to be a person, and a person ID is assigned to each estimated person.

人物矩形抽出部１３は、映像から骨格推定によって得られた骨格モデルの外形を示す矩形であるバウンディングボックスを生成し、当該矩形でトリミングされた画像を人物特徴抽出部１４とフレームレート調整部１８に出力する。骨格推定部１２と人物矩形抽出部１３は一体であってもよく、たとえば、人物矩形であるバウンディングボックスの生成は、骨格推定と同時に行われてもよい。また、人物矩形抽出部１３は、当該矩形の抽出のみを行い、フレーム画像と人物矩形を送信してもよい。 The person rectangle extraction unit 13 generates a bounding box, which is a rectangle indicating the outer shape of the skeleton model obtained by skeletal estimation from the video, and sends the image trimmed by the rectangle to the person feature extraction unit 14 and the frame rate adjustment unit 18. Output. The skeleton estimation unit 12 and the person rectangle extraction unit 13 may be integrated. For example, the generation of the bounding box, which is the person rectangle, may be performed simultaneously with the skeleton estimation. Alternatively, the person rectangle extraction unit 13 may only extract the rectangle and transmit the frame image and the person rectangle.

人物特徴抽出部１４は、骨格推定によって映像から得られた人物の画像特徴量を抽出する。例えば、人物矩形画像を入力とし、公知の一般物体認識用深層ネットワークモデルによる推論を実施し、その中間層のデータを画像特徴量としてもよい。また、公知の一般物体認識用深層ネットワークモデルによって、人物矩形抽出部１３及び人物特徴抽出部１４を一体に構成してもよく、この場合、骨格推定部１２が骨格推定した人物と人物特徴抽出を行った人物を紐づける処理を実施する。 The person feature extraction unit 14 extracts the image feature amount of the person obtained from the video by skeletal estimation. For example, a rectangular image of a person may be input, inference may be performed using a known general object recognition deep layer network model, and intermediate layer data may be used as the image feature amount. Also, the person rectangle extraction unit 13 and the person feature extraction unit 14 may be configured integrally by a well-known deep layer network model for general object recognition. A process of linking the person who has performed is executed.

人物追跡部１５は、人物特徴抽出によって得られた人物の画像特徴を用いて、特徴量が近似する同一人物の時系列の位置情報を関連付けて、人物の軌跡にトラックＩＤを付与する。例えば、ＤｅｅｐＳＯＲＴなどの深層ネットワークモデルを活用したＲｅＩＤ手法によって追跡処理を実現してもよいし、映像中の各フレームにおいて最も近い位置にいる人物を関連付ける処理を行ってもよいし、位置情報と人物の画像特徴を含む特徴量距離が小さい人物を関連付ける処理を行ってもよい。さらに、人物追跡部１５は、当該軌跡の特徴量である軌跡特徴を抽出してもよい。軌跡特徴は、例えば、一つ又は複数の固定長のベクトルによって表現され、任意の公知の方法によって抽出できる。具体的には、同一の軌跡ＩＤに対応付けられた移動物体の画像のフレーム内の座標の時系列変化から軌跡特徴を計算できる。トラックＩＤが付与された軌跡のデータは、当該軌跡が終了するまで、内部変数としてメモリに一時的に記憶されており、当該軌跡が途切れた後に、軌跡テーブル２１４に格納される。 The person tracking unit 15 uses the image features of the person obtained by extracting the person's features to associate time-series position information of the same person with similar feature amounts, and assigns a track ID to the trajectory of the person. For example, the tracking process may be realized by a ReID technique that utilizes a deep network model such as DeepSORT, the process of associating the closest person in each frame in the video, or the location information and the person. A process of associating a person with a small feature amount distance including the image feature of . Furthermore, the person tracking unit 15 may extract a trajectory feature, which is a feature amount of the trajectory. Trajectory features are represented, for example, by one or more fixed-length vectors and can be extracted by any known method. Specifically, the trajectory feature can be calculated from time-series changes in the coordinates within the frame of the image of the moving object associated with the same trajectory ID. The data of the trajectory assigned the track ID is temporarily stored in the memory as an internal variable until the trajectory ends, and is stored in the trajectory table 214 after the trajectory ends.

時系列行動認識部１６は、人物特徴抽出によって得られた同一人物の過去数フレームの骨格推定結果を用いて、人物の行動の種類を識別する。あわせて識別結果の確信度を算出するとよい。例えば、ＧＣＮベースの行動識別器である深層ネットワークモデルによって、着目時間制御部２３から出力された識別用コマ数のフレームを含むスライディングウインドウを用いて、時系列に行動を識別するとよい。他の行動識別器やルールベースによって学習済みの行動を識別してもよい。時系列行動認識を行わず、１フレームの骨格推定結果に基づいて行動を識別してもよい。時系列行動認識部１６で識別される人物の行動は、立つ、歩く、走る、しゃがむ、倒れる、手を振る、指をさす、キョロキョロする、話す、物を受け渡す、柵を乗り越える、物を拾う、刃物を振り回すなどであり、各行動に一意のアクションＩＤが付与されている。アクションＩＤは、行動識別によって得られる識別クラスを示す数値でもよいし、識別クラスに対応するラベルでもよい。また、時系列行動認識部１６が複数の行動を同時に認識した場合、アクションＩＤは、複数の行動に関する識別結果を示す数値列や識別ラベルのリストでもよい。また、時系列行動認識部１６は、前述した生活における人物の行動の他、工場における制御盤などの設備の操作行動やワークに対する作業を識別してもよい。 The time-series behavior recognition unit 16 identifies the type of behavior of a person using the skeleton estimation result of the same person for the past several frames obtained by extracting the person's characteristics. In addition, it is preferable to calculate the certainty of the identification result. For example, a deep network model, which is a GCN-based action classifier, may classify actions in chronological order using a sliding window containing the number of frames for identification output from the time-of-interest control unit 23 . A learned action may be identified by another action classifier or rule base. Actions may be identified based on the skeleton estimation result of one frame without performing time-series action recognition. The actions of a person identified by the time-series action recognition unit 16 are standing, walking, running, crouching, falling, waving, pointing, looking around, talking, handing over an object, climbing over a fence, and picking up an object. , swinging a knife, etc., and each action is given a unique action ID. The action ID may be a numerical value indicating an identification class obtained by action identification, or a label corresponding to the identification class. Further, when the time-series action recognition unit 16 recognizes a plurality of actions at the same time, the action ID may be a numerical string or a list of identification labels indicating identification results regarding the plurality of actions. In addition, the chronological behavior recognition unit 16 may identify the behavior of operating equipment such as a control panel in a factory and the work performed on a work, in addition to the above-described behavior of a person in daily life.

また、時系列行動認識部１６は、人物の行動における同時多発事象を識別する。すなわち、時系列行動認識部１６は、複数の人物が同時に同じ行動をとっており、同じアクションＩＤが付与される場合、撮影されている環境に異常が発生していると判定し、同時多発アラートを発生する。例えば、複数の人物が同時に倒れたことが検出されると、地震、火災、有毒ガスなどの異常が発生している可能性がある。また、大勢の人物が同時に同じ方向に走っていると、避難すべき事象が発生している可能性がある。同時多発事象は、完全に同じ時刻（同じフレーム）で識別されず、近い時間（前後のフレーム）や数秒間又は数分間などの一定の時間範囲内のフレームで識別される場合がある。このため、時系列行動認識部１６は、同時多発事象を識別する場合、所定の時間幅において同一行動が識別されるかを判定する。 In addition, the time-series behavior recognition unit 16 identifies simultaneous multiple events in human behavior. That is, when a plurality of persons are taking the same action at the same time and given the same action ID, the time-series action recognition unit 16 determines that an abnormality has occurred in the environment being photographed, and generates a simultaneous multi-occurrence alert. occurs. For example, if it is detected that multiple people have fallen at the same time, there is a possibility that an abnormality such as an earthquake, fire, or poisonous gas has occurred. Also, if a large number of people are running in the same direction at the same time, there is a possibility that an event requiring evacuation has occurred. Simultaneous events may not be identified at exactly the same time (same frame), but may be identified at close times (preceding and succeeding frames) or frames within a certain time range, such as seconds or minutes. Therefore, when identifying simultaneous multiple events, the time-series behavior recognition unit 16 determines whether the same behavior is identified within a predetermined time interval.

フレームレート調整部１７は、時系列行動認識部１６で識別された人物の行動種別のフレームレートを５ｆｐｓから１ｆｐｓに低下するように調整する。フレームレート調整部１７が調整するフレームレート（５ｆｐｓ、１ｆｐｓなど）は、本実施例における説明と異なるものでもよく、検出精度や用途によって、着目時間制御部２３から出力されるフレームレート制御値に従って調整される。人物の骨格の推定、特徴量の抽出、行動の認識、属性の推定などの処理の中で、行動の認識は最も高いフレームレートが必要であるが、他の処理は、その結果が急に変化する性質ではないため、低いフレームレートでもよい。各処理について必要十分なフレームレートに調整することによって、各処理で必要なフレームレートを最適化でき、少ないハードウェアリソースでリアルタイム処理又は準リアルタイム処理を実現できる。 The frame rate adjustment unit 17 adjusts the frame rate of the action type of the person identified by the time-series action recognition unit 16 from 5 fps to 1 fps. The frame rate (5 fps, 1 fps, etc.) adjusted by the frame rate adjustment unit 17 may be different from that described in the present embodiment, and may be adjusted according to the frame rate control value output from the time-of-interest control unit 23 depending on the detection accuracy and application. be done. Among processes such as human skeleton estimation, feature extraction, action recognition, and attribute estimation, action recognition requires the highest frame rate, but the results of other processes change rapidly. A low frame rate is acceptable because it does not have the property of By adjusting the necessary and sufficient frame rate for each process, the frame rate required for each process can be optimized, and real-time processing or quasi-real-time processing can be realized with less hardware resources.

フレームレート調整部１８は、人物特徴抽出部１４で抽出された人物の画像特徴量のフレームレートを５ｆｐｓから１ｆｐｓに低下するように調整する。フレームレート調整部１８が調整するフレームレート（５ｆｐｓ、１ｆｐｓ）は、本実施例における説明と異なるものでもよく、検出精度や用途によって、着目時間制御部２３から出力されるフレームレート制御値に従って調整される。次段の人物属性推定部２０では、急に大きく変化しない人物属性を推定するために高いフレームレートを必要としないので、人物属性を推定するために必要十分なフレームレートへの低下によって、使用するハードウェアリソースを低減できる。 The frame rate adjusting unit 18 adjusts the frame rate of the person's image feature amount extracted by the person feature extracting unit 14 from 5 fps to 1 fps. The frame rate (5 fps, 1 fps) adjusted by the frame rate adjustment unit 18 may be different from that described in the present embodiment, and is adjusted according to the frame rate control value output from the time-of-interest control unit 23 depending on the detection accuracy and application. be. The human attribute estimating unit 20 at the next stage does not require a high frame rate to estimate human attributes that do not change greatly abruptly. Hardware resources can be reduced.

人物属性推定部２０は、人物の年齢、性別、髪型、髪の色、着用しているアクセサリー、リュックや鞄や杖などの所有物、服装の色、服の種類などの外見から推定できる属性を推定し、各人物属性に一意に付与された人物属性ＩＤを出力する。あわせて推定結果の確信度を算出してもよい。人物属性ＩＤは、人物属性推定によって推定された属性を示す数値でもよいし、属性を示すラベルでもよい。また、人物属性推定部２０が複数の属性を同時に推定した場合、人物属性ＩＤは、複数の属性に関する推定結果でもよい。 The personal attribute estimation unit 20 identifies attributes that can be estimated from the person's appearance, such as age, gender, hairstyle, hair color, accessories worn, possessions such as backpacks, bags, and walking sticks, color of clothing, and type of clothing. It estimates and outputs a person attribute ID uniquely assigned to each person attribute. In addition, the certainty factor of the estimation result may be calculated. The personal attribute ID may be a numerical value indicating an attribute estimated by personal attribute estimation, or a label indicating the attribute. Also, when the personal attribute estimation unit 20 estimates a plurality of attributes at the same time, the personal attribute ID may be an estimation result regarding the plurality of attributes.

フレームレート調整部１９は、映像取得部１１が取得した映像のフレームレートを５ｆｐｓから１ｆｐｓに低下するようにフレームを間引く調整をする。フレームレート調整部１９が調整するフレームレート（５ｆｐｓ、１ｆｐｓ）は、本実施例における説明と異なるものでもよく、検出精度や用途によって、着目時間制御部２３から出力されるフレームレート制御値に従って調整される。次段の物体認識部２１では、急に大きく変化しない物体の種別を識別するために、高いフレームレートを必要としないので、物体を認識するために必要十分なフレームレートへの低下によって、使用するハードウェアリソースを低減できる。 The frame rate adjustment unit 19 thins out frames so that the frame rate of the image acquired by the image acquisition unit 11 is reduced from 5 fps to 1 fps. The frame rate (5 fps, 1 fps) adjusted by the frame rate adjustment unit 19 may be different from that described in this embodiment, and is adjusted according to the frame rate control value output from the time-of-interest control unit 23 depending on the detection accuracy and application. be. Since the object recognition unit 21 at the next stage does not require a high frame rate to identify the types of objects that do not change greatly abruptly, it is possible to reduce the frame rate to a sufficient frame rate for recognizing objects. Hardware resources can be reduced.

なお、フレームレート調整部１７～１８は、別のサブプログラムで構成しても、同じサブプログラムで構成してもよい。また、映像取得部１１やフレームレート調整部１７～１８が調整するフレームレートは、厳密に等間隔な時間調整をしなくてもよく、フレームレートに対し時間方向のゆらぎを持っていてもよい。つまり、例えば、５ｆｐｓで調整する場合、厳密に２００ｍｓ間隔のフレームを選択せず、１秒間に対して時間間隔が異なる５枚のフレームを選択してもよい。 Note that the frame rate adjusting units 17 and 18 may be configured by different subprograms or may be configured by the same subprogram. Also, the frame rate adjusted by the video acquisition unit 11 and the frame rate adjustment units 17 and 18 does not need to be adjusted at strict equal intervals, and may have fluctuations in the time direction with respect to the frame rate. That is, for example, when adjusting at 5 fps, instead of strictly selecting frames with 200 ms intervals, five frames with different time intervals for one second may be selected.

物体認識部２１は、映像取得部１１が取得した映像に写っている物体を識別し、当該物体の種別（人物が持っている鞄や傘、自走するロボット、自転車、スケートボード、人物が操作する設備）を識別して、各物体属性に一意に付与された物体属性ＩＤを出力する。例えば、物体の画像と種別で学習したＡＩエンジンを用いて物体の種別を識別できる。あわせて推定結果の確信度を算出してもよい。さらに、物体認識部２１は、識別された物体と映像から得られた人物の関係（例えば、識別された物体と所有者との関係）を推定する。あわせて推定結果の確信度を算出してもよい。 The object recognition unit 21 identifies an object appearing in the image acquired by the image acquisition unit 11, and identifies the type of the object (a bag or umbrella held by a person, a self-propelled robot, a bicycle, a skateboard, or operated by a person). equipment) is identified, and an object attribute ID uniquely assigned to each object attribute is output. For example, the type of object can be identified using an AI engine trained on the image and type of the object. In addition, the certainty factor of the estimation result may be calculated. Furthermore, the object recognition unit 21 estimates the relationship between the identified object and the person obtained from the image (for example, the relationship between the identified object and the owner). In addition, the certainty factor of the estimation result may be calculated.

本実施例では、人物属性推定部２０及び物体認識部２１の両方を設けているが、映像解析の用途に応じて、一方のみ設けてもよい。 In this embodiment, both the person attribute estimating unit 20 and the object recognizing unit 21 are provided, but only one of them may be provided depending on the purpose of video analysis.

なお、映像解析システム１００は、人物だけでなく他の生物（例えば、熊、猪などの野生動物）を識別してもよく、生物ではないロボットや機械を識別してもよい。例えば、人型ロボットを人物として識別し、動物型ロボットや搬送車を物体として識別してもよい。 Note that the video analysis system 100 may identify not only humans but also other living things (for example, wild animals such as bears and wild boars), and may also identify non-living robots and machines. For example, a humanoid robot may be identified as a person, and an animal robot or carrier may be identified as an object.

ＦＤＢ登録部２２は、ＦＤＢサーバ２００へデータを登録するＦＤＢ登録処理を実行する。具体的には、人物追跡部１５から出力される人物の軌跡（トラックＩＤ）、時系列行動認識部１６から出力され、フレームレート調整部１７がフレームレートを調整した人物の行動の種類（アクションＩＤ）、人物属性推定部２０から出力される人物の属性（人物属性ＩＤ）、及び物体認識部２１から出力される物体の種別（物体属性ＩＤ）を、オブジェクトＩＤ、人物ＩＤ、及びトラックＩＤを用いて関連付けてＦＤＢサーバ２００へ登録する。 The FDB registration unit 22 executes FDB registration processing for registering data in the FDB server 200 . Specifically, the person's trajectory (track ID) output from the person tracking unit 15, the type of person's action (action ID) output from the time-series action recognition unit 16 and whose frame rate is adjusted by the frame rate adjustment unit 17 (action ID ), the attribute of the person (person attribute ID) output from the person attribute estimation unit 20, and the type of object (object attribute ID) output from the object recognition unit 21 are obtained using the object ID, the person ID, and the track ID. are associated with each other and registered in the FDB server 200 .

着目時間制御部２３は、各フレームレート調整部１７～１９及び映像取得部１１が調整するフレームレートを制御する。着目時間制御部２３が出力するフレームレートの制御値は、時系列行動認識部１６で認識する行動の種別に従って決定されるとよい。着目時間制御部２３は、各処理におけるフレームレートの制御値をフレームレート調整部１７～１９に出力し、各処理の実行タイミング（実行間隔）を制御する。着目時間制御部２３は、行動を識別するための画像の間隔（フレームレート）とコマ数の組を時系列行動認識部１６に出力し、時系列行動認識処理の実行タイミング（実行間隔）を制御する。また、着目時間制御部２３の構成は、フレームレートの制御値を格納するテーブルでも、パラメータファイルでも、プログラムの内部変数でもよい。 The time-of-interest control unit 23 controls the frame rate adjusted by each of the frame rate adjustment units 17 to 19 and the image acquisition unit 11 . The frame rate control value output by the time-of-interest control unit 23 may be determined according to the type of action recognized by the time-series action recognition unit 16 . The time-of-interest control unit 23 outputs the control value of the frame rate in each process to the frame rate adjustment units 17 to 19, and controls the execution timing (execution interval) of each process. The time-of-interest control unit 23 outputs a set of an image interval (frame rate) and the number of frames for identifying an action to the time-series action recognition unit 16, and controls the execution timing (execution interval) of time-series action recognition processing. do. Also, the structure of the time-of-interest control unit 23 may be a table for storing control values of the frame rate, a parameter file, or an internal variable of a program.

クエリ設定部２４は、履歴検索機能において、人物の属性、人物の行動、物体の属性の一つ以上を含む、特徴量データベース２１０を検索するための検索クエリを生成する。検索クエリは、例えば、図７に示す検索画面７００を操作して検索すべき属性を設定して、検索クエリを設定する。生成された検索クエリはＦＤＢサーバ２００に送信され、特徴量データベース２１０を検索した結果が映像解析システム１００に返送される。 In the history search function, the query setting unit 24 generates a search query for searching the feature amount database 210 including one or more of a person's attribute, a person's action, and an object's attribute. The search query is set by, for example, operating the search screen 700 shown in FIG. 7 to set attributes to be searched. The generated search query is transmitted to the FDB server 200, and the result of searching the feature amount database 210 is returned to the video analysis system 100. FIG.

検索結果出力部２５は、クエリに該当する人物や物体が写っているフレームや映像クリップ（例えば、人物特徴抽出部１４が生成した人物矩形データ内の映像）を検索結果表示領域７５０に表示する検索画面７００（図７参照）を生成する。 The search result output unit 25 displays, in the search result display area 750, a frame or video clip showing a person or object corresponding to the query (for example, video in the person rectangle data generated by the person feature extraction unit 14). A screen 700 (see FIG. 7) is generated.

リアルタイム検知部２６は、映像認識結果又は映像から推定される検知対象を受け付け、検知ルールメモリ２７及び検知対象データベース２２０を参照して、リアルタイム検知処理を実行する。リアルタイム検知部２６は、例えば、リアルタイム検知条件を生成し、人物追跡部１５から出力される人物の軌跡、時系列行動認識部１６から出力される人物の行動の種類、人物属性推定部２０から出力される人物の属性、及び物体認識部２１から出力される物体の種別が検知条件にヒットすると、トラックＩＤを用いて軌跡を特定して人物を追跡し、当該トラックＩＤのデータをリアルタイム検知部２６内の一次記憶領域に格納して、検知結果をリアルタイムに画面出力する。リアルタイム検知部２６は、リアルタイム検知結果をＦＤＢ登録部２２に送って、クエリＩＤと人物ＩＤとを関連付け、特徴量データベース２１０に登録してもよい。リアルタイム検知処理は、ＦＤＢ登録部２２によるＦＤＢ登録処理と並列に実行されてもよい。 The real-time detection unit 26 receives a detection target estimated from the image recognition result or the video, refers to the detection rule memory 27 and the detection target database 220, and executes real-time detection processing. The real-time detection unit 26 generates, for example, a real-time detection condition, and uses the trajectory of the person output from the person tracking unit 15, the type of behavior of the person output from the time-series behavior recognition unit 16, and the type of behavior of the person output from the person attribute estimation unit 20. When the attribute of the person received and the type of the object output from the object recognition unit 21 match the detection conditions, the person is tracked by specifying the trajectory using the track ID, and the data of the track ID is sent to the real-time detection unit 26. The data is stored in the temporary storage area inside the system, and the detection results are displayed on the screen in real time. The real-time detection unit 26 may send the real-time detection result to the FDB registration unit 22 , associate the query ID and the person ID, and register them in the feature amount database 210 . The real-time detection process may be executed in parallel with the FDB registration process by the FDB registration unit 22 .

検知ルールメモリ２７は、特徴量データベース２１０を検索した検索クエリのうち、ターゲットがヒットしなかったものが検知ルールとして登録される記憶領域である（図５のステップ１５８参照）。 The detection rule memory 27 is a storage area for registering, as a detection rule, search queries for which the target is not hit among the search queries that searched the feature quantity database 210 (see step 158 in FIG. 5).

ＦＤＢサーバ２００は、映像解析システム１００による映像の解析結果が登録される特徴量データベース２１０と、検知対象データベース２２０によって構成される。特徴量データベース２１０の構成は、図３を参照して後述する。検知対象データベース２２０は、特徴量データベース２１０を検索した結果、ヒットしたターゲットの特徴量が登録され、後のリアルタイム検知において特徴量同士（すなわち、特徴量によるクエリと、ＦＤＢ登録部２２に入力される特徴量）を比較するために使用される。なお、特徴量データベース２１０と検知対象データベース２２０は独立のＦＤＢサーバとして稼働してもよいし、さらに、それぞれのデータベースが複数のデータベースに分割されて構成されてよく、分割されたデータベースが独立のＦＤＢサーバとして稼働してもよい。 The FDB server 200 is composed of a feature amount database 210 in which video analysis results by the video analysis system 100 are registered, and a detection target database 220 . The configuration of the feature quantity database 210 will be described later with reference to FIG. In the detection target database 220, as a result of searching the feature amount database 210, the feature amounts of hit targets are registered. feature values). Note that the feature amount database 210 and the detection target database 220 may operate as independent FDB servers, or each database may be divided into a plurality of databases, and the divided databases may be independent FDB servers. It can work as a server.

カメラ３００は、監視区域内を移動する人物を追跡可能なように、監視区域内に複数設置される。カメラ３００は、撮像素子、制御回路、及び通信インターフェースを有し、撮影した映像を通信インターフェースから映像解析システム１００に出力する。カメラ３００は、ＩＰネットワークに直接接続可能なネットワークカメラに限らず、いわゆるビデオカメラやスチルカメラでもよい。さらに、カメラ３００は、骨格推定機能を有するエッジ映像解析機能を有するＡＩカメラでもよく、この場合、骨格推定部１２は骨格推定を行わず人物ＩＤの発行のみを行ってもよいし、骨格推定部１２が省略され、映像取得部１１や人物特徴抽出部１４において人物ＩＤが発行されてもよい。 A plurality of cameras 300 are installed in the surveillance area so that a person moving in the surveillance area can be tracked. The camera 300 has an imaging device, a control circuit, and a communication interface, and outputs captured video to the video analysis system 100 through the communication interface. The camera 300 is not limited to a network camera directly connectable to an IP network, and may be a so-called video camera or still camera. Furthermore, the camera 300 may be an AI camera having an edge video analysis function with a skeleton estimation function. 12 may be omitted, and the person ID may be issued in the video acquisition unit 11 or the person feature extraction unit 14 .

図２は、本実施例の映像解析システム１００の物理的な構成を示すブロック図である。 FIG. 2 is a block diagram showing the physical configuration of the video analysis system 100 of this embodiment.

本実施形態の映像解析システム１００は、プロセッサ（ＣＰＵ）１、メモリ２、補助記憶装置３及び通信インターフェース４を有する計算機によって構成される。 A video analysis system 100 of this embodiment is configured by a computer having a processor (CPU) 1 , a memory 2 , an auxiliary storage device 3 and a communication interface 4 .

プロセッサ１は、メモリ２に格納されたプログラムを実行する。メモリ２は、不揮発性の記憶素子であるＲＯＭ及び揮発性の記憶素子であるＲＡＭを含む。ＲＯＭは、不変のプログラム（例えば、ＢＩＯＳ）などを格納する。ＲＡＭは、ＤＲＡＭ（Dynamic Random Access Memory）のような高速かつ揮発性の記憶素子であり、プロセッサ１が実行するプログラム及びプログラムの実行時に使用されるデータを一時的に格納する。 Processor 1 executes programs stored in memory 2 . The memory 2 includes ROM, which is a non-volatile storage element, and RAM, which is a volatile storage element. The ROM stores immutable programs (eg, BIOS) and the like. RAM is a high-speed and volatile storage element such as DRAM (Dynamic Random Access Memory), and temporarily stores programs executed by the processor 1 and data used when the programs are executed.

補助記憶装置３は、例えば、磁気記憶装置（ＨＤＤ）、フラッシュメモリ（ＳＳＤ）等の大容量かつ不揮発性の記憶装置であり、プロセッサ１が実行するプログラム及びプログラムの実行時に使用されるデータを格納する。すなわち、プログラムは、補助記憶装置３から読み出されて、メモリ２にロードされて、プロセッサ１によって実行される。 The auxiliary storage device 3 is, for example, a large-capacity, non-volatile storage device such as a magnetic storage device (HDD) or flash memory (SSD), and stores programs executed by the processor 1 and data used when the programs are executed. do. That is, the program is read from the auxiliary storage device 3, loaded into the memory 2, and executed by the processor 1. FIG.

通信インターフェース４は、所定のプロトコルに従って、他の装置（ＦＤＢサーバ２００、カメラ３００など）との通信を制御するネットワークインターフェース装置である。 The communication interface 4 is a network interface device that controls communication with other devices (FDB server 200, camera 300, etc.) according to a predetermined protocol.

映像解析システム１００は、入力インターフェース５及び出力インターフェース８を有してもよい。入力インターフェース５は、キーボード６やマウス７などが接続され、オペレータからの入力を受けるインターフェースである。出力インターフェース８は、ディスプレイ装置９やプリンタなどが接続され、プログラムの実行結果をオペレータが視認可能な形式で出力するインターフェースである。映像解析システム１００が入出力画面をウェブアプリやウェブ画面等としてサーバ経由で提供する場合、入力インターフェース５と出力インターフェース８は、入出力画面にアクセスするための映像解析システム１００とは異なる端末に搭載される。この場合、タブレットデバイスなどのように、入力インターフェース５と出力インターフェース８は一つのデバイスとして構成されてもよい。 The video analysis system 100 may have an input interface 5 and an output interface 8 . The input interface 5 is an interface to which a keyboard 6, a mouse 7, etc. are connected and which receives input from an operator. The output interface 8 is an interface to which a display device 9, a printer, and the like are connected, and which outputs results of program execution in a format that can be visually recognized by an operator. When the video analysis system 100 provides an input/output screen via a server as a web application or a web screen, the input interface 5 and the output interface 8 are mounted on a terminal different from the video analysis system 100 for accessing the input/output screen. be done. In this case, the input interface 5 and the output interface 8 may be configured as one device like a tablet device.

プロセッサ１が実行するプログラムは、リムーバブルメディア（ＣＤ－ＲＯＭ、フラッシュメモリなど）又はネットワークを介して映像解析システム１００に提供され、非一時的記憶媒体である不揮発性の補助記憶装置３に格納される。このため、映像解析システム１００は、リムーバブルメディアからデータを読み込むインターフェースを有するとよい。 Programs executed by the processor 1 are provided to the video analysis system 100 via removable media (CD-ROM, flash memory, etc.) or a network, and stored in the non-volatile auxiliary storage device 3, which is a non-temporary storage medium. . Therefore, the video analysis system 100 preferably has an interface for reading data from removable media.

映像解析システム１００は、物理的に一つの計算機上で、又は、論理的又は物理的に構成された複数の計算機上で構成される計算機システムであり、同一の計算機上で別個のスレッドで動作してもよく、複数の物理的計算機資源上に構築された仮想計算機上で動作してもよい。映像解析システム１００の各機能部は異なる計算機上で実現されてもよい。 The video analysis system 100 is a computer system configured on one physical computer or on a plurality of logically or physically configured computers, and operates in separate threads on the same computer. Alternatively, it may operate on a virtual computer built on a plurality of physical computer resources. Each functional unit of the video analysis system 100 may be implemented on different computers.

図３は、特徴量データベース２１０の構成例を示す図である。 FIG. 3 is a diagram showing a configuration example of the feature amount database 210. As shown in FIG.

特徴量データベース２１０は、フレームテーブル２１１、人物テーブル２１２、物体テーブル２１３及び軌跡テーブル２１４で構成される。特徴量データベース２１０を他のテーブル構成でもよく、テーブルではない形式、例えば、リストや辞書などの形式で構成してもよい。 The feature amount database 210 is composed of a frame table 211 , person table 212 , object table 213 and trajectory table 214 . The feature amount database 210 may have another table configuration, or may be configured in a format other than a table, such as a list or dictionary.

フレームテーブル２１１は、映像のフレームに関するデータが記録されるテーブルであって、フレームＩＤ、カメラＩＤ及び日時を含む。フレームＩＤは、フレームの識別情報である。カメラＩＤは、当該フレームを撮影したカメラ３００の一意の識別情報である。なお、カメラＩＤを明示的に設けず、特定の桁がカメラ３００を表すようにフレームＩＤを定義してもよい。日時は、当該フレームが撮影された日時、又は、カメラ３００が配信時に付与する日時、または、映像取得部１１が映像取得時に付与する日時である。フレームテーブル２１１の情報を人物テーブル２１２および物体テーブル２１３が保持してもよく、この場合フレームテーブル２１１は無くてもよい。 The frame table 211 is a table in which data relating to video frames is recorded, and includes frame IDs, camera IDs, and dates and times. The frame ID is identification information of the frame. The camera ID is unique identification information of the camera 300 that captured the frame. Note that the frame ID may be defined such that a specific digit represents the camera 300 without explicitly providing the camera ID. The date and time is the date and time when the frame was captured, the date and time given by the camera 300 at the time of distribution, or the date and time given by the video acquisition unit 11 when the video is acquired. The information of the frame table 211 may be held by the person table 212 and the object table 213, in which case the frame table 211 may be omitted.

人物テーブル２１２は、映像のフレームから認識された人物の情報が記録されるテーブルであって、人物ＩＤ、フレームＩＤ、トラックＩＤ、アクションＩＤ、人物属性ＩＤ、人物画像特徴、及び人物座標を含む。人物ＩＤは、人物であるオブジェクト（例えば骨格推定ができたオブジェクトは人物であると認識できる）に付与される一意の識別情報である。人物ＩＤは、同じ人物について複数のフレームにおいて同じ人物ＩＤが付与されてもよいし、同じ人物でもフレーム毎に異なる人物ＩＤが付与されてもよい。フレームＩＤは、フレームテーブル２１１のフレームＩＤと同じ識別情報が用いられる。トラックＩＤは、人物の移動の軌跡を一意に示す識別情報であり、同一人物の軌跡には一つのトラックＩＤが付与される。トラックＩＤは含まれなくてもよい。アクションＩＤは、人物の行動の種類を示す識別情報であり、時系列行動認識部１６で識別される人物の行動の種類に対応する。アクションＩＤは時系列行動認識部１６で識別された識別値でもよいし、識別値に対応するラベルでもよい。アクションＩＤは、識別時の確信度を含んでもよい。人物属性ＩＤは、当該人物の属性を示す識別情報であり、人物属性推定部２０で推定された属性に対応する。人物属性ＩＤは、人物属性推定部２０で識別された識別値であってもよいし、識別値に対応するラベルでもよい。人物属性ＩＤは、推定の確信度を含んでもよい。人物画像特徴は、人物特徴抽出部１４が出力する人物の画像特徴量である。人物座標は、当該人物が認識された範囲を示す人物のフレームにおける座標であり、骨格推定部１２における人物の骨格位置情報と、人物矩形抽出部１３から出力される人物の範囲を示す矩形情報との、いずれか、又は、両方である。人物座標は、いわゆる画像座標で表現されてもよいし、絶対座標などの被撮影者の３次元空間の位置を示す位置情報として表現されてもよい。 The person table 212 is a table in which information on persons recognized from video frames is recorded, and includes person IDs, frame IDs, track IDs, action IDs, person attribute IDs, person image features, and person coordinates. A person ID is unique identification information given to an object that is a person (for example, an object whose skeleton can be estimated can be recognized as a person). The same person ID may be assigned to the same person in a plurality of frames, or different person IDs may be assigned to the same person for each frame. The same identification information as the frame ID of the frame table 211 is used for the frame ID. A track ID is identification information that uniquely indicates the trajectory of movement of a person, and one track ID is assigned to the trajectory of the same person. A track ID may not be included. The action ID is identification information indicating the type of action of the person, and corresponds to the type of action of the person identified by the time-series action recognition unit 16 . The action ID may be an identification value identified by the time series action recognition unit 16, or may be a label corresponding to the identification value. The action ID may include confidence at the time of identification. A person attribute ID is identification information indicating an attribute of the person, and corresponds to the attribute estimated by the person attribute estimation unit 20 . The personal attribute ID may be an identification value identified by the personal attribute estimation unit 20, or may be a label corresponding to the identification value. A person attribute ID may include an estimated confidence factor. A person image feature is an image feature amount of a person output by the person feature extraction unit 14 . The person coordinates are the coordinates in the frame of the person indicating the range in which the person is recognized, and are combined with the skeleton position information of the person in the skeleton estimation unit 12 and the rectangle information indicating the range of the person output from the person rectangle extraction unit 13. , either or both. The person coordinates may be represented by so-called image coordinates, or may be represented by position information indicating the position of the person to be photographed in a three-dimensional space, such as absolute coordinates.

物体テーブル２１３は、物体ＩＤ、フレームＩＤ、トラックＩＤ、人物ＩＤ、物体属性ＩＤ、及び物体座標を含む。物体ＩＤは、当該オブジェクトが認識された物体に付与される一意の識別情報である。フレームＩＤは、フレームテーブル２１１のフレームＩＤと同じ識別情報が用いられる。トラックＩＤは、物体を追跡して得られた軌跡を一意に示す識別情報であり、異なる物体ＩＤでも同一物体の移動には一つのトラックＩＤが付与される。人物ＩＤは、当該物体と共に移動していると推定される人物の識別情報である。トラックＩＤと人物ＩＤは省略されてもよい。物体属性ＩＤは、当該物体の属性（物体認識部２１で識別された物体の種類）を示す識別情報である。物体属性ＩＤは物体認識部２１における識別値でもよいし、識別値に対応するラベルでもよい。物体属性ＩＤは、物体認識部２１による識別の確信度を含んでもよい。物体座標は、物体認識部２１により物体が認識された画像上の位置又は領域（矩形や多角形など）を示す座標である。物体座標はいわゆる画像座標でもよいし、世界座標や物体が置かれている３次元空間上の位置情報でもよい。 The object table 213 includes object IDs, frame IDs, track IDs, person IDs, object attribute IDs, and object coordinates. The object ID is unique identification information given to the object whose object is recognized. The same identification information as the frame ID of the frame table 211 is used for the frame ID. A track ID is identification information that uniquely indicates a trajectory obtained by tracking an object. Even if different object IDs are used, one track ID is assigned to the movement of the same object. The person ID is identification information of a person presumed to be moving with the object. The track ID and person ID may be omitted. The object attribute ID is identification information indicating the attribute of the object (type of object identified by the object recognition unit 21). The object attribute ID may be an identification value in the object recognition unit 21 or a label corresponding to the identification value. The object attribute ID may include confidence of identification by the object recognition unit 21 . The object coordinates are coordinates indicating the position or area (rectangle, polygon, etc.) on the image where the object is recognized by the object recognition unit 21 . The object coordinates may be so-called image coordinates, world coordinates, or positional information in a three-dimensional space where the object is placed.

軌跡テーブル２１４は、トラックＩＤ、人物ＩＤ、物体ＩＤ及び軌跡特徴を含む。トラックＩＤは、人物追跡部１５により得られた人物又は物体の軌跡を一意に示す識別情報であり、人物テーブル２１２のトラックＩＤや物体テーブル２１３のトラックＩＤと同じ識別情報が用いられる。つまり、軌跡テーブル２１４は、軌跡情報に基づいて、複数のフレームに出現する同一の人物又は物体の関連付けを可能にする。人物ＩＤは、当該軌跡に沿って移動する人物の識別情報である。物体ＩＤは、当該軌跡に沿って移動する物体の識別情報である。軌跡属性ＩＤは、当該軌跡の特徴量である。 Trajectory table 214 includes track IDs, person IDs, object IDs, and trajectory features. The track ID is identification information that uniquely indicates the trajectory of the person or object obtained by the person tracking unit 15, and the same identification information as the track ID of the person table 212 and the track ID of the object table 213 is used. In other words, the trajectory table 214 enables association of the same person or object appearing in multiple frames based on the trajectory information. The person ID is identification information of a person who moves along the trajectory. The object ID is identification information of an object that moves along the trajectory. A trajectory attribute ID is a feature amount of the trajectory.

このように、特徴量データベース２１０を構成する各テーブルは、フレームＩＤ、トラックＩＤ、人物ＩＤ、及び物体ＩＤで関連付けられており、クエリ設定部２４からの検索要求に対して、これらの識別情報によって他のテーブルのデータを取得できるように構成されている。 In this way, each table constituting the feature amount database 210 is associated with a frame ID, a track ID, a person ID, and an object ID. It is configured to retrieve data from other tables.

図４は、特徴量ＤＢ構築処理のフローチャートである。 FIG. 4 is a flowchart of the feature amount DB construction processing.

まず、映像取得部１１が、複数のカメラ３００から映像を取得し、映像のフレームを間引いてフレームレートを低く（例えば３０ｆｐｓから５ｆｐｓに）する映像取得処理を実行する（１０１）。次に、骨格推定部１２が、取得した映像に写っている人物の骨格を推定し、骨格データを生成する骨格推定処理を実行する（１０２）。次に、人物矩形抽出部１３が、映像から骨格推定処理（１０２）によって映像から得られた骨格モデルの外形を示す矩形を生成する人物矩形抽出処理を実行する（１０３）。次に、人物特徴抽出部１４が、人物矩形抽出処理（１０３）によって映像から得られた矩形内で人物の画像特徴量を抽出しする人物特徴抽出処理を実行する（１０４）。次に、人物追跡部１５が、人物特徴抽出処理（１０４）によって得られた人物の画像特徴を用いて、同一人物の時系列の位置情報を関連付けて、人物の軌跡にトラックＩＤを付与し、当該軌跡の特徴を抽出する人物追跡処理を実行する（１０５）。ここで、ステップ１０２とステップ１０３、ステップ１０３とステップ１０４、ステップ１０２とステップ１０３とステップ１０４は、それぞれ、深層ネットワークモデルを含むプログラムによって同時に算出されてもよい。次に、ＦＤＢ登録部２２が、人物特徴抽出部１４から出力された人物画像特徴と、人物追跡部１５から出力される人物の軌跡とをＦＤＢサーバ２００へ登録するＦＤＢ登録処理を実行する（１０６）。 First, the image acquisition unit 11 acquires images from a plurality of cameras 300, thins out frames of the images, and executes image acquisition processing to lower the frame rate (for example, from 30 fps to 5 fps) (101). Next, the skeleton estimation unit 12 estimates the skeleton of the person appearing in the acquired video, and executes skeleton estimation processing for generating skeleton data (102). Next, the person rectangle extraction unit 13 executes a person rectangle extraction process (103) to generate a rectangle indicating the outline of the skeleton model obtained from the image by the skeleton estimation process (102) from the image. Next, the person feature extraction unit 14 executes person feature extraction processing (104) for extracting the image feature amount of the person within the rectangle obtained from the video by the person rectangle extraction processing (103). Next, the person tracking unit 15 uses the image features of the person obtained by the person feature extraction process (104) to associate time-series position information of the same person, assigns a track ID to the trajectory of the person, A person tracking process for extracting features of the trajectory is executed (105). Here, step 102 and step 103, step 103 and step 104, step 102, step 103 and step 104 may each be calculated simultaneously by a program including a deep network model. Next, the FDB registration unit 22 executes FDB registration processing for registering the person image feature output from the person feature extraction unit 14 and the person's trajectory output from the person tracking unit 15 in the FDB server 200 (106 ).

次に、時系列行動認識部１６が、人物特徴抽出処理（１０４）によって得られた同一人物の過去数フレームの骨格推定結果を用いて、人物の行動の種類を識別する時系列行動認識処理を実行する（１０７）。次に、フレームレート調整部１７が、時系列行動認識部１６で識別された人物の行動種別のフレームレートを低く（例えば５ｆｐｓから１ｆｐｓに）するフレームレート調整処理を実行する（１０８）。次に、ＦＤＢ登録部２２が、フレームレートが調整された人物の行動の種類（アクションＩＤ）をＦＤＢサーバ２００へ登録するＦＤＢ登録処理を実行する（１０９）。 Next, the time-series action recognition unit 16 performs time-series action recognition processing for identifying the type of action of the person using the skeleton estimation result of the same person for the past few frames obtained by the person feature extraction process (104). Execute (107). Next, the frame rate adjustment unit 17 executes frame rate adjustment processing to lower the frame rate of the action type of the person identified by the time-series action recognition unit 16 (for example, from 5 fps to 1 fps) (108). Next, the FDB registration unit 22 executes FDB registration processing for registering the type of action (action ID) of the person whose frame rate has been adjusted in the FDB server 200 (109).

また、フレームレート調整部１９が、映像取得部１１が取得した映像のフレームレートを低く（例えば５ｆｐｓから１ｆｐｓに）するフレームレート調整処理を実行する（１１１）。次に、物体認識部２１が、映像取得部１１が取得した映像に写っている物体を認識し、当該物体の種別を識別する物体認識処理を実行する（１１２）。次に、ＦＤＢ登録部２２が、物体認識処理（１１２）で識別された物体の種別（物体属性ＩＤ）をＦＤＢサーバ２００へデータを登録するＦＤＢ登録処理を実行する（１１３）。 In addition, the frame rate adjustment unit 19 executes frame rate adjustment processing to lower the frame rate of the image acquired by the image acquisition unit 11 (for example, from 5 fps to 1 fps) (111). Next, the object recognition unit 21 recognizes an object appearing in the image acquired by the image acquisition unit 11 and executes object recognition processing for identifying the type of the object (112). Next, the FDB registration unit 22 executes FDB registration processing for registering the data of the object type (object attribute ID) identified in the object recognition processing (112) in the FDB server 200 (113).

また、フレームレート調整部１８が、人物特徴抽出部１４で抽出された人物の画像特徴量のフレームレートを低く（例えば５ｆｐｓから１ｆｐｓ）するフレームレート調整処理を実行する（１２１）。次に、人物属性推定部２０が、人物の年齢、性別、髪型、髪の色、服装の色、服の種類など映像から推定できる属性を推定する人物属性推定処理を実行する（１２２）。次に、ＦＤＢ登録部２２が、人物属性推定処理（１２２）で推定された人物の属性（人物属性ＩＤ）をＦＤＢサーバ２００へ登録するＦＤＢ登録処理を実行する（１２３）。 The frame rate adjustment unit 18 also executes frame rate adjustment processing to lower the frame rate of the image feature amount of the person extracted by the person feature extraction unit 14 (for example, from 5 fps to 1 fps) (121). Next, the person attribute estimation unit 20 executes a person attribute estimation process for estimating attributes that can be estimated from the image, such as the person's age, sex, hairstyle, hair color, clothing color, and clothing type (122). Next, the FDB registration unit 22 executes an FDB registration process for registering the person's attribute (person attribute ID) estimated in the person attribute estimation process (122) in the FDB server 200 (123).

図５Ａは、リアルタイム検知処理のフローチャートである。 FIG. 5A is a flowchart of real-time detection processing.

リアルタイム検知処理では、特徴量データベース２１０にリアルタイムに登録される特徴量に、目撃情報（例えば、人物の外見、人物の行動、物体の外見などの属性）又は人物画像特徴量をクエリとして、ターゲットを検知する。なお、ステップ１５２～１５７の処理と、ステップ１５１、ステップ１６１～１６３の処理は並列に実行される。 In the real-time detection process, the feature amount registered in the feature amount database 210 in real time is used as a query for eyewitness information (for example, attributes such as a person's appearance, a person's behavior, and an object's appearance) or a person image feature amount, and a target is detected. detect. The processing of steps 152 to 157 and the processing of steps 151 and 161 to 163 are executed in parallel.

まず、リアルタイム検知部２６は、検知対象が検知対象データベース２２０に登録されているかを判定し（１５１）、これと並列または前後して、検知ルールメモリ２７に検知ルールが設定済みであるかを判定する（１５２）。 First, the real-time detection unit 26 determines whether or not the detection target is registered in the detection target database 220 (151). (152).

ステップ１５１で、検知対象が検知対象データベース２２０に登録されていると判定されると、ステップ１６１に進む。一方、ステップＳ１５２で検知ルールメモリ２７に検知ルールが設定されていれば、ステップ１５３に進む。ステップ１５１とステップ１５２がともにＮｏであれば、検知対象が特徴量データベース２１０に登録されておらず、かつ検知ルールメモリ２７に検知ルールが設定されていないので、リアルタイム検知処理を終了する。 If it is determined in step 151 that the detection target is registered in the detection target database 220, step 161 is performed. On the other hand, if the detection rule is set in the detection rule memory 27 in step S152, the process proceeds to step S153. If both steps 151 and 152 are No, the detection target is not registered in the feature amount database 210 and the detection rule is not set in the detection rule memory 27, so the real-time detection process is terminated.

ステップ１５１で、検知対象が検知対象データベース２２０に登録されていると判定されると、リアルタイム検知部２６は、人物特徴抽出部１４から取得した人物画像特徴をクエリにして、検知対象ＤＢ２２０に登録された人物の人物画像特徴に対する検索を実施し、類似度の高い人物が登録されているかを確認する（１６１）。類似度の高い人物が登録されていなければ（１６２でＮｏ）、リアルタイム検知処理を終了する。一方、類似度の高い人物が登録されていれば（１６２でＹｅｓ）、対象者発見の画面を生成し、ユーザに通知する（１６３）。このとき、類似度の高い人物に関連する軌跡があれば、当該軌跡のトラックＩＤから人物ＩＤ及び物体ＩＤを介して、当該人物及び当該人物と共にしている物体を追跡して、複数時点での人物の特徴と提示できる。 When it is determined in step 151 that the detection target is registered in the detection target database 220, the real-time detection unit 26 uses the human image features acquired from the human feature extraction unit 14 as a query and registers them in the detection target DB 220. A search is carried out for the person image features of the person found, and it is confirmed whether or not a person with a high degree of similarity is registered (161). If a person with a high degree of similarity is not registered (No at 162), the real-time detection process ends. On the other hand, if a person with a high degree of similarity is registered (Yes at 162), a target person discovery screen is generated and notified to the user (163). At this time, if there is a trajectory related to a person with a high degree of similarity, the person and the object together with the person are tracked from the track ID of the trajectory via the person ID and the object ID, and the It can be presented as a characteristic of a person.

ステップ１５２で、検知ルールメモリ２７に検知ルールが設定されていれば、属性推定や行動識別の結果が検知ルールメモリ２７に設定された検知ルールに該当するかを確認する（１５３）。属性推定及び行動識別のいずれの結果も検知ルールメモリ２７に設定された検知ルールに該当しなければ（１５４でＮｏ）、リアルタイム検知処理を終了する。一方、属性推定及び行動識別の結果が検知ルールメモリ２７に設定された検知ルールに該当すれば（１５４でＹｅｓ）、対象者発見の画面を生成し、ユーザに通知する（１５５）。このとき、発見された人物に関連する軌跡があれば、当該軌跡のトラックＩＤから人物ＩＤ及び物体ＩＤを介して、当該人物及び当該人物と共にしている物体を追跡して、複数時点での人物の特徴と提示できる。 If a detection rule is set in the detection rule memory 27 in step 152, it is checked whether the result of attribute estimation and action identification corresponds to the detection rule set in the detection rule memory 27 (153). If neither result of attribute estimation nor action identification corresponds to the detection rule set in the detection rule memory 27 (No in 154), the real-time detection process is terminated. On the other hand, if the result of attribute estimation and behavior identification corresponds to the detection rule set in the detection rule memory 27 (Yes in 154), a screen for finding the target person is generated and notified to the user (155). At this time, if there is a trajectory related to the discovered person, the person and the object accompanying the person are tracked from the track ID of the trajectory via the person ID and the object ID, and the person at multiple points of time is tracked. It can be presented with the characteristics of

その後、ユーザが通知された映像を見て、対象者を発見したかを判定する（１５６）。なお、リアルタイム検知部２６がステップ１５６における判定を行ってもよい。ユーザが対象者を発見すれば（１５６でＹｅｓ）、発見した人物の特徴を検知対象データベース２２０に登録して（１５７）、リアルタイム検知処理を終了する。この際、該当の検知ルールを検知ルールメモリ２７から削除しもよい。ユーザが対象者を発見した場合（ステップ１５６でＹｅｓ）、ステップ１５７の処理により、対象者の人物画像が検知対象データベース２２０に登録され、次回のリアルタイム検知処理では、同一人物のリアルタイム検知方法が、ステップ１５１、ステップ１６１～１６３に移行する。 The user then views the notified video to determine if the target person has been found (156). Note that the real-time detection unit 26 may make the determination in step 156 . If the user finds the target person (Yes at 156), the features of the found person are registered in the detection target database 220 (157), and the real-time detection process ends. At this time, the relevant detection rule may be deleted from the detection rule memory 27 . If the user finds the target person (Yes in step 156), the person image of the target person is registered in the detection target database 220 by the processing of step 157, and in the next real-time detection process, the real-time detection method for the same person is Go to step 151 and steps 161-163.

図５Ｂは、ユーザによる映像検索処理のフローチャートである。 FIG. 5B is a flowchart of video search processing by the user.

図５Ｂに示す映像検索処理では、ユーザが任意のタイミングで、図７に示す検索画面７００を用いて映像検索処理を実施する（１７１）。ユーザが映像検索結果に対象者を発見した場合（１７２でＹｅｓ）、発見した人物の人物画像特徴を検知対象データベース２２０に登録して（１７３）、映像検索処理を終了する。一方、ユーザが対象者を発見しなれば、検索クエリを検知ルールとして検知ルールメモリ２７に設定する（１７４）。検索クエリから、場所や時間で変化する条件（すなわち、行動に関する属性）を除去し、場所や時間で変化しない条件（例えば、人物の年齢、性別、髪型、髪の色、着用しているアクセサリー、服装の色、服の種類など）を残した検知ルールを生成するとよい。その後、映像検索処理を終了する。 In the video search processing shown in FIG. 5B, the user performs the video search processing using the search screen 700 shown in FIG. 7 at any timing (171). When the user finds a target person in the video search result (Yes in 172), the person image feature of the found person is registered in the detection target database 220 (173), and the video search process ends. On the other hand, if the user does not find the target person, the search query is set in the detection rule memory 27 as a detection rule (174). Remove location- and time-varying terms (i.e., behavioral attributes) from search queries and replace them with location- and time-invariant terms (e.g., a person’s age, gender, hairstyle, hair color, accessories worn, etc.) It is recommended to generate a detection rule that leaves the color of clothing, type of clothing, etc.). After that, the video search processing ends.

図５Ｂに示す映像検索処理では、ユーザは、画面表示された検索結果を見て、目撃情報に該当する人物が存在するか否かを入力する。映像解析システム１００は、ユーザの入力に基づいて検知ルールを生成する。すなわち、目撃情報に該当する人物が発見された場合、その人物の画像特徴を、特徴量データベース２１０と別に設けられる検知対象データベース２２０に登録し、該特徴量を後のリアルタイム検知に利用可能とする。一方、目撃情報に該当する人物が発見されなかった場合、検索クエリから検知ルールを生成し検知ルールメモリ２７に設定する。このため、発見された人物の特徴量を用いて特徴量データベース２１０を検索でき、目撃情報に合致する人物を高精度に発見できる。 In the video search process shown in FIG. 5B, the user looks at the search results displayed on the screen and inputs whether or not there is a person corresponding to the eyewitness information. The video analysis system 100 generates detection rules based on user input. That is, when a person corresponding to eyewitness information is found, the image feature of the person is registered in the detection target database 220 provided separately from the feature amount database 210, and the feature amount can be used for later real-time detection. . On the other hand, if the person corresponding to the eyewitness information is not found, a detection rule is generated from the search query and set in the detection rule memory 27 . Therefore, the feature amount database 210 can be searched using the feature amount of the discovered person, and the person matching the eyewitness information can be found with high accuracy.

図６は、本実施例の映像解析システム１００による処理を示す図である。 FIG. 6 is a diagram showing processing by the video analysis system 100 of this embodiment.

本実施例の映像解析システム１００では、人物追跡部１５が特徴量が近似する人物を追跡し、軌跡毎に一意のトラックＩＤを付与する。 In the video analysis system 100 of the present embodiment, the person tracking unit 15 tracks a person whose feature values are similar to each other, and assigns a unique track ID to each trajectory.

骨格推定部１２は、フレーム（５ｆｐｓ）毎に骨格推定結果から人物を推定し、フレーム毎かつ人物毎に一意の人物ＩＤを付与する。時系列行動認識部１６は、フレーム（５ｆｐｓ）毎の骨格推定結果を複数用いて、人物の行動の種類を識別する。時系列行動認識部１６は、骨格推定部１２と同等のフレーム（５ｆｐｓ）毎に実施されてもよいが、図６では記載の都合上、骨格推定よりも間引いて表現している。 The skeleton estimation unit 12 estimates a person from the skeleton estimation result for each frame (5 fps), and assigns a unique person ID to each frame and person. The time-series action recognition unit 16 uses a plurality of skeleton estimation results for each frame (5 fps) to identify the type of human action. The time-series action recognition unit 16 may be performed every frame (5 fps) equivalent to that of the skeleton estimation unit 12, but in FIG.

人物属性推定部２０は、例えば１ｆｐｓに間引かれたフレーム映像から人物の属性を推定し、人物属性ＩＤを決定する。また、物体認識部２１は、例えば１ｆｐｓに間引かれたフレーム映像から物体の種別を識別し、物体属性ＩＤを決定する。 The human attribute estimation unit 20 estimates the human attribute from the frame video thinned to 1 fps, for example, and determines the human attribute ID. Further, the object recognition unit 21 identifies the type of object from the frame video thinned to 1 fps, for example, and determines the object attribute ID.

このように、映像からの人物の属性や行動を認識する際、行動認識が最も高いフレームレートを必要とする。一方、人物の属性は急に変化する性質ではないため、行動認識に使用されない属性の認識は、低いフレームレートで実行する。 Thus, when recognizing a person's attributes and actions from video, action recognition requires the highest frame rate. On the other hand, since the attributes of a person do not change rapidly, recognition of attributes not used for action recognition is performed at a low frame rate.

図７は、検索画面７００の一例を示す図である。 FIG. 7 is a diagram showing an example of a search screen 700. As shown in FIG.

図７に示す検索画面７００は、検索指示ボタン７１０、カメラ指定欄７２０、日時指定欄７３０、属性指定欄７４０及び検索結果表示領域７５０を含む。 A search screen 700 shown in FIG. 7 includes a search instruction button 710 , a camera designation field 720 , a date and time designation field 730 , an attribute designation field 740 and a search result display area 750 .

検索指示ボタン７１０は、検索クエリをＦＤＢサーバ２００に送信するために操作されるボタンである。カメラ指定欄７２０は、検索対象とする映像を取得したカメラ３００、すなわち検索対象とする位置をプルダウンによって選択する欄である。日時指定欄７３０は、検索対象とする映像の日時の範囲を入力する欄である。属性指定欄７４０は、検索クエリに含める人物の属性、人物の行動、物体の属性の一つ以上の項目を設定するための欄である。検索結果表示領域７５０は、検索結果のフレーム映像を表示する領域である。検索結果として表示されたフレーム映像を選択操作すると、当該フレーム映像の近傍の動画像を再生するとよい。動画像の再生中は、ユーザが対象者を発見したかを入力できるボタンを表示するとよい。 A search instruction button 710 is a button operated to send a search query to the FDB server 200 . The camera designation field 720 is a field for selecting the camera 300 that acquired the video to be searched, that is, the position to be searched by using a pull-down menu. The date and time designation column 730 is a column for inputting a date and time range of videos to be searched. The attribute specification field 740 is a field for setting one or more items of attributes of a person, actions of a person, and attributes of an object to be included in a search query. The search result display area 750 is an area for displaying the frame video of the search result. By performing a selection operation on a frame video displayed as a search result, it is preferable to reproduce a moving image near the frame video. During playback of the moving image, it is preferable to display a button that allows the user to input whether or not the target person has been found.

以上に本発明の実施例である、監視カメラ映像を解析する映像解析システムについて説明したが、工場の製造工程において、作業者による設備の操作行動やワークに対する作業を識別して、特定の行動や動きを識別することも可能である。 The image analysis system for analyzing surveillance camera images, which is an embodiment of the present invention, has been described above. It is also possible to identify motion.

以上に説明したように、本実施例の映像解析システムは、映像中の生物の属性を抽出する生物属性推定部（人物属性推定部２０）と、映像中の物体の属性を抽出する物体認識部２１と、映像中の生物の動きを認識する動き認識部（時系列行動認識部１６）と、前記各部による処理の実行タイミングを制御するためのフレームレート調整部１７～１９とを有し、フレームレート調整部１７～１９は、時系列行動認識部１６が処理する映像のフレームレート（処理の実行間隔）を、人物属性推定部２０及び物体認識部２１の少なくとも一方が処理する映像のフレームレート（処理の実行間隔）より高く設定する。すなわち、特徴量の継続性に着目して、属性推定と行動認識で異なるフレームレートで処理をするので、特徴量データベース２００の構築及びリアルタイム検知の計算コストを低減できる。 As described above, the video analysis system of this embodiment includes a biological attribute estimating unit (human attribute estimating unit 20) that extracts the attributes of living things in the video, and an object recognition unit that extracts the attributes of objects in the video. 21, a motion recognition unit (time-series action recognition unit 16) that recognizes the movement of the creature in the video, and frame rate adjustment units 17 to 19 for controlling the execution timing of the processing by each unit. The rate adjustment units 17 to 19 change the frame rate (execution interval of processing) of the video processed by the time-series action recognition unit 16 to the frame rate of the video processed by at least one of the person attribute estimation unit 20 and the object recognition unit 21 ( process execution interval). That is, since processing is performed at different frame rates for attribute estimation and action recognition, focusing on the continuity of feature amounts, it is possible to reduce the computational cost of constructing the feature amount database 200 and real-time detection.

また、フレームレート調整部１９が調整する処理の実行タイミングを出力する着目時間制御部２３を備え、フレームレート調整部１９は、着目時間制御部２３からの出力に従って時系列行動認識部１６における処理の実行間隔を制御するので、ユーザの要求や用途に応じて、フレームレートを適切に調整できる。 The frame rate adjustment unit 19 also includes a time-of-interest control unit 23 that outputs the execution timing of the process adjusted by the frame rate adjustment unit 19 . Since the execution interval is controlled, the frame rate can be appropriately adjusted according to the user's request and usage.

また、属性及び動きの少なくとも一方をクエリとして特徴量データベース２１０を検索する検索部（クエリ設定部２４）と、人物又は物体の属性に関する特徴量を含む検索クエリが設定される検知ルール記憶部（検知ルールメモリ２７）と、検知ルールメモリ２７に設定された検索クエリが人物属性推定部２０、物体認識部２１、及び時系列行動認識部１６からの出力に合致するかを判定するリアルタイム検知部２６と備えるので、特徴量データベース２１０に対するリアルタイム検知機能と履歴検索機能を実現できる。 In addition, a search unit (query setting unit 24) that searches the feature amount database 210 using at least one of attributes and movements as a query, and a detection rule storage unit (detection a rule memory 27), and a real-time detection unit 26 that determines whether the search query set in the detection rule memory 27 matches the output from the person attribute estimation unit 20, the object recognition unit 21, and the time-series behavior recognition unit 16; Since it is provided, a real-time detection function and a history search function for the feature amount database 210 can be realized.

また、リアルタイム検知部２６は、ユーザが検知クエリとして入力した属性によって得られた検索結果のうちユーザに選択された生物又は物体の特徴量が、人物属性推定部２０、物体認識部２１、及び時系列行動認識部１６からの出力に合致するかを判定するので、目撃情報のクエリを用いた履歴検索によって得られた対象の映像の特徴量をリアルタイム検知ルールとするので、迅速かつ正確にターゲットを発見できる。これによりターゲットの逃走やターゲットによって引き起こされる新たなトラブルを未然に防止できる。 In addition, the real-time detection unit 26 determines that the feature amount of the creature or object selected by the user from among the search results obtained by the attributes input by the user as the detection query is detected by the person attribute estimation unit 20, the object recognition unit 21, and the Since it is determined whether or not the output from the series action recognition unit 16 matches, the feature amount of the target video obtained by the history search using the query of the eyewitness information is used as the real-time detection rule, so the target can be detected quickly and accurately. can be discovered. This can prevent the target from escaping and new troubles caused by the target.

また、リアルタイム検知部２６は、ユーザが検知クエリとして入力した属性によって得られた検索結果のうちユーザに選択されたものがない場合、ユーザが入力した属性の検索クエリに基づいて検知ルールを生成するので、ターゲットがヒットしない場合でもリアルタイム検知を継続できる。 In addition, when there is no search result selected by the user among the search results obtained by the attribute input by the user as the detection query, the real-time detection unit 26 generates a detection rule based on the search query of the attribute input by the user. So real-time detection can continue even if the target is not hit.

また、前記映像中の生物を追跡して軌跡を生成する人物追跡部１５を備え、特徴量データベース２１０は、人物属性推定部２０、物体認識部２１、及び時系列行動認識部１６からの出力を生物識別子（人物ＩＤ）を用いて関連付けて登録し、クエリ設定部２４は、特定の生物の属性又は動きを人物ＩＤで関連付けて特徴量データベース２１０で検索するので、人物ＩＤを経由して間欠データを補完しながら検索結果、検知結果を取得できる。 Also, a person tracking unit 15 that tracks the creature in the image and generates a trajectory is provided. A biological identifier (person ID) is used to associate and register, and the query setting unit 24 searches the feature amount database 210 by associating the attribute or movement of a specific organism with the person ID. Search results and detection results can be obtained while complementing

また、特徴量データベース２１０９は、人物属性推定部２０、物体認識部２１、及び時系列行動認識部１６からの出力を軌跡ＩＤを用いて関連付けて登録し、クエリ設定部２４は、特定の生物の属性又は動きを軌跡ＩＤで関連付けて特徴量データベース２１０を検索するので、軌跡ＩＤを経由して間欠データを補完しながら検索結果、検知結果を取得できる。 Also, the feature amount database 2109 associates and registers the outputs from the person attribute estimation unit 20, the object recognition unit 21, and the time-series action recognition unit 16 using the trajectory ID, and the query setting unit 24 registers Since the feature amount database 210 is searched by associating the attribute or movement with the trajectory ID, the search results and detection results can be obtained while complementing the intermittent data via the trajectory ID.

また、時系列行動認識部１６は、人物の行動における同時多発事象を検出するので、環境の異常を速やかに検知できる。 In addition, since the time-series action recognition unit 16 detects simultaneous multiple events in human actions, it is possible to quickly detect environmental abnormalities.

なお、本発明は前述した実施例に限定されるものではなく、添付した特許請求の範囲の趣旨内における様々な変形例及び同等の構成が含まれる。例えば、前述した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに本発明は限定されない。また、ある実施例の構成の一部を他の実施例の構成に置き換えてもよい。また、ある実施例の構成に他の実施例の構成を加えてもよい。また、各実施例の構成の一部について、他の構成の追加・削除・置換をしてもよい。 It should be noted that the present invention is not limited to the embodiments described above, but includes various modifications and equivalent configurations within the scope of the appended claims. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and the present invention is not necessarily limited to those having all the described configurations. Also, part of the configuration of one embodiment may be replaced with the configuration of another embodiment. Moreover, the configuration of another embodiment may be added to the configuration of one embodiment. Further, additions, deletions, and replacements of other configurations may be made for a part of the configuration of each embodiment.

また、前述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等により、ハードウェアで実現してもよく、プロセッサがそれぞれの機能を実現するプログラムを解釈し実行することにより、ソフトウェアで実現してもよい。 In addition, each configuration, function, processing unit, processing means, etc. described above may be realized by hardware, for example, by designing a part or all of them with an integrated circuit, and the processor realizes each function. It may be realized by software by interpreting and executing a program to execute.

各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶装置、又は、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に格納することができる。 Information such as programs, tables, and files that implement each function can be stored in storage devices such as memories, hard disks, SSDs (Solid State Drives), or recording media such as IC cards, SD cards, and DVDs.

また、制御線や情報線は説明上必要と考えられるものを示しており、実装上必要な全ての制御線や情報線を示しているとは限らない。実際には、ほとんど全ての構成が相互に接続されていると考えてよい。 In addition, the control lines and information lines indicate those considered necessary for explanation, and do not necessarily indicate all the control lines and information lines necessary for mounting. In practice, it can be considered that almost all configurations are interconnected.

１プロセッサ
２メモリ
３補助記憶装置
４通信インターフェース
５入力インターフェース
６キーボード
７マウス
８出力インターフェース
９ディスプレイ装置
１１映像取得部
１２骨格推定部
１３人物矩形抽出部
１４人物特徴抽出部
１５人物追跡部
１６時系列行動認識部
１７フレームレート調整部
１８フレームレート調整部
１９フレームレート調整部
２０人物属性推定部
２１物体認識部
２２ＦＤＢ登録部
２３着目時間制御部
２４クエリ設定部
２５検索結果出力部
２６リアルタイム検知部
２７検知ルールメモリ
１００映像解析システム
２００ＦＤＢサーバ
２１０特徴量データベース
２１１フレームテーブル
２１２人物テーブル
２１３物体テーブル
２１４軌跡テーブル
２２０検知対象データベース
３００カメラ
７００検索画面
７１０検索指示ボタン
７２０カメラ指定欄
７３０日時指定欄
７４０属性指定欄
７５０検索結果表示領域 1 Processor 2 Memory 3 Auxiliary storage device 4 Communication interface 5 Input interface 6 Keyboard 7 Mouse 8 Output interface 9 Display device 11 Image acquisition unit 12 Skeleton estimation unit 13 Person rectangle extraction unit 14 Person feature extraction unit 15 Person tracking unit 16 Time series behavior recognition unit 17 frame rate adjustment unit 18 frame rate adjustment unit 19 frame rate adjustment unit 20 human attribute estimation unit 21 object recognition unit 22 FDB registration unit 23 time-of-interest control unit 24 query setting unit 25 search result output unit 26 real-time detection unit 27 detection Rule memory 100 Video analysis system 200 FDB server 210 Feature amount database 211 Frame table 212 Person table 213 Object table 214 Trajectory table 220 Detection target database 300 Camera 700 Search screen 710 Search instruction button 720 Camera designation field 730 Date and time designation field 740 Attribute designation field 750 Search result display area

Claims

A video analysis system,
A computer comprising an arithmetic unit for executing predetermined processing and a storage device connected to the arithmetic unit,
The computing device is capable of accessing a feature amount database that stores feature amount data of objects in the video,
The video analysis system includes
a bio-attribute estimating unit for extracting the bio-attribute of the living thing in the image;
an object recognition unit in which the arithmetic unit extracts attributes of objects in the image;
a motion recognition unit in which the arithmetic device recognizes the motion of the creature in the image;
The computing device has a frame rate adjustment unit for controlling the execution timing of the processing by each unit,
The frame rate adjustment unit sets the frame rate of the image processed by the motion recognition unit higher than the frame rate of the image processed by at least one of the biological attribute estimation unit and the object recognition unit. analysis system.

The video analysis system according to claim 1,
The video analysis system, wherein the frame rate adjustment unit sets an execution interval of processing in the motion recognition unit higher than an execution interval of processing in at least one of the biological attribute estimation unit and the object recognition unit.

The video analysis system according to claim 1,
A time-of-interest control unit that outputs the execution timing of the process adjusted by the frame rate adjustment unit,
The video analysis system, wherein the frame rate adjustment section controls an execution interval of processing in the motion recognition section in accordance with the output from the time-of-interest control section.

The video analysis system according to claim 1,
a search unit in which the arithmetic device searches the feature amount database using at least one of the attribute and the movement as a query;
a detection rule storage unit in which a search query including a feature amount related to attributes of a person or an object is set;
A video analysis characterized by comprising a real-time detection unit that determines whether a search query set in the detection rule storage unit matches outputs from the biological attribute estimation unit, the object recognition unit, and the motion recognition unit. system.

The video analysis system according to claim 4,
The real-time detection unit is configured such that the feature amount of a creature or an object selected by a user from among search results obtained from attributes input by a user as a detection query is detected by the biological attribute estimation unit, the object recognition unit, and the motion recognition unit. A video analysis system characterized by determining whether the output from the unit matches.

The video analysis system according to claim 4,
The real-time detection unit generates a detection rule based on the search query of the attribute input by the user when there is no search result selected by the user among the search results obtained by the attribute input by the user as the detection query. A video analysis system characterized by

The video analysis system according to claim 4,
The computing device includes a tracking unit that tracks the creature in the image and generates a trajectory,
The feature amount database associates and registers the outputs from the tracking unit, the biological attribute estimation unit, and the motion recognition unit using biological identifiers,
The video analysis system, wherein the search unit searches the feature amount database by associating attributes or movements of a specific living thing with the biological identifier.

The video analysis system according to claim 7,
The feature amount database associates and registers the outputs from the tracking unit, the biological attribute estimation unit, the object recognition unit, and the motion recognition unit using a trajectory identifier,
The video analysis system, wherein the search unit searches the feature amount database by associating the attribute or movement of a specific creature with the trajectory identifier.

The video analysis system according to claim 1,
The video analysis system, wherein the motion recognition unit detects simultaneous multiple events in human behavior.

A video analysis method comprising:
Executed by a computer having an arithmetic unit for executing predetermined processing and a storage device connected to the arithmetic unit,
The computing device is capable of accessing a feature amount database that stores feature amount data of objects in the video,
The video analysis method includes
a bio-attribute estimating procedure in which the computing device extracts the bio-attribute in the image;
an object recognition procedure in which the arithmetic unit extracts an attribute of an object in the video;
a motion recognition procedure in which the arithmetic device recognizes the motion of the creature in the image;
The computing device has a frame rate adjustment procedure for controlling the execution timing of each unit,
In the frame rate adjustment procedure, the frame rate of the video processed in the motion recognition procedure is set higher than the frame rate of the video processed in at least one of the biological attribute estimation procedure and the object recognition procedure. Video analysis method.