JP7478630B2

JP7478630B2 - Video analysis system and video analysis method

Info

Publication number: JP7478630B2
Application number: JP2020154309A
Authority: JP
Inventors: 良起伊藤; 健一森田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2024-05-07
Anticipated expiration: 2040-09-15
Also published as: WO2022059223A1; JP2022048475A

Description

本発明は、監視エリアを撮像した映像から、人物の状態や物体を検知し、検知結果に基づいて、監視対象を検出する映像解析システムおよび映像解析方法に関する。 The present invention relates to a video analysis system and a video analysis method that detects the state of people and objects from video captured in a surveillance area and detects the surveillance target based on the detection results.

近年、コンサート会場やアミューズメント施設などのイベント会場、駅や空港などの公共施設における映像監視の必要性が増大している。例えば、セキュリティエリア内外における荷物の受け渡し行為や物の置き去り行為等に対しては、爆発物や有害液体等の危険物を用いたテロ行為を防ぐために、人物の監視、行為の検出、または行為を行ったもしくは行う予兆のある人物への声掛けといった対応がなされることが保安上要請される。また、人物同士の揉み合い、撮像人物の転倒やうずくまる動作等の早期発見によって、施設管理者は施設内において発生した要救護対象者の迅速な保護を行うことができ、安全の確保に貢献することが可能になる。 In recent years, there has been an increasing need for video surveillance at event venues such as concert halls and amusement facilities, and at public facilities such as stations and airports. For example, in order to prevent terrorist acts using hazardous materials such as explosives or toxic liquids, security requires that actions such as monitoring people, detecting the act, and calling out to people who have committed or are about to commit an act, be taken against the handover of luggage or the abandonment of items inside and outside security areas. In addition, early detection of people struggling with each other, or of a person in the video falling or crouching, allows facility managers to quickly protect people in need of assistance within the facility, contributing to ensuring safety.

例えば、特許文献１に記載の画像監視装置においては、監視領域を撮影した画像から、人体の姿勢に関する情報を算出することで、監視領域における受け渡し行為を検出している。さらに、発生位置と受け渡された物品の種類を用いて監視重要度を算出し、前記監視重要度に応じて受け渡し行為の検出結果を出力している。また、受け渡し行為の検出結果の出力について、「監視センタの監視員」に対する画面表示、警報ランプ、警報音等による通知方法が記載されている。 For example, the image monitoring device described in Patent Document 1 detects handover actions in a monitored area by calculating information about the posture of the human body from an image captured of the monitored area. Furthermore, the monitoring importance is calculated using the location of the action and the type of item being handed over, and the detection result of the handover action is output according to the monitoring importance. Also, the device describes a method of notifying the "monitoring staff at the monitoring center" of the output of the detection result of the handover action by using a screen display, an alarm lamp, an alarm sound, etc.

また、特許文献２に記載の画像処理装置においては、映像を監視する負担を軽減すると共に物事が起こった瞬間を捉え易くすることを目的として、複数の映像配信装置から配信された複数の映像を取得する取得手段と、複数の映像のそれぞれに含まれる複数の物体の位置を検出する物体位置検出手段と、複数の物体の位置における各物体の向きを検出する向き検出手段と、各物体の向きに基づいて複数の映像のそれぞれに優先度を設定する設定手段と、優先度に基づいて複数の映像を表示する表示手段とを有する画像処理装置が開示されている。 In addition, the image processing device described in Patent Document 2 discloses an image processing device that has an acquisition means for acquiring multiple videos distributed from multiple video distribution devices, an object position detection means for detecting the positions of multiple objects contained in each of the multiple videos, an orientation detection means for detecting the orientation of each object at the position of the multiple objects, a setting means for setting a priority for each of the multiple videos based on the orientation of each object, and a display means for displaying the multiple videos based on the priority, with the aim of reducing the burden of monitoring the videos and making it easier to capture the moment something happens.

特開２０１７－０２８５６１号公報JP 2017-028561 A 特開２０１７－０１７４４１号広報JP2017-017441Publication

上述の従来技術を用いると、本来は監視重要度が高くない人物に対しても過剰な発報が行われるケースが発生するため、監視業務に関わる作業員は発報された事象に対応しなければならず業務負担が増大する。また、事象検出後に結果を出力する際に、監視重要度の高い行為者が多いと、これらの行為者について映像解析システムによる人物追跡や行動認識等を行う必要が生じるため、人数の増加による処理負荷の増大を引き起こす。そこで、本発明では、監視エリア内でインタラクションを行った複数の人物ごとに監視重要度を設定し、監視重要度の高い人物を精度良く絞り込むことで、監視における対応者の業務負荷の軽減およびシステムの処理負荷の低減が可能な映像解析システムを提供することを目的とする。 When using the above-mentioned conventional technology, there are cases where excessive alerts are issued for people who are not actually of high monitoring importance, and workers involved in monitoring work have to respond to the alerted events, increasing their workload. Furthermore, when outputting the results after detecting an event, if there are many actors with high monitoring importance, it becomes necessary to use the video analysis system to track these actors and recognize their behavior, etc., which increases the processing load due to the increase in the number of people. Therefore, the objective of this invention is to provide a video analysis system that can reduce the workload of responders in monitoring and reduce the processing load of the system by setting a monitoring importance for each of multiple people who interact within a monitoring area and accurately narrowing down the people with high monitoring importance.

本発明の一態様としての映像解析システムは、監視領域を撮影した映像を用いて、前記監視領域における事象を検出する映像解析システムにおいて、前記映像に基づいて、複数の人物の関与により生じる事象であるインタラクションを検出し、前記インタラクションの種類と、前記複数の人物の各々が前記インタラクションにおいて他の人物とどのように関わったかを示すインタラクションの方向とを出力するインタラクション検出部と、前記インタラクションの種類及び方向と、予め設定された監視基準情報と、を比較して、前記インタラクションに関与した複数の人物について人物毎の監視重要度を判定する監視重要度判定部と、前記監視重要度に基づいて、前記事象の検出結果を出力する出力制御部と、を有する映像解析システムを提供する。
また、本発明は、監視領域を撮影した映像を用いて、前記監視領域における事象を検出する映像解析方法において、前記映像に基づいて、複数の人物の関与により生じる事象であるインタラクションを検出し、前記インタラクションの種類と、前記複数の人物の各々が前記インタラクションにおいて他の人物とどのように関わったかを示すインタラクションの方向とを出力するインタラクション検出ステップと、前記インタラクションの種類及び方向と、予め設定された監視基準情報と、を比較して、前記インタラクションに関与した複数の人物について人物毎の監視重要度を判定する監視重要度判定ステップと、前記監視重要度に基づいて、前記事象に検出結果を出力する出力制御ステップと、を含む映像解析方法を提供する。 One aspect of the present invention provides a video analysis system that detects events in a monitored area using video captured of the area, and includes an interaction detection unit that detects interactions, which are events that arise with the involvement of multiple people, based on the video, and outputs the type of interaction and the direction of interaction indicating how each of the multiple people involved in the interaction with the other people; a monitoring importance determination unit that compares the type and direction of the interaction with preset monitoring standard information to determine the monitoring importance of each of the multiple people involved in the interaction; and an output control unit that outputs the detection results of the event based on the monitoring importance.
The present invention also provides a video analysis method for detecting an event in a monitored area using video captured of the area, the video analysis method including an interaction detection step of detecting an interaction, which is an event occurring with the involvement of a plurality of persons, based on the video, and outputting a type of the interaction and a direction of the interaction indicating how each of the plurality of persons was involved with other persons in the interaction, a monitoring importance determination step of comparing the type and direction of the interaction with preset monitoring standard information to determine the monitoring importance of each of the plurality of persons involved in the interaction, and an output control step of outputting a detection result for the event based on the monitoring importance.

本発明によれば、監視エリア内でインタラクションを行った複数の人物ごとに監視重要度を設定し、監視における対応者の業務負荷の軽減およびシステムの処理負荷の低減を実現することができる。 According to the present invention, it is possible to set a monitoring importance level for each of multiple people who interact within a monitoring area, thereby reducing the workload of the person in charge of monitoring and the processing load of the system.

本実施の形態における映像監視システムの説明図である。1 is an explanatory diagram of a video monitoring system according to an embodiment of the present invention; 本実施の形態における映像監視システムの全体構成を示した図である。1 is a diagram showing an overall configuration of a video monitoring system according to an embodiment of the present invention. 本実施の形態における映像解析システムのブロック図を示した図である。FIG. 1 is a block diagram of a video analysis system according to an embodiment of the present invention. 本実施の形態における映像解析システムのフローチャートを示した図である。FIG. 2 is a diagram showing a flowchart of the video analysis system according to the present embodiment. 本実施の形態における監視基準情報のデータ構造を示した図である。4] FIG. 4 is a diagram showing a data structure of monitoring criteria information in the present embodiment. 本実施の形態における監視基準情報の設定画面例を示した図である。11 is a diagram showing an example of a setting screen for monitoring criteria information in the present embodiment. FIG. 本実施の形態における監視基準情報の設定画面例を示した図である。11 is a diagram showing an example of a setting screen for monitoring criteria information in the present embodiment. FIG. 本実施の形態における映像表示部の表示例を示した図である。4A to 4C are diagrams showing examples of displays on a video display unit in the present embodiment. 本実施の形態における映像表示部の表示例を示した図である。4A to 4C are diagrams showing examples of displays on a video display unit in the present embodiment. 本実施の形態における検索部の表示例を示した図である。FIG. 11 is a diagram showing a display example of a search unit in the present embodiment. 本実施の形態における映像表示部の表示例を示した図である。4A to 4C are diagrams showing examples of displays on a video display unit in the present embodiment. 本実施の形態における映像監視システムのハードウェア構成を示した図である。1 is a diagram showing a hardware configuration of a video monitoring system according to an embodiment of the present invention.

以下、本発明にかかる映像監視システムの実施の形態について説明する。本実施の形態では、イベント会場、駅や空港などの公共施設におけるテロ行為や危険行為の早期発見を目的として、監視エリア内における物品の受け渡し行為やもみ合い行為等の複数人物による連携行動、すなわちインタラクションの検出を行うものである。本発明によると、インタラクションを行った各人物について監視重要度を設定し、対応すべき事象に優先順位を設定可能になるため、施設内の監視員や現場スタッフが効率的かつ迅速に事象への対応を行うことを促進する。また、前記スタッフの人員が要対応者よりも少ない場合において、重要度の高い人物を取り逃すリスクを低減可能である。さらに、監視重要度の高い人物から事象発生後の人物追跡や行動認識等の処理を行うことが可能になるため、限られたコンピュータリソースを適切に配分可能とする。 The following describes an embodiment of the video surveillance system according to the present invention. In this embodiment, the system detects coordinated actions by multiple people, such as the handing over of goods or scuffles, in a surveillance area, for the purpose of early detection of terrorist acts or dangerous acts in public facilities such as event venues, stations, and airports. According to the present invention, it is possible to set a surveillance importance for each person who has interacted and to set priorities for events that need to be dealt with, which promotes efficient and rapid response by surveillance personnel and on-site staff in the facility to events. In addition, when the number of staff is fewer than the number of people who need to be dealt with, it is possible to reduce the risk of missing people with high importance. Furthermore, it is possible to carry out processing such as person tracking and behavior recognition after an event occurs starting from people with high surveillance importance, which allows limited computer resources to be appropriately allocated.

なお、本実施の形態における「事象」とは、ある監視領域において検出対象として事前設定された状況である。特に本実施の形態では、複数の人物が関与して生じる事象であるインタラクションを検出対象とする。例えば、握手、荷物の受け渡し、もみ合い、暴行などの行動がインタラクションに含まれる。以下、図面を用いて、実施例を説明する。 In this embodiment, an "event" refers to a situation that is preset as a detection target in a certain monitoring area. In particular, in this embodiment, the detection target is an interaction, which is an event that occurs with the involvement of multiple people. For example, interactions include actions such as shaking hands, handing over luggage, fighting, and assault. Below, an example will be explained using the drawings.

図１は、本実施の形態における映像監視システムの説明図である。図１に示すように、映像監視システム１は、撮影システム２、映像解析システム３、監視センタシステム４に大別される。撮影システム２は、監視対象エリアに設置されたカメラ部によって構成される。また、映像解析システム３では、撮像装置からの入力映像を解析することで、検出対象である人物間のインタラクションおよび前記人物の属性を判定し、さらに、発生位置の情報とインタラクションの方向に関する情報を、事前設定された監視基準情報に照らし合わせることで、各人物の監視重要度を判定する。また、監視センタシステム４では、映像解析システム３からの解析結果を受け取り、監視員や現場スタッフへの効果的な表示や、インタラクションや人物に関する事象発生後の検索を行う。 Figure 1 is an explanatory diagram of a video surveillance system in this embodiment. As shown in Figure 1, the video surveillance system 1 is broadly divided into a filming system 2, a video analysis system 3, and a surveillance center system 4. The filming system 2 is composed of a camera unit installed in the area to be monitored. The video analysis system 3 analyzes the input video from the imaging device to determine interactions between people to be detected and the attributes of the people, and further determines the surveillance importance of each person by comparing information on the occurrence location and information on the direction of the interaction with preset surveillance criteria information. The surveillance center system 4 receives the analysis results from the video analysis system 3, and effectively displays them to surveillance staff and on-site staff, and searches for events related to interactions and people after they occur.

ここで、インタラクションの方向とは、インタラクションに係る所定の行動がどの人物からどの人物に行われたかを示すものである。例えば、受け渡しであれば、物品を渡した人物（受け渡しの実行者）から受け取った人物（受け渡しの被実行者）にインタラクションの方向を設定する。同様に、暴行であれば、加害者（暴行の実行者）から被害者（暴行の被実行者）にインタラクションの方向を設定する。このように、インタラクションの方向は、インタラクションの種類ごとに定まるものである。また、握手やもみ合いのように、双方向で行われるインタラクションも存在する。
また、人物の属性とは、一般人か警備員か、年齢、性別などである。
映像監視システム１は、インタラクションの方向やインタラクションに関与した人物の属性を用いることで、人物ごとに監視重要度を設定し、監視における対応者の業務負荷の軽減およびシステムの処理負荷の低減を実現している。 Here, the direction of interaction indicates which person performed a specific action related to the interaction to which person. For example, in the case of a handover, the direction of the interaction is set from the person who handed over the item (the person who performed the handover) to the person who received it (the person who received the handover). Similarly, in the case of an assault, the direction of the interaction is set from the assailant (the person who performed the assault) to the victim (the person who received the assault). In this way, the direction of the interaction is determined for each type of interaction. There are also interactions that are two-way, such as handshakes and scuffles.
The attributes of a person include whether the person is a regular person or a security guard, their age, and their gender.
The video surveillance system 1 uses the direction of interaction and the attributes of the people involved in the interaction to set the monitoring importance for each person, thereby reducing the workload of the surveillance personnel and the processing load of the system.

この点について、具体例を挙げて説明する。発生位置と受け渡された物品の種類を用いて人物の監視重要度を算出する算出法によると、受け渡しと受け取りを行った複数の人物の監視重要度の軽重は考慮されない。しかし、例えば物品が当該監視エリアにおける要注意物品であったとすると、前記物品を受け渡された人物の監視重要度が、受け渡した人物よりも高く設定されることが望ましい。または、監視エリア内に異なるセキュリティレベルのエリアがある場合、セキュリティレベルの低いエリアから高いエリアへの受け渡しと、その逆方向の受け渡しでは、監視重要度は異なって判定されることが望ましい。 This point will be explained with a concrete example. According to the calculation method that calculates the monitoring importance of a person using the location of occurrence and the type of the delivered item, the degree of monitoring importance of the multiple people who performed the delivery and receipt is not taken into consideration. However, for example, if the item is a suspicious item in the monitored area, it is desirable to set the monitoring importance of the person to whom the item is delivered higher than that of the person who delivered it. Alternatively, if there are areas with different security levels within the monitored area, it is desirable to determine the monitoring importance differently for a delivery from an area with a low security level to an area with a high security level and a delivery in the opposite direction.

また、例えば、受け渡しを行った人物が警備員をはじめとする監視エリア内の保安維持業務の従事者（保安要員）であれば、監視重要度は高く設定されるべきではない。しかし、このような属性を考慮しなければ、真に監視すべき受け渡しを行った人物と保安要員の監視重要度は同等に判定される。この方法では、システムが提示する重要度に差がない複数の人物に対応が必要となり、監視員などの負担が増大する。さらに、現場スタッフや警備員等の対応にあたる人員が、要対応者の人数よりも少ない場合、重要な対象者を取り逃す恐れがある。
以上から、映像監視システム１は、インタラクションの種別およびその発生位置のみならず、行動の方向と属性に関する情報を用いることで正確な監視重要度の判定を実現しているのである。 Also, for example, if the person who made the handover is a security guard or other security personnel in charge of maintaining security in the monitored area, the monitoring importance should not be set high. However, if such attributes are not taken into consideration, the monitoring importance of the person who made the handover and the security personnel who should really be monitored will be judged to be equal. With this method, it becomes necessary to respond to multiple people whose importance is the same as that presented by the system, which increases the burden on the monitors. Furthermore, if the number of personnel who respond to the handover, such as on-site staff and security guards, is smaller than the number of people who need to be responded to, there is a risk that important targets will be missed.
As described above, the video surveillance system 1 realizes accurate judgment of the importance of surveillance by using information on the direction and attributes of an action, in addition to the type of interaction and the location where it occurred.

以下、撮影システム２、映像解析システム３、および監視センタシステム４について具体的に説明する。
図２は、本実施の形態における映像監視システムの全体構成を示した図である。撮影システム２は、監視対象エリアに設置された一つまたは複数のカメラ部２１から構成され、撮像された映像は、映像解析システム３の映像入力部３１へ順次入力される。カメラ部２１は、監視を対象とするエリア全体を撮像できるように配置された監視カメラである。なお、インタラクションの検出のためのエリア設定を必要としない場合には、監視カメラは固定されていない移動型のカメラでもよく、監視対象のエリアを撮像できていれば、形式は問わない。一方、エリア設定を要する場合には、壁や支柱等へ固定された監視カメラを使用し、キャリブレーション設定が事前になされていることが望ましい。また、そのような場合は、パン・チルト・ズーム（ＰＴＺ）操作が不可能な固定カメラの利用が想定されるが、それらの設定とキャリブレーション設定の組み合わせが事前に調整されている場合、ＰＴＺ操作が可能なカメラを利用してもよく、同一のカメラで様々なエリアを監視してもよい。 The imaging system 2, the video analysis system 3, and the monitoring center system 4 will be specifically described below.
FIG. 2 is a diagram showing the overall configuration of the video monitoring system in this embodiment. The shooting system 2 is composed of one or more camera units 21 installed in the area to be monitored, and the captured images are sequentially input to the video input unit 31 of the video analysis system 3. The camera unit 21 is a monitoring camera arranged so as to capture the entire area to be monitored. In addition, if area setting for detecting an interaction is not required, the monitoring camera may be a mobile camera that is not fixed, and the type does not matter as long as it can capture the area to be monitored. On the other hand, if area setting is required, it is desirable to use a monitoring camera fixed to a wall, a pillar, etc., and to perform calibration setting in advance. In such a case, it is assumed to use a fixed camera that cannot be operated in pan-tilt-zoom (PTZ), but if the combination of those settings and calibration settings is adjusted in advance, a camera capable of PTZ operation may be used, and various areas may be monitored with the same camera.

また、カメラ部２１と映像入力部３１は、有線通信手段または無線通信手段によって接続され、カメラ部２１から映像入力部３１へフレーム画像を連続送信する。インタラクション認識が複数のフレーム画像の入力を前提とする時系列データ分析モデルである場合、フレーム画像の連続送信のフレームレートは、インタラクション認識の要求値以上であることが望ましい。一方、フレームレートが要求値より低下した場合に生じるインタラクション認識の精度低下が許容できる場合、フレームレートは要求値を下回っても良い。この場合、インタラクション認識において、時系列データの内挿または外挿による補間など、精度低下を抑制する処理を行っても良い。また、カメラ部２１と映像解析システム３は一対一の対応ではなくてもよく、複数のカメラ部と一つの映像解析システムとして用いてもよい。このような多重プロセスの実行の場合においても、各プロセスが必要とするフレームレートは前述の制約に準ずる。なお、カメラ部２１は、後述する映像解析システムの一部または全部の機能を搭載してもよい。 The camera unit 21 and the video input unit 31 are connected by wired communication means or wireless communication means, and the camera unit 21 continuously transmits frame images to the video input unit 31. When the interaction recognition is a time series data analysis model that assumes the input of multiple frame images, it is desirable that the frame rate of the continuous transmission of frame images is equal to or higher than the required value for the interaction recognition. On the other hand, if the decrease in accuracy of the interaction recognition that occurs when the frame rate falls below the required value is acceptable, the frame rate may be lower than the required value. In this case, in the interaction recognition, a process for suppressing the decrease in accuracy, such as interpolation by interpolation or extrapolation of the time series data, may be performed. Furthermore, the camera unit 21 and the video analysis system 3 do not need to correspond one-to-one, and multiple camera units may be used as one video analysis system. Even in the case of execution of such multiple processes, the frame rate required by each process conforms to the above-mentioned constraints. The camera unit 21 may be equipped with some or all of the functions of the video analysis system described later.

映像解析システム３は、映像入力部３１、映像処理部３２、記憶部３３から構成される。映像入力部３１は、カメラ部２１から映像の入力を受け付け、映像処理部３２へ映像データを送信する。なお、解析対象とする映像は、カメラ部２１から直接入力された映像ではなく、別途保存されたレコーダ内の映像でもよい。映像の保存場所は問わない。映像処理部３２は、後述する記憶部３３に保存された監視基準情報を読み込み、映像入力部３１より入力された映像を解析することで、インタラクションを行った各個人の監視重要度を判定する機能を有する。記憶部３３は、後述する管理制御部４３において設定された監視基準情報を保存する。監視基準情報は、映像処理部３２の出力となる監視重要度の判定に用いる。また、本実施の形態では、映像解析システム３は運用施設内のサーバへ構築するようなオンプレミス型のシステムに限定されるものではなく、クラウドサービスを活用するなど施設外部サーバに構築されてもよい。 The video analysis system 3 is composed of a video input unit 31, a video processing unit 32, and a storage unit 33. The video input unit 31 accepts video input from the camera unit 21 and transmits video data to the video processing unit 32. The video to be analyzed may not be video directly input from the camera unit 21, but may be video stored separately in a recorder. The location where the video is stored does not matter. The video processing unit 32 has a function of reading the monitoring criteria information stored in the storage unit 33 described later and analyzing the video input from the video input unit 31 to determine the monitoring importance of each individual who has interacted. The storage unit 33 stores the monitoring criteria information set in the management control unit 43 described later. The monitoring criteria information is used to determine the monitoring importance output from the video processing unit 32. In this embodiment, the video analysis system 3 is not limited to an on-premise system constructed on a server in an operation facility, but may be constructed on a server outside the facility, for example, by utilizing a cloud service.

監視センタシステム４は、記録部４１と映像表示部４２と管理制御部４３と検索部４４から構成される。記録部４１は、映像解析システム３による映像解析によって得られた発生インタラクション、インタラクションの方向、人物属性、発生エリア、発生時刻等の情報をデータベースとして保持する機能を有する。映像表示部４２では、インタラクションを行った人物の現在時刻における行動や、インタラクション検出時のフレームの一部または全部に関する情報を、監視重要度に従って表示する。管理制御部４３では、映像処理部３２で用いる監視基準情報を保存するため、監視員や現場スタッフ等によって記憶部３３へ設定情報を入力する機能を有する。検索部４４は、人物の属性やインタラクション種別をクエリとして、記録部４１に保存された情報から該当する人物を検索する機能を有し、該当人物の現在時刻における位置とそれまでの施設内の移動軌跡等の情報を調べる機能を有する。 The monitoring center system 4 is composed of a recording unit 41, a video display unit 42, a management control unit 43, and a search unit 44. The recording unit 41 has a function of holding information such as the interaction, the direction of the interaction, the person's attributes, the area where the interaction occurred, and the time of the interaction obtained by the video analysis by the video analysis system 3 as a database. The video display unit 42 displays information on the current behavior of the person who performed the interaction and information on some or all of the frames when the interaction was detected according to the monitoring importance. The management control unit 43 has a function of inputting setting information into the memory unit 33 by a monitor or on-site staff, etc., in order to save the monitoring criteria information used by the video processing unit 32. The search unit 44 has a function of searching for the relevant person from the information stored in the recording unit 41 using the person's attributes and interaction type as a query, and has a function of investigating information such as the relevant person's position at the current time and the movement trajectory within the facility up to that point.

図１２は、本実施の形態における映像監視システムのハードウェア構成図である。図１２では、カメラユニット１１０２がネットワークを介してコンピュータ１１０３に接続されている。さらに、コンピュータ１１０３は、ネットワークを介してコンピュータ１１０４と通信可能である。 Figure 12 is a hardware configuration diagram of a video monitoring system in this embodiment. In Figure 12, a camera unit 1102 is connected to a computer 1103 via a network. Furthermore, computer 1103 can communicate with computer 1104 via the network.

カメラユニット１１０２は、監視領域に１つ又は複数設置され、映像データをコンピュータ１１０３に適宜送信する。コンピュータ１１０３は、演算制御装置としてのＣＰＵ（Central Processing Unit）、主記憶装置としてのＲＡＭ（Random access memory）、補助記憶装置としてのＨＤＤ（hard disk drive）を備える。コンピュータ１１０３は、各種プログラムをＨＤＤから読み出してＲＡＭに展開し、ＣＰＵによって実行することで、映像解析システム３としての機能を実現する。また、コンピュータ１１０３は、所定の通信インタフェース（ＩＦ）を介してカメラユニット１１０２及びコンピュータ１１０４と通信する。なお、図示を省略したが、キーボードやディスプレイなどの入出力装置も所定のＩＦを介してコンピュータ１１０３に接続される。 One or more camera units 1102 are installed in the monitoring area, and transmit video data to the computer 1103 as appropriate. The computer 1103 has a CPU (Central Processing Unit) as an arithmetic and control device, a RAM (Random access memory) as a main storage device, and a HDD (hard disk drive) as an auxiliary storage device. The computer 1103 realizes the functions of the video analysis system 3 by reading various programs from the HDD, expanding them in the RAM, and executing them with the CPU. The computer 1103 also communicates with the camera unit 1102 and the computer 1104 via a specified communication interface (IF). Although not shown in the figure, input/output devices such as a keyboard and a display are also connected to the computer 1103 via a specified IF.

コンピュータ１１０４は、演算制御装置としてのＣＰＵ、主記憶装置としてのＲＡＭ、補助記憶装置としてのＨＤＤを備え、各種プログラムをＨＤＤから読み出してＲＡＭに展開し、ＣＰＵによって実行することで、監視センタシステム４としての機能を実現する。また、コンピュータ１１０４は、所定のインタフェース（ＩＦ）を介してコンピュータ１１０３、キーボードやディスプレイなどの入出力装置と接続される。 Computer 1104 is equipped with a CPU as an arithmetic and control device, a RAM as a main memory device, and a HDD as an auxiliary memory device, and realizes the functions of monitoring center system 4 by reading various programs from the HDD, expanding them into the RAM, and executing them with the CPU. Computer 1104 is also connected to computer 1103 and input/output devices such as a keyboard and a display via a specified interface (IF).

次に、図３を参照して、映像解析システム３の詳細を説明する。図３は、本実施の形態における映像解析システムのブロック図を示した図である。以下、映像解析システム３を構成する映像入力部３１、映像処理部３２、記憶部３３について説明する。 Next, the video analysis system 3 will be described in detail with reference to FIG. 3. FIG. 3 is a block diagram of the video analysis system according to the present embodiment. Below, the video input unit 31, video processing unit 32, and storage unit 33 that constitute the video analysis system 3 will be described.

映像入力部３１は、一つまたは複数のカメラ部２１から映像を順次受け付け、後段の映像処理部３２へ映像を出力する。ただし、映像処理部３２が時系列情報を扱わない場合、入力は画像であってもよい。 The video input unit 31 sequentially receives video from one or more camera units 21 and outputs the video to the downstream video processing unit 32. However, if the video processing unit 32 does not handle time series information, the input may be an image.

映像処理部３２は、算出部３２１、インタラクション検出部３２２、監視重要度判定部３２３、そして出力制御部３２４から構成される。
算出部３２１は、さらに人物検出部３２１１と属性判定部３２１２から構成される。 The video processing unit 32 is made up of a calculation unit 321 , an interaction detection unit 322 , a monitoring importance determination unit 323 , and an output control unit 324 .
The calculation unit 321 further includes a person detection unit 3211 and an attribute determination unit 3212 .

人物検出部３２１１は、前記映像入力部から受け付けた画像または映像を用いて、現フレームの静止画中から人物を検出する。人物検出の手段には、Ｈａａｒ－ｌｉｋｅ特徴の利用やＲ－ＣＮＮ（ＲｅｇｉｏｎｓｗｉｔｈＣＮＮ）等の利用によって判定する手段や、骨格推定手段を用いて人物ごとに推定された骨格座標群から推定領域を判定する手段などがあり、本実施の形態ではその手段を問わない。また、人物検出部３２１１は人物検出ののちに人物追跡を行う。人物追跡では、ある人物の矩形画像と、その人物へ割り当てられた人物ＩＤとが、前後フレームで対応付けられていればよく、テンプレートマッチングやオプティカルフロー等、一般的な人物追跡手法を用いればよい。 The person detection unit 3211 detects people from the still image of the current frame using the image or video received from the video input unit. Means of person detection include a means for determining using Haar-like features or R-CNN (Regions with CNN), or a means for determining an estimated area from a group of skeleton coordinates estimated for each person using a skeleton estimation means, and in this embodiment, any means is acceptable. In addition, the person detection unit 3211 tracks people after detecting them. In person tracking, it is sufficient that a rectangular image of a person and a person ID assigned to that person are associated in previous and subsequent frames, and a general person tracking method such as template matching or optical flow can be used.

次に、属性判定部３２１２へ前記人物検出部で得た人物の矩形画像を入力し、人物の属性を判定する。人物の属性を用いることによって、各個人の監視重要度の判定に寄与する情報として活用することができる。さらに、前述した検索部４４におけるクエリとして属性を用いることができる。属性の例として、施設内の警備員やスタッフ、一般の施設利用者、年齢や性別等が挙げられる。施設内の警備員やスタッフと一般の施設利用者を判別できると、警備員やスタッフが起こしたインタラクションは職務の範囲で行った行動であることが想定されるため、監視重要度を設定せず、余分な発報を生じさせないという効果が期待できる。また、一般の施設利用者の年齢や性別の推定により、例えば、仮に特定の年齢層の人物が要注意行動を発生させやすいという事前の統計情報がある場合、当該特定の年齢層に対する監視重要度を高めに設定することで効果的な映像監視が可能になる。さらに、アミューズメント施設やイベント施設等においては、前記施設利用者のうち事前登録された要注意人物や出入禁止対象人物等との照合を行うことも効果的である。属性の推定方法には、人物の矩形画像をＨＯＧ（ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）、ＳＩＦＴ（Ｓｃａｌｅ－ＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ）、または学習済みディープラーニングモデルのネットワークの中間層から出力されるベクトル等の画像特徴量へ変換し、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）や決定木等の識別機を学習させる手段や、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）に基づく手法によってエンドトゥエンドに判定する手段などがある。これらの判定手段は、一段目として警備員や現場スタッフと一般来場者を分類する識別機、二段目として一般来場者のうち年齢や性別を判定する識別機として構築されてもよく、または同一の識別機において学習されてもよい。さらに、人物の所有物品の判定手段を別途用いることで、禁止物品や危険物品を所持していると判定された人物に対しては、それを表現する属性を付与することも、適切な属性表現のために効果的である。 Next, the rectangular image of the person obtained by the person detection unit is input to the attribute determination unit 3212, and the attributes of the person are determined. By using the attributes of the person, it can be used as information that contributes to the determination of the monitoring importance of each individual. Furthermore, attributes can be used as a query in the search unit 44 described above. Examples of attributes include security guards and staff in the facility, general facility users, age, gender, etc. If it is possible to distinguish between security guards and staff in the facility and general facility users, it is assumed that the interactions caused by security guards and staff are actions performed within the scope of their duties, so that it is expected that there is no need to set a monitoring importance level and no unnecessary alerts will be generated. In addition, by estimating the age and gender of general facility users, for example, if there is prior statistical information that indicates that people of a certain age group are likely to engage in suspicious behavior, it is possible to effectively monitor the video by setting the monitoring importance level for that specific age group higher. Furthermore, in amusement facilities, event facilities, etc., it is also effective to compare the facility users with pre-registered suspicious people or people who are prohibited from entering the facility. Attribute estimation methods include a method of converting a rectangular image of a person into image features such as HOG (Histograms of Oriented Gradients), SIFT (Scale-Invariant Feature Transform), or a vector output from an intermediate layer of a network of a trained deep learning model, and training a classifier such as SVM (Support Vector Machine) or a decision tree, or a method of end-to-end determination using a method based on CNN (Convolutional Neural Network). These determination means may be constructed as a classifier that classifies security guards, on-site staff, and general visitors as the first stage, and a classifier that determines the age and gender of general visitors as the second stage, or may be trained in the same classifier. Furthermore, by using a separate means for determining what items a person possesses, it would be effective to give an attribute that indicates that a person is carrying prohibited or dangerous items in order to appropriately express the attributes.

インタラクション検出部３２２は、前記人物検出部３２１１から得た情報を用いて、インタラクションの有無、種別、およびその方向を判定する。判定方法には、任意の人物ペアについて、前述のように画像特徴量を用いる手段や、骨格情報を用いる手段がある。骨格情報を用いる場合、前記人物骨格検知手段による骨格の推定結果から算出される人物の姿勢を表す特徴量や、ある人物ペアの任意の骨格点間の相対距離から算出される特徴量、または、前後の画像フレームから骨格の単位時間当たりの移動量や相対距離の変化量を表現する時系列の特徴量を用いてもよい。さらに、属性判定部３２１２から得た属性情報を特徴量として用いてもよく、例えば、年齢や性別を表現する特徴量を用いてもよい。または、人物に着目するのみならず、人物が所持する物品を表現する特徴量を用いてもよい。例えば、ある人物による所有判定がなされた物品が、一定時間後に別の人物の所有判定となった場合、所有判定が切り替わった時点において、受け渡し行為が発生したと解釈することができる。これらの特徴量は、単独で用いられるだけではなく、複合的に用いられてもよい。例えば、姿勢に関する特徴量のみを用いてインタラクションを判定すると、遠距離に位置する人物間であってもインタラクションを誤検知する可能性があるが、相対距離に関する特徴量も併用することで、誤検知数を低減可能である。以上のように、特徴量を単独または複合的に用いることで、効果的なインタラクション検出を行うことができる。 The interaction detection unit 322 uses the information obtained from the person detection unit 3211 to determine the presence or absence of an interaction, its type, and its direction. The determination method includes a means using image features as described above for any person pair, or a means using skeletal information. When skeletal information is used, a feature representing the posture of a person calculated from the skeleton estimation result by the person skeleton detection means, a feature calculated from the relative distance between any skeleton points of a certain person pair, or a time-series feature representing the movement amount of the skeleton per unit time or the change in relative distance from previous and next image frames may be used. Furthermore, attribute information obtained from the attribute determination unit 3212 may be used as a feature, and for example, a feature representing age or gender may be used. Alternatively, in addition to focusing on a person, a feature representing an item held by a person may be used. For example, when an item that has been determined to be owned by a certain person is determined to be owned by another person after a certain time, it can be interpreted that a handover action has occurred at the time when the ownership determination was switched. These features may be used not only alone but also in combination. For example, if interactions are determined using only posture-related features, there is a possibility that interactions between people located far away may be falsely detected, but by also using features related to relative distance, the number of false positives can be reduced. As described above, effective interaction detection can be achieved by using features alone or in combination.

画像情報を用いる場合は、ＣＮＮに基づく手法を用いる手段が挙げられる。または、骨格情報を用いる場合は、姿勢、相対距離、または属性等を表す特徴量により、ＳＶＭ、決定木、またはＬＳＴＭ（ＬｏｎｇＳｈｏｒｔ－ＴｅｒｍＭｅｍｏｒｙ）等の識別機を学習させる手段が挙げられる。 When image information is used, a method based on CNN can be used. When skeletal information is used, a method of training a classifier such as SVM, decision tree, or LSTM (Long Short-Term Memory) can be used using features representing posture, relative distance, attributes, etc.

監視重要度判定部３２３は、属性判定部３２１２から得られた人物の属性、インタラクション検出部３２２から得られたインタラクションの種別と方向、そして本実施例の形態では人物検出部３２１１から得られる人物が位置するエリア情報も入力とし、監視基準情報３３１に設定された情報と照合することで、インタラクションを行った個人ごとの監視重要度を設定する。エリア情報の算出は、エリアの事前設定情報と人物矩形の情報とを照合して判定することができる。例えば、あるＰＴＺ設定のカメラにおいて画像座標上でエリア設定を行えば、人物の足元の推定位置がどのエリア上に位置するかによって、人物が位置するエリアを判定することができる。 The monitoring importance determination unit 323 inputs the attributes of the person obtained from the attribute determination unit 3212, the type and direction of the interaction obtained from the interaction detection unit 322, and in this embodiment, the area information in which the person is located obtained from the person detection unit 3211, and sets the monitoring importance for each individual who has interacted by comparing it with the information set in the monitoring criteria information 331. The area information can be calculated by comparing the pre-set information of the area with the information of the person rectangle. For example, if an area is set on the image coordinates in a camera with a certain PTZ setting, the area in which the person is located can be determined depending on which area the estimated position of the person's feet is located in.

出力制御部３２４は、監視重要度判定部３２３で判定された個人ごとの監視重要度を、監視センタシステム４へ送信する。監視重要度が算出された事象全てを送信してもよく、または監視重要度が高いもののみが送信されるように、閾値を事前設定してもよい。 The output control unit 324 transmits the monitoring importance for each individual determined by the monitoring importance determination unit 323 to the monitoring center system 4. All events for which the monitoring importance has been calculated may be transmitted, or a threshold may be preset so that only events with a high monitoring importance are transmitted.

記憶部３３では、監視重要度判定部３２３で用いるための監視基準情報３３１を記憶する。監視基準情報３３１は、インタラクション種別ごとに設定されたインタラクションセキュリティレベル、属性種別ごとに設定された属性セキュリティレベル、エリア種別ごとに設定されたエリアセキュリティレベルの三種のセキュリティレベル設定情報を有する。さらに、それらのセキュリティレベル設定それぞれの重み情報およびインタラクション種別ごとに設定される実行者または被実行者の重み情報を有する。監視基準情報３３１は、管理制御部４３から設定することができる。 The memory unit 33 stores monitoring criteria information 331 for use by the monitoring importance determination unit 323. The monitoring criteria information 331 has three types of security level setting information: an interaction security level set for each interaction type, an attribute security level set for each attribute type, and an area security level set for each area type. It also has weight information for each of these security level settings and weight information for the performer or performed party set for each interaction type. The monitoring criteria information 331 can be set from the management control unit 43.

次に、図４に示すフローチャートを参照して、本実施の形態における映像解析システムの処理の流れを説明する。
ステップＳ１において撮影システムから映像解析システムへ映像が入力されると、ステップＳ２において人物検出が行われる。
次に、ステップＳ３で人数計測を行う。もし画面内に２名以上が検出された場合には、ステップＳ４へ移行し、１名以下のみの検出であった場合には、ステップＳ４以降の処理は行わずに次フレームの入力を待機して、ステップＳ１へ戻る。なお、インタラクション検出を所望または許容するエリアと所望または許容しないエリアが画面内に混在している場合、計算量削減のために、ステップ２の直前に監視エリアに対して部分的にマスク処理を行ってもよい。 Next, the flow of processing in the video analysis system according to the present embodiment will be described with reference to the flowchart shown in FIG.
When an image is input from the imaging system to the image analysis system in step S1, human detection is performed in step S2.
Next, in step S3, the number of people is counted. If two or more people are detected on the screen, the process proceeds to step S4, and if only one or less people are detected, the process waits for the input of the next frame without performing the processes in and after step S4, and returns to step S1. Note that if the screen includes a mixture of areas where interaction detection is desired or permitted and areas where it is not desired or permitted, a partial mask process may be performed on the monitoring area immediately before step 2 in order to reduce the amount of calculation.

ステップＳ４ではインタラクション判定を行う。判定は、画面内の任意の人物ペアに対して行われるが、計算量削減のためにインタラクション種別ごとに一定の距離以上の人物ペアに対しては判定処理を行わないことが好適である。例えば、受け渡し行為の検出においては、明らかに互いに手の届かない距離に位置している二名の人物間に対して受け渡しの有無を判定する必要はない。行動判定の前に人物間の相対距離の判定を行う場合、相対距離を世界座標系で算出する必要がある。そのため、事前に設定されたエリア情報を用いるか、または、ステレオカメラや単眼カメラによる深度推定技術等を用いて、事前設定を行うことなく、世界座標系における人物の位置を推定し、人物間の相対距離を算出する。さらに、別途設定テーブルを用意し、一括して判定用閾値を設定する。例えば、閾値を「３ｍ」と設定した場合、世界座標系における相対距離が「３ｍ」という前記閾値を超える人物ペアに対しては行動の判定処理を行わない。もし識別機が多種のインタラクションの認識に対応した多クラス分類器ではなく、インタラクション種別ごとに学習された二クラス分類器である場合、前記閾値の設定はインタラクション種別ごとに行うこともできる。また、エリア境界を横断するインタラクションを検出する際には、同一のエリア内に位置する人物間でインタラクション判定を行わず、異なるエリアに位置する人物間でのインタラクションを判定することも計算量削減および誤検知低減のために好適である。 In step S4, an interaction judgment is performed. The judgment is performed for any person pair in the screen, but in order to reduce the amount of calculation, it is preferable not to perform the judgment process for person pairs that are more than a certain distance apart for each interaction type. For example, in detecting a handover action, it is not necessary to judge whether or not there is a handover between two people who are clearly out of reach of each other. When judging the relative distance between people before the behavior judgment, it is necessary to calculate the relative distance in the world coordinate system. Therefore, by using area information set in advance, or by using depth estimation technology using a stereo camera or a monocular camera, the position of the person in the world coordinate system is estimated without presetting, and the relative distance between the people is calculated. Furthermore, a separate setting table is prepared, and a judgment threshold is set collectively. For example, if the threshold is set to "3m", the behavior judgment process is not performed for person pairs whose relative distance in the world coordinate system exceeds the threshold of "3m". If the classifier is not a multi-class classifier that corresponds to the recognition of various types of interactions, but a two-class classifier that is trained for each interaction type, the threshold can also be set for each interaction type. In addition, when detecting interactions that cross area boundaries, it is preferable to determine interactions between people located in different areas rather than between people located in the same area, in order to reduce the amount of calculations and false positives.

インタラクション判定の結果、インタラクションが行われた、すなわち事象発生と判定された場合、ステップＳ５の分岐によりステップＳ６以降の処理へ移行する。一方で、事象が発生していないと判定された場合、ステップＳ６以降の処理は行わず、次フレームの入力を待機してステップＳ１へ戻る。 If the interaction determination determines that an interaction has occurred, i.e., that an event has occurred, the process branches off at step S5 and proceeds to step S6 and subsequent steps. On the other hand, if it determines that no event has occurred, the process does not proceed to step S6 and subsequent steps, and the process waits for the input of the next frame and returns to step S1.

ステップＳ６からステップＳ８では、事象を発生させた人物ごとに属性算出を行う。次に、ステップＳ９からステップＳ１１では、検出事象ごとに監視重要度判定を行う。判定された監視重要度についてステップＳ１２で出力制御を行ったのち、次フレームの入力を待機してステップＳ１へ戻る。 In steps S6 to S8, attributes are calculated for each person who caused the event. Next, in steps S9 to S11, a monitoring importance determination is made for each detected event. After performing output control in step S12 for the determined monitoring importance, the process waits for the input of the next frame and returns to step S1.

なお、本図中に示したフローの処理は、必ずしも単一のプロセスで処理される必要はなく、計算効率向上のために、複数のプロセスを用いて非同期に処理されてもよい。 Note that the processing of the flow shown in this diagram does not necessarily have to be performed by a single process, and may be performed asynchronously using multiple processes to improve computational efficiency.

次に、図５を参照して、本実施の形態における監視基準情報の設定例を示す。本実施例の形態における監視基準情報は、表５１のセキュリティレベル設定情報、表５２のセキュリティレベル設定対象への重み、そして表５３のインタラクション種別ごとの実行者重みから構成される。表５１および表５２から、インタラクション種別、人物属性、発生エリアを考慮した、個人ごとの重み付き点数を算出し、さらに、前記重み付き点数と表５３の実行者重みを用いて監視重要度を算出する。これらの情報は管理制御部４３から設定され、監視基準情報３３１へ保存され、監視重要度判定部３２３で読み込まれる。以下、各表の設定内容と効果について詳細に説明する。 Next, referring to FIG. 5, an example of the monitoring criteria information settings in this embodiment is shown. The monitoring criteria information in this embodiment is composed of security level setting information in Table 51, weights for security level setting targets in Table 52, and performer weights for each interaction type in Table 53. From Tables 51 and 52, a weighted score for each individual is calculated taking into account the interaction type, person attributes, and occurrence area, and the monitoring importance is calculated using the weighted score and the performer weight in Table 53. This information is set by the management control unit 43, saved in the monitoring criteria information 331, and read by the monitoring importance determination unit 323. The setting contents and effects of each table are explained in detail below.

表５１は、インタラクション種別ごとのセキュリティレベルを設定する表５１１、属性種別ごとのセキュリティレベルを設定する表５１２、エリア種別ごとのセキュリティレベルを設定する表５１３という三種のセキュリティレベルの設定テーブルから構成される。本実施例の形態では、セキュリティレベルは３点から０点の４段階で設定されるものとし、点数の降順に「レベル高」、「レベル中」、「レベル低」、「レベルなし」とする。レベル高は最も注意レベルの高い対象を示し、レベルなしは注意を要さない対象を示す。
表５１１に示すインタラクションセキュリティレベルの設定では、例えば「受け渡し」は１点、「暴行」は３点のように、インタラクション種別ごとの重要度が設定されている。同様に、表５１２に示す属性セキュリティレベルの設定では、「スタッフ」は０点、「出入禁止対象者」は３点のように設定され、表５１３に示すエリアセキュリティレベルの設定では、「入場ゲート内」を３点、「売店」を２点のように設定されている。 Table 51 is made up of three types of security level setting tables: table 511 for setting security levels for each interaction type, table 512 for setting security levels for each attribute type, and table 513 for setting security levels for each area type. In this embodiment, security levels are set in four stages from 3 points to 0 points, and are classified in descending order of points as "high level,""mediumlevel,""lowlevel," and "none level." A high level indicates an object requiring the highest level of attention, and no level indicates an object requiring no attention.
In the interaction security level settings shown in Table 511, the importance of each interaction type is set, for example, "handover" is set to 1 point and "assault" is set to 3 points. Similarly, in the attribute security level settings shown in Table 512, "staff" is set to 0 points and "prohibited persons" is set to 3 points, and in the area security level settings shown in Table 513, "inside the entrance gate" is set to 3 points and "concession stand" is set to 2 points.

各セキュリティレベルとも、設定対象外の項目は明示的にレベル０として設定され、例えば、インタラクションセキュリティレベルの設定欄においては、「握手」や「ハグ」がレベル０として設定されている。本実施の形態では、表５１１、表５１２、および表５１３の各項目について３点から０点の４段階の点数としたが、階級数は本実施の形態に限定されるものではなく、設定者が自由に設定できることが望ましい。 For each security level, items that are not subject to setting are explicitly set as level 0; for example, in the interaction security level setting field, "handshake" and "hug" are set as level 0. In this embodiment, each item in tables 511, 512, and 513 is given a four-level score from 3 points to 0 points, but the number of levels is not limited to this embodiment, and it is desirable for the setting person to be able to set it freely.

表５２は、三種のセキュリティレベルの設定対象それぞれに対する重みを格納する設定テーブルである。表５２の例では、三種の重みの総和を１００％とし、インタラクションが３０％、属性が２０％、エリアが５０％と設定されている。本実施の形態においては、これらの重みを用いて表５１の括弧内の点数から重み付き点数を計算し、個人ごとの監視重要度の算出に利用する。例えば、「受け渡し」行為を「一般／青年」が「入場ゲート内」で行った場合、「一般／青年」へ付与される重み付き点数は、表５２によって１×０．３＋２×０．２＋３×０．５＝２．２点と計算される。三種のセキュリティレベルの設定対象それぞれに対して重みを設定可能にすることで、施設ごとに異なる需要に柔軟に対応可能になる。例えば、行われたインタラクション種別と人物属性のみを重視するがエリアは問わないという需要があった場合、インタラクションを７０％、属性を３０％、エリアを０％のように設定することができる。また、一つ以上の設定対象で０点となった場合、その個人の重み付き点数を０点としてもよい。例えば、属性が「警備員」と判定された人物がいずれかのインタラクションを行った場合、職務の範囲において行ったインタラクションであることが考えられ、「警備員」を監視対象とすることは一般に不適切と考えられるためである。上式は重み付き点数の算出式の一例であり、異なる算出式を用いてもよい。 Table 52 is a setting table that stores weights for each of the three security level setting targets. In the example of Table 52, the sum of the three weights is set to 100%, with interaction set to 30%, attribute set to 20%, and area set to 50%. In this embodiment, these weights are used to calculate weighted scores from the scores in parentheses in Table 51, and are used to calculate the monitoring importance for each individual. For example, if a "handover" action is performed by "general/young person" "inside the entrance gate", the weighted score given to "general/young person" is calculated as 1 x 0.3 + 2 x 0.2 + 3 x 0.5 = 2.2 points according to Table 52. By making it possible to set weights for each of the three security level setting targets, it becomes possible to flexibly respond to different demands for each facility. For example, if there is a demand to emphasize only the type of interaction and person attributes performed but not the area, it is possible to set the interaction to 70%, the attribute to 30%, and the area to 0%. Also, if one or more of the set targets receive a score of 0, the weighted score for that individual may be set to 0. For example, if a person whose attribute is determined to be "security guard" engages in any interaction, it is likely that the interaction was within the scope of their job duties, and it is generally considered inappropriate to target "security guards" as targets for monitoring. The above formula is an example of a formula for calculating the weighted score, and a different formula may be used.

表５３は、表５１および表５２から算出された重み付き点数に対して、インタラクションの方向を考慮し、監視重要度を計算するための設定テーブルである。表５３を参照すると、例えば「受け渡し」行為は実行者、すなわち物品を受け渡した人物に対する重みは２０％と設定される。一方で、被実行者、すなわち物品を受け渡された人物に対する重みは８０％と設定される。この例のように、実行者の方が重みが少なくなる設定は、実行者よりも被実行者の重要度が高いことを意味する。「受け渡し」行為においては、物品を手放した人物よりも、その物品を受け取った人物の重要度が高いことが想定される。また、双方向のインタラクションとなる「もみ合い」行為は実行者および被実行者の重みが等価となるよう、表５３においては５０％と設定されている。そして、実行者である加害者と被実行者である被害者の方向性が明らかな「暴行」行為に関しては、実行者の重みが９０％となるように設定されている。重み付き点数から監視重要度を算出する例として、「受け渡し」行為を「一般／青年」が「入場ゲート内」で行った場合に「一般／青年」へ付与される重み付き点数は、前述したように表５１および表５２から２．２点となるが、「受け渡し」行為の実行者重みが２０％と設定されている場合、監視重要度は２．２×０．２＝０．４４となる。 Table 53 is a setting table for calculating the monitoring importance level by considering the direction of the interaction for the weighted scores calculated from Tables 51 and 52. Referring to Table 53, for example, for the "handover" action, the weight for the perpetrator, i.e., the person who hands over the item, is set to 20%. On the other hand, the weight for the victim, i.e., the person to whom the item is handed over, is set to 80%. As in this example, a setting in which the weight for the perpetrator is lower means that the victim is more important than the perpetrator. In the "handover" action, it is assumed that the person who receives the item is more important than the person who gives it away. In addition, the "scuffle" action, which is a two-way interaction, is set to 50% in Table 53 so that the weights for the perpetrator and victim are equal. And for the "assault" action, where the direction of the perpetrator, the assailant, and the victim, the victim, is clear, the weight for the perpetrator is set to 90%. As an example of calculating the monitoring importance from the weighted score, if a "general/young person" performs a "handover" action "inside the entrance gate", the weighted score given to the "general/young person" is 2.2 points according to Tables 51 and 52 as described above, but if the weight of the person who performs the "handover" action is set to 20%, the monitoring importance will be 2.2 x 0.2 = 0.44.

監視重要度の大きい事象を発生させた人物が、続けて監視重要度の小さい事象を発生させた際に、監視重要度の値が上書きされ、低く見積もられてしまうことを避ける必要がある。そのため、複数のインタラクションを発生させた人物は、その人物に対する監視重要度の値がリセットされるまで、複数の事象で算出された監視重要度が加算される、または大きい事象の値が継続して採用され続けることが望ましい。 When a person who has caused an event with high monitoring importance subsequently causes an event with low monitoring importance, it is necessary to avoid the monitoring importance value being overwritten and estimated low. Therefore, for a person who has caused multiple interactions, it is desirable to have the monitoring importance calculated for multiple events added together, or to continue to use the value of the large event, until the monitoring importance value for that person is reset.

また、同一人物が関与した複数のインタラクションの種別と方向を総合的に用いて、当該人物の監視重要度を求めたり、対応を異ならせることも可能である。例えば、歩行者同士がぶつかるという事象を種別「歩行者衝突」のインタラクションとして設定し、ぶつかった側を実行者、ぶつかられた側を被実行者として方向を規定したケースを考える。ある人物が関与したインタラクションの実績を参照したときに、「歩行者衝突」に高い頻度で関与しており、常に実行者側であった場合、この人物は故意にぶつかっている可能性が高いと考えることができる。このケースでは、監視重要度を高く設定し、同様の「歩行者衝突」に関与したときに即座に保安要員を派遣するという対応をとればよい。一方、「歩行者衝突」に高い頻度で関与しているが、方向が一定ではない（実行者側と被実行者側が同程度）ならば、体調不良などの可能性が考えられる。このケースでは、監視重要度を高く設定し、「うずくまる」などの挙動がみられたときに救護要員を派遣するという対応が望ましい。 It is also possible to determine the monitoring importance of a person or to differentiate responses by using the type and direction of multiple interactions involving the same person in a comprehensive manner. For example, consider a case where an event in which pedestrians collide with each other is set as an interaction of type "pedestrian collision", and the person who collided is set as the actor and the person who was hit is set as the victim, and the direction is specified. When referring to the records of interactions in which a certain person was involved, if the person was frequently involved in "pedestrian collisions" and was always on the actor side, it is highly likely that this person intentionally collided. In this case, the monitoring importance can be set high, and security personnel can be dispatched immediately when the person is involved in a similar "pedestrian collision". On the other hand, if the person is frequently involved in "pedestrian collisions" but the direction is not constant (the actor and victim sides are about the same), it is possible that the person is in poor health. In this case, it is desirable to set the monitoring importance high and dispatch rescue personnel when behavior such as "crouching" is observed.

以上によって計算された監視重要度は出力制御部３２４へ送信される。出力制御部では、監視センタシステム４へ送信する事象数を抑制する目的で、監視重要度に閾値を設けることができる。例えば、閾値を２．０と設定すれば、監視重要度１．５と算出された監視対象者は送信されず、３．０と算出された個人は送信される。なお、本実施の形態では示していないが、現場スタッフや警備員など、発報対象者を監視重要度の点数に応じて指定してもよい。 The monitoring importance calculated as above is transmitted to the output control unit 324. The output control unit can set a threshold for the monitoring importance in order to reduce the number of events transmitted to the monitoring center system 4. For example, if the threshold is set to 2.0, monitoring targets calculated to have a monitoring importance of 1.5 will not be transmitted, and individuals calculated to have a monitoring importance of 3.0 will be transmitted. Although not shown in this embodiment, targets for alerts, such as on-site staff or security guards, may be specified according to their monitoring importance scores.

次に、図６および図７を参照して、図５の監視基準情報を設定するためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）を説明する。図６および図７は、本実施の形態における監視基準情報の設定画面例を示した図である。また、図６は表５１の表を作成するための設定画面であり、図７は表５２および表５３の表を作成するための設定画面である。 Next, a GUI (Graphical User Interface) for setting the monitoring criteria information of FIG. 5 will be described with reference to FIG. 6 and FIG. 7. FIG. 6 and FIG. 7 are diagrams showing examples of setting screens for monitoring criteria information in this embodiment. FIG. 6 is a setting screen for creating Table 51, and FIG. 7 is a setting screen for creating Tables 52 and 53.

図６は、インタラクション種別、人物の属性、そしてエリアに関するセキュリティレベルを設定するＧＵＩであり、以下では特に領域６１１に示すインタラクションセキュリティレベルの設定について説明する。領域６１１におけるインタラクションセキュリティレベルの段階数は１点から３点の３段階となっているが、前述の通り、段階数は本実施の形態に限定されるものではなく、設定者が自由に設定できることが望ましい。また、セキュリティレベルの大きさも同様に設定者側が自由に設定できることが望ましい。さらに、セキュリティレベルを設定しないインタラクションについては、重要度を０点として明示できるようにしてもよい。設定者は、領域６１１１で示すプルダウン欄を押下し、セキュリティレベルの列ごとに、リストから登録されているインタラクションを選択することができる。選択したインタラクション種別を領域６１１２の「追加」ボタンを押下し、下段の「登録済みのインタラクション」リストへ追加する。追加後、登録済みのインタラクションを削除するためには、削除を所望するインタラクションに対応する領域６１１３のチェックボックスを押下し、領域６１１４を押下する。領域６１１５「設定を保存」ボタンを押下することで、本領域で設定された情報は監視基準情報３３１へ反映される。本領域で設定された情報を確認するためには、領域６１１６を押下すればよい。領域６１２に示す属性セキュリティレベルの設定および領域６１３に示すエリアセキュリティレベルの設定も、同様に行うことができる。ただし、エリア種別は、システムによって選択肢が固定されているインタラクション種別および属性種別と異なり、システムが想定していないエリアにも対応できるように、エリア種別自体の追加が別途可能である。 Figure 6 shows a GUI for setting the interaction type, person attributes, and area-related security levels. The following describes the setting of the interaction security level shown in area 611. The number of interaction security levels in area 611 is three, from 1 point to 3 points, but as described above, the number of levels is not limited to this embodiment, and it is preferable that the setter can freely set it. Similarly, it is preferable that the setter can freely set the magnitude of the security level. Furthermore, for interactions for which a security level is not set, the importance may be clearly indicated as 0 points. The setter can press the pull-down field shown in area 6111 and select a registered interaction from the list for each security level column. The selected interaction type is added to the "Registered Interaction" list in the lower row by pressing the "Add" button in area 6112. To delete a registered interaction after adding it, the check box in area 6113 corresponding to the interaction to be deleted is pressed, and area 6114 is pressed. By pressing the "Save Settings" button in area 6115, the information set in this area is reflected in the monitoring criteria information 331. To check the information set in this area, simply press area 6116. The attribute security level shown in area 612 and the area security level shown in area 613 can also be set in a similar manner. However, unlike interaction types and attribute types, whose options are fixed by the system, area types themselves can be added separately to accommodate areas not anticipated by the system.

図７における領域６２は、セキュリティレベル設定対象の重みを設定するためのＧＵＩであり、本領域の設定で表５２を設定する。領域６２１では、インタラクション、属性、およびエリアに関する重みを百分率で設定する。領域６２２「設定を保存」ボタンを押下することで、本領域で設定された情報は監視基準情報３３１へ反映される。本領域で設定された情報を確認するためには、領域６２３を押下すればよい。 Area 62 in FIG. 7 is a GUI for setting the weights of the security level setting targets, and table 52 is set by the settings in this area. In area 621, weights for interactions, attributes, and areas are set in percentages. By pressing the "Save Settings" button in area 622, the information set in this area is reflected in the monitoring criteria information 331. To check the information set in this area, simply press area 623.

領域６３はインタラクション種別ごとの実行者重みを設定するためのＧＵＩであり、本領域の設定で表５３を設定する。領域６３１では、インタラクション種別ごとに実行者側の重みを百分率で設定する。「実行者重み」の入力によって被実行者側に対する重みが自動計算され、表示されるようにしてもよい。または、入力の対象は被実行者側でもよい。領域６３１で設定が要求されるインタラクションは、領域６１１で登録され、かつ０点より大きい点数が付与されている全ての行動であり、領域６１１５の押下による登録と同時に対応する行が自動追加される。領域６３２「設定を保存」ボタンを押下することで、本領域で設定された情報は監視基準情報３３１へ反映される。本領域で設定された情報を確認するためには、領域６３３を押下すればよい。ただし、領域６１１で登録されたにもかかわらず、領域６３で実行者側重みが登録されていないという不整合を回避するため、本設定画面全体の終了時に、登録事項の不備に関して注意を促す表示を出力するなどの実装を行うことが望ましい。 Area 63 is a GUI for setting the executor weight for each interaction type, and this area is used to set Table 53. In area 631, the executor weight for each interaction type is set as a percentage. The weight for the executed party may be automatically calculated and displayed by inputting the "executor weight". Alternatively, the input may be made on the executed party side. The interactions for which setting is requested in area 631 are all actions that are registered in area 611 and have a score greater than 0 points, and the corresponding row is automatically added at the same time as the registration by pressing area 6115. By pressing the "Save Settings" button in area 632, the information set in this area is reflected in the monitoring criteria information 331. To check the information set in this area, simply press area 633. However, in order to avoid an inconsistency in which the executor weight is not registered in area 63 despite the fact that it has been registered in area 611, it is desirable to implement an implementation such as outputting a display to call attention to any errors in the registered items when the entire setting screen is closed.

次に、図８を参照して、監視員に対する検出事象の通知画面例について説明する。図８は、本実施の形態における映像表示部４２における表示例を示した図であり、領域７は出力画面を示す。領域７が通知画面の全体に表示されてもよいし、通知画面の一部に表示されてもよい。 Next, an example of a notification screen for a detected event to a monitor will be described with reference to FIG. 8. FIG. 8 is a diagram showing an example of a display on the video display unit 42 in this embodiment, where area 7 indicates an output screen. Area 7 may be displayed on the entire notification screen, or may be displayed on a part of the notification screen.

領域７のうち、画面Ａ（領域７１）、画面Ｂ（領域７２）、画面Ｃ（領域７３）、画面Ｄ（領域７４）に表示されている人物は、監視重要度が設定され、監視センタシステム４へ送信された人物である。本実施の形態では、これらの全人物について現在時刻における映像を表示する例を示している。本表示で用いる映像は、事象を発生させた人物の追跡に基づくリアルタイムの映像である。人物追跡は前述の通り人物検出部３２１１が行い、前記映像は監視重要度に関する情報とともに映像表示部４２へ送信される。また、領域７５は、各画面に表示されている人物が監視重要度順に示され、検出事象、発生場所、発生時刻、および現在位置が示されている。監視重要度順として表示されている列は、監視重要度判定部３２３で出力された実際の値を表示してもよい。 The people displayed in area 7 on screen A (area 71), screen B (area 72), screen C (area 73), and screen D (area 74) are people for whom monitoring importance has been set and transmitted to the monitoring center system 4. In this embodiment, an example is shown in which images of all these people are displayed at the current time. The images used in this display are real-time images based on tracking of the person who caused the event. Person tracking is performed by the person detection unit 3211 as described above, and the images are transmitted to the image display unit 42 together with information on the monitoring importance. In addition, area 75 shows the people displayed on each screen in order of monitoring importance, and shows the detected event, the location of occurrence, the time of occurrence, and the current location. The column displayed in order of monitoring importance may display the actual values output by the monitoring importance determination unit 323.

本実施例の形態では、画面ＡからＤの領域サイズは判定された監視重要度に応じて動的に変更される。例えば、全人物の監視重要度がリセットされ、いずれの人物も画面に表示されていない時、時刻「０９：２０：００」に「受け渡し」行為を行った人物ａは最も大きい画面で表示される。次に、時刻「０９：３０：５０」に「暴行」行為を行った人物ｂが、前記「受け渡し」行為を行った人物よりも監視重要度が大きいと判定された場合、人物ｂの表示領域は、人物ａの表示領域よりも大きくなる。なお、画面に表示されている人物は、同一画面に撮影される他の人物との識別を容易にするため、人物の検出枠の重畳または画像のトリミングなどの画像処理を施して表示することが望ましい。また、画面サイズの制約上、複数の事象が発生している場合には、画面をスクロールできるようにしてもよい。 In this embodiment, the area sizes of screens A to D are dynamically changed according to the determined monitoring importance. For example, when the monitoring importance of all people is reset and none of the people are displayed on the screen, person a who performed the "handover" act at time "09:20:00" is displayed on the largest screen. Next, if person b who performed the "assault" act at time "09:30:50" is determined to have a higher monitoring importance than the person who performed the "handover" act, the display area of person b becomes larger than the display area of person a. Note that it is desirable to display the people displayed on the screen after applying image processing such as superimposing a detection frame of the person or cropping the image in order to make it easier to distinguish them from other people photographed on the same screen. Also, due to the constraints of the screen size, if multiple events are occurring, the screen may be scrolled.

以上に述べたようなリアルタイムの追跡映像のみならず、表示画面や領域７５の行を選択することにより、事象発生時の様子を切り替えられることが望ましい。図９は、本実施の形態における画像表示部の表示例を示した図であり、図８における領域７５の一事象を選択した様子を示している。図９では、画面Ｂ（領域７６）が選択され、追跡されている人物が発生させたインタラクションについて検出時のフレームを表示している。インタラクションは一定の時間の幅をもって行われるため、インタラクションの開始と終了のうち、最も判定確度が高いフレームを表示することが望ましい。または、インタラクションの開始から終了までの短時間のクリップを再生できるようにしてもよい。以上により、監視員は、事象発生時にどのような状況でインタラクションが行われたのか把握することができる。さらに、以上のようにして確認された事象に対してスタッフによる対応が完了した場合、対応が不要と判断された場合、または誤検知が明らかな場合等は、表示画面や領域７５の行を選択し、削除することができる。 It is desirable to be able to switch the state of an event by selecting a line in the display screen or area 75, in addition to the real-time tracking video as described above. FIG. 9 is a diagram showing an example of the display of the image display unit in this embodiment, and shows the state in which one event in area 75 in FIG. 8 is selected. In FIG. 9, screen B (area 76) is selected, and the frame at the time of detection of an interaction caused by a person being tracked is displayed. Since an interaction is performed over a certain time span, it is desirable to display the frame with the highest accuracy of judgment between the start and end of the interaction. Alternatively, a short clip from the start to the end of the interaction may be made playable. In this way, the monitor can understand the situation under which the interaction was performed when the event occurred. Furthermore, when the staff has completed their response to the event confirmed in the above manner, when it is determined that no response is necessary, or when a false detection is obvious, the display screen or the line in area 75 can be selected and deleted.

また、本画面を確認することができるのは、監視センタの監視員のような、大型のディスプレイの利用が想定される発報対象者に限らず、現場で対応するスタッフや警備員であっても、スマートフォン端末やタブレット端末、またはＡＲゴーグル等を利用することによって、領域７の一部または全部を現場において確認することができる。 In addition, the screen can be viewed not only by the subject of an alert, such as a monitor at a monitoring center, who is expected to use a large display, but also by staff or security guards responding at the scene, who can view part or all of area 7 at the scene by using a smartphone, tablet, AR goggles, etc.

次に、図１０を参照して、インタラクション種別や属性等を利用した検索手段について説明する。映像表示部４２において監視員や現場スタッフ等による対応が完了した事象や、一旦対応の必要なしと判断された事象は、表示出力から削除されるため、その後事象を確認する場合には、発生事象に関する情報のデータベースである記録部４１からの事象の検索手段が必要となる。 Next, referring to FIG. 10, a search means using interaction type, attributes, etc. will be described. Events that have been dealt with by monitors or on-site staff in the video display unit 42, or events that have been determined to no longer require a response, are deleted from the display output. Therefore, to check the events thereafter, a search means for the events from the recording unit 41, which is a database of information on the events that have occurred, is required.

図１０は、本実施の形態における検索部の表示例を示した図であり、領域９は検索画面全体を示す。領域９のうち、領域９１は検索の入力画面を示し、領域９２は出力画面を示す。
領域９１において、領域９１１では、インタラクション種別、属性、および発生時刻をクエリとした事象の絞り込みを行う。具体的に、領域９１１の各検索項目列についてプルダウン欄を押下し、リストから登録されている項目を選択する。選択した項目は「追加」ボタンの押下により、下段の「登録済みの項目」リストへ追加される。追加後、登録済みの項目を削除するためには、削除を所望する項目のチェックボックスを押下し、領域９１２を押下する。なお、本実施例の形態においては、インタラクション種別、属性、および発生時刻をクエリとした検索の例を示しているが、インタラクションの方向やエリア情報などをクエリとした検索を行ってもよい。また、各列は必ずしも入力が要求されるものではなく、空欄であってもよい。例えば、全項目を空欄にして検索を実行すると、記憶部４１に保存されている全情報が検索結果として出力される。 10 is a diagram showing an example of the display of the search section in this embodiment, in which an area 9 shows the entire search screen. Within the area 9, an area 91 shows a search input screen, and an area 92 shows an output screen.
In the area 91, the area 911 narrows down the events using the interaction type, attribute, and occurrence time as queries. Specifically, the pull-down column for each search item column in the area 911 is pressed, and a registered item is selected from the list. The selected item is added to the "registered items" list in the lower section by pressing the "Add" button. To delete a registered item after adding it, the check box of the item to be deleted is pressed, and the area 912 is pressed. Note that in the embodiment, an example of a search using the interaction type, attribute, and occurrence time as queries is shown, but a search using the direction of the interaction, area information, etc. as queries may also be performed. Also, input is not necessarily required for each column, and the columns may be left blank. For example, when a search is executed with all items left blank, all information stored in the storage unit 41 is output as a search result.

領域９２において、領域９２１では検索結果に関して、現在場所や発生時刻等の情報が表示される。領域９２１に示すように、現在時刻においても監視エリア内での滞在が確認され、追跡が可能であるならば、現在位置に関する情報を表示してもよい。領域９２２では、領域９２１の情報に対応するフレーム画像またはインタラクションの開始から終了までの短時間のクリップ映像等を表示する。以上、領域９２１と領域９２２をまとめて一件の検索結果とし、本実施例の形態においては、スクロールによって検索結果全体を確認することができる。検索結果は、領域９２２に示す画像または映像のみによるグリッド状に切り替えられるようにしてもよい。 In area 92, area 921 displays information about the search results, such as the current location and the time of occurrence. As shown in area 921, if the person is confirmed to be staying within the monitoring area at the current time and tracking is possible, information about the current location may be displayed. Area 922 displays a frame image corresponding to the information in area 921 or a short clip video from the start to the end of an interaction. As described above, areas 921 and 922 are combined into one search result, and in this embodiment, the entire search result can be confirmed by scrolling. The search results may be switched to a grid of only the images or videos shown in area 922.

検索部４４の利用によって、映像表示部４２による表示出力から削除された事象であったとしても、記録部４１から効率的に事象を検索することができる。また、類似事例の検索やその発生件数を確認することができるため、今後発生することが予想される事象への対応策や防止策を講じるために役立てることができる。 By using the search unit 44, events can be efficiently searched for in the recording unit 41, even if the event has been deleted from the display output by the video display unit 42. In addition, since it is possible to search for similar cases and check the number of occurrences, this can be useful in taking measures to deal with and prevent events that are expected to occur in the future.

以上に説明したように、映像監視システム１は、監視映像からインタラクションを検出し、前記インタラクションの種別および方向、ならびに人物の属性とエリア情報を用いて、個人ごとに監視重要度を判定する。本発明によれば、監視エリア内でインタラクションを行った人物について全員に等価な監視重要度を与えるのではなく、個人ごとに監視重要度を設定するため、監視員が対応に優先順位を設定することが容易になり、行為者への効率的な対応を行うことができる。 As described above, the video surveillance system 1 detects interactions from surveillance video and determines the surveillance importance for each individual using the type and direction of the interaction, as well as the person's attributes and area information. According to the present invention, instead of assigning an equal surveillance importance to all people who have interacted within a surveillance area, a surveillance importance is set for each individual, making it easier for surveillance personnel to prioritize responses and enabling efficient responses to perpetrators.

以下、本発明である映像監視システム１の別の実施形態について説明する。なお、上述した実施の形態と共通する発明については説明を省略し、本実施の形態における特有の処理について説明する。 Below, we will explain another embodiment of the video surveillance system 1 of the present invention. Note that we will omit the description of the invention that is common to the above-mentioned embodiment, and will explain the processing that is unique to this embodiment.

前述した実施例における映像表示部４２における表示例では、現在時刻において監視重要度が設定された全事象について、監視重要度順に従って表示を行っている。一方で、ある一事象のみに着目した表示方法も考えられる。 In the display example on the video display unit 42 in the embodiment described above, all events for which a monitoring importance level has been set at the current time are displayed in order of monitoring importance. However, a display method that focuses on only one particular event may also be considered.

図１１は、本実施の形態における映像表示部４２によるの表示例を示した図である。監視重要度が設定された二名間のインタラクションについて、各人物の情報を詳細に示したものである。領域８１はインタラクションが検出されたフレーム画像を示している。前記フレーム画像には、人物８１１と人物８１２の間における受け渡し行為の様子が示されている。また、領域８４には、二名の人物の属性、発生場所、および現在位置、ならびに、受け渡しの方向および発生場所に関する情報が示されている。前記領域８４に記載の通り、二名の人物に関する情報は、画面Ｘ（領域８２）および画面Ｙ（領域８３）にてそれぞれ分かれて表示されている。前記領域８２および領域８３の内部の画面について、領域８２１および領域８３１では、領域８１において撮像された各人物の画像が、監視員が容易に視認できるように倍率調整されて表示されている。また、領域８２２および領域８３２では、フロアマップを用いて、事象の発生場所、人物の現在位置、および移動軌跡が示されている。本図面においては、丸印が事象の発生現場、五角形が人物の現在位置と進行方向、そして点線が事象の発生現場から現在位置までを結ぶ移動軌跡を示す。また、領域８２３および領域８３３では、現在時刻における人物の様子が確認できる。領域８２および領域８３に示す画面によって、事象を発生させてから現在時刻に至るまでの実行者および被実行者の移動軌跡を把握することができる。 11 is a diagram showing an example of display by the video display unit 42 in this embodiment. The information of each person is shown in detail for an interaction between two people for which a monitoring importance level is set. Area 81 shows a frame image in which an interaction is detected. The frame image shows the state of a handover between person 811 and person 812. Area 84 also shows information on the attributes, occurrence location, and current location of the two people, as well as the direction and occurrence location of the handover. As described in area 84, information on the two people is displayed separately on screen X (area 82) and screen Y (area 83). Regarding the screens inside areas 82 and 83, areas 821 and 831 display the images of each person captured in area 81 with the magnification adjusted so that the monitor can easily view them. Areas 822 and 832 also show the location of the event, the current location of the person, and the movement trajectory using a floor map. In this diagram, a circle indicates the location where the event occurred, a pentagon indicates the person's current location and direction of travel, and a dotted line indicates the movement trajectory connecting the location where the event occurred to the current location. Additionally, areas 823 and 833 allow the user to check the state of the person at the current time. The screens shown in areas 82 and 83 allow the user to grasp the movement trajectories of the performer and the performed person from the time the event occurred until the current time.

フロアマップへの移動軌跡の表示を行うために、以下の処理が必要である。まず、人物８１１と人物８１２の画像が取得されたのち、カメラから取得されたフレーム画像に対して、一定時刻ごと、または一定フレームごとに人物同定を行う。次に、フロアマップ上において前記人物の位置を特定するため、事前に設定されたエリア情報を用いるか、または、ステレオカメラや単眼カメラによる深度推定技術等を用いて、事前設定を行うことなく、世界座標系における人物の位置を推定する。取得した位置情報と時刻情報を時系列で繋ぐことで、前記人物の移動軌跡をフロアマップへ表示することができる。 The following process is required to display the movement trajectory on the floor map. First, images of person 811 and person 812 are acquired, and then person identification is performed at regular time intervals or for regular frames for the frame images acquired from the camera. Next, to identify the position of the person on the floor map, pre-set area information is used, or depth estimation technology using a stereo camera or monocular camera is used to estimate the position of the person in the world coordinate system without pre-setting. By connecting the acquired position information and time information in chronological order, the movement trajectory of the person can be displayed on the floor map.

また、本図面では実行者および被実行者共に事象発生からの移動軌跡を示しているが、事象を発生させていない人物の追跡が既に行われている、または別途用意された記憶媒体へ保存済みの映像を利用することで事象発生後に当該人物の追跡が行える場合、各人物について事象の発生現場までの移動軌跡を表示してもよい。例えば、受け渡し行為においては、物品を受け渡した実行者は事象発生までの移動軌跡を表示し、物品を受け渡された被実行者は事象発生後からの移動軌跡を表示することで、人物ではなく物品の移動軌跡に着目した映像監視を行うことができる。 In addition, while this diagram shows the movement trajectories of both the perpetrator and the executed person from the occurrence of the event, if tracking of a person who did not cause the event has already been performed, or if tracking of that person can be performed after the event occurs by using video stored on a separately prepared storage medium, the movement trajectory of each person to the scene of the event may be displayed. For example, in the case of a handover act, the perpetrator who hands over an item displays the movement trajectory up to the occurrence of the event, and the executed person to whom the item is handed over displays the movement trajectory from after the event occurs, allowing video surveillance to focus on the movement trajectory of the item rather than the person.

以上に説明したように、本実施の形態によれば、画像表示部では一事象のみに着目した表示が可能である。具体的に、インタラクションを発生させた人物それぞれの事象発生時の様子、現在時刻における様子、および移動軌跡を表示することができ、さらに、事象発生までの実行者らの移動軌跡も表示することができる。これにより、監視員は事象の有無のみならず、実行者らの事象発生前後の動きも一目で確認することができるため、容易かつ正確に事象の詳細情報を把握することができる。 As described above, according to this embodiment, the image display unit can display a single event. Specifically, it is possible to display the state of each person who generated an interaction at the time the event occurred, their state at the current time, and their movement trajectory, and it is also possible to display the movement trajectory of the perpetrators up until the event occurred. This allows the monitor to check at a glance not only whether an event occurred, but also the movements of the perpetrators before and after the event occurred, making it possible to easily and accurately grasp detailed information about the event.

上述の各実施の形態に示した通り、開示の映像解析システム３は、監視領域を撮影した映像を用いて、監視領域における事象を検出する映像解析システムであって、映像に基づいて、複数の人物の関与により生じる事象であるインタラクションを検出し、インタラクションの種類と、複数の人物の各々がインタラクションにおいて他の人物とどのように関わったかを示すインタラクションの方向とを出力するインタラクション検出部３２２と、インタラクションの種類及び方向と、予め設定された監視基準情報と、を比較して、インタラクションに関与した複数の人物について人物毎の監視重要度を判定する監視重要度判定部３２３と、監視重要度に基づいて、事象の検出結果を出力する出力制御部３２４とを有する。かかる構成及び動作により、監視エリア内でインタラクションを行った複数の人物ごとに監視重要度を設定し、監視における対応者の業務負荷の軽減およびシステムの処理負荷の低減を実現することができる。 As described in the above-mentioned embodiments, the disclosed video analysis system 3 is a video analysis system that detects events in a monitored area using video captured in the monitored area, and includes an interaction detection unit 322 that detects interactions, which are events that occur due to the involvement of multiple people, based on the video, and outputs the type of interaction and the direction of the interaction that indicates how each of the multiple people interacted with other people in the interaction, a monitoring importance determination unit 323 that compares the type and direction of the interaction with preset monitoring criteria information to determine the monitoring importance of each of the multiple people involved in the interaction, and an output control unit 324 that outputs the detection result of the event based on the monitoring importance. With this configuration and operation, a monitoring importance can be set for each of the multiple people who interacted within the monitored area, and the workload of the responder in monitoring and the processing load of the system can be reduced.

また、映像に含まれる人物の像を検出し、検出された人物の属性を表す属性特徴量を算出する算出部３２１をさらに備え、監視重要度判定部３２３は、属性特徴量をさらに用いて監視重要度を判定するので、人物の属性を考慮した高精度な判定が可能である。 The system further includes a calculation unit 321 that detects images of people included in the video and calculates attribute features that represent the attributes of the detected people, and the monitoring importance determination unit 323 further uses the attribute features to determine the monitoring importance, making it possible to make a highly accurate determination that takes into account the attributes of people.

また、インタラクション検出部３２２は、映像に基づいて、人物の骨格を検出し、検出した骨格の推定結果から算出される人物の姿勢を表す姿勢特徴量と、人物間の任意の部位間の一つまたは複数の距離から算出される距離特徴量と、映像に基づいた前後の画像フレーム間の差分から算出される骨格の単位時間当たりの移動量を表す移動特徴量と、人物に対する物品の所有関係を表現する物品特徴量と、の中から少なくともいずれか一つを算出し、算出した特徴量に基づいて、インタラクションの種類及びインタラクションの方向を検出する。かかる構成によれば、人物の姿勢や人物間の距離、物品の種別などを考慮した高精度な判定が可能である。 The interaction detection unit 322 also detects the skeleton of a person based on the video, and calculates at least one of the following: a posture feature representing the posture of the person calculated from the estimated result of the detected skeleton, a distance feature calculated from one or more distances between any parts of the person, a movement feature representing the amount of movement of the skeleton per unit time calculated from the difference between previous and next image frames based on the video, and an item feature expressing the ownership relationship of an item to the person, and detects the type and direction of the interaction based on the calculated feature. This configuration enables highly accurate determination taking into account the posture of the person, the distance between the people, the type of item, etc.

さらに、監視重要度判定部３２３は、インタラクションが発生した時点の複数の人物の位置に関する情報を利用して、監視重要度を判定するので、位置関係から不合理な事象を除外し、判定精度を向上することができる。 Furthermore, the monitoring importance determination unit 323 determines the monitoring importance using information about the positions of multiple people at the time an interaction occurs, making it possible to eliminate unreasonable events based on positional relationships and improve the accuracy of the determination.

また、映像解析システム３は、検出対象のインタラクションを発生させた各人物に対し監視重要度を判定するための、インタラクションの種類、インタラクションの方向、及び、発生エリア毎のセキュリティレベルの情報を監視基準情報として保持する記憶部３３を有する。これらの情報を予め保持し、適宜使用することで、簡易且つ高精度な判定を実現することができる。 The video analysis system 3 also has a storage unit 33 that stores information on the type of interaction, the direction of the interaction, and the security level for each area of occurrence as monitoring criteria information to determine the monitoring importance of each person who has generated the interaction to be detected. By storing this information in advance and using it appropriately, it is possible to realize simple and highly accurate determination.

また、インタラクションの種類及び／または前記人物に関する情報を検索クエリとして、インタラクションの検出実績を検索することで、インタラクションを発生させた人物の検索が可能な検索部４４を設けることで、検知して蓄積したインタラクションを有効利用することができる。 In addition, by providing a search unit 44 that can search for the person who caused the interaction by searching the interaction detection record using the type of interaction and/or information about the person as a search query, the detected and accumulated interactions can be effectively utilized.

また、出力制御部３２４は、インタラクションを発生させた各人物に対する監視重要度に応じて、表示端末への表示の大きさを変化させる。このため、重要度を簡易に認識させるとともに、重要度に応じて情報量をコントロールすることができる。
また、インタラクションを発生させた各人物について、インタラクションの時間的に前後の移動軌跡を画面上に表示することで、インタラクション前後の挙動を確認することができる。
複数の人物の移動軌跡の生成の要否は、重要度により判定してもよい。この場合には、重要な人物について選択的に移動軌跡を出力可能となる。
さらに、検出したインタラクションにおける複数の人物について、インタラクションに係る所定の行動の実行者または被実行者であることを示す情報を画面に表示することで、インタラクションの方向を明示してもよい。
また、インタラクションが検出されたフレーム画像を画面に表示しつつ、当該インタラクションに関与した人物の現在の位置及び／または現在の映像を画面に表示することで、インタラクションの内容と人物の現状とを関連付けて表示することもできる。 Furthermore, the output control unit 324 changes the size of the display on the display terminal in accordance with the monitoring importance of each person who has generated an interaction. This allows the importance to be easily recognized, and the amount of information can be controlled according to the importance.
In addition, for each person who generated an interaction, the movement trajectory before and after the interaction can be displayed on the screen, making it possible to check the behavior before and after the interaction.
Whether or not to generate trajectories of multiple people may be determined based on their importance, in which case it becomes possible to selectively output trajectories of important people.
Furthermore, the direction of the interaction may be clearly indicated by displaying on the screen information indicating that each of a plurality of people in a detected interaction is either the performer or the recipient of a predetermined action related to the interaction.
In addition, by displaying on the screen a frame image in which an interaction is detected while also displaying on the screen the current position and/or current video of the person involved in the interaction, it is possible to display the content of the interaction in relation to the person's current situation.

なお、本発明は上述した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。また、上記の各構成、機能、処理部、処理手段等は、それらの一部または全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、ＳＳＤ（Solid State Drive）等の記録装置、または、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。 The present invention is not limited to the above-mentioned embodiment, and various modifications are included. For example, the above-mentioned embodiment has been described in detail to explain the present invention in an easy-to-understand manner, and is not necessarily limited to those having all the configurations described. In addition, it is possible to replace a part of the configuration of a certain embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of a certain embodiment. In addition, it is possible to add, delete, or replace a part of the configuration of each embodiment with another configuration. In addition, each of the above-mentioned configurations, functions, processing units, processing means, etc. may be realized in hardware by designing them as integrated circuits, for example, in part or in whole. In addition, each of the above-mentioned configurations, functions, etc. may be realized in software by a processor interpreting and executing a program that realizes each function. Information such as programs, tables, files, etc. that realize each function can be placed in a memory, a recording device such as a hard disk or SSD (Solid State Drive), or a recording medium such as an IC card, SD card, or DVD.

１…映像監視システム、
２…撮影システム、２１…カメラ部、
３…映像解析システム、３１…映像入力部、３２…映像処理部、３２１…算出部、３２１１…人物検出部、３２１２…属性判定部、３２２…インタラクション検出部、３２３…監視重要度判定部、３２４…出力制御部、３３…記憶部、３３１…監視基準情報、
４…監視センタシステム、４１…記録部、４２…映像表示部、４３…管理制御部、４４…検索部
1...Video surveillance system,
2...photography system, 21...camera unit,
3: Video analysis system, 31: Video input unit, 32: Video processing unit, 321: Calculation unit, 3211: Person detection unit, 3212: Attribute determination unit, 322: Interaction detection unit, 323: Monitoring importance determination unit, 324: Output control unit, 33: Storage unit, 331: Monitoring criteria information,
4: monitoring center system, 41: recording section, 42: video display section, 43: management control section, 44: search section

Claims

1. A video analysis system for detecting an event in a monitored area using a video of the monitored area,
an interaction detection unit that detects an interaction, which is an event occurring with the involvement of a plurality of persons, based on the video, and outputs a type of the interaction and an interaction direction indicating how each of the plurality of persons is involved with other persons in the interaction;
a monitoring importance determination unit for comparing the type and direction of the interaction with preset monitoring standard information to determine a monitoring importance level for each of a plurality of people involved in the interaction;
An output control unit that outputs the event detection result based on the monitoring importance.

A calculation unit is further provided for detecting an image of a person included in the video and calculating an attribute feature amount representing an attribute of the detected person,
The video analysis system according to claim 1 , wherein the monitoring importance determination unit determines the monitoring importance by further using the attribute feature amount.

The interaction detection unit detects a skeleton of the person based on the video;
A posture feature amount representing a posture of the person calculated from an estimation result of the detected skeleton; and
a distance feature calculated from one or more distances between any parts of the people; and
a movement feature amount that represents a movement amount of the skeleton per unit time, the movement feature amount being calculated from a difference between previous and next image frames based on the video;
and an item feature that expresses an ownership relationship of the item to the person, and the type of the interaction and the direction of the interaction are detected based on the calculated feature.

The video analysis system according to claim 1, characterized in that the monitoring importance determination unit determines the monitoring importance using information about the positions of the multiple people at the time the interaction occurred.

The video analysis system according to claim 1, further comprising a storage unit that stores, as the monitoring criteria information, information on the type of interaction, the direction of the interaction, and the security level for each area in which the interaction occurs, in order to determine the importance of monitoring for each person who has caused the interaction to be detected.

The video analysis system according to claim 1, further comprising a search unit capable of searching for the person who caused the interaction by searching the interaction detection record using the type of the interaction and/or information about the person as a search query.

The video analysis system according to claim 1, characterized in that the output control unit changes the size of the display on the display terminal according to the monitoring importance of each person who caused the interaction.

The video analysis system according to claim 1, characterized in that the output control unit displays on a screen the movement trajectory of each person who caused the interaction, both before and after the interaction.

The video analysis system according to claim 1, characterized in that the output control unit determines whether or not it is necessary to generate movement trajectories of the multiple people based on the monitoring importance.

The video analysis system according to claim 1, characterized in that the output control unit displays on a screen information indicating that each of the multiple people in the detected interaction is a performer or a recipient of a predetermined action related to the interaction.

The video analysis system of claim 1, wherein the output control unit displays on a screen a frame image in which the interaction is detected, and further displays on the screen the current position and/or current video of the person involved in the interaction.

The video analysis system according to claim 3, characterized in that the interaction detection unit calculates the posture feature amount, the distance feature amount, the movement feature amount, and the item feature amount, and detects the type of interaction and the direction of the interaction between the detected people based on the posture feature amount, the distance feature amount, the movement feature amount, and the item feature amount.

1. A computer-implemented video surveillance method for detecting an event in a surveillance area using a video of the surveillance area, the method comprising:
an interaction detection step of detecting an interaction, which is an event occurring with the involvement of a plurality of persons, based on the video, and outputting a type of the interaction and an interaction direction indicating how each of the plurality of persons is involved with the other persons in the interaction;
a monitoring importance determination step of comparing the type and direction of the interaction with preset monitoring standard information to determine a monitoring importance level for each of a plurality of people involved in the interaction;
and an output control step of outputting the event detection result based on the monitoring importance.