JP6876379B2

JP6876379B2 - Behavioral analyzers and programs

Info

Publication number: JP6876379B2
Application number: JP2016099019A
Authority: JP
Inventors: 田中　康成; 康成田中; 和大倉又; 鈴木　優; 優鈴木; 聡典河村; 純太浅野
Original assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2016-05-17
Filing date: 2016-05-17
Publication date: 2021-05-26
Anticipated expiration: 2036-05-17
Also published as: JP2017207877A

Description

本発明の実施形態は、行動分析装置およびプログラムに関する。 Embodiments of the present invention relate to behavioral analytics devices and programs.

施設等における人間関係のトラブルを防止するため、あるいは解決するためには、人間の行動における特定の事象あるいは兆候を早期に発見することが有効であると考えられる。
従来技術において、家庭内暴力や、虐待や、いじめを未然に防ぐために、通信機器よる通報を容易に行えるシステムを提供しようとするものがある。同システムでは、音声に基づく状況分析をすることにより、危険度や異常度を判断する。また、幼児や高齢者等が文字を書くことが困難な場合に、音声認識を行って文字に変換する技術が提案されている。しかしながら、音声による分析だけでは状況の把握が不十分である可能性があった。
また、従来技術において、スマートフォン等の個人用端末装置に交友状況プログラムをインストールしておき、ＳＮＳサイト等における友人数の増減に基づいて交友状況の変化を検知する技術が存在する。しかしながら、ＳＮＳサイトにおける友人数の増減の情報は、日々のきめ細かな人の行動までを表すには不十分である場合があった。 In order to prevent or solve human relationship troubles in facilities, it is considered effective to detect specific events or signs in human behavior at an early stage.
In the prior art, there is an attempt to provide a system that can easily make a report by a communication device in order to prevent domestic violence, abuse, and bullying. The system determines the degree of risk and abnormality by analyzing the situation based on voice. In addition, when it is difficult for infants and the elderly to write characters, a technique has been proposed in which voice recognition is performed and the characters are converted into characters. However, there was a possibility that the situation could not be grasped only by voice analysis.
Further, in the prior art, there is a technique of installing a friendship status program on a personal terminal device such as a smartphone and detecting a change in the friendship status based on an increase or decrease in the number of friends on an SNS site or the like. However, the information on the increase / decrease in the number of friends on the SNS site may not be sufficient to represent even the detailed daily behavior of a person.

特開２００３−３４６２５６号公報Japanese Unexamined Patent Publication No. 2003-346256 特開２０１５−２２５５４０号公報JP-A-2015-225540

本発明が解決しようとする課題は、いじめなどといった問題に関する事象やその兆候の情報を網羅的に把握することができる行動分析装置およびプログラムを提供することである。 An object to be solved by the present invention is to provide a behavioral analyzer and a program capable of comprehensively grasping information on events and signs related to problems such as bullying.

実施形態の行動分析装置は、映像取得部と、映像解析部と、音声取得部と、音声解析部と、総合解析部と、情報出力部とを持つ。映像取得部は、映像を取得する。映像解析部は、前記映像取得部が取得した映像に映る人を解析して映像パターンを検出する。音声取得部は、音声を取得する。音声解析部は、前記音声取得部が取得した音声に含まれる人の声を解析して音声パターンを検出する。総合解析部は、前記映像解析部によって検出された前記映像パターンと前記音声解析部によって検出された前記音声パターンとに基づいて、人の行動に関する特定の事象を検出する。情報出力部は、前記総合解析部によって検出された前記事象に関する情報を出力する。 The behavior analysis device of the embodiment includes a video acquisition unit, a video analysis unit, a voice acquisition unit, a voice analysis unit, a comprehensive analysis unit, and an information output unit. The video acquisition unit acquires the video. The image analysis unit analyzes a person appearing in the image acquired by the image acquisition unit and detects an image pattern. The voice acquisition unit acquires voice. The voice analysis unit analyzes a human voice included in the voice acquired by the voice acquisition unit and detects a voice pattern. The comprehensive analysis unit detects a specific event related to human behavior based on the video pattern detected by the video analysis unit and the audio pattern detected by the audio analysis unit. The information output unit outputs information about the event detected by the comprehensive analysis unit.

第１の実施形態の行動分析装置を適用したシステム構成例を示す図。The figure which shows the system configuration example which applied the behavior analysis apparatus of 1st Embodiment. 第１の実施形態の行動分析装置の概略機能構成を示すブロック図。The block diagram which shows the schematic functional structure of the behavior analysis apparatus of 1st Embodiment. 第１の実施形態の映像解析部が解析した結果の例を示す概略図。The schematic diagram which shows the example of the result which the image analysis part of 1st Embodiment analyzed. 第１の実施形態の映像解析部が解析した結果の例を示す概略図。The schematic diagram which shows the example of the result which the image analysis part of 1st Embodiment analyzed. 第１の実施形態の映像解析部が解析した結果の例を示す概略図。The schematic diagram which shows the example of the result which the image analysis part of 1st Embodiment analyzed. 第１の実施形態の総合解析部の内部の機能構成を示すブロック図。The block diagram which shows the internal functional structure of the comprehensive analysis part of 1st Embodiment. 第１の実施形態の兆候検出ルール記憶部が記憶するルールの例を示す概略図。The schematic diagram which shows the example of the rule which the symptom detection rule storage part of 1st Embodiment stores. 第１の実施形態の行動分析装置におけるユーザーの権限を設定するための権限設定テーブルの構成を示す概略図。The schematic diagram which shows the structure of the authority setting table for setting the authority of the user in the behavior analysis apparatus of 1st Embodiment. 第１の実施形態の情報出力部が出力する情報の一例を示す概略図。The schematic diagram which shows an example of the information output by the information output part of 1st Embodiment.

以下、実施形態の行動分析装置およびプログラムを、図面を参照して説明する。 Hereinafter, the behavioral analyzer and the program of the embodiment will be described with reference to the drawings.

図１は、本実施形態の行動分析装置を適用したシステム構成例を示す図であり、学校施設において本実施形態の行動分析装置を利用した例を示している。図示する例では、行動分析装置は学校施設に設置されている。なお、行動分析装置は、学校に限らず、例えば、幼稚園あるいは保育園等や、介護施設や、病院や、役所等の公的機関や、店舗や企業等、集団で利用される施設に設置され得る。行動分析装置は、学校内の教室に設けられたカメラ８２で取得される映像や、マイクロフォン８３で取得される音声を用いて分析する。なお、施設内（学校内）の通信手段としては例えば校内イントラネットが利用される。そして、行動分析装置は、行動分析の結果として、行動評価を推定する。図示する例では、行動分析装置は、いじめ発生の程度を推定する。具体的には、例えば、行動分析装置は、ある教室におけるいじめ発生の程度を「低」であると推定し、また他の教室におけるいじめ発生の程度を「高」であると推定する。また、行動分析装置は、分析結果の情報を出力する。具体的には、行動分析装置は、例えばインターネットを介して、分析結果を記載した電子メールを、先生や、生徒の保護者（親）などに送信する。また、行動分析装置は、インターネットを介して外部装置８１と情報のやり取りを行う。ここで、外部装置８１とは、例えば他の学校に設置された行動分析装置や、「学校非公式サイト」と呼ばれるウェブサイトのサーバー装置等である。なお、行動分析装置１は、インターネットを介した通信により、外部装置８１との情報のやり取りを行う。 FIG. 1 is a diagram showing a system configuration example to which the behavior analysis device of the present embodiment is applied, and shows an example of using the behavior analysis device of the present embodiment in a school facility. In the illustrated example, the behavioral analyzer is installed in a school facility. The behavior analysis device can be installed not only in schools but also in kindergartens or nursery schools, nursing care facilities, hospitals, public institutions such as government offices, stores, companies, and other facilities used by groups. .. The behavior analysis device analyzes using the video acquired by the camera 82 provided in the classroom in the school and the voice acquired by the microphone 83. As a means of communication within the facility (inside the school), for example, the school intranet is used. Then, the behavior analysis device estimates the behavior evaluation as a result of the behavior analysis. In the illustrated example, the behavioral analyzer estimates the degree of bullying occurrence. Specifically, for example, a behavioral analyzer estimates that the degree of bullying in one classroom is "low" and that the degree of bullying in another classroom is "high". In addition, the behavior analysis device outputs information on the analysis result. Specifically, the behavior analysis device sends an e-mail containing the analysis result to the teacher, the guardian (parent) of the student, or the like via the Internet, for example. In addition, the behavior analysis device exchanges information with the external device 81 via the Internet. Here, the external device 81 is, for example, a behavior analysis device installed in another school, a server device for a website called an "unofficial school site", or the like. The behavior analysis device 1 exchanges information with the external device 81 by communication via the Internet.

図２は、本実施形態の行動分析装置の概略機能構成を示すブロック図である。図示するように、行動分析装置１は、映像取得部１１と、映像蓄積部１２と、映像解析部１３と、音声取得部２１と、音声蓄積部２２と、音声解析部２３と、個人特徴記憶部３１と、総合解析部３２と、情報交換部４１と、情報出力部４２とを含んで構成される。また、行動分析装置１は、外部の複数のカメラ８２から、映像を取得する。また、行動分析装置１は、外部の複数のマイクロフォン８３から、音声を取得する。また、行動分析装置１は、通信ネットワーク等を介して、外部装置８１との間で情報を交換する。
なお、図では２台のカメラ８２のみを示しているが、設置するカメラの台数は任意であり、１台であってもよく、また３台以上であってもよい。また、図では２台のマイクロフォン８３のみを示しているが、設置するマイクロフォンの本数は任意であり、１本であってもよく、また３本以上であってもよい。なお、カメラ８２とマイクロフォン８３とは、同じ場所にペアで設置される。また、図では１台の外部装置８１のみを示しているが、行動分析装置１が情報交換する相手である外部装置は、複数であってもよい。 FIG. 2 is a block diagram showing a schematic functional configuration of the behavior analysis device of the present embodiment. As shown in the figure, the behavior analysis device 1 includes a video acquisition unit 11, a video storage unit 12, a video analysis unit 13, a voice acquisition unit 21, a voice storage unit 22, a voice analysis unit 23, and personal feature storage. It is composed of a unit 31, a comprehensive analysis unit 32, an information exchange unit 41, and an information output unit 42. In addition, the behavior analysis device 1 acquires images from a plurality of external cameras 82. In addition, the behavior analysis device 1 acquires voice from a plurality of external microphones 83. Further, the behavior analysis device 1 exchanges information with the external device 81 via a communication network or the like.
Although only two cameras 82 are shown in the figure, the number of cameras to be installed is arbitrary and may be one or three or more. Further, although only two microphones 83 are shown in the figure, the number of microphones to be installed is arbitrary and may be one or three or more. The camera 82 and the microphone 83 are installed in pairs at the same location. Further, although the figure shows only one external device 81, there may be a plurality of external devices with which the behavior analysis device 1 exchanges information.

なお、行動分析装置１は、電子回路を用いた情報処理装置として実現される。また、行動分析装置１が。コンピューターとプログラムとを用いて実現されるようにしてもよい。また、行動分析装置１に含まれる各機能部が情報を記憶する機能を有する場合、半導体メモリーや磁気ディスク装置等を用いた記憶手段が使用される。 The behavior analysis device 1 is realized as an information processing device using an electronic circuit. Also, the behavior analysis device 1. It may be realized by using a computer and a program. Further, when each functional unit included in the behavior analysis device 1 has a function of storing information, a storage means using a semiconductor memory, a magnetic disk device, or the like is used.

本実施形態として、図１に示すように行動分析装置１を学校等の施設に設置した場合を例に説明する。カメラ８２とマイクロフォン８３は、対として、施設内の所定の場所に設けられる。例えば、一対のカメラ８２とマイクロフォン８３が、教室や、体育館や、食堂や、施設内のその他のスペースに設けられる。カメラ８２は設置されたスペースの映像を取得する。マイクロフォン８３は設置されたスペースにおける音声を取得する。そして、行動分析装置１は、いじめなどの問題行動やその兆候を検出し、検出結果を報告することによっていじめ問題の防止を図ろうとするものである。 As the present embodiment, a case where the behavior analysis device 1 is installed in a facility such as a school as shown in FIG. 1 will be described as an example. The camera 82 and the microphone 83 are provided as a pair at a predetermined location in the facility. For example, a pair of cameras 82 and a microphone 83 are provided in classrooms, gymnasiums, dining rooms, and other spaces within the facility. The camera 82 acquires an image of the installed space. The microphone 83 acquires voice in the installed space. Then, the behavior analysis device 1 attempts to prevent the problem of bullying by detecting problematic behavior such as bullying and its signs and reporting the detection result.

以下では、行動分析装置１が有する各部の機能について説明する。 Hereinafter, the functions of each part of the behavior analysis device 1 will be described.

映像取得部１１は、外部に設けられた複数のカメラ８２が撮影した映像の信号を取り込み、撮影場所や撮影時刻の情報とともに、映像蓄積部１２に書き込む。
映像蓄積部１２は、映像取得部１１が取得した映像の情報を蓄積する。映像蓄積部１２が映像を保持するために、例えば、磁気ディスク装置等の記録媒体を用いる。なお、映像蓄積部１２が保持する映像は、撮影場所や撮影時刻の情報が付加された状態で蓄積される。また、映像解析部１３によって行われる映像解析処理の結果である解析結果の情報を、映像に付加して蓄積するようにする。 The image acquisition unit 11 captures signals of images taken by a plurality of cameras 82 provided externally, and writes them in the image storage unit 12 together with information on a shooting location and a shooting time.
The video storage unit 12 stores video information acquired by the video acquisition unit 11. In order for the image storage unit 12 to hold the image, for example, a recording medium such as a magnetic disk device is used. The video held by the video storage unit 12 is stored with information on the shooting location and shooting time added. In addition, information on the analysis result, which is the result of the image analysis process performed by the image analysis unit 13, is added to the image and stored.

映像解析部１３は、映像蓄積部１２に蓄積されている映像を解析する処理を行う。映像解析部１３は、解析結果の情報を総合解析部３２に渡すとともに、解析対象とした映像に付加する形で映像蓄積部１２にも記録する。特に、映像解析部１３は、映像内の人を解析する。つまり、映像解析部１３は、映像取得部１１が取得した映像に映る人を解析して映像パターンを検出する。映像解析部１３が行う解析処理には様々なものがあるが、例えば、次の通りである。
映像解析部１３は、映像内に映っているものや人などを認識し、抽出する。
また、映像解析部１３は、個人特徴記憶部３１に記憶されている個人ごとの映像上の特徴に基づいて、個人の識別処理を行う。個人識別のためには、顔や服装の特徴の情報を利用する。なお、個人識別の処理自体は、既存技術を用いて十分な精度で行うことが可能である。つまり、映像解析部１３は、個人特徴記憶部３１から読み出す映像に関する個人特徴に基づいて、映像に映る人の個人識別処理を行う。
また、映像解析部１３は、映像内に映っている複数の人の相対的な位置関係を抽出し、人の配置に関する特定のパターンを抽出する。人の配置のパターンとは、例えば、ある人とある人との間の距離が近いあるいは遠いといったパターンや、複数の人が他の人を取り囲んでいるといったパターンや、ある人が人の集まり（クラスター）に属しているか属していないか、などといったパターンである。
また、映像解析部１３は、映像内に映っている人の行動を解析する。人の行動とは、例えば、人が静止しているか移動しているか、人が真っ直ぐ移動しているか曲線的に移動しているか、人の移動の速さが速いか遅いか、といったものである。また、映像解析部１３は、人の手や足や頭の動きの軌跡を解析する。また、映像解析部１３は、それらの移動や手・足・頭などの動きの組み合わせから、行動種別を解析する。ここで、組み合わせによって解析される行動種別とは、例えば、人が他の人を殴っているとか、蹴っているとか、その他の攻撃を加えているなどといった行動である。 The image analysis unit 13 performs a process of analyzing the image stored in the image storage unit 12. The image analysis unit 13 passes the information of the analysis result to the comprehensive analysis unit 32, and also records it in the image storage unit 12 in the form of adding it to the image to be analyzed. In particular, the image analysis unit 13 analyzes a person in the image. That is, the image analysis unit 13 analyzes the person appearing in the image acquired by the image acquisition unit 11 and detects the image pattern. There are various analysis processes performed by the image analysis unit 13, and for example, they are as follows.
The image analysis unit 13 recognizes and extracts what is reflected in the image, a person, and the like.
In addition, the image analysis unit 13 performs individual identification processing based on the characteristics on the image of each individual stored in the personal feature storage unit 31. For personal identification, information on facial features and clothing characteristics is used. The personal identification process itself can be performed with sufficient accuracy using existing technology. That is, the image analysis unit 13 performs the personal identification process of the person appearing in the image based on the individual characteristics related to the image read from the personal feature storage unit 31.
In addition, the image analysis unit 13 extracts the relative positional relationship of a plurality of people shown in the image, and extracts a specific pattern regarding the arrangement of the people. The pattern of arrangement of people is, for example, a pattern in which a person is close to or far from another person, a pattern in which multiple people surround another person, or a group of people (a group of people). It is a pattern such as whether it belongs to a cluster) or not.
In addition, the video analysis unit 13 analyzes the behavior of the person shown in the video. A person's behavior is, for example, whether the person is stationary or moving, whether the person is moving straight or curvilinearly, and whether the person is moving fast or slow. .. In addition, the image analysis unit 13 analyzes the trajectory of the movement of a person's hand, foot, or head. In addition, the video analysis unit 13 analyzes the action type from the combination of those movements and the movements of the hands, feet, head, and the like. Here, the action type analyzed by the combination is, for example, an action such as a person hitting another person, kicking, or making another attack.

音声取得部２１は、外部に設けられた複数のマイクロフォン８３が取得した音声の信号を取り込み、集音場所や時刻の情報とともに音声蓄積部２２に書き込む。なお、集音場所の情報は、例えば教室の番号等、マイクロフォン８３が設置された場所の情報を含む。なお、マイクロフォン８３は、マルチチャンネル（２チャンネルまたはそれ以上）の音声を取得するものであってもよい。
音声蓄積部２２は、音声取得部２１が取得した音声を蓄積する。音声蓄積部２２が音声を記録するために、例えば、磁気ディスク装置等の記録媒体を用いる。なお、音声蓄積部２２が保持する音声は、集音場所や時刻の情報が付加された状態で蓄積される。また、音声解析部２３によって行われる音声解析処理の結果である解析結果の情報を、音声に付加して蓄積するようにする。 The voice acquisition unit 21 takes in voice signals acquired by a plurality of external microphones 83 and writes them in the voice storage unit 22 together with information on the sound collection location and time. The sound collection location information includes information on the location where the microphone 83 is installed, such as a classroom number. The microphone 83 may acquire multi-channel (two-channel or more) audio.
The voice storage unit 22 stores the voice acquired by the voice acquisition unit 21. In order for the sound storage unit 22 to record sound, for example, a recording medium such as a magnetic disk device is used. The voice held by the voice storage unit 22 is stored with information on the sound collection location and time added. In addition, information on the analysis result, which is the result of the voice analysis process performed by the voice analysis unit 23, is added to the voice and stored.

音声解析部２３は、音声蓄積部２２に蓄積されている音声を解析する処理を行う。音声解析部２３は、解析結果の情報を総合解析部３２に渡すとともに、解析対象とした映像に音声を付加する形で音声蓄積部２２にも記録する。特に音声解析部２３は、人の声を解析する。つまり、音声解析部２３は、音声取得部２１が取得した音声に含まれる人の声を解析して音声パターンを検出する。音声解析部２３が行う解析処理には様々なものがあるが、例えば、次の通りである。
音声解析部２３は、人による発話の部分を認識し、抽出する。なお、記録されている音声の中から人の発話を抽出する処理自体は、既存技術を用いて行うことができる。
また、集音のために用いられたマイクロフォン８３がマルチチャンネル（２チャンネルまたはそれ以上）の音声を取得するものである場合、音声解析部２３は、抽出した人の発話が行われた場所あるいは方向を特定する。なお、場所や方向は、各チャンネルにおけるその発話の音量の分布や、チャンネル間での発話の遅延を解析することなどにより特定されるものである。
また、音声解析部２３は、人の発話の音量を解析する。そして、ある発話における音量が所定の閾値を超えている場合には、特に大声で発話されたものであることを示すマークを、その発話の部分に付加する。
また、音声解析部２３は、抽出された発話の、周波数分布等の特徴により、発話者を識別する処理を行う。このとき、音声解析部２３は、音声の個人別の特徴を表す情報を、個人特徴記憶部３１から読み出して利用する。なお、音声による発話者の識別処理自体は、既存の技術を用いて行うようにする。音声による識別処理が、多数の人を完全に識別することが困難である場合にも、ある発話の発話者が誰であるかを表す尤度の情報を出力することは可能である。つまり、音声解析部２３は、個人特徴記憶部３１から読み出す音声に関する個人特徴に基づいて、音声に含まれる人の声の個人識別処理を行う。
また、音声解析部２３は、抽出された発話の部分について、不特定話者用の認識用音声モデルのデータを用いて、音声認識処理を行う。音声認識処理自体は、既存の技術により行うことができる。音声解析部２３は、音声認識処理の結果として、発話ごとの単語列のデータを出力する。このような音声認識処理を行うことにより、例えば、特定のキーワードが発話された状況を検知することができるようになる。
また、音声解析部２３は、音声に基づく感情の推定を行う。この解析は、楽しく笑う声や、嘲笑の声や、泣き声や、怒り声など、表されている感情ごとの音響的特徴のモデルを予め保持しておき、実際に取得した音声をこの感情ごとのモデルに基づいて分析することにより行える。感情の推定自体は、既存の技術を用いて行うことができる。感情を推定した結果として、音声解析部２３は、発話ごとの感情の尤度（例えば、「泣き声である尤度が９０％である」）の情報を出力する。 The voice analysis unit 23 performs a process of analyzing the voice stored in the voice storage unit 22. The audio analysis unit 23 passes the analysis result information to the comprehensive analysis unit 32, and also records the analysis result information in the audio storage unit 22 in the form of adding audio to the video to be analyzed. In particular, the voice analysis unit 23 analyzes a human voice. That is, the voice analysis unit 23 analyzes the human voice included in the voice acquired by the voice acquisition unit 21 and detects the voice pattern. There are various analysis processes performed by the voice analysis unit 23, and for example, they are as follows.
The voice analysis unit 23 recognizes and extracts a portion of a human utterance. The process itself of extracting a person's utterance from the recorded voice can be performed using existing technology.
Further, when the microphone 83 used for sound collection acquires multi-channel (2 channels or more) voice, the voice analysis unit 23 determines the place or direction in which the extracted person's utterance is made. To identify. The location and direction are specified by analyzing the distribution of the volume of the utterance in each channel and the delay of the utterance between the channels.
In addition, the voice analysis unit 23 analyzes the volume of a person's utterance. Then, when the volume of a certain utterance exceeds a predetermined threshold value, a mark indicating that the utterance is particularly loud is added to the utterance portion.
In addition, the voice analysis unit 23 performs a process of identifying the speaker based on characteristics such as frequency distribution of the extracted utterance. At this time, the voice analysis unit 23 reads out the information representing the individual characteristics of the voice from the personal feature storage unit 31 and uses it. The speaker identification process itself by voice should be performed using existing technology. Even when it is difficult for the voice identification process to completely identify a large number of people, it is possible to output likelihood information indicating who is the speaker of a certain utterance. That is, the voice analysis unit 23 performs personal identification processing of the human voice included in the voice based on the personal characteristics related to the voice read from the personal feature storage unit 31.
In addition, the voice analysis unit 23 performs voice recognition processing on the extracted utterance portion using the data of the recognition voice model for an unspecified speaker. The voice recognition process itself can be performed by existing technology. The voice analysis unit 23 outputs word string data for each utterance as a result of the voice recognition process. By performing such a voice recognition process, for example, it becomes possible to detect a situation in which a specific keyword is spoken.
In addition, the voice analysis unit 23 estimates emotions based on voice. In this analysis, a model of the acoustic characteristics of each emotion expressed, such as a happy laughing voice, a mocking voice, a crying voice, and an angry voice, is stored in advance, and the actually acquired voice is obtained for each emotion. It can be done by analyzing based on the model. Emotion estimation itself can be done using existing techniques. As a result of estimating the emotion, the voice analysis unit 23 outputs information on the likelihood of the emotion for each utterance (for example, "the likelihood of crying is 90%").

個人特徴記憶部３１は、映像解析処理や音声解析処理で利用可能な、個人の特徴を記憶する。個人特徴の情報は、大別すると、視覚的情報と音響的情報に分かれる。個人特徴の情報は、映像解析部１３による個人識別処理や、音声解析部２３による個人識別処理のために用いられる。個人の視覚的特徴は、顔の特徴や、身体の特徴や、服装の特徴を含む。個人の音響的特徴は、発話における周波数分布等の特徴を含む。 The personal feature storage unit 31 stores personal features that can be used in video analysis processing and audio analysis processing. Information on personal characteristics can be broadly divided into visual information and acoustic information. The personal feature information is used for personal identification processing by the video analysis unit 13 and personal identification processing by the audio analysis unit 23. Individual visual features include facial features, physical features, and clothing features. An individual's acoustic characteristics include characteristics such as frequency distribution in speech.

総合解析部３２は、映像解析部１３による映像の解析結果と、音声解析部２３による音声の解析結果とに基づき、人の行動に関する総合的な解析を行い、その結果を出力する。つまり、総合解析部３２は、映像解析部１３によって検出された映像パターンと音声解析部２３によって検出された音声パターンとに基づいて、人の行動に関する特定の事象を検出する。総合解析部３２の機能の詳細については、別の図を参照しながら、後で説明する。 The comprehensive analysis unit 32 performs comprehensive analysis on human behavior based on the image analysis result by the image analysis unit 13 and the audio analysis result by the audio analysis unit 23, and outputs the result. That is, the comprehensive analysis unit 32 detects a specific event related to human behavior based on the image pattern detected by the image analysis unit 13 and the audio pattern detected by the audio analysis unit 23. The details of the function of the comprehensive analysis unit 32 will be described later with reference to another figure.

情報交換部４１は、インターネット等の通信回線を介して、外部装置８１との間で情報を交換する。外部装置８１は、例えば、他の施設に設置された行動分析装置である。あるいは、外部装置８１は、行動分析装置１が行う解析処理の際に使用するパラメーター等を提供するコンピューターサーバー装置である。あるいは、外部装置８１は、行動分析装置１によって解析された結果の情報（人の行動に関する事象や兆候の情報等）を収集するコンピューターサーバー装置である。
なお、情報交換部４１が外部からパラメーター等のデータを取得し、そのデータを総合解析部３２や映像解析部１３や音声解析部２３が使用することにより、解析精度が向上する。また、情報交換部４１が、外部装置８１に対して同様のデータを提供することにより、他の行動分析装置における解析精度が向上する。 The information exchange unit 41 exchanges information with the external device 81 via a communication line such as the Internet. The external device 81 is, for example, a behavior analysis device installed in another facility. Alternatively, the external device 81 is a computer server device that provides parameters and the like used in the analysis processing performed by the behavior analysis device 1. Alternatively, the external device 81 is a computer server device that collects information on the results analyzed by the behavior analysis device 1 (information on events and signs related to human behavior, etc.).
The information exchange unit 41 acquires data such as parameters from the outside, and the data is used by the comprehensive analysis unit 32, the video analysis unit 13, and the voice analysis unit 23 to improve the analysis accuracy. Further, the information exchange unit 41 provides the same data to the external device 81, so that the analysis accuracy in the other behavior analysis device is improved.

情報出力部４２は、総合解析部３２が検出した事象あるいは兆候を、報告のために出力する機能を有する。情報出力部４２は、例えば、行動分析装置１のユーザーによって読まれることを目的とした、紙の報告書や、電子メールや、ＳＮＳのメッセージを出力する。なお、ＳＮＳとは、ソーシャル・ネットワーキング・サービス（social networking service）の略である。つまり、情報出力部４２は、総合解析部３２によって検出された事象に関する情報を出力する。
なお、「事象」や「兆候」を総称して「事象」と呼ぶ場合がある。 The information output unit 42 has a function of outputting an event or a sign detected by the comprehensive analysis unit 32 for reporting. The information output unit 42 outputs, for example, a paper report, an e-mail, or an SNS message intended to be read by the user of the behavior analysis device 1. Note that SNS is an abbreviation for social networking service. That is, the information output unit 42 outputs information about the event detected by the comprehensive analysis unit 32.
In addition, "event" and "symptom" may be collectively called "event".

外部装置８１は、行動分析装置１が情報交換を行う相手の装置である。具体的には、外部装置８１は、上でも述べたように、他の施設等に設置されている別の行動分析装置や、あるいは、行動分析装置１が入出力する情報を蓄積したり提供したりするコンピューターサーバー装置である。また、外部装置８１は、ウェブの形式で情報を提供するウェブサーバー装置（例えば、いわゆる、学校非公式サイトのサーバー）であってもよい。 The external device 81 is a device of the other party with which the behavior analysis device 1 exchanges information. Specifically, as described above, the external device 81 stores or provides information input / output by another behavior analysis device installed in another facility or the like, or the behavior analysis device 1. It is a computer server device. Further, the external device 81 may be a web server device (for example, a server of a so-called unofficial school site) that provides information in the form of a web.

カメラ８２は、映像を撮影するためのビデオカメラである。なお、カメラ８２が、静止画を撮影するためのスチルカメラであってもよい。
マイクロフォン８３は、設置されている場所における音を集音して、電気信号として出力する音声信号取得手段である。
既に述べたように、カメラ８２とマイクロフォン８３とは原則的にペアで、所定の場所に設定される。 The camera 82 is a video camera for capturing an image. The camera 82 may be a still camera for capturing a still image.
The microphone 83 is an audio signal acquisition means that collects sound at a place where it is installed and outputs it as an electric signal.
As already described, the camera 82 and the microphone 83 are basically paired and set in a predetermined place.

次に、映像解析部１３による映像解析の例について説明する。
図３は、１台のカメラ８２が撮影した映像を、映像解析部１３が解析した結果の例を示す概略図である。同図に示す解析処理は、人の認識と個人の識別である。本例は、教室に設置されたカメラ８２が撮影した教室の映像の解析結果である。映像解析部１３は、映像に含まれる人を認識する。その結果、映像解析部１３は、映像内に含まれる人に、Ａ，Ｂ，Ｃなどといったラベルを付与する。そして、映像内において人が映っている場所に、それらのラベルを付ける。また、映像解析部１３は、個人特徴記憶部３１から読み出した個人特徴の情報に基づいて、個人を識別する処理を行う。その結果、映像解析部１３は、ラベルと、認識結果の人を識別する情報（例えば、氏名や、個人ＩＤなど）とを関連付ける。同図に示す例では、教室内で人が認識された箇所にラベルが表示されている。また、映像枠の外に、ラベル（Ａ，Ｂ，Ｃ等）と個人識別情報（この例では、氏名）とが対応付けて表示されている。なお、映像解析部１３は、個人識別情報を映像内の位置に関連付ける形で、保存することができる。
なお、カメラの設置場所は、教室に限られない。教室以外の必要なスペースを映すようにカメラを設置し、映像内の人の認識と個人の識別を行うようにすることもできる。 Next, an example of image analysis by the image analysis unit 13 will be described.
FIG. 3 is a schematic view showing an example of the result of analysis by the image analysis unit 13 of the image taken by one camera 82. The analysis process shown in the figure is human recognition and individual identification. This example is an analysis result of a classroom image taken by a camera 82 installed in the classroom. The image analysis unit 13 recognizes a person included in the image. As a result, the image analysis unit 13 assigns labels such as A, B, and C to the people included in the image. Then, put those labels on the places where people are shown in the video. Further, the image analysis unit 13 performs a process of identifying an individual based on the personal feature information read from the personal feature storage unit 31. As a result, the image analysis unit 13 associates the label with the information that identifies the person in the recognition result (for example, name, personal ID, etc.). In the example shown in the figure, a label is displayed at a place where a person is recognized in the classroom. Further, outside the video frame, labels (A, B, C, etc.) and personal identification information (name in this example) are displayed in association with each other. The image analysis unit 13 can save the personal identification information in a form associated with the position in the image.
The location of the camera is not limited to the classroom. It is also possible to install a camera so as to show the necessary space other than the classroom so that the person can be recognized and the individual can be identified in the image.

図４は、１台のカメラ８２が撮影した映像を、映像解析部１３が解析した結果の例を示す概略図である。同図に示す解析処理は、映像内の人と人との間の距離の把握である。本例は、教室に設置されたカメラ８２が撮影した教室の映像の解析結果である。映像解析部１３は、映像に含まれる人を認識するとともに、各個人を識別する。その結果、映像解析部１３は、映像内に含まれる人に、Ａ，Ｂなどといったラベルを付与するとともに、ラベルと個人識別情報（例えば、氏名）との対応付けを行う。そして、映像解析部１３は、ＡおよびＢとラベル付けされた人物間の距離を計測する。なお、映像解析部１３は、算出された距離の情報を、この映像に付加して保存することができる。
映像内における個人間の距離の算出は、予め与えられたパラメーターに基づいて測量の手法を使って行われる。ここで、パラメーターとは、撮影対象の教室の平面図や、カメラ８２の設置場所（平面視したときの位置と、床面からの高さ）や、カメラの向き（平面における向きと、俯角（または仰角））や、カメラの撮影画角である。測量の手法による距離の算出自体は、既存の技術を用いて行うことができる。また、距離算出の際、映像に含まれる特定の被写体（教室内に固定的に設置された物や、画像マーカーなど）を用いて、被写体の位置の補正を適宜行う様にしてもよい。
なお、図示する例では、画面内に２人の人物だけが映っている状況を示しているが、映像内の人数に関する制約はない。映像内の任意の２人の間の距離を計測することができる。
また、カメラの設置場所は、教室に限られない。教室以外の必要なスペースを映すようにカメラを設置し、映像内に映る人と人の間の距離を算出するようにしてもよい。 FIG. 4 is a schematic view showing an example of the result of analysis by the image analysis unit 13 of the image taken by one camera 82. The analysis process shown in the figure is to grasp the distance between people in the image. This example is an analysis result of a classroom image taken by a camera 82 installed in the classroom. The image analysis unit 13 recognizes the person included in the image and identifies each individual. As a result, the image analysis unit 13 assigns labels such as A and B to the people included in the image, and associates the labels with personal identification information (for example, name). Then, the image analysis unit 13 measures the distance between the persons labeled A and B. The image analysis unit 13 can add the calculated distance information to the image and save it.
The calculation of the distance between individuals in the video is performed using a surveying method based on parameters given in advance. Here, the parameters are the plan view of the classroom to be photographed, the installation location of the camera 82 (the position when viewed in a plan view and the height from the floor surface), and the orientation of the camera (the orientation in the plane and the depression angle (direction in the plane and the depression angle). Or elevation angle)) or the shooting angle of view of the camera. The distance calculation itself by the surveying method can be performed by using the existing technique. Further, when calculating the distance, the position of the subject may be appropriately corrected by using a specific subject (object fixedly installed in the classroom, an image marker, etc.) included in the image.
In the illustrated example, only two people are shown on the screen, but there is no restriction on the number of people in the video. The distance between any two people in the video can be measured.
Also, the location of the camera is not limited to the classroom. A camera may be installed so as to show a necessary space other than the classroom, and the distance between people shown in the image may be calculated.

図５は、１台のカメラ８２が撮影した映像を、映像解析部１３が解析した結果の例を示す概略図である。同図に示す解析処理は、行動パターンの検出である。本例は、教室に設置されたカメラ８２が撮影した教室の映像の解析結果である。映像解析部１３は、映像に含まれる人を認識するとともに、各個人を識別する。その結果、映像解析部１３は、映像内の人に、Ａ，Ｂ，Ｃ，Ｄなどといったラベルを付与するとともに、ラベルと個人識別情報（例えば、氏名）との対応付けを行う。そして、映像解析部１３は、予め記憶している行動パターンを参照して、解析対象の映像内に該当する行動パターンが含まれているかどうかを判定する。予め記憶されている行動パターンは、特定の人の動き方のパターンや、複数の人の相対的な位置関係のパターンや、それらの組み合わせのパターンを含む。なお、人の動きのパターンは、人の移動の軌跡や、人の手足などの動きの軌跡を表すものである。図示する例では、ＡとＢとＣがＤを取り囲むというパターンが検出される。なお、映像解析部１３は、検出された行動パターンの情報をこの映像に付加して保存することができる。 FIG. 5 is a schematic view showing an example of the result of analysis by the image analysis unit 13 of the image taken by one camera 82. The analysis process shown in the figure is the detection of behavior patterns. This example is an analysis result of a classroom image taken by a camera 82 installed in the classroom. The image analysis unit 13 recognizes the person included in the image and identifies each individual. As a result, the image analysis unit 13 assigns labels such as A, B, C, and D to the person in the image, and associates the label with personal identification information (for example, name). Then, the video analysis unit 13 refers to the behavior pattern stored in advance and determines whether or not the corresponding behavior pattern is included in the video to be analyzed. Pre-stored behavior patterns include patterns of movement of a specific person, patterns of relative positional relationships of a plurality of people, and patterns of combinations thereof. The pattern of human movement represents the locus of human movement and the locus of movement of human limbs and the like. In the illustrated example, a pattern in which A, B, and C surround D is detected. The video analysis unit 13 can add the detected behavior pattern information to the video and save it.

図６は、総合解析部３２の内部の機能構成を示すブロック図である。図示するように、総合解析部３２は、兆候検出ルール記憶部３２１と、一時パターン検出部３２２と、長期パターン検出部３２３と、検出結果蓄積部３２４とを含んで構成される。 FIG. 6 is a block diagram showing an internal functional configuration of the comprehensive analysis unit 32. As shown in the figure, the comprehensive analysis unit 32 includes a sign detection rule storage unit 321, a temporary pattern detection unit 322, a long-term pattern detection unit 323, and a detection result storage unit 324.

兆候検出ルール記憶部３２１は、映像解析結果または音声解析結果に基づき、あるいはそれら両方に基づき、事象や兆候を検出するためのルールを記憶するものである。つまり、兆候検出ルール記憶部３２１は、映像解析部１３によって検出された映像パターンの条件と音声解析部２３によって検出された音声パターンの条件に基づいて事象を検出するルールを記憶するものである。
なお、兆候検出ルール記憶部３２１のデータの構成例については、後で説明する。 The symptom detection rule storage unit 321 stores a rule for detecting an event or a symptom based on a video analysis result, an audio analysis result, or both. That is, the sign detection rule storage unit 321 stores a rule for detecting an event based on the condition of the video pattern detected by the video analysis unit 13 and the condition of the audio pattern detected by the audio analysis unit 23.
An example of the data configuration of the sign detection rule storage unit 321 will be described later.

一時パターン検出部３２２は、映像解析部１３や音声解析部２３から渡される映像や音声の解析結果に基づいて、一時的なパターンを検出する機能を有する。一時パターン検出部３２２は、その検出結果を検出結果蓄積部３２４に書き込む。なお、一時パターン検出部３２２は、兆候検出ルール記憶部３２１に記憶されているルールに基づいて一時的なパターンを検出する。つまり、一時パターン検出部３２２は、外部から取得したパターンが兆候検出ルール記憶部３２１に記憶されている条件にマッチしたときに、その事象を一時的なパターンとして検出する。つまり、一時パターン検出部３２２は、一時点において検出された映像パターンと音声パターンに基づいて事象を検出する。
なお、一時パターン検出部３２２は、検出した結果を、総合解析部３２の外部にも提供する。 The temporary pattern detection unit 322 has a function of detecting a temporary pattern based on the analysis result of the video or audio passed from the video analysis unit 13 or the audio analysis unit 23. The temporary pattern detection unit 322 writes the detection result in the detection result storage unit 324. The temporary pattern detection unit 322 detects a temporary pattern based on the rules stored in the sign detection rule storage unit 321. That is, when the pattern acquired from the outside matches the condition stored in the sign detection rule storage unit 321, the temporary pattern detection unit 322 detects the event as a temporary pattern. That is, the temporary pattern detection unit 322 detects an event based on the video pattern and the audio pattern detected at the temporary point.
The temporary pattern detection unit 322 also provides the detected result to the outside of the comprehensive analysis unit 32.

長期パターン検出部３２３は、検出結果蓄積部３２４に蓄積されている所定のパターンの時系列に基づいて、長期的なパターンを検出する機能を有する。長期パターン検出部３２３は、その検出結果を検出結果蓄積部３２４に書き込む。長期パターン検出部３２３は、自らが検出結果蓄積部３２４に書き込んだ検出結果に基づいて、さらなる長期的なパターンを検出する場合がある。また、長期パターン検出部３２３が長期的なパターンを検出する際に、その時点での映像解析結果を映像解析部１３から受け取って利用してもよい。また、長期パターン検出部３２３が長期的なパターンを検出する際に、その時点での音声解析結果を音声解析部２３から受け取って利用してもよい。
なお、長期パターン検出部３２３は、兆候検出ルール記憶部３２１に記憶されているルールに基づいて長期的なパターンを検出する。つまり、長期パターン検出部３２３は、外部から取得したパターンや、検出結果蓄積部３２４に蓄積されている過去の検出結果が兆候検出ルール記憶部３２１に記憶されている条件にマッチしたときに、その事象を一時的なパターンとして検出する。つまり、長期パターン検出部３２３は、少なくとも、複数の時点に関して一時パターン検出部３２２によって検出された事象に基づいて、複数の時点に渡る事象を検出する。
なお、長期パターン検出部３２３は、検出した結果を、総合解析部３２の外部にも提供する。 The long-term pattern detection unit 323 has a function of detecting a long-term pattern based on a time series of predetermined patterns stored in the detection result storage unit 324. The long-term pattern detection unit 323 writes the detection result in the detection result storage unit 324. The long-term pattern detection unit 323 may detect a further long-term pattern based on the detection result written in the detection result storage unit 324 by itself. Further, when the long-term pattern detection unit 323 detects a long-term pattern, the image analysis result at that time may be received from the image analysis unit 13 and used. Further, when the long-term pattern detection unit 323 detects a long-term pattern, the voice analysis result at that time may be received from the voice analysis unit 23 and used.
The long-term pattern detection unit 323 detects a long-term pattern based on the rules stored in the sign detection rule storage unit 321. That is, when the pattern acquired from the outside and the past detection results stored in the detection result storage unit 324 match the conditions stored in the sign detection rule storage unit 321, the long-term pattern detection unit 323 sets the pattern. Detect the event as a temporary pattern. That is, the long-term pattern detection unit 323 detects an event over a plurality of time points based on the event detected by the temporary pattern detection unit 322 at least at a plurality of time points.
The long-term pattern detection unit 323 also provides the detected result to the outside of the comprehensive analysis unit 32.

検出結果蓄積部３２４は、一時パターン検出部３２２や長期パターン検出部３２３によって検出されたパターン（事象や兆候等）を、記憶して蓄積する。 The detection result storage unit 324 stores and stores patterns (events, signs, etc.) detected by the temporary pattern detection unit 322 and the long-term pattern detection unit 323.

図７は、兆候検出ルール記憶部３２１が記憶するルールの例を示す概略図である。図示するように、兆候検出ルール記憶部３２１は、表形式の構造のデータを記憶する。この表は、番号、事象、映像解析手法、音声解析手法の各項目を有する。番号は、ルールごとに付与される通し番号である。事象の欄は、検出対象である兆候の名称を格納する。映像解析手法の欄は、映像解析の手法によって検出されるパターンを格納する。音声解析手法の欄は、音声解析の手法によって検出されるパターンを格納する。これら映像解析手法または音声解析手法の欄に記載されているパターンがマッチしたときに、事象の欄に記載されている事象（または兆候）が起こったことが検知される。なお、ルールとして、音声解析手法によるパターンと映像解析手法によるパターンの両方が記載されている場合には、両方のパターンがマッチしたときにその事象（または兆候）が起こったことが検知される。 FIG. 7 is a schematic view showing an example of a rule stored in the sign detection rule storage unit 321. As shown in the figure, the sign detection rule storage unit 321 stores data having a tabular structure. This table has items of numbers, events, video analysis methods, and audio analysis methods. The number is a serial number assigned to each rule. The event column stores the name of the symptom to be detected. The image analysis method column stores the patterns detected by the image analysis method. The voice analysis method column stores the patterns detected by the voice analysis method. When the patterns described in the video analysis method or audio analysis method column match, it is detected that the event (or sign) described in the event column has occurred. As a rule, when both the pattern by the audio analysis method and the pattern by the video analysis method are described, it is detected that the event (or sign) has occurred when both patterns match.

なお、一時パターン検出部３２２や長期パターン検出部３２３は、兆候検出ルール記憶部３２１に記憶されているルールに基づいて、所定の事象（または兆候）が起こったか否かの二値の判断をするだけでなく、その尤度を判断結果として出力してもよい。
つまり、ある事象（または兆候）の前提条件が、成立したか否かの二値だけによる判断ではなく、前提条件が成立した程度に応じた尤度を、判断結果としてもよい。
また、複数の前提条件が規定されている場合、すべての前提条件が成立した場合にのみその事象（または兆候）が起こったことを判断するのではなく、複数の前提条件のうちのいくつが成立したかによって、その事象（または兆候）の尤度を計算し出力してもよい。なお、このとき、複数の前提条件のそれぞれに重みを予め付与しておいて、尤度計算に用いてもよい。 The temporary pattern detection unit 322 and the long-term pattern detection unit 323 determine binary whether or not a predetermined event (or sign) has occurred based on the rules stored in the sign detection rule storage unit 321. Not only that, the likelihood may be output as a judgment result.
That is, the likelihood may be determined according to the degree to which the precondition is satisfied, instead of the judgment based only on the binary value of whether or not the precondition of a certain event (or sign) is satisfied.
Also, when multiple preconditions are specified, it is not determined that the event (or sign) has occurred only when all the preconditions are satisfied, but how many of the multiple preconditions are satisfied. Depending on the situation, the likelihood of the event (or sign) may be calculated and output. At this time, weights may be given to each of the plurality of preconditions in advance and used for the likelihood calculation.

図示する例における１番のルールは、「仲間はずれ、無視」に関するルールである。このルールにおける映像解析手法による条件は、「休み時間あるいは昼食時に、高い頻度で、特定生徒がグループから孤立」というパターンである。また、このルールには音声解析手法による条件はない。
なお、総合解析部３２は、映像や音声に関連付けられた時刻情報と当該施設におけるスケジュール情報（例えば、学校での時間割の情報）とを参照することにより、解析対象の映像や音声が、例えば「休み時間あるいは昼食時」のものであるかどうかを判断する。
また、このルールに規定されている「高い頻度」は、別途定められる頻度に関する閾値に基づいて判断される。 The first rule in the illustrated example is a rule regarding "disappearing, ignoring". The condition by the image analysis method in this rule is the pattern that "a specific student is frequently isolated from the group during breaks or lunch". In addition, this rule does not have a condition based on the voice analysis method.
In addition, the comprehensive analysis unit 32 refers to the time information associated with the video and audio and the schedule information at the facility (for example, the information of the timetable at the school), so that the video and audio to be analyzed can be, for example, ". Determine if it is "at break or lunch".
In addition, the "high frequency" specified in this rule is determined based on a threshold value for frequency specified separately.

図示する例における２番のルールは、「身体への攻撃」に関するルールである。このルールにおける映像解析手法による条件は、「激しい動き、手・腕や足・脚による相手への攻撃」というパターンである。また、このルールにおける音声解析手法による条件は、「所定量以上の大声、特定キーワードの検知」というパターンである。
なお、総合解析部３２は、映像に映っている人の単位時間当たりの動きの量が、一時的にでも、所定の閾値を超えるか否かによって「激しい動き」を検知する。 The second rule in the illustrated example is a rule regarding "attack on the body". The condition of the image analysis method in this rule is a pattern of "violent movement, attack on the opponent by hands / arms / legs / legs". Further, the condition by the voice analysis method in this rule is a pattern of "detection of a loud voice exceeding a predetermined amount and a specific keyword".
The comprehensive analysis unit 32 detects "violent movement" depending on whether or not the amount of movement of the person shown in the image per unit time exceeds a predetermined threshold value even temporarily.

図示する例における３番のルールは、「嫌がらせ」に関するルールである。このルールにおける映像解析手法による条件は、「顔の表情が、泣いているあるいは困っている」というパターンである。また、このルールにおける音声解析手法による条件は、「嘲笑、泣き声、特定のキーワードの検知」というパターンである。
映像解析部１３は、感情種別（泣き、笑い、怒り、困惑等）ごとに顔の表情の特徴量を予め保持しておき、映像に映る人の顔の特徴が、その感情種別にマッチする度合いに応じて、感情種別ごとの尤度を出力する。
そして、総合解析部３２は、感情種別の尤度に応じて、このルールにマッチする度合いを計算する。 The third rule in the illustrated example is a rule regarding "harassment". The condition by the image analysis method in this rule is the pattern that "the facial expression is crying or in trouble". In addition, the condition by the voice analysis method in this rule is a pattern of "ridicule, crying, detection of a specific keyword".
The image analysis unit 13 holds in advance the facial expression features for each emotion type (crying, laughing, anger, confusion, etc.), and the degree to which the facial features of the person appearing in the image match the emotion type. The likelihood for each emotion type is output according to.
Then, the comprehensive analysis unit 32 calculates the degree of matching with this rule according to the likelihood of the emotion type.

ここで、総合解析部３２が有する長期パターン検出部３２３の意義について補足的に説明する。人の行動を、映像や音声で分析する場合、必ずしもその一時点の表層的特徴が、その人の感情に正確に対応しているとは限らない。例えば、上記の１番のルールにも「休み時間あるいは昼食時に、高い頻度で、特定生徒がグループから孤立」にも「高い頻度で」という条件が含まれている。これは、一時的な行動だけから把握できる意味だけではなく、長期的且つ継続的な行動パターンの傾向から読み取れる意味が重要であることを表している。
したがって、本実施形態による兆候検出ルール記憶部３２１は、一時点のパターンのみにヒットする条件ではなく、長期間を通したパターンにヒットする条件をルールとして記憶できるようにしている。また、本実施形態は、長期パターン検出部３２３を設けることにより、過去の検出結果を参照しながら兆候を発見できるようにしている。
このように、本実施形態では、映像や音声を基にした長期の傾向によって、行動を分析することが可能となっている。 Here, the significance of the long-term pattern detection unit 323 possessed by the comprehensive analysis unit 32 will be supplementarily described. When analyzing a person's behavior with video or audio, the superficial features of the temporary point do not always correspond exactly to the person's emotions. For example, the first rule above also includes the condition that "at high frequency during breaks or lunch, specific students are isolated from the group" and "highly frequently". This means that it is important not only to have a meaning that can be grasped only from temporary behavior, but also to have a meaning that can be read from the tendency of long-term and continuous behavior patterns.
Therefore, the sign detection rule storage unit 321 according to the present embodiment can store not only the condition of hitting the pattern at one time point but the condition of hitting the pattern over a long period of time as a rule. Further, in the present embodiment, by providing the long-term pattern detection unit 323, it is possible to detect a symptom while referring to the past detection results.
As described above, in the present embodiment, it is possible to analyze the behavior by a long-term tendency based on video and audio.

図８は、行動分析装置１におけるユーザーの権限を設定するための権限設定テーブルの構成を示す概略図である。権限設定テーブルは、行動分析装置１内の記憶手段内の所定の領域に設けられる。図示するように、権限設定テーブルは、ユーザーＩＤと、氏名と、役職と、担当クラスと、アクセス権限の各項目を含む。なお、アクセス権限は、全校データへのアクセス権限と、担当クラスデータへのアクセス権限との情報を含む。当事者のプライバシー等を考慮して、行動分析装置１は、ユーザーに応じてデータへのアクセスを制限する。担当クラスデータとは、そのユーザーが担当しているクラスのデータである。全校データとは、各クラスのデータを含む、その学校全体のデータである。そして、図中において、アクセス権限における「〇」は、ユーザーがそのデータへのアクセス権限を有していることを意味する。また、アクセス権限における「−」は、ユーザーがそのデータへのアクセス権限を有していないことを意味する。 FIG. 8 is a schematic view showing the configuration of the authority setting table for setting the user's authority in the behavior analysis device 1. The authority setting table is provided in a predetermined area in the storage means in the behavior analysis device 1. As shown in the figure, the authority setting table includes each item of user ID, name, job title, class in charge, and access authority. The access authority includes information on the access authority to the school-wide data and the access authority to the class data in charge. In consideration of the privacy of the parties, the behavior analysis device 1 restricts access to the data according to the user. The responsible class data is the data of the class that the user is in charge of. School-wide data is data for the entire school, including data for each class. Then, in the figure, "○" in the access authority means that the user has the access authority to the data. Further, "-" in the access authority means that the user does not have the access authority to the data.

図示する例では、役職が「校長」、「副校長」、「教頭」である場合、そのユーザーは全校データへのアクセス権限を有している。また、役職が「教諭（担任）」である場合、そのユーザーは、全校データへのアクセス権限を有さず、自己が担当するクラスのデータのみへのアクセス権限を有している。 In the illustrated example, if the job titles are "Principal", "Vice Principal", and "Vice Principal", the user has access to school-wide data. In addition, when the job title is "teacher (homeroom teacher)", the user does not have the access authority to the data of the whole school, but has the access authority to only the data of the class to which he / she is in charge.

ユーザーは、付与されたアクセス権限の範囲内においてのみ、検出結果のデータを参照することができる。また、ユーザーは、付与されたアクセス権限の範囲内においてのみ、映像のファイルや音声のファイルを再生させることができる。行動分析装置１は、サインインしているユーザーの権限に応じて、許される情報のみをそのユーザーの端末装置に対して送信するようにする。 The user can refer to the detection result data only within the range of the granted access authority. In addition, the user can play a video file or an audio file only within the range of the granted access authority. The behavior analysis device 1 transmits only the permitted information to the terminal device of the user according to the authority of the user who is signed in.

図９は、情報出力部４２が出力する情報の一例を示す概略図である。図示する出力情報は、総合解析部３２による解析結果の一つである。図示する出力情報は、宛先と、種別と、発生日時と、緊急度と、事象サマリーと、映像リンクの各項目の情報を含んでいる。なお、この出力情報は、学校内のあるクラスの教室の映像と音声を分析した結果、検出された事象に関するものである。 FIG. 9 is a schematic view showing an example of the information output by the information output unit 42. The illustrated output information is one of the analysis results by the comprehensive analysis unit 32. The illustrated output information includes destination, type, occurrence date and time, urgency, event summary, and information on each item of the video link. This output information relates to an event detected as a result of analyzing the video and audio of a classroom in a certain class in the school.

宛先は、この出力情報の宛先である。例えば、電子メールでこの情報が出力される場合、宛先として適切なユーザーのメールアドレスが記載される。図示する例では、宛先として「Ｅ木教諭」が記載されている。このユーザーは、事象が検出されたクラスの担任教諭である。 The destination is the destination of this output information. For example, when this information is output by e-mail, the e-mail address of the appropriate user is described as the destination. In the illustrated example, "E-tree teacher" is described as the destination. This user is the homeroom teacher of the class in which the event was detected.

種別は、検出された事象（または兆候）の種別を表す。種別の名称は、兆候検出ルール記憶部３２１内に記憶されているものである。図示する例では、種別は「身体への攻撃」である。
発生日時は、検出された事象（または兆候）の映像や音声に関連付けられた時刻である。長期の事象（または兆候）が検出された場合には、発生日時欄には幅のある期間が記載される。 The type represents the type of event (or symptom) detected. The name of the type is stored in the sign detection rule storage unit 321. In the illustrated example, the type is "attack on the body".
The date and time of occurrence is the time associated with the video or audio of the detected event (or sign). If a long-term event (or symptom) is detected, the date and time of occurrence column will indicate a wide range of periods.

緊急度は、この出力情報の緊急度を表す区別の情報である。例えば、緊急度は、「緊急」と「推定」と「注意」の３つの区別で分類される。そして、この３つの区別においては、緊急度の大きさは、「緊急」＞「推定」＞「注意」の順である。例えば、「緊急」は、緊急な対応を要する（いじめ等の）行動が推定されていることを表す。また、「推定」は、緊急な対応を必要としない（いじめ等の）行動が推定されていることを表す。また、「注意」は、推定するまでには至らないが注意を要する事項、あるいは同種の注意が２日以上連続した場合には推定となる事象（または兆候）が検知されていることを表す。 The urgency is the distinction information indicating the urgency of this output information. For example, the degree of urgency is classified into three distinctions: "urgent", "estimated", and "attention". In these three distinctions, the degree of urgency is in the order of "urgent"> "estimation"> "caution". For example, "urgent" means that an action requiring an urgent response (such as bullying) is presumed. In addition, "estimation" indicates that behavior (such as bullying) that does not require urgent response is presumed. In addition, "caution" indicates that a matter that requires caution but is not estimated, or an event (or sign) that is estimated when the same type of caution is continued for two days or more has been detected.

事象サマリーは、検出された事象の具体的な説明文や、その事象に関わる当事者（生徒）の氏名等の識別情報を含んだサマリー情報である。当事者の識別情報は、映像解析部１３による映像の個人識別処理や音声解析部２３による音声の個人識別処理によって得られたものである。これらの個人識別情報は、検出された事象（または兆候）に紐づけられて記憶されているものである。つまり、情報出力部４２は、事象に関して映像解析部１３または音声解析部２３の少なくともいずれか一方によって識別された個人識別の情報を含んだ事象サマリー情報を出力する。
映像リンクは、検出された事象の判断の基になった映像や音声の該当箇所を再生するためのリンク情報である。 The event summary is summary information including a specific description of the detected event and identification information such as the names of the parties (students) involved in the event. The identification information of the parties is obtained by the personal identification processing of the video by the image analysis unit 13 and the personal identification processing of the audio by the voice analysis unit 23. These personal identification information are stored in association with the detected event (or sign). That is, the information output unit 42 outputs the event summary information including the personal identification information identified by at least one of the video analysis unit 13 and the audio analysis unit 23 regarding the event.
The video link is link information for reproducing the corresponding part of the video or audio that is the basis for determining the detected event.

本実施形態によれば、音声だけでなく、映像と音声に基づく事象を、幅広く把握することができ、問題行動（いじめなど）の防止あるいは早期発見につながる。 According to this embodiment, not only audio but also video and audio-based events can be widely grasped, which leads to prevention or early detection of problem behavior (bullying, etc.).

また、本実施形態によれば、一時点において検出された映像パターンと音声パターンに基づいて事象を検出する一時パターン検出部と、少なくとも、複数の時点に関して一時パターン検出部によって検出された事象に基づいて、複数の時点に渡る事象を検出する長期パターン検出部と、を有するため、複数の時点に渡る事象を検出することができる。 Further, according to the present embodiment, a temporary pattern detection unit that detects an event based on a video pattern and an audio pattern detected at a temporary point, and at least an event detected by the temporary pattern detection unit at a plurality of time points are used. Since it has a long-term pattern detection unit that detects an event over a plurality of time points, it is possible to detect an event over a plurality of time points.

上記実施形態では、個人特徴記憶部３１は、個人の視覚的特徴として、顔の特徴や、身体の特徴や、服装の特徴を含んだ特徴を記憶することとした。本変形例では、所定のタイミングで（例えば、学校における日々の最初の授業の開始時に）、この個人の視覚的特徴を更新するようにする。具体的には、施設におけるスケジュール情報（例えば、学校の時間割情報）にしたがって、日々（施設の休業日を除く）の所定の時刻に、映像取得部１１が取得して映像蓄積部１２に格納された情報に基づいて、映像解析部１３が画像特徴を算出する。このとき、映像解析部１３は、予め記憶された座席表の情報を参照することにより、映像内の位置と個人との関係を推定する。そして、映像解析部１３は、算出された特徴量を、個人特徴記憶部３１に書き込む。例えば、個人の顔の特徴が日々変わる度合いは小さいが、個人の服装（色や形）や髪型は、日によって大きく変わる可能性がある。また、個人の身体的特徴（身長や体型等）は、日々の変化度合いは小さいが、例えば半年単位で捉えた場合には大きく変わる可能性がある。本変形例によれば、所定のタイミングで個人の特徴のデータを更新するため、個人識別の精度をより一層上げることができる。
なお、座席表の情報は、識別され得る個人の配置の場所の範囲を表す情報である。あるいは、座席表の情報は、識別され得る複数の個人間の相対的な位置関係を表す情報である。 In the above embodiment, the personal feature storage unit 31 memorizes features including facial features, body features, and clothing features as individual visual features. In this variant, the visual characteristics of this individual are updated at a given time (eg, at the beginning of the first daily lesson in school). Specifically, according to the schedule information in the facility (for example, school timetable information), the image acquisition unit 11 acquires and stores the image storage unit 12 at a predetermined time on a daily basis (excluding holidays of the facility). The image analysis unit 13 calculates the image features based on the information obtained. At this time, the image analysis unit 13 estimates the relationship between the position in the image and the individual by referring to the seating chart information stored in advance. Then, the image analysis unit 13 writes the calculated feature amount in the personal feature storage unit 31. For example, an individual's facial features change little daily, but an individual's clothing (color and shape) and hairstyle can change significantly from day to day. In addition, although the degree of daily change in individual physical characteristics (height, body shape, etc.) is small, it may change significantly when viewed on a semi-annual basis, for example. According to this modification, since the data of individual characteristics is updated at a predetermined timing, the accuracy of individual identification can be further improved.
The seating chart information is information that represents the range of places where individuals can be identified. Alternatively, the seating chart information is information that represents the relative positional relationship between a plurality of individuals that can be identified.

つまり、映像解析部１３は、記憶媒体に記憶された、施設におけるスケジュール情報と、所定の場所における個人の座席表情報とを読み出す。そして、映像解析部１３は、スケジュール情報によって定められる所定のタイミングにおける映像に基づき、座席表情報によって定められまたは推定される個人の識別情報に対応する個人ごとの視覚的特徴を算出する。そして、映像解析部１３は、算出された前記視覚的特徴によって、個人特徴記憶部３１に記憶されている映像に関する個人特徴を更新する。また、座席情報の利用は個人識別の精度向上に限らず、席替えによる人間関係の変化の傾向等の把握にも利用できる。 That is, the image analysis unit 13 reads out the schedule information in the facility and the seating chart information of the individual in the predetermined place stored in the storage medium. Then, the image analysis unit 13 calculates the visual characteristics of each individual corresponding to the individual identification information determined or estimated by the seating chart information based on the image at the predetermined timing determined by the schedule information. Then, the image analysis unit 13 updates the personal characteristics related to the image stored in the personal feature storage unit 31 by the calculated visual features. In addition, the use of seat information is not limited to improving the accuracy of personal identification, but can also be used to grasp the tendency of changes in human relationships due to seat changes.

以上説明した少なくともひとつの実施形態によれば、映像解析部によって検出された映像パターンと音声解析部によって検出された音声パターンとに基づいて、人の行動に関する特定の事象を検出する総合解析部を持つことにより、行動における問題を防止したり、兆候を早期に発見したりすることができる。
なお、上記実施形態では、学校施設におけるカメラやマイクロフォンから取得した映像、音声に基づいた人の行動に関する特定事象の検出について、主に１つの空間（教室）を対象に説明したが、それぞれの空間（教室）における分析結果を比較することで、各クラスにおける傾向や学校全体の傾向等を把握することも可能となり、例えばクラス替えや担任配置などにも分析結果を活用することができる。
また、上記実施形態では、行動分析装置１を学校施設に適用した例で説明したが、例えば１つの施設において集団行動が行われる、幼稚園、保育園、介護施設などにも適用でき、同様に施設内における人の行動分析により、事前に問題の予兆の発見や防止することができる。 According to at least one embodiment described above, a comprehensive analysis unit that detects a specific event related to human behavior based on a video pattern detected by the video analysis unit and an audio pattern detected by the audio analysis unit. By having it, it is possible to prevent behavioral problems and detect signs early.
In the above embodiment, the detection of specific events related to human behavior based on images and sounds acquired from a camera or microphone in a school facility has been described mainly for one space (classroom), but each space has been described. By comparing the analysis results in (classroom), it is possible to grasp the tendency in each class and the tendency of the whole school, and for example, the analysis result can be used for class change and homeroom assignment.
Further, in the above embodiment, the behavior analysis device 1 has been described as an example of being applied to a school facility, but it can also be applied to a kindergarten, a nursery school, a nursing facility, etc. where group behavior is performed in one facility, and similarly, in the facility. By analyzing human behavior in Japan, it is possible to detect and prevent signs of problems in advance.

なお、上述した実施形態の行動分析装置の少なくとも一部の機能をコンピューターで実現するようにしても良い。その場合、行動分析装置の機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリーのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 In addition, at least a part of the functions of the behavioral analysis device of the above-described embodiment may be realized by a computer. In that case, the program for realizing the function of the behavioral analyzer may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. .. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above-mentioned program may be a program for realizing a part of the above-mentioned functions, and may be a program for realizing the above-mentioned functions in combination with a program already recorded in the computer system.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Although some embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the gist of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, as well as in the scope of the invention described in the claims and the equivalent scope thereof.

１…行動分析装置、１１…映像取得部、１２…映像蓄積部、１３…映像解析部、２１…音声取得部、２２…音声蓄積部、２３…音声解析部、３１…個人特徴記憶部、３２…総合解析部、４１…情報交換部、４２…情報出力部、８１…外部装置、８２…カメラ、８３…マイクロフォン、３２１…兆候検出ルール記憶部（検出ルール記憶部）、３２２…一時パターン検出部、３２３…長期パターン検出部、３２４…検出結果蓄積部 1 ... Behavioral analyzer, 11 ... Video acquisition unit, 12 ... Video storage unit, 13 ... Video analysis unit, 21 ... Voice acquisition unit, 22 ... Voice storage unit, 23 ... Voice analysis unit, 31 ... Personal feature storage unit, 32 ... Comprehensive analysis unit, 41 ... Information exchange unit, 42 ... Information output unit, 81 ... External device, 82 ... Camera, 83 ... Microphone, 321 ... Sign detection rule storage unit (Detection rule storage unit), 322 ... Temporary pattern detection unit 323 ... Long-term pattern detection unit, 324 ... Detection result storage unit

Claims

The video acquisition unit that acquires video and
An image analysis unit that analyzes a person appearing in the image acquired by the image acquisition unit and detects an image pattern, and an image analysis unit.
A voice acquisition unit that acquires voice, and
A voice analysis unit that analyzes a human voice included in the voice acquired by the voice acquisition unit and detects a voice pattern, and a voice analysis unit.
A comprehensive analysis unit that detects a specific event related to human behavior based on the video pattern detected by the video analysis unit and the audio pattern detected by the audio analysis unit.
An information output unit that outputs information about the event detected by the comprehensive analysis unit, and an information output unit.
With
The comprehensive analysis unit includes a detection rule storage unit that stores rules for detecting an event based on the conditions of the video pattern detected by the video analysis unit and the conditions of the audio pattern detected by the audio analysis unit. By referring to the rule read from the detection rule storage unit, the event is detected based on the video pattern and the audio pattern.
The comprehensive analysis unit
A temporary pattern detection unit that detects the event based on the video pattern and the audio pattern detected at one time point,
At least, a long-term pattern detection unit that detects an event over a plurality of time points based on the event detected by the temporary pattern detection unit at a plurality of time points.
Behavioral analyzer equipped with.

The video acquisition unit that acquires video and
An image analysis unit that analyzes a person appearing in the image acquired by the image acquisition unit and detects an image pattern, and an image analysis unit.
A voice acquisition unit that acquires voice, and
A voice analysis unit that analyzes a human voice included in the voice acquired by the voice acquisition unit and detects a voice pattern, and a voice analysis unit.
A comprehensive analysis unit that detects a specific event related to human behavior based on the video pattern detected by the video analysis unit and the audio pattern detected by the audio analysis unit.
An information output unit that outputs information about the event detected by the comprehensive analysis unit, and an information output unit.
A personal feature storage unit that stores personal features related to video and audio,
With
The image analysis unit performs personal identification processing of a person appearing in the image based on the personal characteristics of the image read from the personal feature storage unit.
The voice analysis unit performs personal identification processing of the human voice included in the voice based on the personal characteristics related to the voice read from the personal feature storage unit.
The information output unit outputs information about the event, including personal identification information identified by at least one of the video analysis unit and the audio analysis unit regarding the event.
The image analysis unit reads out the schedule information in the facility and the seating chart information of the individual in the predetermined place stored in the storage medium, and based on the video at the predetermined timing determined by the schedule information, the seating chart The visual features of each individual corresponding to the personal identification information determined or estimated by the information are calculated, and the calculated visual features are used to obtain the personal features related to the image stored in the personal feature storage unit. Update,
Behavioral analyzer.

The behavior analysis device according to any one of claims 1 and 2 , wherein the behavior analysis device is installed in a facility used in a group.

Computer,
A program for functioning as the behavioral analyzer according to any one of claims 1 to 3.