JP2020135747A

JP2020135747A - Action analysis device and action analysis method

Info

Publication number: JP2020135747A
Application number: JP2019031913A
Authority: JP
Inventors: 光晴大峡; Mitsuharu Ohazama
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2019-02-25
Filing date: 2019-02-25
Publication date: 2020-08-31
Anticipated expiration: 2039-02-25
Also published as: JP7149202B2

Abstract

To provide an action analysis device that can more accurately recognize an action of a person included in an image than in a conventional art.SOLUTION: An action analysis device 1 that is the action analysis device to analyze an action of a person included in an image comprises: an original image acquisition section 11 that acquires a plurality of pieces of original image data 131 different in imaging time; a skeleton image generation section 12 that generates skeleton image data 132 on the person within each original image data; a behavior image generation section 13 that generates behavior image data 133 indicating time change of a skeleton of the person on the basis of each skeleton image data; and a model generation section 15 that generates a predetermined model enabling learning and inference of an action pattern of the person on the basis of each original image data, each skeleton image data and each behavior image data.SELECTED DRAWING: Figure 1

Description

本発明は、行動分析装置および行動分析方法に関する。 The present invention relates to a behavior analyzer and a behavior analysis method.

画像に写っている人物がどのような行動をしたかを認識する技術は知られている（特許文献１）。特許文献１では、動画中の時系列画像から、行動の変化が検出された画像を抽出して学習することにより、人物の行動を認識する技術が提案されている。運転者を撮影した画像に基づいて、運転者が携帯電話の使用などの特定行動をする可能性を判断し、警報を出力する技術も知られている（特許文献２）。 A technique for recognizing what kind of behavior a person in an image has performed is known (Patent Document 1). Patent Document 1 proposes a technique for recognizing a person's behavior by extracting and learning an image in which a change in behavior is detected from a time-series image in a moving image. There is also known a technique of determining the possibility of a driver taking a specific action such as using a mobile phone based on an image of the driver and outputting an alarm (Patent Document 2).

なお、画像に写った人物の骨格を推定することにより、二次元画像における姿勢を検出する技術も知られている（非特許文献１）。 A technique for detecting a posture in a two-dimensional image by estimating the skeleton of a person in an image is also known (Non-Patent Document 1).

国際公開第２０１７／１５０２１１号International Publication No. 2017/150211 特開２００９−３７５３４号公報Japanese Unexamined Patent Publication No. 2009-37534

ＺｈｅＣａｏ，ＴｏｍａｓＳｉｍｏｎ，Ｓｈｉｈ−ＥｎＷｅｉ，ＹａｓｅｒＳｈｅｉｋｈ：ＲｅａｌｔｉｍｅＭｕｌｔｉ−Ｐｅｒｓｏｎ２ＤＰｏｓｅＥｓｔｉｍａｔｉｏｎｕｓｉｎｇＰａｒｔＡｆｆｉｎｉｔｙＦｉｅｌｄｓ，ＣＶＰＲ２０１７Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields, CVPR20

特許文献１では、撮影された画像（元の画像）に基づいて対象人物の行動を認識するため、画像に対象人物以外の物体（例えば人物の持ち物、背景など）が存在すると、それら対象人物以外の物体がノイズとなってしまう。例えば、電車または車両などの移動物体に乗車している人物を撮影した場合は、車外の風景が変化するため、対象とする人物の行動を正確に認識することが難しくなる。また例えば、対象とする人物の周囲の明るさが激しく変化する場合も、対象の人物の背景が大きく変動するため、対象の人物の行動を正確に認識するのが難しい。 In Patent Document 1, since the behavior of the target person is recognized based on the captured image (original image), if an object other than the target person (for example, the person's belongings, background, etc.) exists in the image, the object other than the target person is present. The object becomes noise. For example, when a person riding on a moving object such as a train or a vehicle is photographed, the scenery outside the vehicle changes, so that it becomes difficult to accurately recognize the behavior of the target person. Further, for example, even when the brightness around the target person changes drastically, the background of the target person fluctuates greatly, so that it is difficult to accurately recognize the behavior of the target person.

さらに、元の画像だけを用いる特許文献１では、対象の人物の特徴と対象の人物の動作とを正確に認識するのは難しい。また、元の画像だけを用いる特許文献１では、対象の人物の行動の時間変化を正確にとらえるのは難しい。このように、特許文献１の技術では、対象人物の行動を正確に認識するのが難しい。 Further, in Patent Document 1 using only the original image, it is difficult to accurately recognize the characteristics of the target person and the movement of the target person. Further, in Patent Document 1 using only the original image, it is difficult to accurately capture the time change of the behavior of the target person. As described above, it is difficult to accurately recognize the behavior of the target person with the technique of Patent Document 1.

本発明は、上記問題に鑑みてなされたもので、その目的は、画像に含まれる人物の行動を従来よりも高精度に認識することのできるようにした行動分析装置および行動分析方法を提供することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a behavior analysis device and a behavior analysis method capable of recognizing the behavior of a person included in an image with higher accuracy than before. There is.

上記課題を解決すべく、本発明の一つの観点に従う行動分析装置は、画像に含まれる人物の行動を分析する行動分析装置であって、撮影時間の異なる複数の元画像データを取得する元画像取得部と、各元画像データ内の人物の骨格画像データを生成する骨格画像生成部と、各骨格画像データに基づいて人物の骨格の時間変化を示す挙動画像データを生成する挙動画像生成部と、各元画像データと各骨格画像データと各挙動画像データとに基づいて、人物の行動パターンの学習と推論とが可能な所定のモデルを生成するモデル生成部と、を有する。 In order to solve the above problem, the behavior analysis device according to one viewpoint of the present invention is a behavior analysis device that analyzes the behavior of a person included in an image, and is an original image that acquires a plurality of original image data having different shooting times. An acquisition unit, a skeletal image generation unit that generates skeletal image data of a person in each original image data, and a behavior image generation unit that generates behavior image data indicating a time change of a person's skeleton based on each skeletal image data. It has a model generation unit that generates a predetermined model capable of learning and inferring a person's behavior pattern based on each original image data, each skeleton image data, and each behavior image data.

本発明によれば、元画像データだけでなく、元画像データから生成される骨格画像データおよび挙動画像データを用いて行動パターンの学習と推論とが可能な所定のモデルを生成するため、元画像に含まれる人物の行動をより精度良く分析することができる。 According to the present invention, not only the original image data but also the skeleton image data and the behavior image data generated from the original image data are used to generate a predetermined model capable of learning and inferring the behavior pattern. It is possible to analyze the behavior of the person included in the above more accurately.

行動認識装置の全体構成を示す説明図である。It is explanatory drawing which shows the whole structure of the action recognition device. 行動認識装置のハードウェアおよびソフトウェアの構成図である。It is a block diagram of the hardware and software of an action recognition device. 元画像データの説明図である。It is explanatory drawing of the original image data. 骨格画像データの説明図である。It is explanatory drawing of the skeleton image data. 挙動画像データの説明図である。It is explanatory drawing of the behavior image data. 正解データの構成例である。This is a configuration example of correct answer data. 学習用シーケンスデータの構成例である。This is a configuration example of sequence data for learning. モデルデータの構成例である。This is a configuration example of model data. 推論用シーケンスデータの例である。This is an example of sequence data for inference. 骨格画像データを生成する処理を示すフローチャートである。It is a flowchart which shows the process of generating the skeleton image data. 挙動画像データを生成する処理を示すフローチャートである。It is a flowchart which shows the process of generating the behavior image data. シーケンスデータを生成する処理を示すフローチャートである。It is a flowchart which shows the process of generating sequence data. モデルデータを生成する処理を示すフローチャートである。It is a flowchart which shows the process of generating model data. ニューラルネットワークの構成例を示す。An example of configuring a neural network is shown. 推論処理を示すフローチャートである。It is a flowchart which shows the inference process. 第２実施例に係り、行動認識装置の全体構成図である。FIG. 5 is an overall configuration diagram of the behavior recognition device according to the second embodiment. 行動監視処理を示すフローチャートである。It is a flowchart which shows the behavior monitoring process. 第３実施例に係り、行動認識装置の全体構成図である。FIG. 3 is an overall configuration diagram of the behavior recognition device according to the third embodiment. 行動監視処理を示すフローチャートである。It is a flowchart which shows the behavior monitoring process.

以下、図面に基づいて、本発明の実施の形態を説明する。本実施形態に係る行動分析装置は、人物を含む元画像データだけでなく、元画像データ内の人物の行動に由来する他のデータ（骨格画像データ、挙動画像データ）も用いることにより、元画像内の人物の行動を分析する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The behavior analyzer according to the present embodiment uses not only the original image data including the person but also other data (skeleton image data, behavior image data) derived from the behavior of the person in the original image data to obtain the original image. Analyze the behavior of the person inside.

本実施形態では、人物の行動に由来する他のデータも用いるため、人物以外の画像（背景、風景）が変化する場合でも、人物の行動を正確に分析することができる。そして、本実施形態に係る行動分析装置は、例えば、運転手、乗客、歩行者、買い物客などの様々な人物の行動を監視するシステムに適用することができる。 In the present embodiment, since other data derived from the behavior of the person is also used, the behavior of the person can be accurately analyzed even when the image (background, landscape) other than the person changes. Then, the behavior analysis device according to the present embodiment can be applied to a system for monitoring the behavior of various persons such as a driver, a passenger, a pedestrian, and a shopper.

すなわち、本実施形態では、時系列の画像群に写っている人間の行動を認識する装置を提供する。本実施形態では、動画に映っている人の行動を、ニューラルネットワーク等の機械学習を用いて認識する。 That is, the present embodiment provides a device that recognizes human behavior in a time-series image group. In this embodiment, the behavior of a person shown in a moving image is recognized by using machine learning such as a neural network.

本実施形態に係る行動分析装置は、動画からフレームごとに分割された画像群を読み込み、骨格推定技術を適用して、画像から人間の主要な骨格を抽出することにより、骨格として画像化する。さらに、本実施形態では、時間軸上で連続した骨格画像からオプティカルフロー等の画像間の動きの変化を抽出し、挙動画像として画像化する。 The behavioral analyzer according to the present embodiment reads an image group divided into frames from a moving image, applies a skeleton estimation technique, extracts a main human skeleton from the image, and images the image as a skeleton. Further, in the present embodiment, changes in movement between images such as optical flow are extracted from continuous skeleton images on the time axis and imaged as a behavior image.

本実施形態では、それらの画像群（元画像群、骨格画像群、挙動画像群）を元に時系列のシーケンスデータとしてまとめ、ニューラルネットワーク等の機械学習技術により、入力データと行動との関係性を学習する。 In the present embodiment, the image groups (original image group, skeleton image group, behavior image group) are summarized as time-series sequence data, and the relationship between the input data and the behavior is achieved by a machine learning technique such as a neural network. To learn.

機械学習に入力するデータとして骨格画像のデータと挙動画像のデータも含めることにより、人間以外の物体に起因するノイズ、および背景の変化に起因するノイズによる影響を抑制できる。さらに、人間の骨格に基づく姿勢の情報と姿勢の変化の情報とを機械学習に用いることにより、元画像データのみを使用する従来技術に比べて、高精度な行動認識が可能となる。 By including the skeleton image data and the behavior image data as the data to be input to machine learning, it is possible to suppress the influence of noise caused by objects other than humans and noise caused by changes in the background. Further, by using the posture information based on the human skeleton and the posture change information for machine learning, it is possible to recognize the behavior with higher accuracy than the conventional technique using only the original image data.

本実施形態に係る行動分析装置は、プロセッサと記憶装置を備える計算機を用いることにより実現してもよい。プロセッサは、例えば、元画像データの加工と、機械学習による学習および推論とを実行する。記憶装置は、例えば、各画像データと、各中間データと、機械学習モデルと、推論結果とを格納する。 The behavior analysis device according to the present embodiment may be realized by using a computer including a processor and a storage device. The processor performs, for example, processing of the original image data and learning and inference by machine learning. The storage device stores, for example, each image data, each intermediate data, a machine learning model, and an inference result.

プロセッサは、例えば、動画データから抽出された各元画像データに対し、骨格画像データと挙動画像データとを算出する。次に、プロセッサは、各元画像データと各骨格画像データと各挙動画像データとを時系列データ（シーケンスデータ）としてまとめる。プロセッサは、ニューラルネットワーク等の機械学習を用いることにより、時系列データと元画像データに写っている人間の行動との関係を学習し、モデルデータを算出する。プロセッサは、推論時には、時系列データをモデルデータへ入力することにより、元画像データに写っている人間の行動の認識結果を算出する。 The processor calculates skeleton image data and behavior image data for each original image data extracted from the moving image data, for example. Next, the processor collects each original image data, each skeleton image data, and each behavior image data as time series data (sequence data). The processor learns the relationship between the time series data and the human behavior shown in the original image data by using machine learning such as a neural network, and calculates the model data. At the time of inference, the processor inputs the time series data into the model data to calculate the recognition result of the human behavior reflected in the original image data.

図１〜図１５を用いて第１実施例を説明する。本実施例は、本発明を実現するための一例に過ぎず、本発明の技術的範囲を限定するものではないことに注意すべきである。 The first embodiment will be described with reference to FIGS. 1 to 15. It should be noted that the present embodiment is merely an example for realizing the present invention and does not limit the technical scope of the present invention.

以後の説明では「コンピュータプログラム」を主語として説明する場合がある。コンピュータプログラムは、プロセッサによって実行されることで、定められた処理をメモリ及び通信ポート（通信制御装置）を用いながら行う。したがって、コンピュータプログラムに代えてプロセッサを主語として説明することもできるし、プロセッサを有する計算機を主語として説明することもできる。 In the following description, "computer program" may be used as the subject. A computer program is executed by a processor to perform a defined process while using a memory and a communication port (communication control device). Therefore, a processor can be described as a subject instead of a computer program, or a computer having a processor can be described as a subject.

なお、コンピュータプログラムの少なくとも一部または全部を専用ハードウェアで実現してもよい。コンピュータプログラムは、モジュール化されていてもよい。コンピュータプログラムは、記録媒体に固定されて流通してもよいし、あるいは、プログラム配信サーバから通信ネットワークを介して配布されてもよい。プロセッサがコンピュータプログラムを読み込んで実施することにより、後述する機能１１〜１６が実現される。 At least a part or all of the computer program may be realized by dedicated hardware. The computer program may be modular. The computer program may be fixedly distributed on a recording medium, or may be distributed from a program distribution server via a communication network. When the processor reads and executes the computer program, the functions 11 to 16 described later are realized.

図１は、「行動分析装置」としての行動認識装置１の全体構成を示す。行動認識装置１は、元画像データに含まれる人物の行動がどのような行動であるか認識する。 FIG. 1 shows the overall configuration of the behavior recognition device 1 as a “behavior analysis device”. The action recognition device 1 recognizes what kind of action the person's action included in the original image data is.

行動認識装置１は、例えば、元画像取得部１１と、骨格画像生成部１２と、挙動画像生成部１３と、シーケンスデータ生成部１４と、モデル生成部１５と、推論部１６とを備えることができる。 The action recognition device 1 may include, for example, an original image acquisition unit 11, a skeleton image generation unit 12, a behavior image generation unit 13, a sequence data generation unit 14, a model generation unit 15, and an inference unit 16. it can.

元画像取得部１１は、撮影時間の異なる複数の元画像データ１３１を取得する。元画像取得部１１は、例えば、動画ファイル、連続的に撮影された静止画ファイルなどから、同一の被写体についての撮影時間の異なる画像データ１３１を取得する。これら撮影時間の異なる画像データ１３１に基づいて骨格画像データ１３２が作成されるため、元画像データ１３１と呼ぶ。元画像データは、行動認識装置１内に保存されていてもよいし、行動認識装置１がアクセス可能な外部のストレージ装置に保存されていてもよい。 The original image acquisition unit 11 acquires a plurality of original image data 131 having different shooting times. The original image acquisition unit 11 acquires image data 131 of the same subject having different shooting times from, for example, a moving image file, a still image file continuously shot, and the like. Since the skeleton image data 132 is created based on the image data 131 having different shooting times, it is called the original image data 131. The original image data may be stored in the action recognition device 1, or may be stored in an external storage device accessible to the action recognition device 1.

骨格画像生成部１２は、元画像取得部１１により取得された元画像データ１３１に基づいて、元画像データ１３１に写っている人物１３１１の骨格を推定し、推定された骨格の画像データ１３２を生成する。 The skeleton image generation unit 12 estimates the skeleton of the person 1311 shown in the original image data 131 based on the original image data 131 acquired by the original image acquisition unit 11, and generates the estimated skeleton image data 132. To do.

挙動画像生成部１３は、骨格画像生成部１２により生成された骨格画像１３２に基づいて、骨格の時間変化（動作方向、挙動）を示す挙動画像データ１３３を生成する。 The behavior image generation unit 13 generates behavior image data 133 showing the time change (movement direction, behavior) of the skeleton based on the skeleton image 132 generated by the skeleton image generation unit 12.

シーケンスデータ生成部１４は、各元画像データ１３１と各骨格画像データ１３２と各挙動画像データ１３３とを含むシーケンスデータを生成する。 The sequence data generation unit 14 generates sequence data including each original image data 131, each skeleton image data 132, and each behavior image data 133.

モデル生成部１５は、シーケンスデータ生成部１４により生成される学習用シーケンスデータ（図７で後述）に基づいて、人物の行動パターンの学習と推論とが可能な所定のモデルのデータを生成する。 The model generation unit 15 generates data of a predetermined model capable of learning and inferring a person's behavior pattern based on the learning sequence data (described later in FIG. 7) generated by the sequence data generation unit 14.

推論部１６は、シーケンスデータ生成部１４により作成された推論用シーケンスデータ（図９で後述）とモデル生成部１５により生成されたモデルデータとに基づいて、推論用シーケンスデータに含まれる人物の行動を認識し、その認識結果を出力する。 The inference unit 16 is based on the inference sequence data (described later in FIG. 9) created by the sequence data generation unit 14 and the model data generated by the model generation unit 15, and the behavior of the person included in the inference sequence data. Is recognized and the recognition result is output.

＜行動認識装置の構成＞ <Configuration of behavior recognition device>

図２は、行動認識装置１のハードウェアおよびソフトウェアの構成例（機能ブロック図）である。 FIG. 2 is a configuration example (functional block diagram) of the hardware and software of the action recognition device 1.

行動認識装置１は、例えば、中央演算処理装置１１０と、入出力装置１２０と、記憶装置１３０とを備える。 The action recognition device 1 includes, for example, a central processing unit 110, an input / output device 120, and a storage device 130.

中央演算処理装置１１０は、マイクロプロセッサおよびプログラムメモリ（いずれも不図示）を有し、行動認識装置１として機能するための必要な演算処理および制御処理等を行う。中央演算処理装置１１０は、所定のコンピュータプログラム１１１〜１１６を実行する。各コンピュータプログラム１１１〜１１６は、図１で述べた各機能１１〜１６に対応する。 The central processing unit 110 has a microprocessor and a program memory (both not shown), and performs necessary arithmetic processing and control processing for functioning as the action recognition device 1. The central processing unit 110 executes predetermined computer programs 111 to 116. Each computer program 111-116 corresponds to each function 11-16 described in FIG.

元画像取得プログラム１１１は、元画像データを取得するコンピュータプログラムである。元画像取得プログラム１１１は、例えば、記憶装置１３０に格納されている元画像データ１３１を認識対象（分析対象）のデータとして読み込む。元画像取得プログラム１１１は、カメラ１４０で撮影された元画像データを読み込んでもよいし、記憶装置１３０に格納された元画像データを読み込んでもよい。元画像取得プログラム１１１は、オペレーティングシステムの有する機能、あるいはデバイスドライバなどが持つ機能として実現されてもよい。または、元画像取得プログラム１１１は、骨格画像生成プログラム１１２の一部として設けられてもよい。 The original image acquisition program 111 is a computer program that acquires the original image data. The original image acquisition program 111, for example, reads the original image data 131 stored in the storage device 130 as data to be recognized (analyzed). The original image acquisition program 111 may read the original image data captured by the camera 140, or may read the original image data stored in the storage device 130. The original image acquisition program 111 may be realized as a function of the operating system or a function of the device driver or the like. Alternatively, the original image acquisition program 111 may be provided as a part of the skeleton image generation program 112.

骨格画像生成プログラム１１２は、元画像データ１３１に写っている人間の主要部分（例えば、顔および手足等）のパーツを認識して骨格として抽出し、骨格画像データ１３２を生成するコンピュータプログラムである。 The skeleton image generation program 112 is a computer program that recognizes parts of a human main part (for example, face, limbs, etc.) shown in the original image data 131, extracts them as a skeleton, and generates skeleton image data 132.

挙動画像生成プログラム１１３は、時間的に連続した骨格画像データ１３２をもとに骨格の動きを表す挙動を抽出し、挙動画像データ１３３を生成するコンピュータプログラムである。 The behavior image generation program 113 is a computer program that extracts the behavior representing the movement of the skeleton based on the temporally continuous skeleton image data 132 and generates the behavior image data 133.

シーケンスデータ生成プログラム１１４は、一連の動きを表す時系列の画像データ１３１〜１３３を集約して、シーケンスデータ１３５，１３７を生成するコンピュータプログラムである。 The sequence data generation program 114 is a computer program that aggregates time-series image data 131 to 133 representing a series of movements to generate sequence data 135 and 137.

モデル生成プログラム１１５は、シーケンスデータから被写体である人間の動きを機械学習して、「モデル」としてのモデルデータ１３６を生成するコンピュータプログラム（学習プログラム）である。 The model generation program 115 is a computer program (learning program) that machine-learns the movement of a human being as a subject from sequence data and generates model data 136 as a "model".

推論プログラム１１６は、推論用シーケンスデータをモデルへ入力することにより、各シーケンスでの人間の動きを認識するコンピュータプログラムである。 The inference program 116 is a computer program that recognizes human movements in each sequence by inputting inference sequence data into the model.

入出力装置１２０は、ユーザとの間で情報を入出力する装置である。入出力装置１２０は、情報出力装置１２１と情報入力装置１２２とを備える。情報出力装置１２１としては、例えば、ディスプレイ、プリンタ（いずれも不図示）などがある。情報入力装置１２２としては、例えば、キーボード、マウス、タッチパネル、カメラ、スキャナ（いずれも不図示）などがある。情報出力装置と情報入力装置との両方を兼ねる装置でもよい。外部のカメラ１４０で撮影された動画ファイルを記憶装置１３０または中央演算処理装置１１０のいずれかまたは両方に入力させることもできる。 The input / output device 120 is a device that inputs / outputs information to / from the user. The input / output device 120 includes an information output device 121 and an information input device 122. Examples of the information output device 121 include a display and a printer (both not shown). Examples of the information input device 122 include a keyboard, a mouse, a touch panel, a camera, and a scanner (all not shown). A device that also serves as both an information output device and an information input device may be used. It is also possible to have the moving image file captured by the external camera 140 input to either or both of the storage device 130 and the central processing unit 110.

記憶装置１３０は、例えば、中央演算処理装置１１０での処理対象となるデータと処理後のデータ等とを格納する装置である。 The storage device 130 is, for example, a device that stores data to be processed by the central processing unit 110 and data after processing.

記憶装置１３０には、例えば、元画像データ１３１と、骨格画像データ１３２と、挙動画像データ１３３と、正解データ１３４と、学習用シーケンスデータ１３５と、モデルデータ１３６と、推論用シーケンスデータ１３７とが格納される。 In the storage device 130, for example, the original image data 131, the skeleton image data 132, the behavior image data 133, the correct answer data 134, the learning sequence data 135, the model data 136, and the inference sequence data 137 are stored. It is stored.

上述の通り、元画像データ１３１は、動画を例えばフレーム毎の画像単位で分割した元画像群である。骨格画像データ１３２は、元画像データ１３１をもとに人の主要な骨格を抽出した画像データである。挙動画像データ１３３は、骨格画像データ１３２をもとに時間的に連続した画像内の人間の動きにどのような変化があったかの情報を抽出した画像データである。 As described above, the original image data 131 is an original image group obtained by dividing a moving image into, for example, an image unit for each frame. The skeleton image data 132 is image data obtained by extracting the main skeleton of a person based on the original image data 131. The behavior image data 133 is image data obtained by extracting information on what kind of change has occurred in the movement of a human being in a temporally continuous image based on the skeleton image data 132.

正解データ１３４は、元画像データ１３１に含まれる人間の行動パターンの正解を示すデータである。学習用シーケンスデータ１３５は、元画像データ１３１と骨格画像データ１３２と挙動画像データ１３３と正解データ１３４とをもとに、ひとまとまりの時系列データとして加工されたデータである。モデルデータ１３６は、学習用シーケンスデータ１３５を機械学習させることで得られる学習済モデルのデータである。推論用シーケンスデータ１３７は、元画像データ１３１と骨格画像データ１３２と挙動画像データ１３３とをもとに、ひとまとまりの時系列データとして加工されたデータである。 The correct answer data 134 is data indicating the correct answer of the human behavior pattern included in the original image data 131. The learning sequence data 135 is data processed as a set of time-series data based on the original image data 131, the skeleton image data 132, the behavior image data 133, and the correct answer data 134. The model data 136 is the data of the trained model obtained by machine learning the training sequence data 135. The inference sequence data 137 is data processed as a set of time series data based on the original image data 131, the skeleton image data 132, and the behavior image data 133.

上述のコンピュータプログラムとデータの少なくとも一部または全部を、フラッシュメモリデバイス、ハードディスク、磁気テープ、光ディスクなどの記録媒体ＭＭに格納して流通させることもできる。コンピュータプログラムおよびデータの少なくとも一部を、通信ネットワークを介して配信することもできる。 At least a part or all of the above-mentioned computer program and data can be stored and distributed in a recording medium MM such as a flash memory device, a hard disk, a magnetic tape, or an optical disk. At least a portion of computer programs and data can also be distributed over communication networks.

＜元画像データ＞ <Original image data>

図３は、元画像データ１３１の例を示す。元画像データ１３１は、例えば、動画データ（動画ファイル）をフレーム単位で分割してファイルとして格納したデータである。 FIG. 3 shows an example of the original image data 131. The original image data 131 is, for example, data obtained by dividing moving image data (moving image file) into frames and storing it as a file.

図３（１），図３（２）は、撮影時間の異なる元画像データ１３１（１），１３１（２）を示す。各元画像データ１３１（１），１３１（２）には、対象の人物１３１１（１），１３１１（２）と、背景１３１２とが含まれている。図３は、被写体である人間１３１１が街路灯１３１２の前を横切って走っている画像を例に挙げている。この場合、元画像データ１３１（１），１３１（２）は、動作中の人物を撮影しているため、それぞれ人間１３１１の画像が異なる。これに対して、固定された街路灯などの背景１３１２は、元画像データ１３１（１），１３１（２）間で変化しない。 3 (1) and 3 (2) show the original image data 131 (1) and 131 (2) having different shooting times. The original image data 131 (1) and 131 (2) include the target person 1311 (1) and 1311 (2) and the background 1312. FIG. 3 cites, for example, an image of a human being 1311 running across the front of a streetlight 1312. In this case, since the original image data 131 (1) and 131 (2) capture a moving person, the images of the human 1311 are different from each other. On the other hand, the background 1312 of the fixed street light or the like does not change between the original image data 131 (1) and 131 (2).

図３（１）に示すように、各元画像データ１３１には、ファイル名Ｆ１３１が自動的に付与される。図３に示す例では、元画像であることを示す「ｒａｗ」に時間的順序を示す数値を付与することにより、元画像データのファイル名が生成される。 As shown in FIG. 3 (1), the file name F131 is automatically assigned to each original image data 131. In the example shown in FIG. 3, the file name of the original image data is generated by adding a numerical value indicating the temporal order to "raw" indicating that the image is the original image.

＜骨格画像データ＞ <Skeletal image data>

図４は、骨格画像データ１３２の例を示す。図４（１）に示す骨格画像データ１３２（１）は、図３（１）に示す元画像データ１３１（１）から得られる骨格情報から生成されたデータである。図４（２）に示す骨格画像データ１３２（２）は、図３（２）に示す元画像データ１３１（２）から得られる骨格情報から生成されたデータである。 FIG. 4 shows an example of skeleton image data 132. The skeleton image data 132 (1) shown in FIG. 4 (1) is data generated from the skeleton information obtained from the original image data 131 (1) shown in FIG. 3 (1). The skeleton image data 132 (2) shown in FIG. 4 (2) is data generated from the skeleton information obtained from the original image data 131 (2) shown in FIG. 3 (2).

図３（１）の元画像データ１３１（１）に含まれる人間１３１１（１）から骨格だけの人物１３２２（１）が生成される。同様に、図３（２）の元画像データ１３１（２）に含まれる人間１３１１（２）から骨格だけの人物１３２２（２）が生成される。 A person 1322 (1) having only a skeleton is generated from a person 1311 (1) included in the original image data 131 (1) of FIG. 3 (1). Similarly, a person 1322 (2) having only a skeleton is generated from a person 1311 (2) included in the original image data 131 (2) of FIG. 3 (2).

骨格画像は、人の頭および手足等の主要なパーツにおける主要な関節を点でプロットし、一部の点と点とを線で結合したものである。骨格画像データ１３２には元画像データ１３１における背景は含まれないため、人間の行動を認識する上でノイズとなる情報は除外される。さらに、骨格情報を抽出することにより、人間の姿勢をより明確に認識することができる。 The skeletal image is a plot of major joints in major parts such as the human head and limbs with dots, and some dots connected by lines. Since the skeleton image data 132 does not include the background in the original image data 131, information that causes noise in recognizing human behavior is excluded. Furthermore, by extracting the skeletal information, the human posture can be recognized more clearly.

図４（３）に示すように、骨格画像データ１３２にもファイル名Ｆ１３２が自動的に付与される。図４に示す例では、骨格画像であることを示す「ｐｏｓｅ」に時間的順序を示す数値を付与することにより、骨格画像データのファイル名が生成される。 As shown in FIG. 4 (3), the file name F132 is automatically assigned to the skeleton image data 132 as well. In the example shown in FIG. 4, the file name of the skeleton image data is generated by adding a numerical value indicating the temporal order to "pose" indicating that the image is a skeleton image.

＜挙動画像データ＞ <Behavior image data>

図５は、挙動画像データ１３３の例を示す。挙動画像データ１３３は、骨格画像データ１３２の各画像のうち、時間的に隣接する画像から人間の骨格の動きの変化を挙動情報として抽出し、抽出された挙動情報を画像として保存したデータである。 FIG. 5 shows an example of the behavior image data 133. The behavior image data 133 is data obtained by extracting changes in the movement of the human skeleton as behavior information from images that are adjacent in time from each image of the skeleton image data 132, and storing the extracted behavior information as an image. ..

図５（１）に示す挙動画像データ１３３（１）は、図４（１）に示す骨格画像データ１３２（１）と図４（２）に示す骨格画像データ１３２（２）とからオプティカルフローを抽出し、抽出された画素毎のオプティカルフローを矢印で表現したデータである。同様に、図５（２）に示す挙動画像データ１３３（２）は、図４（２）に示す骨格画像データ１３２（２）と時間的に次の骨格画像データ（不図示）とからオプティカルフローを抽出することにより生成されたデータである。 The behavior image data 133 (1) shown in FIG. 5 (1) has an optical flow from the skeleton image data 132 (1) shown in FIG. 4 (1) and the skeleton image data 132 (2) shown in FIG. 4 (2). It is the data extracted and the optical flow for each extracted pixel is represented by an arrow. Similarly, the behavior image data 133 (2) shown in FIG. 5 (2) is an optical flow from the skeleton image data 132 (2) shown in FIG. 4 (2) and the next skeleton image data (not shown) in terms of time. It is the data generated by extracting.

図５（３）に示すように、挙動画像データ１３３にもファイル名Ｆ１３３が自動的に付与される。図５に示す例では、挙動画像であることを示す「ｆｌｏｗ」に時間的順序を示す数値を付与することにより、挙動画像データのファイル名が生成される。 As shown in FIG. 5 (3), the file name F133 is automatically assigned to the behavior image data 133 as well. In the example shown in FIG. 5, the file name of the behavior image data is generated by adding a numerical value indicating the temporal order to "flow" indicating that the behavior image is used.

＜正解データ＞ <Correct answer data>

図６は、正解データ１３４の例を示す。正解データ１３４は、元画像データ１３１のファイル名１３４１と正解１３４２とから構成される。 FIG. 6 shows an example of correct answer data 134. The correct answer data 134 is composed of the file name 1341 of the original image data 131 and the correct answer 1342.

正解１３４２とは、画像に写っている人間の行動を分類する識別子（ＩＤ）である。正解１３４２では、例えば、歩いている人は「０」、走っている人は「１」、座っている人は「２」、のように任意の分類を定義可能である。図６の例では、各ファイル１３４１には、いずれも走っている人を示すＩＤ「１」が付与されている。 The correct answer 1342 is an identifier (ID) that classifies human behavior in the image. In the correct answer 1342, any classification can be defined, for example, "0" for a walking person, "1" for a running person, and "2" for a sitting person. In the example of FIG. 6, each file 1341 is given an ID "1" indicating a running person.

これら以外に、ジャンプしている、しゃがもうとしている、立ち上がろうとしている、座ろうとしている、何かを持ち上げようとしている、何かを置こうとしている、のような行動パターンを定義し、その行動パターンに識別子を割り当ててもよい。 Other than these, define behavioral patterns such as jumping, crouching, standing up, sitting, lifting something, trying to put something down, An identifier may be assigned to the behavior pattern.

＜学習用シーケンスデータ＞ <Sequence data for learning>

図７は、学習用シーケンスデータ１３５の例を示す。学習用シーケンスデータ１３５は、一連の時系列データ（元画像データ１３１，骨格画像データ１３２，挙動画像データ１３３）をもとに生成されている。人間の行動を認識するモデルデータは、学習用シーケンスデータ１３５を用いて機械学習を行うことにより生成される。 FIG. 7 shows an example of the learning sequence data 135. The learning sequence data 135 is generated based on a series of time series data (original image data 131, skeleton image data 132, behavior image data 133). Model data that recognizes human behavior is generated by performing machine learning using the learning sequence data 135.

学習用シーケンスデータ１３５は、例えば、シーケンス識別子１３５１（図中、ｓｉｄ）と、時間的順序識別子１３５２（図中、ｔｉｄ）と、元画像データのファイル名１３５３と、骨格画像データのファイル名１３５４と、挙動画像データのファイル名１３５５と、分類クラス（分類結果）１３５６とを備える。識別子１３５１，１３５２は、行動認識装置１内で一意であればよい。 The training sequence data 135 includes, for example, a sequence identifier 1351 (sid in the figure), a temporal sequence identifier 1352 (tid in the figure), an original image data file name 1353, and a skeleton image data file name 1354. , The behavior image data file name 1355 and the classification class (classification result) 1356 are provided. The identifiers 1351 and 1352 may be unique within the action recognition device 1.

図７の例では、時間的順序識別子１３５２の数は「３」である場合を示す。識別子１３５２の数は「３」以外の数でもよい。図７の例では、骨格画像データと挙動画像データとは、それぞれ同一時刻の元画像データから生成された画像データである。 In the example of FIG. 7, the number of the temporal sequence identifier 1352 is "3". The number of the identifier 1352 may be a number other than "3". In the example of FIG. 7, the skeleton image data and the behavior image data are image data generated from the original image data at the same time, respectively.

＜モデルデータ＞ <Model data>

図８は、モデルデータ１３６の例を示す。モデルデータ１３６は、例えば、データ種類１３６１と、データ項目１３６２と、値１３６３とを備える。 FIG. 8 shows an example of model data 136. The model data 136 includes, for example, a data type 1361, a data item 1362, and a value 1363.

データ種類１３６１は、機械学習により得られたモデルの設定データ１３６１１と学習済モデル１３６１２とを備える。設定データ１３６１１および学習済モデル１３６１２は、それぞれデータ項目１３６２とその値１３６３とを含む。 The data type 1361 includes setting data 13611 of the model obtained by machine learning and the trained model 13612. The configuration data 13611 and the trained model 13612 each include a data item 1362 and its value 1363.

設定データ１３６１１のデータ項目１３６２は、例えば、元画像ｓｈａｐｅ１３６２Ａと、骨格画像ｓｈａｐｅ１３６２Ｂと、挙動画像ｓｈａｐｅ１３６２Ｃと、出力ｓｈａｐｅ１３６２Ｄと、各層の処理内容１３６２Ｅおよび１３６２Ｆとを含む。 The data item 1362 of the setting data 13611 includes, for example, the original image shape 1362A, the skeleton image shape 1362B, the behavior image shape 1362C, the output shape 1362D, and the processing contents 1362E and 1362F of each layer.

元画像ｓｈａｐｅ１３６２Ａは、元画像データ１３１の構造を表す。元画像ｓｈａｐｅ１３６２Ａの値１３６３には、例えば（２５６，２５６，３）が設定される。これは、高さ２５６画素、幅２５６画素、３チャンネル（通常ＲＧＢ）であることを表す。 The original image shape 1362A represents the structure of the original image data 131. For example, (256,256,3) is set in the value 1363 of the original image shape 1362A. This means that it has 256 pixels in height, 256 pixels in width, and 3 channels (usually RGB).

骨格画像ｓｈａｐｅ１３６２Ｂと挙動画像ｓｈａｐｅ１３６２Ｃも同様である。すなわち、骨格画像ｓｈａｐｅ１３６２Ｂは、骨格画像データ１３２の構造を示す。挙動画像ｓｈａｐｅ１３６２Ｃは、挙動画像データ１３３の構造を示す。挙動画像データはグレー画像であるため、１チャンネルである。 The same applies to the skeleton image shape 1362B and the behavior image shape 1362C. That is, the skeleton image shape 1362B shows the structure of the skeleton image data 132. The behavior image shape 1362C shows the structure of the behavior image data 133. Since the behavior image data is a gray image, it has one channel.

出力ｓｈａｐｅ１３６２Ｄの値１３６３には、（１０）が設定されている。これは、１０種類の行動パターンがあることを表す。 (10) is set in the value 1363 of the output shop 1362D. This means that there are 10 types of behavior patterns.

機械学習のアルゴリズムがニューラルネットワークの場合、各層の処理内容が、１層目処理１３６２Ｅ、２層目処理１３６２Ｆのように表される。その他、処理内容に関する様々な設定が設定データ１３６１１に記載される。 When the machine learning algorithm is a neural network, the processing content of each layer is represented as the first layer processing 1362E and the second layer processing 1362F. In addition, various settings related to the processing content are described in the setting data 13611.

学習済モデル１３６１２には、シーケンスデータから人間の行動の認識結果（行動の識別子）を得るためのモデルのパラメータが格納される。機械学習による学習処理が行われるまで、学習済モデル１３６１２の値１３６３には値が格納されない。学習処理が行われた後で、自動的に計算された値が値１３６３へ格納される。なお、モデルのパラメータはユーザが把握している必要はなく、機械学習モデルを呼び出した際にコンピュータプログラム内で自動的に使用される。 The trained model 13612 stores model parameters for obtaining a human behavior recognition result (behavior identifier) from sequence data. The value 1363 of the trained model 13612 is not stored until the learning process by machine learning is performed. After the learning process is performed, the automatically calculated value is stored in the value 1363. The model parameters do not need to be known by the user and are automatically used in the computer program when the machine learning model is called.

＜推論用シーケンスデータ＞ <Sequence data for inference>

図９は、推論用シーケンスデータ１３７の例を示す。推論用シーケンスデータ１３７は、画像に含まれる人間の行動を判別する際に使用されるデータであり、図８で述べたモデルに入力される。 FIG. 9 shows an example of inference sequence data 137. The inference sequence data 137 is data used for discriminating human behavior included in the image, and is input to the model described in FIG.

推論用シーケンスデータ１３７は、図７で述べた学習用シーケンスデータ１３５と同様に、例えば、シーケンス識別子１３７１と、時間的順序識別子１３７２と、元画像データのファイル名１３７３と、骨格画像データのファイル名１３７４と、挙動画像データのファイル名１３７５と、分類クラス１３７６とを備える。分類クラス１３７６には、推論処理後に、行動パターンの判定結果である識別子が格納される。 The inference sequence data 137 is similar to the learning sequence data 135 described with reference to FIG. 7, for example, a sequence identifier 1371, a temporal sequence identifier 1372, an original image data file name 1373, and a skeleton image data file name. It includes 1374, a behavior image data file name 1375, and a classification class 1376. In the classification class 1376, an identifier that is a determination result of the behavior pattern is stored after the inference processing.

＜行動認識装置における処理概要＞ <Outline of processing in the behavior recognition device>

行動認識装置１の処理概要を説明する。中央演算処理装置１１０は、骨格画像生成プログラム１１２により呼び出される元画像取得プログラム１１１を用いて、記憶装置１３０から元画像データ１３１を読み込む。続いて中央演算処理装置１１０は、骨格画像生成プログラム１１２を用いて、元画像データ１３１から骨格画像データ１３２を生成し、生成した骨格画像データ１３２を記憶装置１３０へ格納する。次に、中央演算処理装置１１０は、挙動画像生成プログラム１１３を実行し、記憶装置１３０から骨格画像データ１３２を読み込み、骨格画像データ１３２から挙動画像データ１３３を生成する。中央演算処理装置１１０は、生成した挙動画像データ１３３を記憶装置１３０へ格納する。 The processing outline of the action recognition device 1 will be described. The central processing unit 110 reads the original image data 131 from the storage device 130 by using the original image acquisition program 111 called by the skeleton image generation program 112. Subsequently, the central processing unit 110 generates the skeleton image data 132 from the original image data 131 by using the skeleton image generation program 112, and stores the generated skeleton image data 132 in the storage device 130. Next, the central processing unit 110 executes the behavior image generation program 113, reads the skeleton image data 132 from the storage device 130, and generates the behavior image data 133 from the skeleton image data 132. The central processing unit 110 stores the generated behavior image data 133 in the storage device 130.

中央演算処理装置１１０は、シーケンスデータ生成プログラム１１４を実行する。中央演算処理装置１１０は、記憶装置１３０から、元画像データ１３１と骨格画像データ１３２と挙動画像データ１３３と正解データ１３４とを読み込み、学習用シーケンスデータ１３５を生成する。中央演算処理装置１１０は、生成した学習用シーケンスデータ１３５を記憶装置１３０へ格納する。 The central processing unit 110 executes the sequence data generation program 114. The central processing unit 110 reads the original image data 131, the skeleton image data 132, the behavior image data 133, and the correct answer data 134 from the storage device 130, and generates the learning sequence data 135. The central processing unit 110 stores the generated learning sequence data 135 in the storage device 130.

中央演算処理装置１１０は、モデル生成プログラム１１５を実行する。中央演算処理装置１１０は、記憶装置１３０から学習用シーケンスデータ１３５とモデルデータ１３６とを読み込んで機械学習を行い、モデルデータ１３６を得る。中央演算処理装置１１０は、新たに生成されたモデルデータ１３６を記憶装置１３０に上書き保存する。 The central processing unit 110 executes the model generation program 115. The central processing unit 110 reads the learning sequence data 135 and the model data 136 from the storage device 130, performs machine learning, and obtains the model data 136. The central processing unit 110 overwrites and stores the newly generated model data 136 in the storage device 130.

中央演算処理装置１１０は、推論プログラム１１６を実行する。中央演算処理装置１１０は、記憶装置１３０からモデルデータ１３６と推論用シーケンスデータ１３７とを読み込み、各シーケンスにおける認識結果（行動分類クラス）を求める。中央演算処理装置１１０は、生成した推論用シーケンスデータ１３７を記憶装置１３０に上書き保存する。それぞれの処理について、以下詳細に説明する。 The central processing unit 110 executes the inference program 116. The central processing unit 110 reads the model data 136 and the inference sequence data 137 from the storage device 130, and obtains the recognition result (behavior classification class) in each sequence. The central processing unit 110 overwrites and saves the generated inference sequence data 137 in the storage device 130. Each process will be described in detail below.

＜骨格推定処理＞ <Skeletal estimation processing>

図１０は、骨格画像生成プログラム１１２が実行する骨格画像データ生成処理を示すフローチャートである。ここでの動作主体は、中央演算処理装置１１０により実行される骨格画像生成プログラム１１２である。骨格画王データ生成処理では、図３のような元画像データ群から、各元画像データに写っている人間の骨格の座標を推定し、骨格画像として描画する。 FIG. 10 is a flowchart showing a skeleton image data generation process executed by the skeleton image generation program 112. The operating subject here is the skeleton image generation program 112 executed by the central processing unit 110. In the skeleton image king data generation process, the coordinates of the human skeleton reflected in each original image data are estimated from the original image data group as shown in FIG. 3 and drawn as a skeleton image.

骨格画像生成プログラム１１２は、記憶装置１３０から元画像データ１３１を読み込む（Ｓ２１）。以下では、図３で述べた元画像データ１３１が中央演算処理装置１１０に読み込まれたと仮定して説明する。 The skeleton image generation program 112 reads the original image data 131 from the storage device 130 (S21). Hereinafter, it is assumed that the original image data 131 described in FIG. 3 is read into the central processing unit 110.

骨格画像生成プログラム１１２は、各元画像データ１３１から被写体である人間の骨格座標を算出し、骨格画像データ１３２を生成する（Ｓ２２）。 The skeleton image generation program 112 calculates the skeleton coordinates of the human subject from each original image data 131, and generates the skeleton image data 132 (S22).

人間の骨格座標を求める手法には種々あるが、例えば非特許文献１に記載された方法を用いてもよい。この方法は、画像内に写っている人間の、人体の各部位の位置と各部位間の関係性の特徴とを抽出し、人体毎の骨格座標を求める。 There are various methods for obtaining human skeleton coordinates, and for example, the method described in Non-Patent Document 1 may be used. In this method, the position of each part of the human body and the characteristics of the relationship between each part of the human body shown in the image are extracted, and the skeletal coordinates of each part of the human body are obtained.

算出対象の骨格の部位は、用途に応じて変更可能である。本実施例では、例えば、鼻、首、右肩、左肩、右肘、左肘、右手、左手、右腰、左腰、右膝、左膝、右足、左足、の合計１４点を取得する場合を説明する。算出対象の各部位の座標を抽出した後、抽出された骨格座標群をプロットし、プロットされた骨格座標群のうち一部の座標間を直線で結合することにより、骨格画像データ１３２を生成する。 The part of the skeleton to be calculated can be changed according to the application. In this embodiment, for example, when acquiring a total of 14 points of nose, neck, right shoulder, left shoulder, right elbow, left elbow, right hand, left hand, right waist, left waist, right knee, left knee, right foot, left foot. Will be explained. After extracting the coordinates of each part to be calculated, the extracted skeleton coordinate group is plotted, and the skeleton image data 132 is generated by connecting some of the plotted skeleton coordinate groups with a straight line. ..

例えば、図３（１），（２）に示す元画像データ１３１（１），（２）に対して、抽出後の骨格座標をもとに計算すると、図４（１），（２）に示す骨格画像データ１３２（１），１３２（２）が生成される。なお、骨格画像データ１３２の背景は、例えば白色や黒色等にすればよい。 For example, when the original image data 131 (1) and (2) shown in FIGS. 3 (1) and 3 (2) are calculated based on the skeleton coordinates after extraction, the results are shown in FIGS. 4 (1) and (2). The skeleton image data 132 (1) and 132 (2) shown are generated. The background of the skeleton image data 132 may be, for example, white or black.

最後に、骨格画像生成プログラム１１２は、ステップＳ２２で生成した骨格画像データ１３２を記憶装置１３０へ格納する（Ｓ２３）。 Finally, the skeleton image generation program 112 stores the skeleton image data 132 generated in step S22 in the storage device 130 (S23).

＜挙動抽出処理＞ <Behavior extraction process>

図１１は、挙動画像生成プログラム１１３が実行する挙動画像データ生成処理を示すフローチャートである。ここでの動作主体は、中央演算処理装置１１０により実行される挙動画像生成プログラム１１３である。挙動画像データ生成処理では、図４に示す骨格画像群から、各部位の時間的変化を抽出し、挙動画像データ１３３として描画する。 FIG. 11 is a flowchart showing a behavior image data generation process executed by the behavior image generation program 113. The operating subject here is the behavior image generation program 113 executed by the central processing unit 110. In the behavior image data generation processing, the temporal change of each part is extracted from the skeleton image group shown in FIG. 4 and drawn as the behavior image data 133.

挙動画像生成プログラム１１３は、記憶装置１３０から骨格画像データ１３２を読み込む（Ｓ３１）。以下では、例えば、図４に示す骨格画像データ１３２（１），（２）のようなデータが読み込まれたものとして説明する。 The behavior image generation program 113 reads the skeleton image data 132 from the storage device 130 (S31). In the following, for example, data such as the skeleton image data 132 (1) and (2) shown in FIG. 4 will be described as being read.

挙動画像生成プログラム１１３は、各骨格画像データ１３２から、骨格の動きとしてのオプティカルフローを抽出し、挙動画像データ１３３を生成する（Ｓ３２）。ここでオプティカルフローとは、時間的に連続する画像の中で、物体の動きをベクトルで表現したものである。オプティカルフローの計算方法には種々あるが、例えばＬｕｃａｓ−Ｋａｎａｄｅ法を用いることができる。図４（１），（２）に示す骨格画像データ１３２（１），（２）を元にオプティカルフローを求めると、図５（１）に示す挙動画像データ１３３（１）が生成される。挙動画像データ内の矢印は、矢印の始点から終点に向かって、画像内の画素に動きがあったことを表している。 The behavior image generation program 113 extracts the optical flow as the movement of the skeleton from each skeleton image data 132, and generates the behavior image data 133 (S32). Here, the optical flow is a vector representation of the movement of an object in a temporally continuous image. There are various methods for calculating the optical flow, and for example, the Lucas-Kanade method can be used. When the optical flow is obtained based on the skeleton image data 132 (1) and (2) shown in FIGS. 4 (1) and 4 (2), the behavior image data 133 (1) shown in FIG. 5 (1) is generated. The arrow in the behavior image data indicates that the pixels in the image have moved from the start point to the end point of the arrow.

挙動画像生成プログラム１１３は、ステップＳ３２で生成した挙動画像データ１３３を記憶装置１３０へ格納する（Ｓ３３）。 The behavior image generation program 113 stores the behavior image data 133 generated in step S32 in the storage device 130 (S33).

＜シーケンス生成処理＞ <Sequence generation process>

図１２は、シーケンスデータ生成プログラム１１４が実行するシーケンスデータ生成処理を示すフローチャートである。ここでの動作主体は、中央演算処理装置１１０により実行されるシーケンスデータ生成プログラム１１４である。シーケンスデータ生成処理では、図７で述べた学習用シーケンスデータ１３５または図９で述べた推論用シーケンスデータ１３７を生成する。 FIG. 12 is a flowchart showing a sequence data generation process executed by the sequence data generation program 114. The operating subject here is the sequence data generation program 114 executed by the central processing unit 110. In the sequence data generation process, the learning sequence data 135 described in FIG. 7 or the inference sequence data 137 described in FIG. 9 is generated.

まず最初に、シーケンスデータ生成プログラム１１４は、元画像データ１３１と骨格画像データ１３２と挙動画像データ１３３と正解データ１３４とを記憶装置１３０から読み込む（Ｓ４１）。例えば、図３、図４、図５、図６で述べたデータが読み込まれたものとして、以下説明する。 First, the sequence data generation program 114 reads the original image data 131, the skeleton image data 132, the behavior image data 133, and the correct answer data 134 from the storage device 130 (S41). For example, it will be described below assuming that the data described in FIGS. 3, 4, 5, and 6 has been read.

シーケンスデータ生成プログラム１１４は、各画像データ１３１，１３２，１３３に一意に付与されている時系列を表す変数ｔに「１」を代入する（Ｓ４２）。シーケンスデータ生成プログラム１１４は、一つのシーケンスとして構成する所定時間内のデータを取得する（Ｓ４３）。ここでは、「ｔ」から「ｔ＋ｎ−１」までのデータを一つのシーケンスとして取り扱う。変数ｎは、一つのシーケンスの長さを表している。例えば、「ｔ＝１、ｎ＝３」の場合、一つのシーケンスとなるのは、時間が「１」、「２」、「３」のデータである。 The sequence data generation program 114 substitutes “1” for the variable t uniquely assigned to each image data 131, 132, 133 to represent the time series (S42). The sequence data generation program 114 acquires data within a predetermined time configured as one sequence (S43). Here, the data from "t" to "t + n-1" are treated as one sequence. The variable n represents the length of one sequence. For example, in the case of "t = 1, n = 3", one sequence is data of time "1", "2", "3".

例えば図７では、元画像データのファイル名１３５３、骨格画像データのファイル名１３５４、挙動画像データのファイル名１３５５のうち、各ファイル名中の時間的順序を示す値が「０００１」、「０００２」、「０００３」であるデータが一つのシーケンスとして使用される。すなわちこの場合は「１」、「２」、「３」となる。 For example, in FIG. 7, among the original image data file name 1353, the skeleton image data file name 1354, and the behavior image data file name 1355, the values indicating the temporal order in each file name are "0001" and "0002". , "0003" are used as one sequence. That is, in this case, it becomes "1", "2", and "3".

シーケンスデータ生成プログラム１１４は、シーケンスデータのシーケンス識別子（ｓｉｄ）と時間的順序識別子（ｔｉｄ）とを設定する（Ｓ４４）。シーケンス識別子には、各シーケンスデータを一意に識別する値を格納する。図７のシーケンス識別子１３５１に示すように、「１」から順に格納する。時間的順序識別子は、各シーケンスデータ内での順序を表す値である。図７の時間的順序識別子１３５２に示すように、時間が古いものから順に「１」、「２」、「３」のように設定される。 The sequence data generation program 114 sets a sequence identifier (sid) and a temporal sequence identifier (tid) of the sequence data (S44). A value that uniquely identifies each sequence data is stored in the sequence identifier. As shown in the sequence identifier 1351 of FIG. 7, the storage is performed in order from "1". The temporal order identifier is a value representing the order in each sequence data. As shown in the temporal order identifier 1352 of FIG. 7, the time is set as "1", "2", "3" in order from the oldest one.

シーケンスデータ生成プログラム１１４は、各シーケンスデータの分類クラスを設定する（Ｓ４５）。分類クラスとは、各シーケンスにおける人間の行動パターンを表す識別子である。例えば、「歩いている人」は「０」、「走っている人」は「１」のように、分類クラスは設定される。 The sequence data generation program 114 sets a classification class for each sequence data (S45). The classification class is an identifier representing a human behavior pattern in each sequence. For example, the classification class is set such that "walking person" is "0" and "running person" is "1".

分類クラスは、学習処理時と推論処理時とで、それぞれ格納する値が異なる。推論処理時には、分類クラスの値を設定せず、空白にしておく。学習処理時には、正解データをもとに分類クラスの値を設定する。分類クラスの値の決め方には種々あるが、例えば、同一シーケンス内における多数決で分類クラスを決定する方法が考えられる。 The value to be stored in the classification class differs between the learning process and the inference process. At the time of inference processing, the value of the classification class is not set and is left blank. At the time of learning process, the value of the classification class is set based on the correct answer data. There are various methods for determining the value of the classification class, but for example, a method of determining the classification class by majority voting within the same sequence can be considered.

図７の「ｓｉｄ＝１」の場合で説明する。元画像データは、「ｒａｗ＿０００１．ｊｐｇ」、「ｒａｗ＿０００２．ｊｐｇ」、「ｒａｗ＿０００３．ｊｐｇ」の３つである。これら３つの元画像データを正解データと照合すると、すべて正解は「１」である。したがって、３つのファイルの多数決により、「ｓｉｄ＝１」のシーケンスの分類クラスは「１」となる。分類クラスを多数決で決定できない場合は、例えば、候補となる分類クラスの中からランダムで決定したり、またはそのシーケンスにおける最大の「ｔｉｄ」を持つ分類クラスを使用するなどすればよい。 The case of “sid = 1” in FIG. 7 will be described. There are three original image data, "raw_0001.jpg", "raw_0002.jpg", and "raw_0003.jpg". When these three original image data are collated with the correct answer data, the correct answer is "1". Therefore, the classification class of the sequence of "sid = 1" becomes "1" by the majority vote of the three files. If the classification class cannot be determined by majority vote, for example, it may be randomly determined from the candidate classification classes, or the classification class having the maximum "tid" in the sequence may be used.

続いて、シーケンスデータ生成プログラム１１４は、変数ｔに「ｔ＋ｎ」を代入する（Ｓ４６）。すなわち、次のシーケンスの開始の時間を設定する。 Subsequently, the sequence data generation program 114 substitutes “t + n” for the variable t (S46). That is, the start time of the next sequence is set.

シーケンスデータ生成プログラム１１４は、「ｔ＋ｎ−１」が最大の時間ｔｍａｘより大きいかどうか判定する（Ｓ４７）。この条件が成立する場合（Ｓ４７：ＹＥＳ）、シーケンスをこれ以上設定できないことを表す。条件が成立しない場合（Ｓ４７：ＮＯ）、シーケンスデータ生成プログラム１１４は、ステップＳ４３へ戻って、次のシーケンスを生成する。条件が成立する場合（Ｓ４７：ＹＥＳ）、ステップＳ４８へ進む。 The sequence data generation program 114 determines whether "t + n-1" is larger than the maximum time tmax (S47). When this condition is satisfied (S47: YES), it means that the sequence cannot be set any more. If the condition is not satisfied (S47: NO), the sequence data generation program 114 returns to step S43 and generates the next sequence. If the condition is satisfied (S47: YES), the process proceeds to step S48.

シーケンスデータ生成プログラム１１４は、生成されたシーケンスデータを記憶装置１３０へ格納する（Ｓ４８）。学習処理用にシーケンスデータ生成処理が呼び出された場合は、学習用シーケンスデータ１３５として記憶装置１３０に格納される。推論処理用にシーケンスデータ生成処理が呼び出された場合は、推論用シーケンスデータ１３７として記憶装置１３０へ格納される。 The sequence data generation program 114 stores the generated sequence data in the storage device 130 (S48). When the sequence data generation process is called for the learning process, it is stored in the storage device 130 as the learning sequence data 135. When the sequence data generation process is called for the inference process, it is stored in the storage device 130 as the inference sequence data 137.

＜学習処理＞ <Learning process>

図１３は、モデル生成プログラム１１５が実行するモデル生成処理（学習処理）を示すフローチャートである。ここでの動作主体は中央演算処理装置１１０により実行されるモデル生成プログラム１１５である。モデル生成処理では、図７で述べた学習用シーケンスデータ１３５を生成した後で、この学習用シーケンスデータ１３５から図８で述べたモデルデータ１３６を生成する。 FIG. 13 is a flowchart showing a model generation process (learning process) executed by the model generation program 115. The operating subject here is the model generation program 115 executed by the central processing unit 110. In the model generation process, after the learning sequence data 135 described in FIG. 7 is generated, the model data 136 described in FIG. 8 is generated from the learning sequence data 135.

モデル生成プログラム１１５は、シーケンスデータ生成プログラム１１４により学習用シーケンスデータ１３５を生成させる（Ｓ５１）。このステップＳ５１では、図１２で述べたシーケンスデータ生成処理が呼び出され、前述した処理が実行される。 The model generation program 115 causes the sequence data generation program 114 to generate the learning sequence data 135 (S51). In this step S51, the sequence data generation process described in FIG. 12 is called, and the above-described process is executed.

モデル生成プログラム１１５は、生成された学習用シーケンスデータ１３５を記憶装置１３０から読み込み（Ｓ５２）、機械学習により学習用シーケンスデータ１３５からモデルデータ１３６を生成する（Ｓ５３）。 The model generation program 115 reads the generated learning sequence data 135 from the storage device 130 (S52), and generates model data 136 from the learning sequence data 135 by machine learning (S53).

機械学習の手法は種々あるが、例えばディープラーニングを用いることができる。ディープラーニングを用いる場合、様々なモデルを定義可能である。図１４にニューラルネットワークの構成例を示す。 There are various machine learning methods, but for example, deep learning can be used. When using deep learning, various models can be defined. FIG. 14 shows a configuration example of the neural network.

図１４では、入力として、元画像データ１３１、骨格画像データ１３２、挙動画像データ１３３を、時間的順序識別子（ｔｉｄ）毎にひとつにまとめたマルチチャンネルの画像データとして、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）２１に入力して特徴を抽出し、さらにＬＳＴＭ（ＬｏｎｇＳｈｏｒｔ−ＴｅｒｍＭｅｍｏｒｙ）２２により時系列としての特徴抽出を行って、最終的な分類クラスを出力する（２３，２４）。ＣＮＮとＬＳＴＭとでの処理時には、活性化関数またはプーリングまたはドロップアウト等の処理を加えてもよい。 In FIG. 14, CNN (Convolutional Neural Network) 21 is used as multi-channel image data in which the original image data 131, the skeleton image data 132, and the behavior image data 133 are combined into one for each temporal order identifier (tide) as input. The features are extracted by inputting to, and further, the features are extracted as a time series by LSTM (Long Short-Term Memory) 22 and the final classification class is output (23, 24). At the time of processing with CNN and LSTM, processing such as activation function or pooling or dropout may be added.

ディープラーニングの出力として、各シーケンスデータの分類クラスを設定する。図１４の構成は一例であり、様々な変更が可能である。図１４で述べたニューラルネットワークに学習用シーケンスデータ１３５を入力し、誤差逆伝播法等により学習させることにより、画像内の人間の行動パターンを推測可能なモデルデータ１３６が生成される。 Set the classification class of each sequence data as the output of deep learning. The configuration of FIG. 14 is an example, and various changes can be made. By inputting the training sequence data 135 into the neural network described in FIG. 14 and training it by an error backpropagation method or the like, model data 136 capable of estimating the human behavior pattern in the image is generated.

図１３に戻る。モデル生成プログラム１１５は、ステップＳ５３で生成されたモデルデータ１３６を記憶装置１３０に格納する（Ｓ５４）。 Return to FIG. The model generation program 115 stores the model data 136 generated in step S53 in the storage device 130 (S54).

＜推論処理＞ <Inference processing>

図１５は、推論プログラム１１６の実行する推論処理を示すフローチャートである。動作主体は、中央演算処理装置１１０により実行される推論プログラム１１６である。推論処理では、図９で述べた推論用シーケンスデータ１３７を生成した後、この推論用シーケンスデータ１３７と図８で述べたモデルデータ１３６とをもとに、各シーケンスにおける分類クラスを推定する。 FIG. 15 is a flowchart showing an inference process executed by the inference program 116. The operating subject is the inference program 116 executed by the central processing unit 110. In the inference process, after the inference sequence data 137 described in FIG. 9 is generated, the classification class in each sequence is estimated based on the inference sequence data 137 and the model data 136 described in FIG.

推論プログラム１１６は、シーケンスデータ生成プログラム１１４により推論用シーケンスデータ１３７を生成させる（Ｓ６１）。ステップＳ６１では、シーケンスデータ生成処理が呼び出され、前述の処理が実行される。 The inference program 116 causes the sequence data generation program 114 to generate inference sequence data 137 (S61). In step S61, the sequence data generation process is called and the above-mentioned process is executed.

推論プログラム１１６は、推論用シーケンスデータ１３７とモデルデータ１３６とを記憶装置１３０から読み込み（Ｓ６２）、推論処理により、推論用シーケンスデータ１３７をモデルデータ１３６へ入力して、各シーケンスの分類クラスを求める（Ｓ６３）。 The inference program 116 reads the inference sequence data 137 and the model data 136 from the storage device 130 (S62), inputs the inference sequence data 137 to the model data 136 by inference processing, and obtains the classification class of each sequence. (S63).

図１４で述べたニューラルネットワークに対し、学習時と同様に、元画像データと骨格画像データと挙動画像データとを時間的順序識別子（ｔｉｄ）毎のひとまとまりのマルチチャンネル画像データとして入力すると、ｓｏｆｔｍａｘ処理２３により、各シーケンスの分類クラスの確率が出力される。例えば、「走っている」を表す分類クラス「１」の確率が「０．９」であれば、入力されたデータ群に含まれる人間の行動を「走っている」と判定できる。行動の認識結果を示す識別子は、分類クラスに格納される。 When the original image data, the skeleton image data, and the behavior image data are input to the neural network described in FIG. 14 as a set of multi-channel image data for each temporal order identifier (ted), softmax The process 23 outputs the probability of the classification class of each sequence. For example, if the probability of the classification class "1" representing "running" is "0.9", it can be determined that the human behavior included in the input data group is "running". The identifier indicating the recognition result of the action is stored in the classification class.

推論プログラム１１６は、ステップＳ６３で生成された推論用シーケンスデータ１３７を記憶装置１３０に上書き保存する（Ｓ６４）。 The inference program 116 overwrites and saves the inference sequence data 137 generated in step S63 in the storage device 130 (S64).

以上説明したように、本実施例によれば、元画像データ１３１から骨格画像データ１３２および挙動画像１３３データを生成し、機械学習によるモデルデータ１３６を生成することにより、時系列の画像データに写っている人間の行動を判定することができる。 As described above, according to the present embodiment, the skeleton image data 132 and the behavior image 133 data are generated from the original image data 131, and the model data 136 by machine learning is generated, so that the image data is reflected in the time series image data. It is possible to judge the behavior of a human being.

本実施例では、人間の行動の認識に際して骨格画像データ１３２を使用するため、人間とは無関係の背景の情報を排除できる。さらに、本実施例では、人間の骨格としての特徴を抽出するため、人間の行動を高精度に判定することができる。さらに、本実施例では、挙動画像データ１３３も使用するため、時系列的な人間の動きの特徴を抽出することができ、さらに判定精度を向上させることができる。 In this embodiment, since the skeleton image data 132 is used for recognizing human behavior, background information unrelated to humans can be excluded. Furthermore, in this embodiment, since the characteristics of the human skeleton are extracted, human behavior can be determined with high accuracy. Further, in this embodiment, since the behavior image data 133 is also used, the characteristics of human movement in time series can be extracted, and the determination accuracy can be further improved.

図１６，図１７を用いて第２実施例を説明する。本実施例を含む以下の各実施例では、第１実施例との相違を中心に述べる。本実施例では、各元画像データ１３１内の背景の時間変化が所定値以上の場合に、行動認識装置１Ａを作動させて、画像に写った人間の行動を認識する。本実施例に係る行動認識装置１Ａは、例えば、乗用車、商用車、建設機械などの各種移動体を運転する運転手の監視システムとして使用されてもよい。 The second embodiment will be described with reference to FIGS. 16 and 17. In each of the following examples including this embodiment, the differences from the first embodiment will be mainly described. In this embodiment, when the time change of the background in each original image data 131 is equal to or greater than a predetermined value, the action recognition device 1A is operated to recognize the human action captured in the image. The behavior recognition device 1A according to the present embodiment may be used as a monitoring system for a driver who drives various moving objects such as a passenger car, a commercial vehicle, and a construction machine.

行動認識装置１Ａには、センサ３１からのセンサデータが入力される。センサ３１は、行動認識装置１Ａの監視対象である人間が運転する移動体３０に設けられており、例えば、速度センサ、加速度センサ、位置センサなどが該当する。行動認識装置１Ａは、移動体３０に設けることもできるし、移動体３０の外部に設けることもできる。 The sensor data from the sensor 31 is input to the action recognition device 1A. The sensor 31 is provided on the moving body 30 driven by a human being, which is the monitoring target of the action recognition device 1A, and corresponds to, for example, a speed sensor, an acceleration sensor, a position sensor, and the like. The action recognition device 1A can be provided on the moving body 30 or can be provided outside the moving body 30.

行動認識装置１Ａを例えば移動体３０の外部にあるサーバに設ける場合、移動体３０のセンサ３１からのセンサデータとカメラ１４０等で撮影した元画像データとを通信ネットワークを介してサーバ内の行動認識装置１Ａへ送信する。行動認識装置１Ａは、移動体３０から受信したセンサデータおよび元画像データに基づいて、運転手の行動を認識し、その認識結果（分類クラス）を通信ネットワークを介して移動体３０へ送信する。移動体３０内の情報出力装置は、行動の認識結果に応じた警報を出力する。 When the action recognition device 1A is provided on a server outside the moving body 30, for example, the sensor data from the sensor 31 of the moving body 30 and the original image data taken by the camera 140 or the like are subjected to action recognition in the server via a communication network. It is transmitted to the device 1A. The action recognition device 1A recognizes the driver's action based on the sensor data and the original image data received from the moving body 30, and transmits the recognition result (classification class) to the moving body 30 via the communication network. The information output device in the moving body 30 outputs an alarm according to the recognition result of the action.

図１７は、行動認識装置１Ａを監視システムとして用いる場合の、行動監視処理Ｓ７０を示すフローチャートである。ここでは、移動体３０に行動認識装置１Ａが設けられている場合を例に挙げて説明する。 FIG. 17 is a flowchart showing the behavior monitoring process S70 when the behavior recognition device 1A is used as the monitoring system. Here, a case where the action recognition device 1A is provided on the moving body 30 will be described as an example.

行動認識装置１Ａは、センサ３１からセンサデータを取得すると（Ｓ７１）、移動体３０が移動中であるかをセンサデータに基づいて判断する（Ｓ７２）。ここでは、一例として、移動体３０が停止していない場合、すなわち移動体３０の速度が「０」を超えている場合に、「移動中である」と判断するものとする。これに代えて、任意の自然数に設定される所定速度以上の場合に、移動体３０が移動中であると判定することもできる。 When the action recognition device 1A acquires the sensor data from the sensor 31 (S71), the action recognition device 1A determines whether the moving body 30 is moving based on the sensor data (S72). Here, as an example, when the moving body 30 is not stopped, that is, when the speed of the moving body 30 exceeds "0", it is determined that the moving body 30 is "moving". Instead of this, it can be determined that the moving body 30 is moving when the speed is equal to or higher than a predetermined speed set to an arbitrary natural number.

移動体３０が移動中ではない場合（Ｓ７２：ＮＯ）、行動監視処理は終了する。これに対し、移動体３０が移動中の場合（Ｓ７２：ＹＥＳ）、行動認識装置１Ａは、第１実施例で述べたように、元画像データ１３１と骨格画像データ１３２と挙動画像データ１３３とに基づく推論用シーケンスデータ１３７をモデルデータ１３６に適用することにより、対象者である移動体３０の運転手の行動を認識する（Ｓ７３）。ここでは、行動の分類として、「前を向いている」、「よそ見をしている」、「スマートフォンを操作している」、「飲食している」、「下を向いている」などを挙げる。 When the moving body 30 is not moving (S72: NO), the behavior monitoring process ends. On the other hand, when the moving body 30 is moving (S72: YES), the behavior recognition device 1A is divided into the original image data 131, the skeleton image data 132, and the behavior image data 133 as described in the first embodiment. By applying the based inference sequence data 137 to the model data 136, the behavior of the driver of the moving body 30 which is the target person is recognized (S73). Here, as the classification of behavior, "looking forward", "looking away", "operating a smartphone", "eating and drinking", "looking down", etc. are listed. ..

行動認識装置１Ａは、ステップＳ７３で認識された行動が正常な行動であるか判定する（Ｓ７４）。ここでは、「前を向いている」が正常な行動としてあらかじめ設定されており、それ以外の行動は正常な行動ではないとして設定されているものとする。 The action recognition device 1A determines whether the action recognized in step S73 is a normal action (S74). Here, it is assumed that "looking forward" is preset as a normal behavior, and other behaviors are set as not normal behaviors.

行動認識装置１Ａは、運転手の行動が正常な行動であると判定すると（Ｓ７４：ＹＥＳ）、本処理を終了する。これに対し、行動認識装置１Ａは、運転手の行動が正常な行動ではないと判定すると（Ｓ７４：ＮＯ）、警報を出力する（Ｓ７５）。警報は、例えば、カーナビゲーションシステムなどの移動体３０に搭載された情報出力装置を通じて出力することができる。 When the action recognition device 1A determines that the driver's action is a normal action (S74: YES), the action recognition device 1A ends this process. On the other hand, when the behavior recognition device 1A determines that the driver's behavior is not a normal behavior (S74: NO), the behavior recognition device 1A outputs an alarm (S75). The alarm can be output, for example, through an information output device mounted on the moving body 30 such as a car navigation system.

このように構成される本実施例によれば、移動体３０が移動中に、運転手の行動を認識し、その認識結果に応じた情報を出力することができる。第１実施例で述べたように、行動認識装置１Ａは、元画像データだけでなく骨格画像データおよび挙動画像データも利用して運転手の行動を認識するため、移動体３０が移動して運転手の背景が変化する場合であっても、運転手の行動を適切に認識することができる。 According to the present embodiment configured as described above, the moving body 30 can recognize the driver's behavior while moving and output information according to the recognition result. As described in the first embodiment, since the behavior recognition device 1A recognizes the driver's behavior by using not only the original image data but also the skeleton image data and the behavior image data, the moving body 30 moves and operates. Even when the background of the hand changes, the driver's behavior can be appropriately recognized.

図１８，図１９を用いて第３実施例を説明する。本実施例に係る行動認識装置１Ｂは、元画像データ１３１から対象の人物（ここでは運転手）の視線を検出し、検出された視線を対象人物の行動の判定に利用する。 A third embodiment will be described with reference to FIGS. 18 and 19. The behavior recognition device 1B according to the present embodiment detects the line of sight of the target person (here, the driver) from the original image data 131, and uses the detected line of sight to determine the behavior of the target person.

図１８に示す行動認識装置１Ｂも、第２実施例と同様に、移動体３０の運転手の行動を監視する監視システムとして用いられる。行動認識装置１Ｂには、センサ３１からのセンサデータが入力される。さらに、行動認識装置１Ｂは、元画像データ１３１を分析することにより視線を検出する視線解析部１７が設けられている。視線解析部１７は、中央演算処理装置１１０が所定のコンピュータプログラム（図示せぬ視線解析プログラム）を実行することにより実現される機能である。 The behavior recognition device 1B shown in FIG. 18 is also used as a monitoring system for monitoring the behavior of the driver of the moving body 30 as in the second embodiment. The sensor data from the sensor 31 is input to the action recognition device 1B. Further, the behavior recognition device 1B is provided with a line-of-sight analysis unit 17 that detects the line of sight by analyzing the original image data 131. The line-of-sight analysis unit 17 is a function realized by the central processing unit 110 executing a predetermined computer program (line-of-sight analysis program (not shown)).

図１９は、本実施例に係る行動監視処理Ｓ７０Ｂのフローチャートである。この処理は、図１７で述べたステップＳ７１〜Ｓ７４を全て備えている。本実施例では、図１７で述べたステップＳ７５に代えて、複数種類の警報を出力する（Ｓ７８，Ｓ７９）。さらに、本実施例では、新規なステップＳ７６およびＳ７７を有する。 FIG. 19 is a flowchart of the behavior monitoring process S70B according to this embodiment. This process includes all steps S71 to S74 described in FIG. In this embodiment, a plurality of types of alarms are output instead of step S75 described in FIG. 17 (S78, S79). In addition, this example has novel steps S76 and S77.

行動認識装置１Ｂは、推論用シーケンスデータ１３７をモデルデータ１３６に適用することにより得られる運転手の行動が正常であるか判定し（Ｓ７４）、正常な行動ではないと判定した場合には（Ｓ７４：ＮＯ）、視線解析部１７から運転手の視線の解析結果を読み込む（Ｓ７６）。 The action recognition device 1B determines whether the driver's behavior obtained by applying the inference sequence data 137 to the model data 136 is normal (S74), and if it is determined that the behavior is not normal (S74). : NO), the analysis result of the driver's line of sight is read from the line of sight analysis unit 17 (S76).

行動認識装置１Ｂは、視線の解析結果が正常であるか判定する（Ｓ７７）。ここで、正常な視線とは、移動体３０の移動方向を向いている状態である。行動認識装置１Ｂは、運転手の行動が正常ではない場合であっても（Ｓ７４：ＮＯ）、その視線が移動方向を向いているのであれば（Ｓ７７：ＹＥＳ）、一定の安全は保たれていると判断し、注意を促す警報を出力する（Ｓ７８）。 The behavior recognition device 1B determines whether the analysis result of the line of sight is normal (S77). Here, the normal line of sight is a state in which the moving body 30 is facing the moving direction. Even if the driver's behavior is not normal (S74: NO), the behavior recognition device 1B maintains a certain level of safety as long as the line of sight is in the direction of movement (S77: YES). It is determined that the vehicle is present, and an alarm calling attention is output (S78).

これに対し、運転手の行動が正常ではなく（Ｓ７４：ＮＯ）、かつその視線も正常ではない場合（Ｓ７７：ＮＯ）、安全性を向上させるべく、運転手に向けて警告を示す警報を出力する（Ｓ７９）。注意または警告を示す警報は、移動体３０内の運転手だけに向けて出力してもよいし、移動体３０を管理する管理システムまたは管理者の持つ情報処理装置へ向けて出力してもよい。 On the other hand, when the driver's behavior is not normal (S74: NO) and the line of sight is also not normal (S77: NO), an alarm indicating a warning is output to the driver in order to improve safety. (S79). The warning indicating caution or warning may be output only to the driver in the moving body 30, or may be output to the management system that manages the moving body 30 or the information processing device owned by the administrator. ..

このように構成される本実施例によれば、運転手の行動が正常ではないと判定された場合に、運転手の視線も考慮して警報を出力するため、実際には安全性が確保されている状況下で警告が発せられる事態を抑制することができ、運転手に与える違和感または不快感を低減することができ、使い勝手が向上する。 According to this embodiment configured in this way, when it is determined that the driver's behavior is not normal, an alarm is output in consideration of the driver's line of sight, so that safety is actually ensured. It is possible to suppress the situation where a warning is issued under the circumstances, reduce the discomfort or discomfort given to the driver, and improve the usability.

なお、本発明は、実施形態そのままに限定されるものではなく、実施段階では、その要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 The present invention is not limited to the embodiment as it is, and at the implementation stage, the components can be modified and embodied without departing from the gist thereof. In addition, various inventions can be formed by appropriately combining the plurality of components disclosed in the embodiments. For example, some components may be removed from all the components shown in the embodiments. In addition, components across different embodiments may be combined as appropriate.

実施形態で示された各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。上記各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能等を実現するプログラム、テーブル、ファイル等の情報は、メモリやハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記録或いは記憶装置、またはＩＣカード、ＳＤカード、ＤＶＤ等の記録或いは記憶媒体に格納することができる。 Each configuration, function, processing unit, processing means, etc. shown in the embodiment may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function is stored in a memory, hard disk, recording or storage device such as SSD (Solid State Drive), or recording or storage medium such as IC card, SD card, DVD, etc. be able to.

さらに、上述の実施形態において、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていてもよい。 Further, in the above-described embodiment, the control lines and information lines are shown as necessary for explanation, and not all the control lines and information lines are necessarily shown in the product. All configurations may be interconnected.

本発明の各構成要素は、任意に取捨選択することができ、取捨選択した構成を具備する発明も本発明に含まれる。さらに特許請求の範囲に記載された構成は、特許請求の範囲で明示している組合せ以外にも組み合わせることができる。 Each component of the present invention can be arbitrarily selected, and an invention having the selected configuration is also included in the present invention. Further, the configurations described in the claims can be combined in addition to the combinations specified in the claims.

１，１Ａ，１Ｂ：行動認識装置、１１：元画像取得部、１２：骨格画像生成部、１３：挙動画像生成部、１４：シーケンスデータ生成部、１５：モデル生成部、１６：推論部、３０：移動体、３１：センサ、１１０：中央演算処理装置、１２０：入出力装置、１３０：記憶装置、１３１：元画像データ、１３２：骨格画像データ、１３３：挙動画像データ、１３４：正解データ、１３５：学習用シーケンスデータ、１３６：モデルデータ、１３７：推論用シーケンスデータ 1,1A, 1B: Behavior recognition device, 11: Original image acquisition unit, 12: Skeletal image generation unit, 13: Behavior image generation unit, 14: Sequence data generation unit, 15: Model generation unit, 16: Inference unit, 30 : Moving body, 31: Sensor, 110: Central arithmetic processing device, 120: Input / output device, 130: Storage device, 131: Original image data, 132: Skeletal image data, 133: Behavior image data, 134: Correct answer data, 135 : Training sequence data, 136: Model data, 137: Inference sequence data

Claims

A behavior analyzer that analyzes the behavior of a person included in an image.
An original image acquisition unit that acquires multiple original image data with different shooting times,
A skeleton image generation unit that generates skeleton image data of a person in each of the original image data,
A behavior image generation unit that generates behavior image data indicating a time change of the skeleton of the person based on each skeleton image data.
A model generation unit that generates a predetermined model capable of learning and inferring a person's behavior pattern based on the original image data, the skeleton image data, and the behavior image data.
Behavior analyzer with.

It further includes an inference unit that estimates the behavior of a person in each of the original image data to be analyzed based on the plurality of original image data to be analyzed and the predetermined model, and outputs the estimation result.
The behavior analyzer according to claim 1.

When the time change of the background in each of the original image data is equal to or greater than a predetermined value, the inference unit is generated by the skeleton image generation unit from each of the original image data of the analysis target and each of the original image data of the analysis target. It is included in each image data to be analyzed based on the plurality of skeleton image data to be generated, the plurality of behavior image data generated by the behavior image generation unit from each skeleton image data, and the predetermined model. Estimate the behavior of a person and output the estimation result,
The behavior analyzer according to claim 2.

A line-of-sight analysis unit that analyzes the line of sight of the person included in each original image data to be analyzed is further provided.
The inference unit estimates the behavior of the person based on the analyzed line of sight, each original image data of the analysis target, and the predetermined model, and outputs the estimation result.
The behavior analyzer according to claim 2.

The inference unit
When the time change of the background in each of the original image data is equal to or greater than a predetermined value, each of the original image data to be analyzed and a plurality of skeletons generated by the skeleton image generation unit from each of the original image data to be analyzed Based on the image data, a plurality of behavior image data generated by the behavior image generation unit from each of the skeleton image data, and the predetermined model, the behavior of the person is estimated and the estimation result is output.
When the estimation result does not indicate a normal state, the first alarm is output when the analyzed line of sight is within a preset predetermined range, and when the analyzed line of sight is out of the predetermined range. Outputs a second alarm,
The behavior analyzer according to claim 4.

Further equipped with an arithmetic unit and a storage device,
When the arithmetic unit executes a predetermined computer program stored in the storage device, the original image acquisition unit, the skeleton image generation unit, the behavior image generation unit, and the model generation unit are realized.
The original image data, the skeleton image data, the behavior image data, and the predetermined model are stored in the storage device.
The behavior analyzer according to claim 1.

It is a behavior analysis method that analyzes the behavior of a person included in an image using a computer.
The calculator
Acquire multiple original image data with different shooting times,
The skeleton image data of the person in each of the original image data is generated,
Based on each of the skeleton image data, behavior image data showing the time change of the skeleton of the person is generated.
Based on the original image data, the skeleton image data, and the behavior image data, a predetermined model capable of learning and inferring the behavior pattern of a person is generated.
Acquire a plurality of original image data to be analyzed and the predetermined model,
The behavior of the person in each original image data to be analyzed is estimated and the estimation result is output.
Behavior analysis method.