JP2024044186A

JP2024044186A - Estimation device, estimation method and program

Info

Publication number: JP2024044186A
Application number: JP2022149569A
Authority: JP
Inventors: ティズインティ; Thi Zin Thi; ティンパイ; Ting Pai; テイエ; Ye Tae
Original assignee: University of Miyazaki NUC
Current assignee: University of Miyazaki NUC
Priority date: 2022-09-20
Filing date: 2022-09-20
Publication date: 2024-04-02

Abstract

【課題】より精度高く状態を推定することができる推定装置、推定方法及びプログラムを提供する。【解決手段】推定装置１０、カメラ２０、表示装置３０及び推定モデル生成装置５０を備える推定システム１において、推定装置１０は、カメラ２０から取得するデプス画像に含まれる撮像対象から抽出された特徴に基づいて観測符号を推定し、観測符号及び撮像対象の状態に基づいて、隠れマルコフモデルを用いて前記状態の次の状態を推定する。前記画像は、デプス画像をカラー化した画像であってもよい。【選択図】図１[Problem] To provide an estimation device, an estimation method, and a program capable of estimating a state with higher accuracy. [Solution] In an estimation system 1 including an estimation device 10, a camera 20, a display device 30, and an estimation model generation device 50, the estimation device 10 estimates an observation code based on features extracted from an image capture target contained in a depth image acquired from the camera 20, and estimates a next state of said state using a hidden Markov model based on the observation code and the state of the image capture target. The image may be an image obtained by colorizing the depth image. [Selected Figure] Figure 1

Description

本発明は、推定装置、推定方法及びプログラムに関する。 The present invention relates to an estimation device, an estimation method, and a program.

高齢者の事故を防ぐことや健康をサポートする目的で、カメラなどを用いて高齢者の状態を認識する手法が開発及び研究されている。例えば非特許文献１には、デプスカメラにより部屋の画像を取得し、部屋にいる高齢者の状態を認識することが記載されている。 In order to prevent accidents involving the elderly and to support their health, methods for recognizing the condition of the elderly using cameras and other devices are being developed and researched. For example, Non-Patent Document 1 describes how images of a room are captured using a depth camera and the condition of the elderly in the room is recognized.

Thi Thi Zin, Ye Htet, Y. Akagi, H. Tamura, K. Kondo, S. Araki, E. Chosa, 2021. Real-Time Action Recognition System for Elderly People Using Stereo Depth Camera. Sensors, 21(17), p.5895.Thi Thi Zin, Ye Htet, Y. Akagi, H. Tamura, K. Kondo, S. Araki, E. Chosa, 2021. Real-Time Action Recognition System for Elderly People Using Stereo Depth Camera. Sensors, 21(17), p.5895. Thi Thi Zin, Ye Htet, Y. Akagi, H. Tamura, K. Kondo, S. Araki, 2020, October. Elderly Monitoring and Action Recognition System Using Stereo Depth Camera. In 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE) (pp. 316-317). IEEE.Thi Thi Zin, Ye Htet, Y. Akagi, H. Tamura, K. Kondo, S. Araki, 2020, October. Elderly Monitoring and Action Recognition System Using Stereo Depth Camera. In 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE) (pp. 316-317). IEEE. Swe Nwe Nwe Htun, Thi Thi Zin and Pyke Tin, 2020. Image processing technique and hidden Markov model for an elderly care monitoring system. Journal of Imaging, 6(6), p.49.Swe Nwe Nwe Htun, Thi Thi Zin and Pyke Tin, 2020. Image processing technique and hidden Markov model for an elderly care monitoring system. Journal of Imaging, 6(6), p.49. Buzzelli, M., Albe, A. and Ciocca, G., 2020. A vision-based system for monitoring elderly people at home. Applied Sciences, 10(1), p.374.Buzzelli, M., Albe, A. and Ciocca, G., 2020. A vision-based system for monitoring elderly people at home. Applied Sciences, 10(1), p.374. Hu, R., Michel, B., Russo, D., Mora, N., Matrella, G., Ciampolini, P., Cocchi, F., Montanari, E., Nunziata, S. and Brunschwiler, T., 2020. An unsupervised behavioral modeling and alerting system based on passive sensing for elderly care. Future Internet, 13(1), p.6.Hu, R., Michel, B., Russo, D., Mora, N., Matrella, G., Ciampolini, P., Cocchi, F., Montanari, E., Nunziata, S. and Brunschwiler, T., 2020. An unsupervised behavioral modeling and alerting system based on passive sensing for elderly care. Future Internet, 13(1), p.6. Rajput, A.S., Raman, B. and Imran, J., 2020. Privacy-preserving human action recognition as a remote cloud service using RGB-D sensors and deep CNN. Expert Systems with Applications, 152, p.113349.Rajput, A.S., Raman, B. and Imran, J., 2020. Privacy-preserving human action recognition as a remote cloud service using RGB-D sensors and deep CNN. Expert Systems with Applications, 152, p.113349. Jalal, A., Kamal, S. and Kim, D., 2014. A depth video sensor-based life-logging human activity recognition system for elderly care in smart indoor environments. Sensors, 14(7), pp.11735-11759.Jalal, A., Kamal, S. and Kim, D., 2014. A depth video sensor-based life-logging human activity recognition system for elderly care in smart indoor environments. Sensors, 14(7), pp.11735-11759 .

しかしながら、非特許文献１に記載の方法においては、高齢者の状態を認識する精度が低く、より正確に状態を認識する手法が求められている。
本発明の目的は、より精度高く状態を推定することができる推定装置、推定方法及びプログラムを提供することにある。 However, the method described in Non-Patent Document 1 has low accuracy in recognizing the condition of elderly people, and a method for recognizing the condition more accurately is required.
An object of the present invention is to provide an estimation device, an estimation method, and a program that are capable of estimating a state with higher accuracy.

本発明の一態様は、画像に含まれる撮像対象から抽出された特徴に基づいて観測符号を推定し、前記観測符号及び前記撮像対象の状態に基づいて、隠れマルコフモデルを用いて前記状態の次の状態を推定する、推定装置である。 One aspect of the present invention is to estimate an observation code based on features extracted from an imaging target included in an image, and use a hidden Markov model to estimate the next state of the imaging target based on the observation code and the state of the imaging target. This is an estimation device that estimates the state of .

本発明によれば、より精度高く状態を推定することができる。 According to the present invention, the state can be estimated with higher accuracy.

推定システム１の構成を示す図である。1 is a diagram showing the configuration of an estimation system 1. FIG. 推定装置１０の構成の一例を示す図である。1 is a diagram showing an example of the configuration of an estimation device 10. FIG. 検出結果を示す図である。It is a figure showing a detection result. 正規化した画像を示す図である。FIG. 3 is a diagram showing a normalized image. 特徴抽出部１３によるＨＯＧの算出方法を示す図である。FIG. 11 is a diagram showing a method of calculating HOG by the feature extraction unit 13. ＨＭＭの一例を示す図である。FIG. 2 is a diagram illustrating an example of an HMM. 推定モデル生成装置５０の構成例を示す図である。5 is a diagram showing a configuration example of an estimated model generation device 50. FIG. 推定装置１０による推定方法を示した図である。3 is a diagram showing an estimation method by the estimation device 10. FIG. 表示装置３０に表示されるＧＵＩの一例である。1 is an example of a GUI displayed on the display device 30. 推定装置１０の動作を示すフローチャートである。4 is a flowchart showing the operation of the estimation device 10. 推定モデル生成装置５０の動作を示すフローチャートである。4 is a flowchart showing an operation of the estimation model generating device 50. 推定装置１０による推定精度を示す図である。FIG. 4 is a diagram showing the estimation accuracy by the estimation device 10.

以下、図面を参照しながら本発明の実施形態について詳しく説明する。
図１は、推定システム１の構成を示す図である。推定システム１は、推定装置１０、カメラ２０、表示装置３０を備える。
カメラ２０は、所定の領域を撮影する。カメラ２０は、撮影対象４０を撮影するように設置される。撮影対象は人間であって、例えば高齢者やその介助者である。カメラ２０は、デプスカメラであって、撮影する領域の深度を検出する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
1 is a diagram showing a configuration of an estimation system 1. The estimation system 1 includes an estimation device 10, a camera 20, and a display device 30.
The camera 20 photographs a predetermined area. The camera 20 is installed so as to photograph a subject 40. The subject 40 is a human being, for example an elderly person or his/her caregiver. The camera 20 is a depth camera and detects the depth of the photographed area.

推定装置１０は、カメラ２０により撮影された撮影対象４０の状態を推定する。推定装置１０は、推定結果を表示装置３０に出力する。 The estimation device 10 estimates the state of the object 40 photographed by the camera 20. The estimation device 10 outputs the estimation result to the display device 30.

表示装置３０は、推定装置１０より入力された推定結果を表示する。 The display device 30 displays the estimation results input from the estimation device 10.

図２は、推定装置１０の構成の一例を示す図である。推定装置１０は、画像取得部１１、検出部１２、特徴抽出部１３、状態推定部１４、出力部１５、記録部１６、記憶部１７を備える。 FIG. 2 is a diagram showing an example of the configuration of the estimation device 10. The estimation device 10 includes an image acquisition section 11 , a detection section 12 , a feature extraction section 13 , a state estimation section 14 , an output section 15 , a recording section 16 , and a storage section 17 .

画像取得部１１は、カメラ２０により撮影されたデプス画像を取得する。画像取得部１１は、例えば所定の時間間隔で撮影されたデプス画像を取得する。画像取得部１１は、カメラ２０により撮影された動画を取得し、所定の時間間隔でフレームを切り出すことでデプス画像を取得してもよい。
画像取得部１１は、取得したデプス画像をカラー化された画像に変換してもよい。デプス画像における深度は、カラー化された画像における色相色空間の色に対応する。一般的なデプス画像はＣＳＶファイルの形式で保存されるため、カラー化された画像に変換することで画像のサイズを抑えることができる。 The image acquisition unit 11 acquires a depth image photographed by the camera 20. The image acquisition unit 11 acquires depth images taken at predetermined time intervals, for example. The image acquisition unit 11 may acquire a moving image shot by the camera 20, and acquire a depth image by cutting out frames at predetermined time intervals.
The image acquisition unit 11 may convert the acquired depth image into a colored image. The depth in the depth image corresponds to the color in the hue color space in the colored image. Since a typical depth image is saved in the CSV file format, the size of the image can be reduced by converting it to a colored image.

検出部１２は、画像取得部１１により取得された画像から撮影対象４０を検出する。検出部１２は、画像を入力として当該画像に人が写っているか否か及び当該画像に写る人の位置を出力する検出モデルを使用する。検出部１２は、検出モデルに画像取得部１１により取得された画像を入力することで、撮影対象４０を検出する。
検出モデルは、画像と当該画像に人が写っているか否か及び当該画像に写る人の位置が結び付けられたデータセットを用いて機械学習により生成される。 The detection unit 12 detects the photographing target 40 from the image acquired by the image acquisition unit 11. The detection unit 12 uses a detection model that receives an image as input and outputs whether a person is included in the image and the position of the person in the image. The detection unit 12 detects the photographing target 40 by inputting the image acquired by the image acquisition unit 11 to the detection model.
The detection model is generated by machine learning using a data set in which an image is associated with whether or not a person appears in the image and the position of the person in the image.

検出モデルは例えばＹＯＬＯｖ５である。検出部１２は、例えばデプス画像をカラー化した画像をＹＯＬＯｖ５に入力することで、撮影対象４０を検出する。図３は、検出結果を示す図である。デプス画像をカラー化した画像において、撮影対象４０に四角形の囲いが示され、検出したことを示していることが分かる。検出モデルはＹＯＬＯに限られず、画像から撮影対象を検出できればよい。 The detection model is, for example, YOLOv5. The detection unit 12 detects the subject 40 by, for example, inputting a colorized image of the depth image into YOLOv5. FIG. 3 is a diagram showing the detection result. In the colorized image of the depth image, it can be seen that a rectangular enclosure is shown around the subject 40, indicating that it has been detected. The detection model is not limited to YOLO, and it is sufficient if the subject can be detected from the image.

特徴抽出部１３は、検出部１２により検出される撮影対象４０の特徴を抽出する。以下、特徴抽出部１３による特徴を抽出する方法の一例である。
特徴抽出部１３は、検出された撮影対象４０を囲う図形（例えば四角形）により撮影対象４０を切り出す。特徴抽出部１３は、切り出した撮影対象４０を含む画像を正規化し、各画像から切り出した画像を全て同じサイズにする。図４は、正規化した画像を示す図である。 The feature extraction unit 13 extracts features of the subject 40 detected by the detection unit 12. An example of a method for extracting features by the feature extraction unit 13 will be described below.
The feature extraction unit 13 cuts out the object 40 using a shape (e.g., a rectangle) that surrounds the detected object 40. The feature extraction unit 13 normalizes the images including the cut-out object 40, so that the images cut out from each image are all the same size. Fig. 4 shows the normalized images.

特徴抽出部１３は、時間的に連続して撮影された画像から切り出され、正規化された画像からＤＭＡ(Depth Motion Appearance)及びＤＭＨ(Depth Motion History)を抽出する。その後、特徴抽出部１３は、抽出したＤＭＡ及びＤＭＨのＨＯＧ(Histogram of Oriented Gradients)をそれぞれ算出することで特徴を示すヒストグラムを取得する。特徴抽出部１３は、ＤＭＡ及びＤＭＨそれぞれから取得したＨＯＧを結合する。図５は、特徴抽出部１３によるＨＯＧの算出方法を示す図である。特徴抽出部１３は、例えば５フレームの画像から切り出され、正規化された画像から、ＤＭＡ及びＤＭＨを抽出し、ＤＭＡ及びＤＭＨのＨＯＧであるＨＯＧ_ＤＭＡ及びＨＯＧ_ＤＭＨをそれぞれ算出し、ＨＯＧ_ＤＭＡとＨＯＧ_ＤＭＨを結合することでConcatenated Histogramを生成する。 The feature extraction unit 13 extracts DMA (Depth Motion Appearance) and DMH (Depth Motion History) from images that are cut out from images that are photographed continuously in time and are normalized. Thereafter, the feature extraction unit 13 obtains a histogram representing the features by calculating HOG (Histogram of Oriented Gradients) of the extracted DMA and DMH. The feature extraction unit 13 combines the HOGs obtained from each of the DMA and DMH. FIG. 5 is a diagram showing a method for calculating HOG by the feature extraction unit 13. The feature extraction unit 13 extracts DMA and DMH from a normalized image cut out from, for example, 5 frames, calculates HOG _DMA and HOG DMH which are HOG of DMA and DMH, respectively, and calculates HOG _DMA and HOG _DMH respectively. A Concatenated Histogram is generated by combining _DMH .

状態推定部１４は、特徴抽出部１３により抽出された特徴に基づき、記憶部１７に記憶された推定モデルを用いて、撮影対象４０の状態を推定する。推定モデルはＨＭＭ（隠れマルコフモデル）を含む。図６は、ＨＭＭの一例を示す図である。Ｓは撮影対象４０の状態を示す。例えば、Ｓ_１が「移動する」、Ｓ_２が「車いすに座る」、Ｓ_３が「起立する」、Ｓ_４が「着席する」、Ｓ_５が「横になる」状態を示す。状態Ｓ_ｉから状態Ｓ_ｊへの確率は遷移確率ａ_ｉｊで示される。Ａ＝｛ａ_ｉｊ｝（１≦ｉ、ｊ≦５）となるＡを遷移確率分布と呼ぶ。
ｖ_ｉは観測符号である。状態Ｓ_ｊであるとき、観測符号ｖ_ｋを出力する確率は出力確率ｂ_ｊ（ｋ）で示される。Ｂ＝｛ｂ_ｊ（ｋ）｝（１≦ｊ、ｋ≦５）となるＢを出力確率分布と呼ぶ。
推定モデル生成装置５０は、ＨＭＭを含む推定モデルを生成し、推定装置１０に出力する。推定装置１０は、推定モデルを記憶部１７に記憶する。 The state estimating section 14 estimates the state of the photographic subject 40 based on the features extracted by the feature extracting section 13 and using the estimation model stored in the storage section 17 . The estimation model includes HMM (Hidden Markov Model). FIG. 6 is a diagram showing an example of the HMM. S indicates the state of the object 40 to be photographed. For example, _S1 indicates "moving", _S2 indicates "sitting in a wheelchair", _S3 indicates "standing up", _S4 indicates "seating", and _S5 indicates "lying down". The probability from state S _i to state S _j is represented by transition probability a _ij . A where A={a _ij } (1≦i, j≦5) is called a transition probability distribution.
v _i is an observation code. When in state S _j , the probability of outputting observation code v _k is represented by output probability b _j (k). B such that B={b _j (k)} (1≦j, k≦5) is called an output probability distribution.
The estimation model generation device 50 generates an estimation model including an HMM and outputs it to the estimation device 10. The estimation device 10 stores the estimation model in the storage unit 17.

以下、推定モデル生成装置５０による推定モデルの生成方法を説明する。図７は、推定モデル生成装置５０の構成例を示す図である。推定モデル生成装置５０は、状態シーケンス取得部５１、遷移確率分布算出部５２、観測符号付与モデル生成部５３、観測符号付与部５４、出力確率分布算出部５５、推定モデル出力部５６を備える。 The following describes a method for generating an estimation model by the estimation model generating device 50. FIG. 7 is a diagram showing an example of the configuration of the estimation model generating device 50. The estimation model generating device 50 includes a state sequence acquiring unit 51, a transition probability distribution calculating unit 52, an observation code assignment model generating unit 53, an observation code assignment unit 54, an output probability distribution calculating unit 55, and an estimation model output unit 56.

状態シーケンス取得部５１は、状態遷移を示すデータを取得する。状態遷移を示すデータは、撮影対象４０の状態の変化を示すデータである。遷移確率分布算出部５２は、状態遷移を示すデータに基づいてＨＭＭの遷移確率分布Ａを算出する。例えば、推定モデル生成装置５０は、状態Ｓ_ｉから状態Ｓ_ｊの共起行列を全て算出することで、遷移確率分布Ａを算出する。
観測符号付与モデル生成部５３は、状態Ｓと各状態の特徴とが結び付けられたデータを含むデータセット（Dataset-X）により、特徴に基づき、当該特徴に観測符号ｖを付与する観測符号付与モデルを生成する。観測符号付与部５４は、状態Ｓと特徴とが結び付けられたデータを含み、Dataset-Xとは異なるデータセット（Dataset-Y）に、観測符号付与モデルを用いて観測符号を付与する。出力確率分布算出部５５は、付与された観測符号ごとに特徴から正規分布を算出し、出力確率分布Ｂを算出する。 The state sequence acquisition unit 51 acquires data indicating state transitions. The data indicating the state transition is data indicating a change in the state of the object 40 to be photographed. The transition probability distribution calculation unit 52 calculates a transition probability distribution A of the HMM based on data indicating state transitions. For example, the estimated model generation device 50 calculates the transition probability distribution A by calculating all co-occurrence matrices from state S _i to state S _j .
The observation code assignment model generation unit 53 generates an observation code assignment model that assigns an observation code v to the feature based on the feature using a dataset (Dataset-X) that includes data in which the state S and the features of each state are linked. generate. The observation code assigning unit 54 assigns an observation code to a data set (Dataset-Y) that includes data in which a state S and a feature are associated and is different from Dataset-X using an observation code assigning model. The output probability distribution calculation unit 55 calculates a normal distribution from the characteristics for each assigned observation code, and calculates an output probability distribution B.

ここで、HOG平均(Mean HOGs)を計算することで、出力確率分布Ｂを計算する方法を説明する。出力確率分布Ｂは状態Ｓごとに以下の方法により計算される。
初めに、Dataset-Xから算出されるHOGのヒストグラムの平均値を算出することで、各々の状態における平均HOG特徴（M₁H, M₂H, M₃H, M₄H, M₅H）を算出する。その後、状態ＳごとにDataset-Yから算出されるHOGに観測される符号ｖ_ｉを付与する。ここで符号ｖ_ｉのｉは、IHと平均HOG特徴ベクトル(M_iH)との間の距離のうち、最も小さい距離となるM_iHのｉと等しい。
その後、ラベルを付けたHOGそれぞれの長さ（ユークリッドノルム又はベクトル２ノルム）を算出する。その後、HOGに付けた各ラベルにおいて、HOGの長さの正規分布を算出する。ここで正規分布は例えば平均０、標準偏差１の確率密度関数である。これにより異なる５つのラベルが付与された正規分布が、Dataset-Yから算出されるHOGの数（例えば１００個）算出される。その後、同じラベルの正規分布を足し合わせることでラベルごとに５つの正規分布を取得する。その後、５つの足し合わせた正規分布を全ての１００個の正規分布の和で割り、ラベルごとに異なる５つの正規分布の和を正規化することで１つの状態における出力確率を算出する。以上の計算を５つの状態全てに対して行うことで、出力確率分布Ｂを算出する。これにより、各状態Ｓから各観測符号ｖが出力される確率を示す出力確率分布Ｂを算出することができる。 Here, a method for calculating the output probability distribution B by calculating the HOG mean will be described. The output probability distribution B is calculated for each state S by the following method.
First, the average value of the histogram of HOG calculated from Dataset-X is calculated to calculate the average HOG feature ( _M1H , _M2H , _M3H , _M4H , _M5H ) in each state. Then, the code v _i observed in the HOG calculated from Dataset-Y for each state S is assigned. Here, the i of the code v _i is equal to the i of M _i H that is the smallest distance among the distances between IH and the average HOG feature vector (M _i H).
Then, the length (Euclidean norm or vector 2 norm) of each labeled HOG is calculated. Then, for each label attached to the HOG, a normal distribution of the HOG length is calculated. Here, the normal distribution is, for example, a probability density function with a mean of 0 and a standard deviation of 1. As a result, the number of normal distributions (for example, 100) to which five different labels are attached is calculated from Dataset-Y. Then, five normal distributions for each label are obtained by adding up the normal distributions of the same label. Then, the five summed normal distributions are divided by the sum of all 100 normal distributions, and the sum of the five normal distributions that differ for each label is normalized to calculate the output probability in one state. The above calculations are performed for all five states to calculate the output probability distribution B. As a result, it is possible to calculate the output probability distribution B indicating the probability that each observation code v is output from each state S.

表１は、HOG平均を計算することで出力確率分布Ｂを計算する方法を示す表である。

Table 1 shows how to calculate the output probability distribution B by calculating the HOG mean.

また、k-NN(k-Nearest Neighbors)の手法により、出力確率分布Ｂを算出してもよい。初めにDataset-Xから算出されるHOGを使用して、k-NNがHOGを5つのクラス(C₁, C₂, C₃, C₄, C₅)に分類するように学習させる。その後、Dataset-Yから算出されるHOGを学習させたk-NNを用いて5つのクラス(v₁, v₂, v₃, v₄, v₅)に分類する。5つのクラス(v₁, v₂, v₃, v₄, v₅)の分類基準は、5つのクラス(C₁, C₂, C₃, C₄, C₅)と同じである。
その後、ラベルを付けたHOGそれぞれの長さを算出し、HOGの長さの正規分布を算出する。その後、HOGに付けた各ラベルにおいて、HOGの長さの正規分布を算出する。ここで正規分布は例えば平均０、標準偏差１の確率密度関数である。これにより異なる５つのラベルが付与された正規分布が、Dataset-Yから算出されるHOGの数（例えば１００個）算出される。その後、同じラベルの正規分布を足し合わせることでラベルごとに５つの正規分布を取得する。その後、５つの足し合わせた正規分布を全ての１００個の正規分布の和で割り、ラベルごとに異なる５つの正規分布の和を正規化することで１つの状態における出力確率を算出する。以上の計算を５つの状態全てに対して行うことで、出力確率分布Ｂを算出する。 Alternatively, the output probability distribution B may be calculated using the k-NN (k-Nearest Neighbors) method. First, the HOG calculated from Dataset-X is used to train the k-NN to classify the HOG into five classes ( _C1 , _C2 , _C3 , _C4 , _C5 ). Then, the HOG calculated from Dataset-Y is trained using the k-NN to classify the HOG into five classes ( _v1 , _v2 , _v3 , _v4 , _v5 ). The classification criteria for the five classes ( _v1 , _v2 , _v3 , _v4 , _v5 ) are the same as those for the five classes ( _C1 , _C2 , _C3 , _C4 , _C5 ).
Then, the length of each labeled HOG is calculated, and the normal distribution of the HOG length is calculated. Then, for each label attached to the HOG, the normal distribution of the HOG length is calculated. Here, the normal distribution is, for example, a probability density function with a mean of 0 and a standard deviation of 1. As a result, the number of HOGs (for example, 100) calculated from Dataset-Y is calculated as a normal distribution with five different labels. Then, five normal distributions for each label are obtained by adding together the normal distributions of the same label. Then, the five summed normal distributions are divided by the sum of all 100 normal distributions, and the sum of the five normal distributions that differ for each label is normalized to calculate the output probability in one state. The above calculations are performed for all five states to calculate the output probability distribution B.

表２は、k-NNの手法を用いて、出力確率分布Ｂを計算する方法を示す表である。

Table 2 is a table showing a method for calculating the output probability distribution B using the k-NN method.

また、SVM(Support Vector Machine)を用いて、出力確率分布Ｂを算出してもよい。初めにDataset-Xから算出されるHOGを使用して、SVMがHOGを5つのクラス(C₁, C₂, C₃, C₄, C₅)に分類するように学習させる。その後、Dataset-Yから算出されるHOGを学習させたSVMを用いて5つのクラス(v₁, v₂, v₃, v₄, v₅)に分類する。5つのクラス(v₁, v₂, v₃, v₄, v₅)の分類基準は、5つのクラス(C₁, C₂, C₃, C₄, C₅)と同じである。
その後、ラベルを付けたHOGそれぞれの長さを算出し、HOGの長さの正規分布を算出する。その後、HOGに付けた各ラベルにおいて、HOGの長さの正規分布を算出する。ここで正規分布は例えば平均０、標準偏差１の確率密度関数である。これにより異なる５つのラベルが付与された正規分布が、Dataset-Yから算出されるHOGの数（例えば１００個）算出される。その後、同じラベルの正規分布を足し合わせることでラベルごとに５つの正規分布を取得する。その後、５つの足し合わせた正規分布を全ての１００個の正規分布の和で割り、ラベルごとに異なる５つの正規分布の和を正規化することで１つの状態における出力確率を算出する。以上の計算を５つの状態全てに対して行うことで、出力確率分布Ｂを算出する。 Alternatively, the output probability distribution B may be calculated using SVM (Support Vector Machine). First, using HOG calculated from Dataset-X, SVM is trained to classify HOG into five classes (C ₁ , C ₂ , C ₃ , C ₄ , C ₅ ). After that, the HOG calculated from Dataset-Y is classified into five classes (v ₁ , v ₂ , v ₃ , v ₄ , v ₅ ) using SVM trained. The classification criteria for the five classes (v ₁ , v ₂ , v ₃ , v ₄ , v ₅ ) are the same as for the five classes (C ₁ , C ₂ , C ₃ , C ₄ , C ₅ ).
Then, calculate the length of each labeled HOG and calculate the normal distribution of HOG lengths. Then, for each label attached to the HOG, the normal distribution of the length of the HOG is calculated. Here, the normal distribution is, for example, a probability density function with a mean of 0 and a standard deviation of 1. As a result, the number of HOGs (for example, 100) calculated from Dataset-Y is calculated for a normal distribution to which five different labels are assigned. Then, by adding up the normal distributions of the same label, five normal distributions are obtained for each label. Then, the output probability in one state is calculated by dividing the five normal distributions by the sum of all 100 normal distributions and normalizing the sum of the five normal distributions that differ for each label. The output probability distribution B is calculated by performing the above calculation for all five states.

表３は、SVMの手法を用いて、出力確率分布Ｂを計算する方法を示す表である。

Table 3 is a table showing a method of calculating the output probability distribution B using the SVM method.

ＨＭＭにおける初期状態確率πはすべて同じ大きさであってよい。以上の方法により、推定モデルにおけるＨＭＭのパラメータＡ、Ｂが決定される。 The initial state probabilities π in the HMM may all have the same magnitude. By the above method, the parameters A and B of the HMM in the estimation model are determined.

推定モデル出力部５６は、パラメータＡ、Ｂが決定されることで生成される推定モデルを推定装置１０に出力する。推定モデルは記憶部１７に記憶される。 The estimated model output unit 56 outputs an estimated model generated by determining parameters A and B to the estimating device 10. The estimated model is stored in the storage unit 17.

以下、状態推定部１４による撮影対象４０の状態の推定方法の一例を説明する。初めに状態推定部１４は、特徴抽出部１３により抽出されたＨＯＧから観測符号ｖを推定する。ＨＯＧから観測符号ｖの推定方法は、例えば上記説明したHOG平均を計算する方法、k-NN法又はSVMを使用する方法である。その後、状態推定部１４はＨＭＭを用いて観測符号と状態Ｓに基づいて、次の状態Ｓを推定する。図８は、推定装置１０による推定方法を示した図である。特徴（ＨＯＧ）及び観測符号は複数のフレーム（例えば５フレーム）ごとに算出されるが、状態Ｓはフレームごとに推定されてもよい。状態Ｓは、複数の観測符号からＨＭＭを用いることで、まとめて推定されてもよい。 An example of a method for estimating the state of the object 40 to be photographed by the state estimation unit 14 will be described below. First, the state estimation unit 14 estimates the observation code v from the HOG extracted by the feature extraction unit 13. The method for estimating the observation code v from the HOG is, for example, the above-described method of calculating the HOG average, the k-NN method, or a method using SVM. After that, the state estimation unit 14 estimates the next state S based on the observation code and the state S using an HMM. Figure 8 is a diagram showing an estimation method by the estimation device 10. The features (HOG) and the observation code are calculated for multiple frames (e.g., 5 frames), but the state S may be estimated for each frame. The state S may be estimated collectively from multiple observation codes by using an HMM.

出力部１５は、状態推定部１４により推定された結果を出力する。出力部１５は、画像取得部１１により取得された画像、検出部１２による検出結果又は特徴抽出部１３により抽出される特徴を出力してもよい。 The output unit 15 outputs the results estimated by the state estimation unit 14. The output unit 15 may output the image acquired by the image acquisition unit 11, the detection results by the detection unit 12, or the features extracted by the feature extraction unit 13.

記録部１６は、状態推定部１４により推定された結果を記憶部１７に記録する。記録部１６は、画像取得部１１により取得された画像、検出部１２による検出結果又は特徴抽出部１３により抽出される特徴を対応する推定結果と対応付けて記憶部１７に記録してもよい。また、複数のカメラで撮影している場合には、記録部１６は、画像を撮影対象としている人物又は撮影する部屋と対応付けて記録してもよい。また、出力部１５は、記憶部１７に記録されたデータを出力してもよい。 The recording unit 16 records the results of estimation by the state estimation unit 14 in the memory unit 17. The recording unit 16 may record the images acquired by the image acquisition unit 11, the detection results by the detection unit 12, or the features extracted by the feature extraction unit 13 in the memory unit 17 in association with the corresponding estimation results. Furthermore, when shooting with multiple cameras, the recording unit 16 may record the images in association with the person being shot or the room in which the images are shot. Furthermore, the output unit 15 may output the data recorded in the memory unit 17.

図９は、表示装置３０に表示されるＧＵＩの一例である。領域１１１に患者の名前と時刻が入力されると、領域１１２に、人物及び時刻に対応する画像が表示され、領域１１３には患者及び時刻に対応する状態が表示される。領域１１１には開始時刻と終了時刻が入力されてもよく、領域１１２には連続して画像が表示されてもよく、領域１１３には連続して状態が表示されてもよい。領域１１３には開始時刻から終了時刻の状態がグラフにより表示されてもよく、領域１１２に画像が連続して表示されるのに対応して領域１１３に表示される状態の数が増加してもよい。 FIG. 9 is an example of a GUI displayed on the display device 30. When the patient's name and time are input in area 111, an image corresponding to the person and time is displayed in area 112, and a state corresponding to the patient and time is displayed in area 113. A start time and an end time may be input in the area 111, images may be continuously displayed in the area 112, and status may be continuously displayed in the area 113. The area 113 may display the states from the start time to the end time in the form of a graph, and even if the number of states displayed in the area 113 increases in response to the continuous display of images in the area 112. good.

図１０は、推定装置１０の動作を示すフローチャートである。初めに画像取得部１１が画像を取得する（ステップＳ１１）。その後、検出部１２が、画像から撮影対象４０を検出する（ステップＳ１２）。特徴抽出部１３が、撮像対象の特徴を抽出する（ステップＳ１３）。状態推定部１４が、推定モデルを用いて特徴から状態を推定する（ステップＳ１４）。出力部１５が推定結果を出力する（ステップＳ１５）。 FIG. 10 is a flowchart showing the operation of the estimation device 10. First, the image acquisition unit 11 acquires an image (step S11). After that, the detection unit 12 detects the photographing target 40 from the image (step S12). The feature extraction unit 13 extracts features of the imaging target (step S13). The state estimation unit 14 estimates the state from the features using the estimation model (step S14). The output unit 15 outputs the estimation result (step S15).

図１１は、推定モデル生成装置５０の動作を示すフローチャートである。推定モデル生成装置５０は、状態遷移を示すデータを集計することでＨＭＭの遷移確率分布Ａを算出する（ステップＳ２１）。その後、推定モデル生成装置５０はDataset-Xを用いた学習により観測符号付与モデルを生成する（ステップＳ２２）。推定モデル生成装置５０は、Dataset-Yに含まれるデータに対して観測符号を付与する（ステップＳ２３）。その後、推定モデル生成装置５０は、観測符号ごとに特徴から正規分布を算出し、出力確率分布Ｂを算出する（ステップＳ２４）。 FIG. 11 is a flowchart showing the operation of the estimation model generation device 50. The estimated model generation device 50 calculates the transition probability distribution A of the HMM by totaling data indicating state transitions (step S21). After that, the estimated model generation device 50 generates an observation coded model by learning using Dataset-X (step S22). The estimated model generation device 50 assigns an observation code to the data included in Dataset-Y (step S23). After that, the estimated model generation device 50 calculates a normal distribution from the features for each observation code, and calculates an output probability distribution B (step S24).

（実験結果）
推定装置１０による状態の推定精度を実験により検証した。実験には部屋の内部を写した画像と画像に含まれる撮像対象である人の状態とを結びつけたデータを用いた。画像に結び付けられた人の状態は、画像を見て人間が判定した状態である。全ての画像のうち、画像に結び付けられた状態と、画像から推定装置により推定された状態とが同じである割合を、推定装置１０による推定精度とした。
図１２は、推定装置１０による推定精度を示す図である。３つの部屋を撮影した画像を用いて検証を行った。Room1で撮影した画像は、合計で２２度連続して撮影した画像であり、最長連続撮影時間は12.60時間である。Room2で撮影した画像は、合計で１０度連続して撮影した画像であり、最長連続撮影時間は12.19時間である。Room3で撮影した画像は、合計で２７度連続して撮影した画像であり、最長連続撮影時間は11.91時間である。
推定モデルの生成方法は、Mean+HMM、kNN+HMM、SVM+HMMの３種類行った。Mean+HMM、kNN+HMM、SVM+HMMはそれぞれ表１、表２、表３に示した方法である。３つの部屋における推定精度の平均値では３つの方法全てで８０％を超える推定精度が得られた。また、SVM+HMMにおいては８４％を超える推定精度を得ることができた。 (Experimental result)
The accuracy of state estimation by the estimation device 10 was verified through experiments. The experiment used data that linked an image of the interior of a room with the state of the person being imaged in the image. The state of the person associated with the image is the state determined by the person after viewing the image. Among all the images, the percentage in which the state associated with the image and the state estimated from the image by the estimation device were the same was defined as the estimation accuracy by the estimation device 10.
FIG. 12 is a diagram showing estimation accuracy by the estimation device 10. Verification was conducted using images taken of three rooms. The images taken in Room 1 were taken continuously 22 times in total, and the longest continuous shooting time was 12.60 hours. The images taken in Room2 are images taken continuously 10 times in total, and the longest continuous shooting time is 12.19 hours. The images taken in Room 3 were taken continuously 27 times in total, and the longest continuous shooting time was 11.91 hours.
Three methods were used to generate the estimation model: Mean+HMM, kNN+HMM, and SVM+HMM. Mean+HMM, kNN+HMM, and SVM+HMM are the methods shown in Table 1, Table 2, and Table 3, respectively. The average estimation accuracy in the three rooms was over 80% for all three methods. Furthermore, with SVM+HMM, we were able to obtain an estimation accuracy of over 84%.

このように、本実施形態によれば、隠れマルコフモデルを用いることにより、推定精度を向上させることができる。 In this way, according to this embodiment, the estimation accuracy can be improved by using a hidden Markov model.

〈他の実施形態〉
以上、図面を参照してこの発明の一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 Other Embodiments
Although one embodiment of the present invention has been described in detail above with reference to the drawings, the specific configuration is not limited to the above, and various design changes, etc. are possible within the scope that does not deviate from the gist of the present invention.

上述した実施形態における推定装置１０及び推定モデル生成装置５０の一部又は全部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記録装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものを含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。また、推定装置１０及び推定モデル生成装置５０の一部または全部は、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The estimation device 10 and the estimation model generating device 50 in the above-mentioned embodiment may be partly or entirely realized by a computer. In that case, a program for realizing this function may be recorded in a computer-readable recording medium, and the program recorded in the recording medium may be read into a computer system and executed to realize the function. Note that the term "computer system" here includes the OS and hardware of peripheral devices. Furthermore, the term "computer-readable recording medium" refers to portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, and recording devices such as hard disks built into a computer system. Furthermore, the term "computer-readable recording medium" may include a medium that dynamically holds a program for a short period of time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, and a medium that holds a program for a certain period of time, such as a volatile memory inside a computer system that is a server or client in that case. The above-mentioned program may be a program for realizing part of the above-mentioned function, or may be a program that can be realized in combination with a program already recorded in the computer system. In addition, part or all of the estimation device 10 and the estimation model generation device 50 may be realized using a programmable logic device such as an FPGA (Field Programmable Gate Array).

１推定システム、１０推定装置、２０カメラ、３０表示装置、４０撮影対象、５０推定モデル生成装置、１１画像取得部、１２検出部、１３特徴抽出部、１４状態推定部、１５出力部、１６記憶部、５１状態シーケンス取得部、５２遷移確率分布算出部、５３観測符号付与モデル生成部、５４観測符号付与部、５５出力確率分布算出部、５６推定モデル出力部 1 Estimation System, 10 Estimation Device, 20 Camera, 30 Display Device, 40 Photography Target, 50 Estimation Model Generation Device, 11 Image Acquisition Unit, 12 Detection Unit, 13 Feature Extraction Unit, 14 State Estimation Unit, 15 Output Unit, 16 Storage part, 51 state sequence acquisition unit, 52 transition probability distribution calculation unit, 53 observation code assignment model generation unit, 54 observation code assignment unit, 55 output probability distribution calculation unit, 56 estimated model output unit

Claims

Estimating the observation code based on the features extracted from the imaging target included in the image,
estimating a next state of the state using a hidden Markov model based on the observation code and the state of the imaging target;
Estimation device.

The image is a colorized image of a depth image.
The estimation device according to claim 1 .

The images are displayed together with the corresponding estimated states.
The estimation device according to claim 1 or 2.

Estimating an observation code based on features extracted from an imaged object included in the image;
estimating a next state of the state based on the observation code and the state of the imaging target using a hidden Markov model;
Estimation method.

A program that causes a computer to execute the method according to claim 4.