JP6879388B2

JP6879388B2 - Alertness estimation device, alertness estimation method, and program

Info

Publication number: JP6879388B2
Application number: JP2019567823A
Authority: JP
Inventors: 剛範辻川
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2018-01-29
Filing date: 2018-01-29
Publication date: 2021-06-02
Anticipated expiration: 2038-01-29
Also published as: WO2019146123A1; JPWO2019146123A1

Description

本発明は、人の覚醒状態を表す覚醒度を推定するための、覚醒度推定装置、及び覚醒度推定方法に関し、更には、これらを実現するためのプログラムに関する。 The present invention, for estimating the wakefulness representing the arousal state of a person, awakening level estimation apparatus, and relates to awakening level estimation method further relates to a program for realizing these.

近年、少子高齢化により生産年齢人口が減少し、労働力不足が進行している。そして、このような状況下において、今まで人が行っていた仕事の一部を、ロボット又はＡＩ（artificial intelligence）で置き換える試みが増加している。但し、人が行う仕事のうち、知的労働が必要となる仕事については、ロボット又はＡＩでの置き換えは困難である。このため、今後、人においては、知的労働の生産性を維持、向上することが必須となる。 In recent years, the working-age population has decreased due to the declining birthrate and aging population, and the labor shortage is progressing. Under such circumstances, attempts to replace some of the work that humans have done so far with robots or AI (artificial intelligence) are increasing. However, among the jobs performed by humans, it is difficult to replace them with robots or AI for jobs that require intellectual labor. For this reason, it will be essential for humans to maintain and improve the productivity of intellectual labor in the future.

ところで、人は、機械と異なり、眠気を感じたり（低覚醒の状態）、ストレスを感じたり（過覚醒の状態）する。つまり、人の知的労働の生産性は、心身の覚醒状態に応じて、変化する。従って、人の知的労働の生産性の向上を図るためには、覚醒状態が丁度良い状態となるようにすることが重要である。 By the way, unlike machines, humans may feel drowsy (low arousal state) or stress (hyperawakening state). In other words, the productivity of human intellectual labor changes according to the state of mental and physical arousal. Therefore, in order to improve the productivity of human intellectual labor, it is important to make the wakefulness just right.

そして、知的労働時における人の覚醒状態を丁度良い状態とするための手法としては、最新の人の覚醒状態を検出し、検出した覚醒状態に応じて、オフィス内の温度、湿度、照度といった環境を制御することが挙げられる。とりわけ、この手法においては、人の覚醒状態を精度良く検出することが重要となる。 Then, as a method for improving the wakefulness of a person during intellectual work, the latest wakefulness of a person is detected, and the temperature, humidity, illuminance, etc. in the office are adjusted according to the detected wakefulness. Controlling the environment can be mentioned. In particular, in this method, it is important to accurately detect the awake state of a person.

例えば、特許文献１は、目の開度から人の覚醒度を推定する装置を開示している。特許文献１に開示された装置は、設定されたフレームレートで送られてくるカメラ画像からドライバーの目の開眼時間を取得し、取得した開眼時間から、そのばらつきを求め、求めたばらつきから、ドライバーの覚醒度を推定する。 For example, Patent Document 1 discloses a device that estimates a person's alertness from the opening degree of an eye. The device disclosed in Patent Document 1 acquires the eye opening time of the driver's eyes from a camera image sent at a set frame rate, obtains the variation from the acquired eye opening time, and obtains the variation from the obtained variation. Estimate the degree of arousal.

また、非特許文献１は、顔画像から人の覚醒度（ストレス）を推定する装置を開示している。非特許文献１に開示された装置は、設定されたフレームレートで送られてくるカメラ画像から、人の顔の低周波ＨＲＶ（Heart Rate Variability）成分と呼吸速度を算出し、算出した数値を統計モデルに入力して、人の覚醒度を推定する。 In addition, Non-Patent Document 1 discloses an apparatus for estimating a person's alertness (stress) from a facial image. The device disclosed in Non-Patent Document 1 calculates a low-frequency HRV (Heart Rate Variability) component and a respiratory rate of a human face from a camera image sent at a set frame rate, and statistically calculates the calculated values. Input to the model to estimate a person's arousal level.

国際公開第２０１０／０９２８６０号International Publication No. 2010/092860

Daniel McDuff, Sarah Gontarek, and Rosalind Picard,” RemoteMeasurement of Cognitive Stress via Heart Rate Variability”, EMBC2014Daniel McDuff, Sarah Gontarek, and Rosalind Picard, "RemoteMeasurement of Cognitive Stress via Heart Rate Variability", EMBC2014

ところで、特許文献１に開示された装置、及び非特許文献１に開示された装置では、いずれにおいても、覚醒度の推定精度を保つために、カメラ画像のフレームレートを高く設定し、処理する必要がある。このため、装置に大きな処理負担がかかるという問題がある。 By the way, in both the device disclosed in Patent Document 1 and the device disclosed in Non-Patent Document 1, it is necessary to set a high frame rate of the camera image and process it in order to maintain the estimation accuracy of the arousal degree. There is. Therefore, there is a problem that a large processing load is applied to the device.

本発明の目的の一例は、上記問題を解消し、処理負担を低減しつつ、人の覚醒度を精度良く推定し得る、覚醒度推定装置、覚醒度推定方法、及びプログラムを提供することにある。 An example of an object of the present invention is to provide an arousal level estimation device, an arousal level estimation method, and a program capable of accurately estimating a person's arousal level while solving the above problems and reducing the processing load. ..

上記目的を達成するため、本発明の一側面における覚醒度推定装置は、ユーザの覚醒度を推定するための装置であって、
設定されたフレームレートで、前記ユーザの顔画像を含む画像データを取得する、画像データ取得部と、
設定された前記フレームレートで取得された前記画像データから、前記ユーザの生体情報を示す時系列データを抽出する、時系列データ抽出部と、
抽出された前記時系列データのサンプリング数が設定値となるように、前記時系列データを補間する、データ処理部と、
畳み込みニューラルネットワークを用いて構築された学習モデルに、補間後の前記時系列データを入力して、前記ユーザの覚醒度を推定する、覚醒度推定部と、
を備えていることを特徴とする。
ことを特徴とする。In order to achieve the above object, the arousal level estimation device in one aspect of the present invention is a device for estimating the arousal level of the user.
An image data acquisition unit that acquires image data including the user's face image at a set frame rate, and
A time-series data extraction unit that extracts time-series data indicating the biometric information of the user from the image data acquired at the set frame rate, and a time-series data extraction unit.
A data processing unit that interpolates the time-series data so that the number of samples of the extracted time-series data becomes a set value.
An arousal level estimation unit that estimates the arousal level of the user by inputting the interpolated time series data into a learning model constructed using a convolutional neural network.
It is characterized by having.
It is characterized by that.

また、上記目的を達成するため、本発明の一側面における覚醒度推定方法は、ユーザの覚醒度を推定するための方法であって、
（ａ）設定されたフレームレートで、前記ユーザの顔画像を含む画像データを取得する、ステップと、
（ｂ）設定された前記フレームレートで取得された前記画像データから、前記ユーザの生体情報を示す時系列データを抽出する、ステップと、
（ｃ）抽出された前記時系列データのサンプリング数が設定値となるように、前記時系列データを補間する、ステップと、
（ｄ）畳み込みニューラルネットワークを用いて構築された学習モデルに、補間後の前記時系列データを入力して、前記ユーザの覚醒度を推定する、ステップと、
を有することを特徴とする。Further, in order to achieve the above object, the arousal level estimation method in one aspect of the present invention is a method for estimating the arousal level of the user.
(A) A step of acquiring image data including the user's face image at a set frame rate, and
(B) A step of extracting time-series data indicating the biometric information of the user from the image data acquired at the set frame rate, and
(C) A step of interpolating the time-series data so that the number of samples of the extracted time-series data becomes a set value.
(D) A step of inputting the interpolated time series data into a learning model constructed by using a convolutional neural network to estimate the arousal degree of the user.
It is characterized by having.

更に、上記目的を達成するため、本発明の一側面におけるプログラムは、コンピュータによってユーザの覚醒度を推定するためのプログラムであって、
前記コンピュータに、
（ａ）設定されたフレームレートで、前記ユーザの顔画像を含む画像データを取得する、ステップと、
（ｂ）設定された前記フレームレートで取得された前記画像データから、前記ユーザの生体情報を示す時系列データを抽出する、ステップと、
（ｃ）抽出された前記時系列データのサンプリング数が設定値となるように、前記時系列データを補間する、ステップと、
（ｄ）畳み込みニューラルネットワークを用いて構築された学習モデルに、補間後の前記時系列データを入力して、前記ユーザの覚醒度を推定する、ステップと、
を実行させる、プログラム。 Furthermore, in order to achieve the above object, a program according to an aspect of the present invention is a program for estimating the arousal level of the user by a computer,
On the computer
(A) A step of acquiring image data including the user's face image at a set frame rate, and
(B) A step of extracting time-series data indicating the biometric information of the user from the image data acquired at the set frame rate, and
(C) A step of interpolating the time-series data so that the number of samples of the extracted time-series data becomes a set value.
(D) A step of inputting the interpolated time series data into a learning model constructed by using a convolutional neural network to estimate the arousal degree of the user.
Ru is the execution, program.

以上のように、本発明によれば、処理負担を低減しつつ、人の覚醒度を精度良く推定することができる。 As described above, according to the present invention, it is possible to accurately estimate the arousal level of a person while reducing the processing load.

図１は、本発明の実施の形態１における覚醒度推定装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an alertness estimation device according to the first embodiment of the present invention. 図２は、本発明の実施の形態１において行われる時系列データの補間の一例を示す図である。FIG. 2 is a diagram showing an example of interpolation of time series data performed in the first embodiment of the present invention. 図３は、本発明の実施の形態１で用いられる畳み込みニューラルネットワークの一例を示す図である。FIG. 3 is a diagram showing an example of a convolutional neural network used in the first embodiment of the present invention. 図４は、本発明の実施の形態１における覚醒度推定装置１０の動作を示すフロー図である。FIG. 4 is a flow chart showing the operation of the alertness estimation device 10 according to the first embodiment of the present invention. 図５は、本発明の実施の形態２における覚醒度推定装置の構成を示すブロック図である。FIG. 5 is a block diagram showing the configuration of the alertness estimation device according to the second embodiment of the present invention. 図６は、本発明の実施の形態２における覚醒度推定装置３０の動作を示すフロー図である。FIG. 6 is a flow chart showing the operation of the alertness estimation device 30 according to the second embodiment of the present invention. 図７は、本発明の実施の形態１及び２における覚醒度推定装置を実現するコンピュータの一例を示すブロック図である。FIG. 7 is a block diagram showing an example of a computer that realizes the alertness estimation device according to the first and second embodiments of the present invention.

（実施の形態１）
以下、本発明の実施の形態１における、覚醒度推定装置、覚醒度推定方法、及びプログラムについて、図１〜図４を参照しながら説明する。(Embodiment 1)
Hereinafter, the arousal level estimation device, the arousal level estimation method, and the program according to the first embodiment of the present invention will be described with reference to FIGS. 1 to 4.

［装置構成］
最初に、本実施の形態１における覚醒度推定装置の構成について図１を用いて説明する。図１は、本発明の実施の形態１における覚醒度推定装置の構成を示すブロック図である。[Device configuration]
First, the configuration of the alertness estimation device according to the first embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing a configuration of an alertness estimation device according to the first embodiment of the present invention.

図１に示す、本実施の形態における覚醒度推定装置１０は、ユーザの覚醒度を推定するための装置である。図１に示すように、覚醒度推定装置１０は、画像データ取得部１１と、時系列データ抽出部１２と、データ処理部１３と、覚醒度推定部１４とを備えている。 The arousal level estimation device 10 according to the present embodiment shown in FIG. 1 is a device for estimating the arousal level of the user. As shown in FIG. 1, the arousal level estimation device 10 includes an image data acquisition unit 11, a time series data extraction unit 12, a data processing unit 13, and an arousal level estimation unit 14.

このうち、画像データ取得部１１は、設定されたフレームレートで、ユーザの顔画像を含む画像データを取得する。また、時系列データ抽出部１２は、設定されたフレームレートで取得された画像データから、ユーザの生体情報を示す時系列データを抽出する。 Of these, the image data acquisition unit 11 acquires image data including the user's face image at the set frame rate. In addition, the time-series data extraction unit 12 extracts time-series data indicating the user's biological information from the image data acquired at the set frame rate.

データ処理部１３は、抽出された時系列データのサンプリング数が設定値となるように、時系列データを補間する。覚醒度推定部１４は、畳み込みニューラルネットワークを用いて構築された学習モデルに、補間後の時系列データを入力して、ユーザの覚醒度を推定する。 The data processing unit 13 interpolates the time-series data so that the number of samples of the extracted time-series data becomes a set value. The arousal level estimation unit 14 inputs the time-series data after interpolation into the learning model constructed by using the convolutional neural network, and estimates the arousal level of the user.

以上のように、本実施の形態１では、画像データから抽出された時系列データのサンプリング数を補間することができるので、画像データのフレームレートを予め低く抑えることができる。このため、本実施の形態１によれば、装置における処理負担を低減しつつ、人の覚醒度を精度良く推定することができる。 As described above, in the first embodiment, since the sampling number of the time series data extracted from the image data can be interpolated, the frame rate of the image data can be suppressed low in advance. Therefore, according to the first embodiment, it is possible to accurately estimate the arousal level of a person while reducing the processing load on the apparatus.

続いて、本実施の形態１における覚醒度推定装置１０の構成について、より具体的に説明する。まず、図１に示すように、本実施の形態１では、覚醒度推定装置１０は、外部の撮像装置２０に接続されている。 Subsequently, the configuration of the alertness estimation device 10 according to the first embodiment will be described more specifically. First, as shown in FIG. 1, in the first embodiment, the alertness estimation device 10 is connected to an external imaging device 20.

撮像装置２０は、デジタルカメラであり、設定されたフレームレートで画像データを出力する。また、撮像装置２０は、ユーザの顔画像を撮影できるように配置されている。画像データ取得部１１は、撮像装置２０から出力される画像データを取得する。 The image pickup device 20 is a digital camera and outputs image data at a set frame rate. Further, the image pickup device 20 is arranged so that the user's face image can be taken. The image data acquisition unit 11 acquires image data output from the image pickup apparatus 20.

また、本実施の形態１では、時系列データが示す生体情報としては、ユーザにおける、眼の開閉度合を示す情報、視線の方向を示す情報、顔の向きを示す情報、脈波を示す情報、血流を示す情報、口の開閉度合を示す情報等が挙げられる。時系列データ抽出部１２は、本実施の形態１では、フレーム毎に、画像データから上述の情報を抽出し、時系列データを生成する。 Further, in the first embodiment, the biological information indicated by the time series data includes information indicating the degree of opening and closing of the eyes, information indicating the direction of the line of sight, information indicating the direction of the face, and information indicating the pulse wave in the user. Information indicating blood flow, information indicating the degree of opening and closing of the mouth, and the like can be mentioned. In the first embodiment, the time-series data extraction unit 12 extracts the above-mentioned information from the image data for each frame and generates time-series data.

具体的には、時系列データ抽出部１２は、例えば、画像データから、ユーザの両眼を検出し、検出した両眼の大きさから開閉度合を求め、眼の開閉度合を示す情報の時系列データを生成する。また、時系列データ抽出部１２は、画像データから、ユーザの両眼の中心位置を検出し、検出した各中心位置から視線の方向を算出し、算出した視線の方向を示す情報の時系列データを生成する。 Specifically, the time-series data extraction unit 12 detects, for example, the user's eyes from the image data, obtains the degree of opening / closing from the detected size of both eyes, and time-series of information indicating the degree of opening / closing of the eyes. Generate data. Further, the time-series data extraction unit 12 detects the center positions of both eyes of the user from the image data, calculates the direction of the line of sight from each of the detected center positions, and time-series data of information indicating the calculated direction of the line of sight. To generate.

更に、時系列データ抽出部１２は、画像データから、ユーザの顔の中心線及び輪郭を検出し、検出した中心線及び輪郭の位置関係から、ユーザの顔の向きを算出し、算出した顔の向きを示す情報の時系列データを生成する。また、時系列データ抽出部１２は、画像データから、ユーザの口を検出し、検出した口の大きさから開閉度合を求め、口の開閉度合を示す情報の時系列データを生成する。 Further, the time-series data extraction unit 12 detects the center line and contour of the user's face from the image data, calculates the orientation of the user's face from the positional relationship of the detected center line and contour, and calculates the face orientation of the calculated face. Generate time-series data of information indicating the direction. Further, the time-series data extraction unit 12 detects the user's mouth from the image data, obtains the opening / closing degree from the detected mouth size, and generates time-series data of information indicating the opening / closing degree of the mouth.

また、時系列データ抽出部１２は、血液中のヘモグロビンが光の緑色成分を吸収する性質を用いて、ユーザの脈波又は血流を算出する（例えば、下記の参考文献を参照）。具体的には、時系列データ抽出部１２は、画像データから、ユーザの肌の領域を特定し、特定した領域におけるＲ、Ｇ、Ｂ各チャンネルの輝度値を算出する。そして、時系列データ抽出部１２は、血流が増えると緑色の光が吸収され、Ｇの輝度値が減少することを利用して、ユーザの脈波又は血流を算出し、脈波又は血流を示す情報の時系列データを生成する。 In addition, the time-series data extraction unit 12 calculates the pulse wave or blood flow of the user by using the property that hemoglobin in the blood absorbs the green component of light (see, for example, the following references). Specifically, the time-series data extraction unit 12 identifies the area of the user's skin from the image data, and calculates the brightness values of the R, G, and B channels in the specified area. Then, the time-series data extraction unit 12 calculates the pulse wave or blood flow of the user by utilizing the fact that green light is absorbed when the blood flow increases and the brightness value of G decreases, and the pulse wave or blood is calculated. Generate time-series data of information showing blood flow.

参考文献：梅松旭美、辻川剛範著、「ICA-Rに基づく顔映像からの高精度心拍推定法」、NECデータサイエンス研究所、一般社団法人電子情報通信学会、信学技法IEICE Technical Report 2017 References: Asami Umematsu, Takenori Tsujikawa, "High-precision heart rate estimation method from facial images based on ICA-R", NEC Data Science Laboratory, Institute of Electronics, Information and Communication Engineers, IEICE Technical Report 2017

また、データ処理部１３は、本実施の形態１では、図２に示すように、時系列データに対してアップサンプリングを行うことによって、時系列データを補間する。図２は、本発明の実施の形態１において行われる時系列データの補間の一例を示す図である。図２の例では、本来求められる時系列データのフレームレートはＲであり、実際の時系列データのフレームレートは（Ｒ／２）である。また、図中の「○」は時系列データを示している。 Further, in the first embodiment, the data processing unit 13 interpolates the time-series data by upsampling the time-series data as shown in FIG. FIG. 2 is a diagram showing an example of interpolation of time series data performed in the first embodiment of the present invention. In the example of FIG. 2, the originally required frame rate of the time series data is R, and the actual frame rate of the time series data is (R / 2). In addition, "○" in the figure indicates time series data.

図２に示すように、データ処理部１３は、サンプリング数が設定値となるように、即ち、フレームレートが（Ｒ／２）からＲとなるように、連続している２つの時系列データの間に新たな時系列データを追加する。 As shown in FIG. 2, the data processing unit 13 receives two consecutive time-series data so that the number of samplings becomes a set value, that is, the frame rate changes from (R / 2) to R. Add new time series data in between.

また、新たな時系列データの追加は、例えば、線形補間、スプライン補間によって行われる。更に、例えば、フレームレートがＲ／２のデータと、フレームレートがＲのデータとを学習データとして用いて、ニューラルネットワークを構築し、このニューラルネットワークによって補間が行われる態様であっても良い。その後、図２に示すように、アップサンプリングされた時系列データは、覚醒度推定部１４によって、畳み込みニューラルネットワークに入力される。 Further, new time series data is added by, for example, linear interpolation or spline interpolation. Further, for example, a neural network may be constructed by using data having a frame rate of R / 2 and data having a frame rate of R as learning data, and interpolation may be performed by this neural network. Then, as shown in FIG. 2, the upsampled time series data is input to the convolutional neural network by the alertness estimation unit 14.

加えて、データ処理部１３は、各種信号処理、例えば、ノイズ除去処理、欠損データの補間処理、外れ値の除去処理等を行うこともできる。このような処理により、後の覚醒度推定部１４による覚醒度の推定精度の向上が期待できる。 In addition, the data processing unit 13 can also perform various signal processing, for example, noise removal processing, missing data interpolation processing, outlier removal processing, and the like. By such processing, it can be expected that the arousal degree estimation unit 14 will improve the estimation accuracy of the arousal degree later.

また、本実施の形態１では、時系列データが示す生体情報が、２つ以上の情報である場合は、学習モデルは生体情報毎に畳み込みを行うための層を有している。ここで、図３を用いて、本実施の形態１における覚醒度推定部１４による推定処理について具体的に説明する。 Further, in the first embodiment, when the biological information indicated by the time series data is two or more pieces of information, the learning model has a layer for convolving each biological information. Here, the estimation process by the alertness estimation unit 14 in the first embodiment will be specifically described with reference to FIG.

図３は、本発明の実施の形態１で用いられる畳み込みニューラルネットワークの一例を示す図である。図３の例では、時系列データは、眼の開閉度合を示す情報、視線の方向を示す情報、及び顔の向きを示す情報といった３つの情報を示しており、情報毎に、畳み込みが行われる。また、時系列データは、時系列毎に０〜１、又は−１〜＋１等の値に正規化されているとする。 FIG. 3 is a diagram showing an example of a convolutional neural network used in the first embodiment of the present invention. In the example of FIG. 3, the time-series data shows three pieces of information such as information indicating the degree of opening and closing of the eyes, information indicating the direction of the line of sight, and information indicating the direction of the face, and convolution is performed for each information. .. Further, it is assumed that the time series data is normalized to a value such as 0 to 1 or -1 to +1 for each time series.

図３に示すように、眼の開閉度合を示す時系列データのサイズを（Ｄ_ＥＣ，Ｗ_Ｔ）とする。Ｄ_ＥＣは、眼の開閉度合を示す時系列データの数である。また、例えば、右眼と左眼それぞれの開閉に関する時系列データを入力する場合、Ｄ_ＥＣ＝２である。Ｗ_Ｔは、覚醒度推定のための時間窓幅である。例えば、１０ｓのデータを利用する場合、フレームレートＲを用いて、Ｗ_Ｔ＝１０Ｒである。As shown in FIG. 3, the size of the time-series data showing the opening and closing degree of eyes and (D _{EC, W} _T). _DEC is the number of time-series data indicating the degree of opening and closing of the eye. Further, for example, to enter the time-series data for each of the opening and closing right and left eyes, a D _{EC = 2.} W _T is the time window width for the awakening level estimation. For example, when using the 10s of data, by using the frame rate _R, a W T = 10R.

まず、覚醒度推定部１４は、サイズ（Ｄ_ＥＣ，Ｗ_Ｔ）の時系列データを畳み込み層に入力し、サイズが（Ｄ_ＥＣ，Ｗ_Ｔ）より小さい窓毎に、重みフィルタを畳み込み、バイアスを加えて、活性化関数を通して出力を得る。次に、覚醒度推定部１４は、窓の位置をシフトさせ、そして、異なる重みフィルタとバイアスとを用いて、同様の操作を実行し、サイズ（Ｄ_{ＥＣ＿Ｃ１}，Ｗ_{ＥＣ＿Ｃ１}）の出力を得る。なお、活性化関数としては、ＲｅＬＵ(Rectified Linear Unit)等が挙げられる。First, the awakening level estimation unit 14, the size _(D EC, _{W T)} when type series data to convolution layer, size _(D EC, _{W T)} for each smaller windows, convolution weight filter, a bias In addition, the output is obtained through the activation function. Next, the alertness estimation unit 14 shifts the position of the window and performs the same operation using different weight filters and biases to obtain an output of _{size (D EC_C1} , W _{EC_C1).} Examples of the activation function include ReLU (Rectified Linear Unit).

次に、覚醒度推定部１４は、プーリング層で、サイズ（Ｄ_{ＥＣ＿Ｃ１}，Ｗ_{ＥＣ＿Ｃ１}）の入力データに対して、サイズが（Ｄ_{ＥＣ＿Ｃ１}，Ｗ_{ＥＣ＿Ｃ１}）より小さい窓毎に、プーリング層においてプーリング処理を行う。例えば、よく用いられるｍａｘプーリングの場合、窓内の値の最大値が残るように処理が行われる。次に、覚醒度推定部１４は、窓の位置をシフトさせ、同様のプーリング処理を行うことで、サイズ（Ｄ_{ＥＣ＿Ｐ１}，Ｗ_{ＥＣ＿Ｐ１}）の出力を得る。Next, the awakening level estimation unit 14, in pooling layer, the size _(D _{EC_C1, W EC_C1)} for the input data, size _(D _{EC_C1, W EC_C1)} for each smaller windows, the pooling process in pooling layer Do. For example, in the case of max pooling, which is often used, processing is performed so that the maximum value in the window remains. Next, the alertness estimation unit 14 shifts the position of the window and performs the same pooling process to obtain an output _{of the size (D EC_P1} , W _{EC_P1).}

続いて、覚醒度推定部１４は、次の畳み込み層でも同様に、サイズ（Ｄ_{ＥＣ＿Ｐ１}，Ｗ_{ＥＣ＿Ｐ１}）の入力データに対して、サイズが（Ｄ_{ＥＣ＿Ｐ１}，Ｗ_{ＥＣ＿Ｐ１}）より小さい窓毎に、フィルタの畳み込み、バイアスの加算、活性化関数への入力を行う。そして、覚醒度推定部１４は、この場合も、窓の位置をシフトさせて、同様の処理を行うことで、サイズ（Ｄ_{ＥＣ＿Ｃ２}，Ｗ_{ＥＣ＿Ｃ２}）の出力を得る。Subsequently, the arousal level estimating unit 14, similarly in the next convolution layer, the size _(D _{EC_P1, W EC_P1)} for the input data, size _(D _{EC_P1, W EC_P1)} for each smaller windows, the filter Convolution, addition of bias, input to activation function. Then, in this case as well, the alertness estimation unit 14 shifts the position of the window and performs the same processing to obtain an output _{of the size (D EC_C2} , W _{EC_C2).}

更に、覚醒度推定部１４は、次のプーリング層でも同様に、サイズ（Ｄ_{ＥＣ＿Ｃ２}，Ｗ_{ＥＣ＿Ｃ２}）の入力データに対して、サイズが（Ｄ_{ＥＣ＿Ｃ２}，Ｗ_{ＥＣ＿Ｃ２}）より小さい窓毎に、プーリング処理を行う。次に、覚醒度推定部１４は、窓の位置をシフトさせて、同様のプーリング処理を行うことで、例えばサイズ（Ｄ_{ＥＣ＿Ｐ２}，１）の出力を得る。Furthermore, the awakening level estimation unit 14, similarly in the following pooling layer, the size _(D _{EC_C2, W EC_C2)} for the input data, size _(D _{EC_C2, W EC_C2)} for each smaller windows, the pooling process Do. Next, the alertness estimation unit 14 shifts the position of the window and performs the same pooling process to obtain an output _{of, for example, a size (DEC_P2, 1).}

また、本実施の形態１では、覚醒度推定部１４は、視線の方向を示す時系列データ、顔の向きを示す時系列データ、それぞれに対しても同様の処理を行い、例えば、サイズ（Ｄ_{ＥＧ＿Ｐ２}，１）の出力と、サイズ（Ｄ_{ＨＰ＿Ｐ２}，１）の出力とを得る。Further, in the first embodiment, the alertness estimation unit 14 performs the same processing on the time-series data indicating the direction of the line of sight and the time-series data indicating the direction of the face, and for example, the size (D). _{The output of EG_P2} , 1) and the output of size ( _{DHP_P2} , 1) are obtained.

次に、覚醒度推定部１４は、「連結＆平坦化」処理として（図３参照）、時系列データそれぞれ毎の出力を連結し、平坦化する。これにより、覚醒度推定部１４は、サイズ（１,Ｄ_{ＥＣ＿Ｐ２}+Ｄ_{ＥＧ＿Ｐ２}+Ｄ_{ＨＰ＿Ｐ２}）の出力を得る。Next, the alertness estimation unit 14 concatenates and flattens the outputs of each time-series data as a “concatenation & flattening” process (see FIG. 3). As a result, the alertness estimation unit 14 obtains an output of _{the size (1, D EC_P2} + D _{EG_P2} + D _{HP_P2).}

その後、覚醒度推定部１４は、全結合層において、サイズ（１,Ｄ_{ＥＣ＿Ｐ２}+Ｄ_{ＥＧ＿Ｐ２}+Ｄ_{ＨＰ＿Ｐ２}）の入力データ全てに対して、重みフィルタを畳み込み、バイアスを加えて、活性化関数を通し、出力として覚醒度推定値を得る。After that, the alertness estimation unit 14 convolves the weight filter, biases it, and passes it through the activation function for all the input data _{of the size (1, D EC_P2} + D _{EG_P2} + D _{HP_P2) in the fully connected layer.} , Get the alertness estimate as an output.

また、本実施の形態１では、学習モデルは、畳み込みニューラルネットワークを用いているが、その畳み込み層での重みフィルタ及びバイアスは、覚醒度の正解ラベルが付与されたサンプルデータを用いて、事前にディープラーニングを行うことによって学習される。また、学習は、覚醒度の正解ラベルと覚醒度の推定値との差分が少なくなるように、誤差逆伝搬法等を用いることで行うことができる。 Further, in the first embodiment, the learning model uses a convolutional neural network, but the weight filter and the bias in the convolutional layer use the sample data to which the correct answer label of the arousal degree is given in advance. Learned by doing deep learning. Further, the learning can be performed by using an error back propagation method or the like so that the difference between the correct label of the arousal degree and the estimated value of the arousal degree becomes small.

更に、学習においては、一定の確率で重みとバイアスとをゼロにして学習するDropoutを用いることで、過学習を防ぐことができる。また、本実施の形態１では、学習モデルは、フレームレートが異なる複数の時系列データ（例えば、フレームレートが、Ｒ、Ｒ／２、Ｒ／３、Ｒ／６、Ｒ／１０の時系列データ）に対して、サンプル数が設定値なるように補間を行って得られたデータを、学習データとして、畳み込みニューラルネットワークに入力することによって構築されていても良い。この場合、より精密なモデル化が可能となる。 Furthermore, in learning, overfitting can be prevented by using Dropout, which learns with a certain probability that the weight and bias are set to zero. Further, in the first embodiment, the learning model is a plurality of time series data having different frame rates (for example, time series data having frame rates of R, R / 2, R / 3, R / 6, and R / 10). ), The data obtained by interpolating so that the number of samples becomes a set value may be constructed by inputting it into a convolutional neural network as training data. In this case, more precise modeling becomes possible.

更に、本実施の形態１で用いられる学習モデルにおいて、畳み込みニューラルネットワークの構成は、特に限定されるものではない。学習モデルは、例えば、二つ目のプーリング層が取り除かれ、その代わりに、連結＆平坦化の後段に全結合層がもう一層追加された構成であっても良い。学習モデルには、種々の変形が加えられていても良い。 Further, in the learning model used in the first embodiment, the configuration of the convolutional neural network is not particularly limited. The learning model may have, for example, a configuration in which the second pooling layer is removed and instead a further fully connected layer is added after the connection & flattening. Various modifications may be added to the learning model.

［装置動作］
次に、本実施の形態１における覚醒度推定装置１０の動作について図４を用いて説明する。図４は、本発明の実施の形態１における覚醒度推定装置１０の動作を示すフロー図である。以下の説明においては、適宜図１〜図３を参酌する。また、本実施の形態１では、覚醒度推定装置１０を動作させることによって、覚醒度推定方法が実施される。よって、本実施の形態１における覚醒度推定方法の説明は、以下の覚醒度推定装置１０の動作説明に代える。[Device operation]
Next, the operation of the alertness estimation device 10 in the first embodiment will be described with reference to FIG. FIG. 4 is a flow chart showing the operation of the alertness estimation device 10 according to the first embodiment of the present invention. In the following description, FIGS. 1 to 3 will be referred to as appropriate. Further, in the first embodiment, the alertness estimation method is implemented by operating the alertness estimation device 10. Therefore, the description of the arousal level estimation method in the first embodiment is replaced with the following operation description of the arousal level estimation device 10.

図４に示すように、画像データ取得部１１は、撮像装置２０から画像データが出力されてくると、出力されてきた画像データを取得し、取得した画像データを保持する（ステップＳ１）。 As shown in FIG. 4, when the image data is output from the image pickup apparatus 20, the image data acquisition unit 11 acquires the output image data and holds the acquired image data (step S1).

次に、画像データ取得部１１は、保持している画像データの枚数が所定値に到達しているかどうかを判定する（ステップＳ２）。ステップＳ２の判定の結果、画像データの枚数が所定値に達していない場合は、画像データ取得部１１は、再度ステップＳ１を実行する。一方、ステップＳ２の判定の結果、画像データの枚数が所定値に達している場合は、画像データ取得部１１は、保持している画像データを時系列データ抽出部１２に渡す。 Next, the image data acquisition unit 11 determines whether or not the number of held image data has reached a predetermined value (step S2). As a result of the determination in step S2, if the number of image data has not reached the predetermined value, the image data acquisition unit 11 executes step S1 again. On the other hand, as a result of the determination in step S2, when the number of image data reaches a predetermined value, the image data acquisition unit 11 passes the held image data to the time series data extraction unit 12.

次に、時系列データ抽出部１２は、画像データを受け取ると、ステップＳ１で取得された画像データから、ユーザの生体情報を示す時系列データを抽出する（ステップＳ３）。また、画像データに複数のユーザが含まれている場合は、ステップＳ３において、時系列データ抽出部１２は、ユーザ毎に、時系列データを抽出することもできる。 Next, when the time-series data extraction unit 12 receives the image data, the time-series data extraction unit 12 extracts the time-series data indicating the biometric information of the user from the image data acquired in step S1 (step S3). When the image data includes a plurality of users, the time-series data extraction unit 12 can also extract the time-series data for each user in step S3.

次に、データ処理部１３は、ステップＳ３で抽出された時系列データのサンプリング数が設定値となるように、時系列データを補間する（ステップＳ４）。 Next, the data processing unit 13 interpolates the time-series data so that the number of samples of the time-series data extracted in step S3 becomes the set value (step S4).

次に、覚醒度推定部１４は、畳み込みニューラルネットワークを用いて構築された学習モデルに、ステップＳ４によって補間された後の時系列データを入力して、ユーザの覚醒度を推定する（ステップＳ５）。 Next, the arousal level estimation unit 14 inputs the time-series data after being interpolated by step S4 into the learning model constructed using the convolutional neural network, and estimates the user's arousal level (step S5). ..

具体的には、図３に示すように、時系列データが、眼の開閉度合を示す情報、視線の方向を示す情報、及び顔の向きを示す情報を示す場合は、覚醒度推定部１４は、情報毎に、畳み込みを行って、覚醒度を推定する。また、ステップＳ５の実行後は、再度ステップＳ１〜Ｓ５が実行され、常時、ユーザの覚醒度の推定が行われる。 Specifically, as shown in FIG. 3, when the time-series data shows information indicating the degree of opening and closing of the eyes, information indicating the direction of the line of sight, and information indicating the direction of the face, the alertness estimation unit 14 , For each piece of information, convolution is performed to estimate the arousal level. Further, after the execution of step S5, steps S1 to S5 are executed again, and the user's arousal level is constantly estimated.

また、覚醒度推定装置１０は、推定した覚醒度を、空調装置の制御システム、車両の運行システム等に入力する。これにより、各システムは、ユーザの覚醒度に基づいて、最適化制御を行うことができる。 Further, the arousal level estimation device 10 inputs the estimated arousal level to the control system of the air conditioner, the operation system of the vehicle, and the like. As a result, each system can perform optimization control based on the arousal level of the user.

［実施の形態１における効果］
以上のように、本実施の形態１では、画像データではなく、画像から抽出した時系列データにおいてサンプリング数を補間するので、画像データのフレームレートを予め低く抑えることができる。このため、負担が大きい画像データからの時系列データ抽出部１２の処理を軽減でき、結果、装置全体における処理負担を低減しつつ、人の覚醒度を精度良く推定することができる。また、本実施の形態１では、複数のフレームレートの入力時系列データに対して補間したデータを用いて畳み込みニューラルネットワークをモデル化することで、よりいっそう覚醒度の推定精度を向上させることもできる。また、本実施の形態１では、時系列データとして、複数の生体情報を用いることができるので、よりいっそう覚醒度の精度を向上させることもできる。[Effect in Embodiment 1]
As described above, in the first embodiment, since the sampling number is interpolated in the time series data extracted from the image instead of the image data, the frame rate of the image data can be suppressed low in advance. Therefore, the processing of the time-series data extraction unit 12 from the image data, which is a heavy burden, can be reduced, and as a result, the arousal level of a person can be estimated accurately while reducing the processing load in the entire device. Further, in the first embodiment, the estimation accuracy of the arousal degree can be further improved by modeling the convolutional neural network using the data interpolated for the input time series data of a plurality of frame rates. .. Further, in the first embodiment, since a plurality of biological information can be used as the time series data, the accuracy of the arousal degree can be further improved.

［プログラム］
本実施の形態１におけるプログラムは、コンピュータに、図４に示すステップＳ１〜Ｓ５を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態１における覚醒度推定装置１０と覚醒度推定方法とを実現することができる。この場合、コンピュータのプロセッサは、画像データ取得部１１、時系列データ抽出部１２、データ処理部１３、及び覚醒度推定部１４として機能し、処理を行なう。[program]
The program according to the first embodiment may be any program that causes a computer to execute steps S1 to S5 shown in FIG. By installing this program on a computer and executing it, the arousal degree estimation device 10 and the arousalness degree estimation method according to the first embodiment can be realized. In this case, the computer processor functions as an image data acquisition unit 11, a time series data extraction unit 12, a data processing unit 13, and an arousal degree estimation unit 14 to perform processing.

また、本実施の形態１におけるプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、画像データ取得部１１、時系列データ抽出部１２、データ処理部１３、及び覚醒度推定部１４のいずれかとして機能しても良い。 Further, the program in the first embodiment may be executed by a computer system constructed by a plurality of computers. In this case, for example, each computer may function as any one of the image data acquisition unit 11, the time series data extraction unit 12, the data processing unit 13, and the arousal degree estimation unit 14.

（実施の形態２）
次に、本発明の実施の形態２における覚醒度推定装置について、図５及び図６を参照しながら説明する。(Embodiment 2)
Next, the alertness estimation device according to the second embodiment of the present invention will be described with reference to FIGS. 5 and 6.

［装置構成］
最初に、本実施の形態２における覚醒度推定装置の構成について図５を用いて説明する。図５は、本発明の実施の形態２における覚醒度推定装置の構成を示すブロック図である。[Device configuration]
First, the configuration of the alertness estimation device according to the second embodiment will be described with reference to FIG. FIG. 5 is a block diagram showing the configuration of the alertness estimation device according to the second embodiment of the present invention.

図５に示すように、本実施の形態２における覚醒度推定装置３０は、図１に示した実施の形態１における覚醒度推定装置１０と同様の構成に加えて、フレームレート調整部３１を備えている。以下、実施の形態１との相違点を中心に説明する。 As shown in FIG. 5, the alertness estimation device 30 according to the second embodiment includes a frame rate adjusting unit 31 in addition to the same configuration as the alertness estimation device 10 according to the first embodiment shown in FIG. ing. Hereinafter, the differences from the first embodiment will be mainly described.

フレームレート調整部３１は、覚醒度推定部１４によって推定された覚醒度に応じて、フレームレートを調整する。また、フレームレート調整部３１は、フレームレートの調整後、画像データを出力する撮像装置２０に対して、調整後のフレームレートを指示する。また、フレームレート調整部３１は、撮像装置２０ではなく、画像データ取得部１１、時系列データ抽出部１２に対して、調整後のフレームレートを指示してもよい。また、フレームレート調整部３１は、抽出された時系列データが示す生体情報に応じて、フレームレートを調整することもできる。 The frame rate adjusting unit 31 adjusts the frame rate according to the arousal level estimated by the arousal level estimation unit 14. Further, the frame rate adjusting unit 31 instructs the image pickup apparatus 20 that outputs the image data after adjusting the frame rate to the adjusted frame rate. Further, the frame rate adjusting unit 31 may instruct the image data acquisition unit 11 and the time series data extraction unit 12 instead of the image pickup apparatus 20 to indicate the adjusted frame rate. Further, the frame rate adjusting unit 31 can also adjust the frame rate according to the biometric information indicated by the extracted time series data.

具体的には、フレームレート調整部３１は、覚醒度が一定している場合は、フレームレートを低く設定し、覚醒度推定装置３０における処理負担を低下させる。一方、フレームレート調整部３１は、覚醒度が大きく変化している場合（変化の範囲が所定の範囲を超えている場合）は、フレームレートを高く設定し、覚醒度の推定精度を向上させる。 Specifically, when the arousal level is constant, the frame rate adjusting unit 31 sets the frame rate low to reduce the processing load on the arousal level estimation device 30. On the other hand, when the arousal degree has changed significantly (when the range of change exceeds a predetermined range), the frame rate adjusting unit 31 sets the frame rate high to improve the estimation accuracy of the arousal degree.

［装置動作］
次に、本実施の形態２における覚醒度推定装置３０の動作について図６を用いて説明する。図６は、本発明の実施の形態２における覚醒度推定装置３０の動作を示すフロー図である。以下の説明においては、適宜図５を参酌する。また、本実施の形態２では、覚醒度推定装置３０を動作させることによって、覚醒度推定方法が実施される。よって、本実施の形態２における覚醒度推定方法の説明は、以下の覚醒度推定装置３０の動作説明に代える。[Device operation]
Next, the operation of the alertness estimation device 30 in the second embodiment will be described with reference to FIG. FIG. 6 is a flow chart showing the operation of the alertness estimation device 30 according to the second embodiment of the present invention. In the following description, FIG. 5 will be referred to as appropriate. Further, in the second embodiment, the alertness estimation method is implemented by operating the alertness estimation device 30. Therefore, the description of the arousal level estimation method in the second embodiment is replaced with the following operation description of the arousal level estimation device 30.

図６に示すように、画像データ取得部１１は、撮像装置２０から画像データが出力されてくると、出力されてきた画像データを取得し、取得した画像データを保持する（ステップＳ１１）。 As shown in FIG. 6, when the image data is output from the image pickup apparatus 20, the image data acquisition unit 11 acquires the output image data and holds the acquired image data (step S11).

次に、画像データ取得部１１は、保持している画像データの枚数が所定値に到達しているかどうかを判定する（ステップＳ１２）。ステップＳ１２の判定の結果、画像データの枚数が所定値に達していない場合は、画像データ取得部１１は、再度ステップＳ１１を実行する。一方、ステップＳｓ２の判定の結果、画像データの枚数が所定値に達している場合は、画像データ取得部１１は、保持している画像データを時系列データ抽出部１２に渡す。 Next, the image data acquisition unit 11 determines whether or not the number of held image data has reached a predetermined value (step S12). As a result of the determination in step S12, if the number of image data has not reached the predetermined value, the image data acquisition unit 11 executes step S11 again. On the other hand, as a result of the determination in step Ss2, when the number of image data reaches a predetermined value, the image data acquisition unit 11 passes the held image data to the time series data extraction unit 12.

次に、時系列データ抽出部１２は、画像データを受け取ると、ステップＳ１１で取得された画像データから、ユーザの生体情報を示す時系列データを抽出する（ステップＳ１３）。また、画像データに複数のユーザが含まれている場合は、ステップＳ１３において、時系列データ抽出部１２は、ユーザ毎に、時系列データを抽出することもできる。 Next, when the time-series data extraction unit 12 receives the image data, the time-series data extraction unit 12 extracts the time-series data indicating the biometric information of the user from the image data acquired in step S11 (step S13). When the image data includes a plurality of users, the time-series data extraction unit 12 can also extract the time-series data for each user in step S13.

次に、データ処理部１３は、ステップＳ１３で抽出された時系列データのサンプリング数が設定値となるように、時系列データを補間する（ステップＳ１４）。 Next, the data processing unit 13 interpolates the time-series data so that the number of samples of the time-series data extracted in step S13 becomes the set value (step S14).

次に、覚醒度推定部１４は、畳み込みニューラルネットワークを用いて構築された学習モデルに、ステップＳ１４によって補間された後の時系列データを入力して、ユーザの覚醒度を推定する（ステップＳ１５）。 Next, the arousal level estimation unit 14 inputs the time-series data after being interpolated by step S14 into the learning model constructed using the convolutional neural network, and estimates the user's arousal level (step S15). ..

以上のステップＳ１１〜Ｓ１５の実行により、ユーザの覚醒度が推定される。ステップＳ１１〜Ｓ１５は、図４に示したステップＳ１〜Ｓ５と同様のステップである。 By executing the above steps S11 to S15, the user's arousal level is estimated. Steps S11 to S15 are the same steps as steps S1 to S5 shown in FIG.

次に、ステップＳ１５の実行後、フレームレート調整部３１は、ステップＳ１５によって推定された覚醒度に応じて、フレームレートを調整する（ステップＳ１６）。続いて、フレームレート調整部３１は、撮像装置２０に対して、ステップＳ１６で調整したフレームレートを指示する（ステップＳ１７）。 Next, after the execution of step S15, the frame rate adjusting unit 31 adjusts the frame rate according to the arousal level estimated by step S15 (step S16). Subsequently, the frame rate adjusting unit 31 instructs the image pickup apparatus 20 of the frame rate adjusted in step S16 (step S17).

ステップＳ１７の実行後、撮像装置２０は、指示されたフレームレートで、画像データを出力する。また、ステップＳ１７の実行後は、再度ステップＳ１１〜Ｓ１７が実行されるが、その際、指示されたフレームレートで、時系列データが生成されて、新たに覚醒度が推定されることになる。また、本実施の形態２においても、再度ステップＳ１１〜Ｓ１５が実行されることで、常時、ユーザの覚醒度の推定が行われる。 After the execution of step S17, the image pickup apparatus 20 outputs the image data at the instructed frame rate. Further, after the execution of step S17, steps S11 to S17 are executed again, and at that time, time series data is generated at the instructed frame rate, and the arousal level is newly estimated. Further, also in the second embodiment, the user's arousal level is always estimated by executing steps S11 to S15 again.

また、本実施の形態２においても、覚醒度推定装置３０は、推定した覚醒度を、空調装置の制御システム、車両の運行システム等に入力する。これにより、各システムは、ユーザの覚醒度に基づいて、最適化制御を行うことができる。 Further, also in the second embodiment, the arousal level estimation device 30 inputs the estimated arousal level to the control system of the air conditioner, the vehicle operation system, and the like. As a result, each system can perform optimization control based on the arousal level of the user.

［実施の形態２における効果］
以上のように、本実施の形態２では、画像データのフレームレートを調整することができる。本実施の形態２によれば、求められる覚醒度の精度に応じて、適切なフレームレートを設定することができる。また、本実施の形態２においても、実施の形態１と同様の効果を得ることができる。[Effect in Embodiment 2]
As described above, in the second embodiment, the frame rate of the image data can be adjusted. According to the second embodiment, an appropriate frame rate can be set according to the required accuracy of the arousal degree. Further, also in the second embodiment, the same effect as that of the first embodiment can be obtained.

［プログラム］
本実施の形態２におけるプログラムは、コンピュータに、図６に示すステップＳ１１〜Ｓ１７を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態１における覚醒度推定装置３０と覚醒度推定方法とを実現することができる。この場合、コンピュータのプロセッサは、画像データ取得部１１、時系列データ抽出部１２、データ処理部１３、覚醒度推定部１４、及びフレームレート調整部３１として機能し、処理を行なう。[program]
The program according to the second embodiment may be any program that causes the computer to execute steps S11 to S17 shown in FIG. By installing this program on a computer and executing it, the arousal degree estimation device 30 and the arousalness degree estimation method according to the first embodiment can be realized. In this case, the computer processor functions as an image data acquisition unit 11, a time series data extraction unit 12, a data processing unit 13, an arousal degree estimation unit 14, and a frame rate adjustment unit 31 to perform processing.

また、本実施の形態２におけるプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、画像データ取得部１１、時系列データ抽出部１２、データ処理部１３、覚醒度推定部１４、及びフレームレート調整部３１のいずれかとして機能しても良い。 Further, the program in the second embodiment may be executed by a computer system constructed by a plurality of computers. In this case, for example, each computer may function as one of the image data acquisition unit 11, the time series data extraction unit 12, the data processing unit 13, the arousal degree estimation unit 14, and the frame rate adjustment unit 31, respectively. good.

（物理構成）
ここで、本発明の実施の形態１及び２におけるプログラムを実行することによって、覚醒度推定装置を実現するコンピュータについて図７を用いて説明する。図７は、本発明の実施の形態１及び２における覚醒度推定装置を実現するコンピュータの一例を示すブロック図である。(Physical configuration)
Here, a computer that realizes an alertness estimation device by executing the programs according to the first and second embodiments of the present invention will be described with reference to FIG. 7. FIG. 7 is a block diagram showing an example of a computer that realizes the alertness estimation device according to the first and second embodiments of the present invention.

図７に示すように、コンピュータ１１０は、ＣＰＵ（Central Processing Unit）１１１と、メインメモリ１１２と、記憶装置１１３と、入力インターフェイス１１４と、表示コントローラ１１５と、データリーダ／ライタ１１６と、通信インターフェイス１１７とを備える。これらの各部は、バス１２１を介して、互いにデータ通信可能に接続される。なお、コンピュータ１１０は、ＣＰＵ１１１に加えて、又はＣＰＵ１１１に代えて、ＧＰＵ（Graphics Processing Unit）、又はＦＰＧＡ（Field-ProgrammableGate Array）を備えていても良い。 As shown in FIG. 7, the computer 110 includes a CPU (Central Processing Unit) 111, a main memory 112, a storage device 113, an input interface 114, a display controller 115, a data reader / writer 116, and a communication interface 117. And. Each of these parts is connected to each other via a bus 121 so as to be capable of data communication. The computer 110 may include a GPU (Graphics Processing Unit) or an FPGA (Field-ProgrammableGate Array) in addition to the CPU 111 or in place of the CPU 111.

ＣＰＵ１１１は、記憶装置１１３に格納された、本実施の形態におけるプログラム（コード）をメインメモリ１１２に展開し、これらを所定順序で実行することにより、各種の演算を実施する。メインメモリ１１２は、典型的には、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性の記憶装置である。また、本実施の形態におけるプログラムは、コンピュータ読み取り可能な記録媒体１２０に格納された状態で提供される。なお、本実施の形態におけるプログラムは、通信インターフェイス１１７を介して接続されたインターネット上で流通するものであっても良い。 The CPU 111 expands the programs (codes) of the present embodiment stored in the storage device 113 into the main memory 112 and executes them in a predetermined order to perform various operations. The main memory 112 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory). Further, the program according to the present embodiment is provided in a state of being stored in a computer-readable recording medium 120. The program in the present embodiment may be distributed on the Internet connected via the communication interface 117.

また、記憶装置１１３の具体例としては、ハードディスクドライブの他、フラッシュメモリ等の半導体記憶装置が挙げられる。入力インターフェイス１１４は、ＣＰＵ１１１と、キーボード及びマウスといった入力機器１１８との間のデータ伝送を仲介する。表示コントローラ１１５は、ディスプレイ装置１１９と接続され、ディスプレイ装置１１９での表示を制御する。 Further, specific examples of the storage device 113 include a semiconductor storage device such as a flash memory in addition to a hard disk drive. The input interface 114 mediates data transmission between the CPU 111 and an input device 118 such as a keyboard and mouse. The display controller 115 is connected to the display device 119 and controls the display on the display device 119.

データリーダ／ライタ１１６は、ＣＰＵ１１１と記録媒体１２０との間のデータ伝送を仲介し、記録媒体１２０からのプログラムの読み出し、及びコンピュータ１１０における処理結果の記録媒体１２０への書き込みを実行する。通信インターフェイス１１７は、ＣＰＵ１１１と、他のコンピュータとの間のデータ伝送を仲介する。 The data reader / writer 116 mediates the data transmission between the CPU 111 and the recording medium 120, reads the program from the recording medium 120, and writes the processing result in the computer 110 to the recording medium 120. The communication interface 117 mediates data transmission between the CPU 111 and another computer.

また、記録媒体１２０の具体例としては、ＣＦ（Compact Flash（登録商標））及びＳＤ（Secure Digital）等の汎用的な半導体記憶デバイス、フレキシブルディスク（Flexible Disk）等の磁気記録媒体、又はＣＤ−ＲＯＭ（Compact DiskRead Only Memory）などの光学記録媒体が挙げられる。 Specific examples of the recording medium 120 include a general-purpose semiconductor storage device such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), a magnetic recording medium such as a flexible disk, or a CD-. Examples include optical recording media such as ROM (Compact DiskRead Only Memory).

なお、本実施の形態における覚醒度推定装置は、プログラムがインストールされたコンピュータではなく、各部に対応したハードウェアを用いることによっても実現可能である。更に、覚醒度推定装置は、一部がプログラムで実現され、残りの部分がハードウェアで実現されていてもよい。 The arousal level estimation device in the present embodiment can also be realized by using the hardware corresponding to each part instead of the computer in which the program is installed. Further, the alertness estimation device may be partially realized by a program and the rest may be realized by hardware.

上述した実施の形態の一部又は全部は、以下に記載する（付記１）〜（付記１８）によって表現することができるが、以下の記載に限定されるものではない。 A part or all of the above-described embodiments can be expressed by the following descriptions (Appendix 1) to (Appendix 18), but the present invention is not limited to the following description.

（付記１）
ユーザの覚醒度を推定するための装置であって、
設定されたフレームレートで、前記ユーザの顔画像を含む画像データを取得する、画像データ取得部と、
設定された前記フレームレートで取得された前記画像データから、前記ユーザの生体情報を示す時系列データを抽出する、時系列データ抽出部と、
抽出された前記時系列データのサンプリング数が設定値となるように、前記時系列データを補間する、データ処理部と、
畳み込みニューラルネットワークを用いて構築された学習モデルに、補間後の前記時系列データを入力して、前記ユーザの覚醒度を推定する、覚醒度推定部と、
を備えていることを特徴とする覚醒度推定装置。(Appendix 1)
A device for estimating the user's alertness,
An image data acquisition unit that acquires image data including the user's face image at a set frame rate, and
A time-series data extraction unit that extracts time-series data indicating the biometric information of the user from the image data acquired at the set frame rate, and a time-series data extraction unit.
A data processing unit that interpolates the time-series data so that the number of samples of the extracted time-series data becomes a set value.
An arousal level estimation unit that estimates the arousal level of the user by inputting the interpolated time series data into a learning model constructed using a convolutional neural network.
An arousal level estimation device characterized by being equipped with.

（付記２）
付記１に記載の覚醒度推定装置であって、
前記学習モデルが、フレームレートが異なる複数の時系列データに対して、サンプリング数が設定値になるように補間を行って得られたデータを、学習データとして、畳み込みニューラルネットワークに入力することによって構築されている、
ことを特徴とする覚醒度推定装置。(Appendix 2)
The arousal level estimation device according to Appendix 1.
The training model is constructed by inputting data obtained by interpolating a plurality of time series data having different frame rates so that the number of samples becomes a set value into a convolutional neural network as training data. Has been
An arousal level estimation device characterized by this.

（付記３）
付記１または２に記載の覚醒度推定装置であって、
前記時系列データが示す生体情報が、前記ユーザにおける、眼の開閉度合を示す情報、視線の方向を示す情報、顔の向きを示す情報、脈波を示す情報、血流を示す情報、口の開閉度合を示す情報のうち、少なくとも１つである、
ことを特徴とする覚醒度推定装置。(Appendix 3)
The arousal level estimation device according to Appendix 1 or 2.
The biological information indicated by the time-series data includes information indicating the degree of opening and closing of the eyes, information indicating the direction of the line of sight, information indicating the direction of the face, information indicating pulse waves, information indicating blood flow, and information of the mouth of the user. At least one of the information indicating the degree of opening / closing,
An arousal level estimation device characterized by this.

（付記４）
付記３に記載の覚醒度推定装置であって、
前記時系列データが示す生体情報が、２つ以上の情報である場合に、前記学習モデルが、前記生体情報毎に、畳み込みを行うための層を有している、
ことを特徴とする覚醒度推定装置。(Appendix 4)
The arousal level estimation device described in Appendix 3,
When the biometric information indicated by the time series data is two or more pieces of information, the learning model has a layer for convolution for each of the biometric information.
An arousal level estimation device characterized by this.

（付記５）
付記１〜４のいずれかに記載の覚醒度推定装置であって、
推定された前記覚醒度に応じて、前記フレームレートを調整する、フレームレート調整部を更に備えている、
ことを特徴とする覚醒度推定装置。(Appendix 5)
The arousal level estimation device according to any one of Appendix 1 to 4.
It further includes a frame rate adjusting unit that adjusts the frame rate according to the estimated alertness.
An arousal level estimation device characterized by this.

（付記６）
付記５に記載の覚醒度推定装置であって、
前記フレームレート調整部が、更に、抽出された前記時系列データが示す生体情報に応じて、前記フレームレートを調整する、
ことを特徴とする覚醒度推定装置。(Appendix 6)
The arousal level estimation device according to Appendix 5.
The frame rate adjusting unit further adjusts the frame rate according to the biometric information indicated by the extracted time series data.
An arousal level estimation device characterized by this.

（付記７）
ユーザの覚醒度を推定するための方法であって、
（ａ）設定されたフレームレートで、前記ユーザの顔画像を含む画像データを取得する、ステップと、
（ｂ）設定された前記フレームレートで取得された前記画像データから、前記ユーザの生体情報を示す時系列データを抽出する、ステップと、
（ｃ）抽出された前記時系列データのサンプリング数が設定値となるように、前記時系列データを補間する、ステップと、
（ｄ）畳み込みニューラルネットワークを用いて構築された学習モデルに、補間後の前記時系列データを入力して、前記ユーザの覚醒度を推定する、ステップと、
を有することを特徴とする覚醒度推定方法。(Appendix 7)
A method for estimating user alertness,
(A) A step of acquiring image data including the user's face image at a set frame rate, and
(B) A step of extracting time-series data indicating the biometric information of the user from the image data acquired at the set frame rate, and
(C) A step of interpolating the time-series data so that the number of samples of the extracted time-series data becomes a set value.
(D) A step of inputting the interpolated time series data into a learning model constructed by using a convolutional neural network to estimate the arousal degree of the user.
A method for estimating arousal level, which comprises.

（付記８）
付記７に記載の覚醒度推定方法であって、
前記学習モデルが、フレームレートが異なる複数の時系列データに対して、サンプリング数が設定値になるように補間を行って得られたデータを、学習データとして、畳み込みニューラルネットワークに入力することによって構築されている、
ことを特徴とする覚醒度推定方法。(Appendix 8)
The arousal level estimation method described in Appendix 7
The training model is constructed by inputting data obtained by interpolating a plurality of time series data having different frame rates so that the number of samples becomes a set value into a convolutional neural network as training data. Has been
An alertness estimation method characterized by this.

（付記９）
付記７または８に記載の覚醒度推定方法であって、
前記時系列データが示す生体情報が、前記ユーザにおける、眼の開閉度合を示す情報、視線の方向を示す情報、顔の向きを示す情報、脈波を示す情報、血流を示す情報、口の開閉度合を示す情報のうち、少なくとも１つである、
ことを特徴とする覚醒度推定方法。(Appendix 9)
The arousal level estimation method described in Appendix 7 or 8, wherein the arousal level is estimated.
The biological information indicated by the time-series data includes information indicating the degree of opening and closing of the eyes, information indicating the direction of the line of sight, information indicating the direction of the face, information indicating pulse waves, information indicating blood flow, and information of the mouth of the user. At least one of the information indicating the degree of opening / closing,
An alertness estimation method characterized by this.

（付記１０）
付記９に記載の覚醒度推定方法であって、
前記時系列データが示す生体情報が、２つ以上の情報である場合に、前記学習モデルが、前記生体情報毎に、畳み込みを行うための層を有している、
ことを特徴とする覚醒度推定方法。(Appendix 10)
The arousal level estimation method described in Appendix 9,
When the biometric information indicated by the time series data is two or more pieces of information, the learning model has a layer for convolution for each of the biometric information.
An alertness estimation method characterized by this.

（付記１１）
付記７〜１０のいずれかに記載の覚醒度推定方法であって、
（ｅ）推定された前記覚醒度に応じて、前記フレームレートを調整する、ステップを更に有している、
ことを特徴とする覚醒度推定方法。(Appendix 11)
The arousal level estimation method according to any one of Appendix 7 to 10.
(E) Further having a step of adjusting the frame rate according to the estimated alertness.
An alertness estimation method characterized by this.

（付記１２）
付記１１に記載の覚醒度推定方法であって、
前記（ｅ）のステップにおいて、更に、抽出された前記時系列データが示す生体情報に応じて、前記フレームレートを調整する、
ことを特徴とする覚醒度推定方法。(Appendix 12)
The arousal level estimation method described in Appendix 11,
In the step (e), the frame rate is further adjusted according to the biometric information indicated by the extracted time series data.
An alertness estimation method characterized by this.

（付記１３）
コンピュータによってユーザの覚醒度を推定するためのプログラムであって、
前記コンピュータに、
（ａ）設定されたフレームレートで、前記ユーザの顔画像を含む画像データを取得する、ステップと、
（ｂ）設定された前記フレームレートで取得された前記画像データから、前記ユーザの生体情報を示す時系列データを抽出する、ステップと、
（ｃ）抽出された前記時系列データのサンプリング数が設定値となるように、前記時系列データを補間する、ステップと、
（ｄ）畳み込みニューラルネットワークを用いて構築された学習モデルに、補間後の前記時系列データを入力して、前記ユーザの覚醒度を推定する、ステップと、
を実行させる、プログラム。 (Appendix 13)
A program for estimating the arousal level of the user by a computer,
On the computer
(A) A step of acquiring image data including the user's face image at a set frame rate, and
(B) A step of extracting time-series data indicating the biometric information of the user from the image data acquired at the set frame rate, and
(C) A step of interpolating the time-series data so that the number of samples of the extracted time-series data becomes a set value.
(D) A step of inputting the interpolated time series data into a learning model constructed by using a convolutional neural network to estimate the arousal degree of the user.
Ru is the execution, program.

（付記１４）
付記１３に記載のプログラムであって、
前記学習モデルが、フレームレートが異なる複数の時系列データに対して、サンプリング数が設定値になるように補間を行って得られたデータを、学習データとして、畳み込みニューラルネットワークに入力することによって構築されている、
ことを特徴とするプログラム。 (Appendix 14)
The program described in Appendix 13
The training model is constructed by inputting data obtained by interpolating a plurality of time series data having different frame rates so that the number of samples becomes a set value into a convolutional neural network as training data. Has been
A program characterized by that.

（付記１５）
付記１３または１４に記載のプログラムであって、
前記時系列データが示す生体情報が、前記ユーザにおける、眼の開閉度合を示す情報、視線の方向を示す情報、顔の向きを示す情報、脈波を示す情報、血流を示す情報、口の開閉度合を示す情報のうち、少なくとも１つである、
ことを特徴とするプログラム。 (Appendix 15)
The program described in Appendix 13 or 14,
The biological information indicated by the time-series data is information indicating the degree of opening and closing of the eyes, information indicating the direction of the line of sight, information indicating the direction of the face, information indicating the pulse wave, information indicating the blood flow, and information of the mouth of the user. At least one of the information indicating the degree of opening / closing,
A program characterized by that.

（付記１６）
付記１５に記載のプログラムであって、
前記時系列データが示す生体情報が、２つ以上の情報である場合に、前記学習モデルが、前記生体情報毎に、畳み込みを行うための層を有している、
ことを特徴とするプログラム。 (Appendix 16)
The program described in Appendix 15
When the biometric information indicated by the time series data is two or more pieces of information, the learning model has a layer for convolution for each of the biometric information.
A program characterized by that.

（付記１７）
付記１３〜１６のいずれかに記載のプログラムであって、
前記コンピュータに、
（ｅ）推定された前記覚醒度に応じて、前記フレームレートを調整する、ステップを更に実行させる、
ことを特徴とするプログラム。 (Appendix 17)
The program described in any of Appendix 13 to 16.
Before Symbol computer,
(E) in response to the estimated the awakening degree, to adjust the frame rate, Ru further to execute the steps,
A program characterized by that.

（付記１８）
付記１７に記載のプログラムであって、
前記（ｅ）のステップにおいて、更に、抽出された前記時系列データが示す生体情報に応じて、前記フレームレートを調整する、
ことを特徴とするプログラム。 (Appendix 18)
The program described in Appendix 17,
In the step (e), the frame rate is further adjusted according to the biometric information indicated by the extracted time series data.
A program characterized by that.

以上、実施の形態を参照して本願発明を説明したが、本願発明は上記実施の形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described above with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made within the scope of the present invention in terms of the structure and details of the present invention.

以上のように、本発明によれば、処理負担を低減しつつ、人の覚醒度を精度良く推定することができる。本発明は、人の覚醒度の推定が求められる種々のシステム、例えば、空調システム、自動車等の乗り物の運行システム等に有用である。 As described above, according to the present invention, it is possible to accurately estimate the arousal level of a person while reducing the processing load. The present invention is useful for various systems that require estimation of human alertness, such as air conditioning systems, vehicle operation systems such as automobiles, and the like.

１０覚醒度推定装置（実施の形態１）
１１画像データ取得部
１２時系列データ抽出部
１３データ処理部
１４覚醒度推定部
２０撮像装置
３０覚醒度推定装置（実施の形態２）
３１フレームレート調整部
１１０コンピュータ
１１１ＣＰＵ
１１２メインメモリ
１１３記憶装置
１１４入力インターフェイス
１１５表示コントローラ
１１６データリーダ／ライタ
１１７通信インターフェイス
１１８入力機器
１１９ディスプレイ装置
１２０記録媒体
１２１バス
10 Alertness estimation device (Embodiment 1)
11 Image data acquisition unit 12 Time-series data extraction unit 13 Data processing unit 14 Arousal level estimation unit 20 Imaging device 30 Arousal level estimation device (Embodiment 2)
31 Frame rate adjustment unit 110 Computer 111 CPU
112 Main memory 113 Storage device 114 Input interface 115 Display controller 116 Data reader / writer 117 Communication interface 118 Input device 119 Display device 120 Recording medium 121 Bus

Claims

A device for estimating the user's alertness,
An image data acquisition unit that acquires image data including the user's face image at a set frame rate, and
A time-series data extraction unit that extracts time-series data indicating the biometric information of the user from the image data acquired at the set frame rate, and a time-series data extraction unit.
A data processing unit that interpolates the time-series data so that the number of samples of the extracted time-series data becomes a set value.
An arousal level estimation unit that estimates the arousal level of the user by inputting the interpolated time series data into a learning model constructed using a convolutional neural network.
An arousal level estimation device characterized by being equipped with.

The arousal level estimation device according to claim 1.
The training model is constructed by inputting data obtained by interpolating a plurality of time series data having different frame rates so that the number of samples becomes a set value into a convolutional neural network as training data. Has been
An arousal level estimation device characterized by this.

The alertness estimation device according to claim 1 or 2.
The biological information indicated by the time-series data includes information indicating the degree of opening and closing of the eyes, information indicating the direction of the line of sight, information indicating the direction of the face, information indicating pulse waves, information indicating blood flow, and information of the mouth of the user. At least one of the information indicating the degree of opening / closing,
An arousal level estimation device characterized by this.

The alertness estimation device according to claim 3.
When the biometric information indicated by the time series data is two or more pieces of information, the learning model has a layer for convolution for each of the biometric information.
An arousal level estimation device characterized by this.

The alertness estimation device according to any one of claims 1 to 4.
It further includes a frame rate adjusting unit that adjusts the frame rate according to the estimated alertness.
An arousal level estimation device characterized by this.

The alertness estimation device according to claim 5.
The frame rate adjusting unit further adjusts the frame rate according to the biometric information indicated by the extracted time series data.
An arousal level estimation device characterized by this.

A method for estimating user alertness,
(A) A step of acquiring image data including the user's face image at a set frame rate, and
(B) A step of extracting time-series data indicating the biometric information of the user from the image data acquired at the set frame rate, and
(C) A step of interpolating the time-series data so that the number of samples of the extracted time-series data becomes a set value.
(D) A step of inputting the interpolated time series data into a learning model constructed by using a convolutional neural network to estimate the arousal degree of the user.
A method for estimating arousal level, which comprises.

The arousal level estimation method according to claim 7.
The training model is constructed by inputting data obtained by interpolating a plurality of time series data having different frame rates so that the number of samples becomes a set value into a convolutional neural network as training data. Has been
An alertness estimation method characterized by this.

The arousal level estimation method according to claim 7 or 8.
The biological information indicated by the time-series data includes information indicating the degree of opening and closing of the eyes, information indicating the direction of the line of sight, information indicating the direction of the face, information indicating pulse waves, information indicating blood flow, and information of the mouth of the user. At least one of the information indicating the degree of opening / closing,
An alertness estimation method characterized by this.

The arousal level estimation method according to claim 9.
When the biometric information indicated by the time series data is two or more pieces of information, the learning model has a layer for convolution for each of the biometric information.
An alertness estimation method characterized by this.

The arousal level estimation method according to any one of claims 7 to 10.
(E) Further having a step of adjusting the frame rate according to the estimated alertness.
An alertness estimation method characterized by this.

The arousal level estimation method according to claim 11.
In the step (e), the frame rate is further adjusted according to the biometric information indicated by the extracted time series data.
An alertness estimation method characterized by this.

A program for estimating the arousal level of the user by a computer,
On the computer
(A) A step of acquiring image data including the user's face image at a set frame rate, and
(B) A step of extracting time-series data indicating the biometric information of the user from the image data acquired at the set frame rate, and
(C) A step of interpolating the time-series data so that the number of samples of the extracted time-series data becomes a set value.
(D) A step of inputting the interpolated time series data into a learning model constructed by using a convolutional neural network to estimate the arousal degree of the user.
Ru is the execution, program.

The program according to claim 13.
The training model is constructed by inputting data obtained by interpolating a plurality of time series data having different frame rates so that the number of samples becomes a set value into a convolutional neural network as training data. Has been
A program characterized by that.

The program according to claim 13 or 14.
The biological information indicated by the time-series data includes information indicating the degree of opening and closing of the eyes, information indicating the direction of the line of sight, information indicating the direction of the face, information indicating pulse waves, information indicating blood flow, and information of the mouth of the user. At least one of the information indicating the degree of opening / closing,
A program characterized by that.

The program according to claim 15.
When the biometric information indicated by the time series data is two or more pieces of information, the learning model has a layer for convolution for each of the biometric information.
A program characterized by that.

The program according to any one of claims 13 to 16.
On the computer
(E) in response to the estimated the awakening degree, to adjust the frame rate, Ru further to execute the steps,
A program characterized by that.

The program according to claim 17.
In the step (e), the frame rate is further adjusted according to the biometric information indicated by the extracted time series data.
A program characterized by that.