JP2021058573A

JP2021058573A - Cognitive function prediction device, cognitive function prediction method, program and system

Info

Publication number: JP2021058573A
Application number: JP2020150612A
Authority: JP
Inventors: 祐希花房; Yuki Hanabusa; 斉志渡辺; Hiroshi Watanabe; 村山　宣人; Mobuto Murayama; 宣人村山; 宏季田中; Hiroki Tanaka; 中村　哲; Satoru Nakamura; 哲中村; 喬工藤; Takashi Kudo; 浩祥足立; Hiroyoshi Adachi
Original assignee: Suntory Holdings Ltd
Current assignee: Suntory Holdings Ltd
Priority date: 2019-10-08
Filing date: 2020-09-08
Publication date: 2021-04-15
Anticipated expiration: 2040-09-08
Also published as: JP7390268B2

Abstract

To provide a device, method, program and system for predicting the state or symptom due to the reduction in a cognitive function.SOLUTION: A cognitive function prediction device comprises: an information transmission unit 11 which transmits voice information and image information to a user; a measurement unit 12 which measures a user voice and/or user image; a feature data creation unit 131 which creates at least one kind of feature data from voice feature data, language feature data and image feature data on the basis of the measurement result obtained in the measurement unit; a feature amount extraction unit 132 which extracts a feature amount from the feature data; and a cognitive function prediction unit 133 which predicts whether or not the user has a tendency of the state or symptom due to the reduction in the cognitive function on the basis of the feature amount. The information transmitted from the information transmission unit includes an atypical question. The atypical question includes a question about a past event according to the user's age.SELECTED DRAWING: Figure 1

Description

本発明は、認知機能予測装置、認知機能予測方法、プログラム及びシステムに関する。 The present invention relates to a cognitive function prediction device, a cognitive function prediction method, a program and a system.

近年、先進国では高齢化が進んでおり、特に日本の高齢化率（総人口の占める６５歳以上人口の割合）は、厚生労働省の調査によると、２０１７年に２７．５％となり、超高齢社会に突入している。今後も高齢化率は増加すると考えられており、高齢者の医療対策が重要な課題になっている。そのうちの一つに、認知症が挙げられる。認知症とは、記憶、思考、および行動などの認知機能が低下することで、日常生活に支障をきたす状態になることを指す。現在、認知症を含む認知機能の低下に起因する状態または症状の有効な治療方法は確立されておらず、患者およびその家族が将来の計画を立てるなどの診断後支援が必要となる。そのためには、認知機能の低下に起因する状態または症状の早期発見が重要となる。 In recent years, the aging of the population has progressed in developed countries. In particular, the aging rate of Japan (the ratio of the total population aged 65 and over) was 27.5% in 2017, according to a survey by the Ministry of Health, Labor and Welfare. We are entering society. The aging rate is expected to increase in the future, and medical measures for the elderly have become an important issue. One of them is dementia. Dementia refers to a condition in which cognitive functions such as memory, thinking, and behavior are impaired, which interferes with daily life. Currently, no effective treatment for conditions or symptoms caused by cognitive decline, including dementia, has been established, and post-diagnosis support such as future planning by patients and their families is required. For that purpose, early detection of conditions or symptoms caused by cognitive decline is important.

認知機能の低下に起因する状態又は症状の早期発見は、神経心理検査や血液検査、脳画像検査などを組み合わせて行われる。しかし、この検査には、侵襲的な検査が含まれており、被験者に不安感やストレスを与えるため、被験者への負担が大きい。このため、非侵襲で手軽な検出法が必要とされている。よって、これまでに、非侵襲で手軽に認知機能の低下に起因する状態又は症状を検出する手法が、数多く提案されている。 Early detection of conditions or symptoms caused by cognitive decline is performed by combining neuropsychological tests, blood tests, and brain imaging tests. However, this test includes an invasive test, which causes anxiety and stress to the subject, and thus places a heavy burden on the subject. Therefore, a non-invasive and easy detection method is required. Therefore, many methods have been proposed so far that are non-invasive and easily detect a condition or symptom caused by a decrease in cognitive function.

提案された手法に、言語情報を用いた研究や音声情報を用いた研究がある（例えば、非特許文献１、２及び３）。これらは、写真の叙述や神経心理検査中の一方的な発話を分析したものがほとんどである。また、エージェントを用いて、対話的に検出する手法も提案されている（非特許文献４及び５）。この手法では、神経心理検査を元に作成された質問を３問用意し、その応答の音声情報や言語情報から検出を試みている。しかし、神経心理検査の質問は、高齢者が覚え、あらかじめ対策を講じられる可能性があった。 Proposed methods include research using linguistic information and research using voice information (for example, Non-Patent Documents 1, 2 and 3). Most of these are analyzes of unilateral utterances during photographic narratives and neuropsychological tests. In addition, a method of interactively detecting using an agent has also been proposed (Non-Patent Documents 4 and 5). In this method, three questions created based on a neuropsychiatric test are prepared, and detection is attempted from the voice information and linguistic information of the responses. However, the questions of the neuropsychological test could be remembered by the elderly and taken in advance.

Ａｒａｍａｋｉ，Ｅ．，Ｓｈｉｋａｔａ，Ｓ．，Ｍｉｙａｂｅ，Ｍ．ａｎｄＫｉｎｏｓｈｉｔａ，Ａ．：Ｖｏｃａｂｕｌａｒｙｓｉｚｅｉｎｓｐｅｅｃｈｍａｙｂｅａｎｅａｒｌｙｉｎｄｉｃａｔｏｒｏｆｃｏｇ−ｎｉｔｉｖｅｉｍｐａｉｒｍｅｎｔ，ＰｌｏＳｏｎｅ，Ｖｏｌ．１１，Ｎｏ．５，ｐ．ｅ０１５５１９５（２０１６）Aramaki, E.I. , Shikata, S.K. , Miyabe, M.M. and Kinoshita, A. : Vocabulary size in speech may be an early indicator of cog-night impairment, PLOS one, Vol. 11, No. 5, p. e0155195 (2016) ＭｃＫｈａｎｎ，Ｇ．Ｍ．，Ｋｎｏｐｍａｎ，Ｄ．Ｓ．，Ｃｈｅｒｔｋｏｗ，Ｈ．，Ｈｙ−ｍａｎ，Ｂ．Ｔ．，Ｊａｃｋ，Ｃ．Ｒ．，Ｋａｗａｓ，Ｃ．Ｈ．，Ｋｌｕｎｋ，Ｗ．Ｅ．，Ｋｏｒｏｓｈｅｔｚ，Ｗ．Ｊ．，Ｍａｎｌｙ，Ｊ．Ｊ．ａｎｄＭａｙｅｕｘ，Ｒ．：ＴｈｅｄｉａｇｎｏｓｉｓｏｆｄｅｍｅｎｔｉａｄｕｅｔｏＡｌｚｈｅｉｍｅｒ’ｓｄｉｓｅａｓｅ：ＲｅｃｏｍｍｅｎｄａｔｉｏｎｓｆｒｏｍｔｈｅＮａｔｉｏｎａｌＩｎｓｔｉｔｕｔｅｏｎＡｇｉｎｇ−Ａｌｚｈｅｉｍｅｒ’ｓＡｓｓｏｃｉａｔｉｏｎｗｏｒｋｇｒｏｕｐｓｏｎｄｉａｇｎｏｓｔｉｃｇｕｉｄｅｌｉｎｅｓｆｏｒＡｌｚｈｅｉｍｅｒ’ｓｄｉｓｅａｓｅ，Ａｌｚｈｅｉｍｅｒ’ｓ＆ｄｅｍｅｎｔｉａ：ｔｈｅｊｏｕｒｎａｌｏｆｔｈｅＡｌｚｈｅｉｍｅｒ’ｓＡｓｓｏｃｉａｔｉｏｎ，Ｖｏｌ．７，Ｎｏ．３，ｐｐ．２６３｛２６９（２０１１）｝McKhann, G.M. M. , Knopman, D.I. S. , Chertkow, H. et al. , Hy-man, B. T. , Jack, C.I. R. , Kawas, C.I. H. , Krunk, W. et al. E. , Koroshetz, W. et al. J. , Many, J. et al. J. and Mayeux, R.M. : The diagnosis of dementia due to Alzheimer's disease: Recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease, Alzheimer's & dementia: the journal of the Alzheimer's Association, Vol .. 7, No. 3, pp. 263 {269 (2011)} Ｒｏａｒｋ，Ｂ．，Ｍｉｔｃｈｅｌｌ，Ｍ．，Ｈｏｓｏｍ，Ｊ．−Ｐ．，Ｈｏｌｌｉｎｇｓｈｅａｄ，Ｋ．ａｎｄＫａｙｅ，Ｊ．：Ｓｐｏｋｅｎｌａｎｇｕａｇｅｄｅｒｉｖｅｄｍｅａｓｕｒｅｓｆｏｒｄｅｔｅｃｔｉｎｇｍｉｌｄｃｏｇｎｉｔｉｖｅｉｍｐａｉｒｍｅｎｔ，ＩＥＥＥｔｒａｎｓａｃｔｉｏｎｓｏｎａｕｄｉｏ，ｓｐｅｅｃｈ，ａｎｄｌａｎｇｕａｇｅｐｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．１９，Ｎｏ．７，ｐｐ．２０８１｛２０９０（２０１１）｝Roark, B.I. , Mitchell, M.M. , Hosom, J. et al. -P. , Hollinghead, K.K. and Kaye, J.M. : Spoken language diverged massages for detecting milled cognitive impairment, IEEE transitions on audio, speech, and language. 19, No. 7, pp. 2081 {2090 (2011)} Ｔａｎａｋａ，Ｈ．，Ａｄａｃｈｉ，Ｈ．，Ｕｋｉｔａ，Ｎ．，Ｉｋｅｄａ，Ｍ．，Ｋａｚｕｉ，Ｈ．，Ｋｕｄｏ，Ｔ．ａｎｄＮａｋａｍｕｒａ，Ｓ．：ＤｅｔｅｃｔｉｎｇＤｅｍｅｎｔｉａＴｈｒｏｕｇｈＩｎｔｅｒａｃｔｉｖｅＣｏｍｐｕｔｅｒＡｖａｔａｒｓ，ＩＥＥＥｊｏｕｒｎａｌｏｆｔｒａｎｓｌａｔｉｏｎａｌｅｎｇｉｎｅｅｒｉｎｇｉｎｈｅａｌｔｈａｎｄｍｅｄｉｃｉｎｅ，Ｖｏｌ．５，ｐｐ．１｛１１（２０１７）｝Tanaka, H.M. , Adachi, H. et al. , Ukita, N.M. , Ikeda, M.I. , Kazui, H. et al. , Kudo, T.M. and Nakamura, S.A. : Detecting Dementia Through Interactive Computer Avatars, IEEE journal of transitional engineering in health and medicine, Vol. 5, pp. 1 {11 (2017)} Ｍｉｒｈｅｉｄａｒｉ，Ｂ．，Ｂｌａｃｋｂｕｒｎ，Ｄ．，Ｈａｒｋｎｅｓｓ，Ｋ．，Ｗａｌｋｅｒ，Ｔ．，Ｖｅｎｎｅｒｉ，Ａ．，Ｒｅｕｂｅｒ，Ｍ．ａｎｄＣｈｒｉｓｔｅｎｓｅｎ，Ｈ．：Ａｎａｖａｔａｒ−ｂａｓｅｄｓｙｓｔｅｍｆｏｒｉｄｅｎｔｉｆｙｉｎｇｉｎｄｉｖｉｄｕａｌｓｌｉｋｅｌｙｔｏｄｅｖｅｌｏｐｄｅｍｅｎｔｉａ，Ｐｒｏｃ．Ｉｎｔｅｒｓｐｅｅｃｈ２０１７，ｐｐ．３１４７｛３１５１（２０１７）｝Mirheidari, B.M. , Blackburn, D.I. , Harkness, K.K. , Walker, T.M. , Venneri, A. , Reuber, M.D. and Christensen, H. et al. : An avatar-based system for dementia, individuals likely to develop dementia, Proc. Interspeech 2017, pp. 3147 {3151 (2017)}

早期に認知機能の低下に起因する状態又は症状を検出するためには、定期的に、かつ、長期的にモニタリングする必要があるが、上記対話的に認知機能の低下に起因する状態又は症状を検出する場合、あらかじめ対策ができないようにする必要があった。 In order to detect the condition or symptom caused by the cognitive decline at an early stage, it is necessary to monitor it regularly and for a long period of time. When detecting, it was necessary to make it impossible to take measures in advance.

本発明は、認知機能の低下に起因する状態又は症状を予測するための装置、方法、プログラム及びシステムを提供することを目的とする。 An object of the present invention is to provide a device, a method, a program and a system for predicting a condition or symptom caused by a decrease in cognitive function.

（１）利用者に対し音声情報及び画像情報を伝達するための情報伝達部と、利用者音声及び／又は利用者画像を測定するための測定部と、上記測定部で得られた測定結果に基づいて、音声的特徴データ、言語的特徴データ及び画像的特徴データからなる群から選択される少なくとも１種の特徴データを作成するための特徴データ作成部と、上記特徴データ作成部で得られた特徴データから、特徴量を抽出するための特徴量抽出部と、上記特徴量抽出部で得られた少なくとも１種の特徴量に基づき、上記利用者に認知機能低下に起因する状態又は症状の傾向があるか否かを予測するように構成されている認知機能予測部とを備え、上記情報伝達部から伝達される情報は、非定型質問を含み、
上記非定型質問は、利用者の年齢に応じた過去のイベントに関する質問を含むことを特徴とする認知機能予測装置。
（２）利用者からの応答に基づき質問を作成するように構成されている質問作成部をさらに備え、上記非定型質問は、前記質問作成部にて作成された質問を含む（１）に記載の認知機能予測装置。
（３）上記特徴量抽出部は、上記測定部で得られた利用者音声に基づき上記特徴データ作成部で作成された音声的特徴データ及び言語的特徴データのうち少なくとも１種の特徴データに基づき、特徴量を抽出するように構成されている（１）又は（２）に記載の認知機能予測装置。
（４）上記特徴量抽出部は、上記測定部で得られた利用者画像に基づき上記特徴データ作成部で作成された視線パターン、フェイシャルアクションコーディングシステム及びフェイシャルランドマーク特徴からなる群より選択される少なくとも１種の画像的特徴データと、上記測定部で得られた利用者音声に基づき上記特徴データ作成部で作成された音声的特徴データ及び言語的特徴データのうち少なくとも１種の特徴データとに基づき、特徴量を抽出するように構成されている（１）又は（２）に記載の認知機能予測装置。
（５）上記認知機能予測部は、上記特徴量抽出部で抽出された特徴量に基づき、上記利用者に認知機能低下に起因する状態又は症状の傾向があるか否かを、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）、ロジスティック回帰分析、及び、深層学習からなる群から選択される少なくとも１つを用いて予測するように構成されている（１）〜（４）のいずれかに記載の認知機能予測装置。
（６）利用者に対し音声情報及び画像情報を伝達する情報伝達ステップと、利用者音声及び／又は利用者画像を測定する測定ステップと、上記測定部の測定結果に基づいて、音声的特徴データ、言語的特徴データ及び画像的特徴データからなる群から選択される少なくとも１種の特徴データを作成する特徴データ作成ステップと、上記特徴データ作成ステップで得られた特徴データから、特徴量を抽出する特徴量抽出ステップと、上記特徴量抽出部で得られた少なくとも１種の特徴量に基づき、上記利用者に認知機能低下に起因する状態又は症状の傾向があるか否かを予測する認知機能予測ステップとを備え、上記情報伝達部から伝達される情報は、非定型質問を含み、上記非定型質問は、利用者の年齢に応じた過去のイベントに関する質問を含むことを特徴とする認知機能予測方法。
（７）（６）に記載の認知機能予測方法をコンピュータに実行させるためのプログラム。
（８）利用者に対し音声情報及び画像情報を伝達するための情報伝達部と、利用者音声及び／又は利用者画像を測定するための測定部と、を備える利用者端末と、上記測定部の測定結果に基づいて、音声的特徴データ、言語的特徴データ及び画像的特徴データからなる群から選択される少なくとも１種の特徴データを作成するための特徴データ作成部と、上記特徴データ作成部で得られた特徴データから特徴量を抽出するための特徴量抽出部と、上記特徴量抽出部で抽出された少なくとも１種の特徴量に基づき上記利用者に認知機能低下に起因する状態又は症状の傾向があるか否かを予測するように構成されている認知機能予測部と、を備える認知機能予測装置とを有し、上記情報伝達部から伝達される情報は、非定型質問を含み、上記非定型質問は、利用者の年齢に応じた過去のイベントに関する質問を含むことを特徴とする認知機能予測システム。 (1) The information transmission unit for transmitting voice information and image information to the user, the measurement unit for measuring the user voice and / or the user image, and the measurement results obtained by the above measurement unit. Based on this, a feature data creation unit for creating at least one type of feature data selected from a group consisting of voice feature data, linguistic feature data, and image feature data, and a feature data creation section obtained by the above feature data creation section. Based on the feature amount extraction unit for extracting the feature amount from the feature data and at least one kind of feature amount obtained by the feature amount extraction unit, the tendency of the state or symptom caused by the cognitive decline to the user. The information transmitted from the information transmission unit includes a cognitive function prediction unit configured to predict whether or not there is an atypical question.
The above-mentioned atypical question is a cognitive function prediction device characterized in that it includes a question about a past event according to the age of the user.
(2) A question creation unit configured to create a question based on a response from a user is further provided, and the atypical question is described in (1) including a question created by the question creation unit. Cognitive function predictor.
(3) The feature amount extraction unit is based on at least one of the voice feature data and the linguistic feature data created by the feature data creation unit based on the user voice obtained by the measurement unit. , The cognitive function predictor according to (1) or (2), which is configured to extract a feature amount.
(4) The feature amount extraction unit is selected from a group consisting of a line-of-sight pattern, a facial action coding system, and a facial landmark feature created by the feature data creation unit based on the user image obtained by the measurement unit. At least one kind of image feature data, and at least one kind of feature data among the voice feature data and the linguistic feature data created by the feature data creation section based on the user voice obtained by the measurement section. The cognitive function predictor according to (1) or (2), which is configured to extract a feature amount based on the above.
(5) Based on the feature amount extracted by the feature amount extraction unit, the cognitive function prediction unit determines whether or not the user has a tendency of a state or symptom caused by cognitive decline by SVM (Support Vector). The cognitive function predictor according to any one of (1) to (4), which is configured to predict using at least one selected from the group consisting of Machine), logistic regression analysis, and deep learning.
(6) Voice feature data based on the information transmission step of transmitting voice information and image information to the user, the measurement step of measuring the user voice and / or the user image, and the measurement result of the measurement unit. , The feature amount is extracted from the feature data creation step for creating at least one kind of feature data selected from the group consisting of linguistic feature data and image feature data, and the feature data obtained in the above feature data creation step. Cognitive function prediction that predicts whether or not the user is prone to a condition or symptom caused by cognitive decline based on the feature amount extraction step and at least one kind of feature amount obtained by the feature amount extraction unit. The information transmitted from the information transmission unit includes atypical questions, and the atypical questions include questions about past events according to the age of the user. Cognitive function prediction. Method.
(7) A program for causing a computer to execute the cognitive function prediction method described in (6).
(8) A user terminal including an information transmission unit for transmitting voice information and image information to a user, and a measurement unit for measuring user voice and / or user image, and the measurement unit. A feature data creation unit for creating at least one type of feature data selected from a group consisting of voice feature data, linguistic feature data, and image feature data based on the measurement results of the above, and the above-mentioned feature data creation section. A condition or symptom caused by cognitive decline in the user based on the feature amount extraction unit for extracting the feature amount from the feature data obtained in the above and at least one kind of feature amount extracted by the feature amount extraction unit. The information transmitted from the information transmission unit includes a cognitive function prediction unit configured to predict whether or not there is a tendency of the above information transmission unit, and a cognitive function prediction device including the cognitive function prediction unit. The above atypical question is a cognitive function prediction system characterized by including questions about past events according to the age of the user.

本発明によれば、早期の認知機能の低下に起因する状態又は症状予測を簡便に行うことができる。 According to the present invention, it is possible to easily predict a state or a symptom caused by an early deterioration of cognitive function.

本発明の一実施形態の認知機能予測装置を説明するための図である。It is a figure for demonstrating the cognitive function prediction apparatus of one Embodiment of this invention. 本発明の一実施形態における情報伝達部及び測定部を説明するための図である。It is a figure for demonstrating the information transmission part and the measurement part in one Embodiment of this invention. 本発明の一実施形態の認知機能予測システムを説明するための図である。It is a figure for demonstrating the cognitive function prediction system of one Embodiment of this invention. 本発明の一実施形態の認知機能予測方法を説明するためのフローチャートである。It is a flowchart for demonstrating the cognitive function prediction method of one Embodiment of this invention. 本発明の一実施形態の認知機能予測装置に用いられる質問の例示である。It is an example of a question used in the cognitive function predictor of one embodiment of the present invention. 制御部における特徴量抽出部が抽出する特徴量を示した表である。It is a table which showed the feature amount extracted by the feature amount extraction part in the control part. 本発明の一実施形態の認知機能予測装置における認知機能の低下に起因する状態又は症状の１種である認知症の判定精度の検証結果について示したグラフである。図７（ａ）は、判定方法としてＳＶＭを採用して認知症を予測する場合の予測精度の検証結果を示したグラフであり、図７（ｂ）は、ロジスティック回帰分析を採用して認知症を予測する場合の予測精度の検証結果を示したグラフである。It is a graph which showed the verification result of the determination accuracy of dementia which is one kind of the state or symptom caused by the deterioration of the cognitive function in the cognitive function prediction apparatus of one Embodiment of this invention. FIG. 7 (a) is a graph showing the verification result of the prediction accuracy when SVM is adopted as the determination method to predict dementia, and FIG. 7 (b) is a graph showing the verification result of the prediction accuracy when dementia is predicted by adopting the logistic regression analysis. It is a graph which showed the verification result of the prediction accuracy in the case of predicting. 本発明の一実施形態で用いられる画像的特徴データのみを用いた認知機能の低下に起因する状態又は症状の１種である認知症の判定精度の検証結果について示したグラフである。It is a graph which showed the verification result of the determination accuracy of dementia which is one kind of the state or symptom caused by the deterioration of the cognitive function using only the image feature data used in one Embodiment of this invention.

以下、本発明の実施形態について図面を参照しながら詳細に説明する。なお、本発明は以下の記述のみに限定されるものではなく、本発明の要旨を逸脱しない範囲において適宜変更可能である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. The present invention is not limited to the following description, and can be appropriately modified without departing from the gist of the present invention.

［認知機能予測装置］
図１は、本発明の一実施形態に係る認知機能予測装置を説明するための図である。
図１に示すように、認知機能予測装置１０は、情報伝達部１１、測定部１２、制御部１３、外部情報入力部２１及びデータベース３１を備える。制御部１３は、特徴データ作成部１３１、特徴量抽出部１３２、認知機能予測部１３３及び質問作成部１３４を備える。なお、本実施形態に係る認知機能予測装置１０は、利用者と対話を行うように構成されており、その対話における利用者の応答に基づいて、当該利用者が認知機能の低下に起因する状態又は症状であるか否かを予測するように構成されている。 [Cognitive function predictor]
FIG. 1 is a diagram for explaining a cognitive function prediction device according to an embodiment of the present invention.
As shown in FIG. 1, the cognitive function prediction device 10 includes an information transmission unit 11, a measurement unit 12, a control unit 13, an external information input unit 21, and a database 31. The control unit 13 includes a feature data creation unit 131, a feature amount extraction unit 132, a cognitive function prediction unit 133, and a question creation unit 134. The cognitive function prediction device 10 according to the present embodiment is configured to have a dialogue with the user, and the user is in a state caused by a deterioration of the cognitive function based on the response of the user in the dialogue. Or it is configured to predict whether it is a symptom or not.

情報伝達部１１は、認知機能予測装置１０の利用者（以下、単に「利用者」と記す）に対し、音声情報及び画像情報を伝達するためのものであり、例えば、ディスプレイ等の表示部１１１とスピーカ等の音声出力部１１２で構成される。ここで、情報伝達部１１の一例について図面を参照して説明する。図２は、情報伝達部及び測定部の一例を示す模式図である。 The information transmission unit 11 is for transmitting voice information and image information to the user of the cognitive function prediction device 10 (hereinafter, simply referred to as “user”), and is, for example, a display unit 111 such as a display. And an audio output unit 112 such as a speaker. Here, an example of the information transmission unit 11 will be described with reference to the drawings. FIG. 2 is a schematic view showing an example of an information transmission unit and a measurement unit.

図２に示すように、情報伝達部１１は、画像を表示する表示部１１１と、音声を出力する音声出力部１１２とを備える。表示部１１１は、例えば、人を模した画像（アバター）１１１ａ及び／又はテキスト１１１ｂを表示することができる。情報伝達部１１は、利用者と対話を行うために、利用者に対する質問を音声出力部１１２において音声として出力することができる。また、音声出力部１１２により出力される音声情報に合わせて、表示部１１１に表示されるアバター１１１ａを動かすことができる。そのため、表示部１１１にアバター１１１ａが表示される場合、音声出力部１１２により出力される音声情報に合わせて、表示部１１１に表示されるアバター１１１ａの口や表情が動くように構成することができ、利用者がアバターと対話しているような環境を作り出すことができる。これにより、認知機能予測装置１０を利用し、情報伝達部１１より伝達される質問に回答する利用者の違和感や緊張を緩和することができる。 As shown in FIG. 2, the information transmission unit 11 includes a display unit 111 for displaying an image and an audio output unit 112 for outputting audio. The display unit 111 can display, for example, an image (avatar) 111a and / or text 111b that imitates a person. The information transmission unit 11 can output a question to the user as a voice in the voice output unit 112 in order to have a dialogue with the user. Further, the avatar 111a displayed on the display unit 111 can be moved according to the voice information output by the voice output unit 112. Therefore, when the avatar 111a is displayed on the display unit 111, the mouth and facial expression of the avatar 111a displayed on the display unit 111 can be configured to move according to the voice information output by the voice output unit 112. , You can create an environment where the user is interacting with the avatar. As a result, it is possible to alleviate the discomfort and tension of the user who answers the question transmitted from the information transmission unit 11 by using the cognitive function prediction device 10.

また、表示部１１１に表示されるアバター１１１ａは、利用者の好みに合わせて設定することができる。例えば、利用者の好みに合わせて、家族、ペット、友人、有名人、動物及びキャラクター等を模した画像をアバターとして設定することができる。このように利用者の好みに応じたアバター１１１ａを表示部１１１に表示させることで、利用者による認知機能予測装置１０の定期的な利用が促進され、早期段階で認知機能の低下に起因する状態又は症状を予測することができる。 Further, the avatar 111a displayed on the display unit 111 can be set according to the user's preference. For example, an image imitating a family member, a pet, a friend, a celebrity, an animal, a character, or the like can be set as an avatar according to the user's preference. By displaying the avatar 111a according to the user's preference on the display unit 111 in this way, the regular use of the cognitive function prediction device 10 by the user is promoted, and the state caused by the deterioration of the cognitive function at an early stage. Or the symptom can be predicted.

また、音声出力部１１２から出力される音声は、利用者の好みに合わせて設定することができる。例えば、声の高さ低さの変更、声色、話し方、スピード、音量等を設定することができる。これにより、利用者の好みに応じたアバター１１１ａに最適な音声を設定することができ、利用者の違和感や緊張をより緩和し、聞き取りやすさを向上することができる。 Further, the voice output from the voice output unit 112 can be set according to the user's preference. For example, it is possible to change the pitch of the voice, set the voice color, the way of speaking, the speed, the volume, and the like. As a result, the optimum voice can be set for the avatar 111a according to the user's preference, the discomfort and tension of the user can be further alleviated, and the ease of listening can be improved.

なお、表示部１１１に表示される画像データ及び音声出力部１１２より出力される音声データは、制御部１３から与えられる。
例えば、利用者は外部情報入力部２１及び／又は集音部１２１から、好みの画像データや音声データを入力することができ、該画像データ及び音声データはデータベース３１に保管される。制御部１３は、データベース３１に保管されたデータを読み出し、表示部１１１に画像データを与え、音声出力部１１２に音声データを与えることができる。 The image data displayed on the display unit 111 and the audio data output from the audio output unit 112 are given by the control unit 13.
For example, the user can input favorite image data and audio data from the external information input unit 21 and / or the sound collecting unit 121, and the image data and the audio data are stored in the database 31. The control unit 13 can read the data stored in the database 31, give the image data to the display unit 111, and give the audio data to the audio output unit 112.

測定部１２は、例えばマイクロフォン等で構成される集音部１２１と、例えばＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）イメージセンサやＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）イメージセンサ等で構成される撮像部１２２とを備える。集音部１２１は、利用者が発する音声を集音するために設けられており、撮像部１２２は利用者の表情等を撮像するために設けられている。例えば、集音部１２１は、利用者が図２に示す表示部１１１と向かい合った場合に、利用者の音声を集音し易い位置に設けられることが好ましい。撮像部１２２も同様であり、利用者が図２に示す表示部１１１と向かい合った場合に、利用者の表情等を撮像し易い位置に設けられることが好ましい。具体的に例えば、集音部１２１及び撮像部１２２は、表示部１１１の周囲のベゼル等に設けられることが好ましい。 The measuring unit 12 includes, for example, a sound collecting unit 121 composed of a microphone or the like, and an imaging unit 122 composed of, for example, a CCD (Charge Coupled Device) image sensor, a CMOS (Complementary Metal Oxide Sensor) image sensor, or the like. The sound collecting unit 121 is provided to collect the sound emitted by the user, and the imaging unit 122 is provided to capture the facial expression of the user and the like. For example, the sound collecting unit 121 is preferably provided at a position where the user's voice can be easily collected when the user faces the display unit 111 shown in FIG. The same applies to the image pickup unit 122, and it is preferable that the image pickup unit 122 is provided at a position where it is easy to image the user's facial expression or the like when the user faces the display unit 111 shown in FIG. Specifically, for example, it is preferable that the sound collecting unit 121 and the imaging unit 122 are provided on the bezel or the like around the display unit 111.

図１に示す外部情報入力部２１は、所定の入力操作が可能なものであればよく、例えば、マウス、キーボード、タッチパネル等の入力インターフェースが挙げられる。 The external information input unit 21 shown in FIG. 1 may be any as long as it can perform a predetermined input operation, and examples thereof include an input interface such as a mouse, a keyboard, and a touch panel.

制御部１３は、特徴データ作成部１３１、特徴量抽出部１３２及び認知機能予測部１３３を備えている。制御部１３は、情報伝達部１１が利用者に情報を伝達するために必要なデータ（表示部１１１に表示される画像データ及び音声出力部１１２から出力される音声データ）を情報伝達部１１に与えることができる。また、制御部１３は、測定部１２の測定結果（集音部１２１から得られる音声データ及び／又は撮像部１２２から得られる画像のデータ）に基づいて、特徴データ作成部１３１において、音声的特徴データ、言語的特徴データ及び画像的特徴データからなる群から選択される少なくとも１種の特徴データを作成する。また、制御部１３における特徴量抽出部１３２において、特徴データ作成部１３１から得られた特徴データから特徴量を抽出し、認知機能予測部１３３において、特徴量抽出部１３２で得られた少なくとも１種の特徴量に基づき、利用者に認知機能の低下に起因する状態又は症状の傾向があるか否かを予測する。 The control unit 13 includes a feature data creation unit 131, a feature amount extraction unit 132, and a cognitive function prediction unit 133. The control unit 13 transmits data necessary for the information transmission unit 11 to transmit information to the user (image data displayed on the display unit 111 and audio data output from the audio output unit 112) to the information transmission unit 11. Can be given. Further, the control unit 13 is a feature data creation unit 131 based on the measurement result of the measurement unit 12 (audio data obtained from the sound collecting unit 121 and / or image data obtained from the imaging unit 122). Create at least one feature data selected from the group consisting of data, linguistic feature data and image feature data. Further, the feature amount extraction unit 132 of the control unit 13 extracts the feature amount from the feature data obtained from the feature data creation unit 131, and the cognitive function prediction unit 133 obtains at least one type obtained by the feature amount extraction unit 132. Based on the feature amount of, it is predicted whether or not the user has a tendency of a condition or a symptom due to a decrease in cognitive function.

制御部１３は、いわゆるコンピュータであり、図示しないＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等の演算装置と、データメモリ、データ蓄積用のメモリ及びワーキングメモリといった公知の構成を備えている。特徴データ作成部１３１、特徴量抽出部１３２及び認知機能予測部１３３は、各々コンピュータ上で動作して所定の機能を発揮するハードウェアとプログラムとを含むものである。 The control unit 13 is a so-called computer, and includes an arithmetic unit such as a CPU (Central Processing Unit) (not shown), and a known configuration such as a data memory, a memory for storing data, and a working memory. The feature data creation unit 131, the feature amount extraction unit 132, and the cognitive function prediction unit 133 each include hardware and a program that operate on a computer and exert a predetermined function.

上記制御部１３では、利用者が理解しやすいように、又は、利用者の好みに応じて、音声出力部１１２から出力される音声情報の音量及び／又は速度を調節することができる。また、上記制御部１３では、利用者の選択により、音声情報にテキスト１１１ｂを付与することができる。 The control unit 13 can adjust the volume and / or speed of the voice information output from the voice output unit 112 so that the user can easily understand it or according to the user's preference. Further, in the control unit 13, the text 111b can be added to the voice information by the user's selection.

データベース３１は、例えばハードディスクや半導体メモリなどの記録装置で構成され、上記の制御部１３が行う各種処理に必要なデータ（例えば、利用者に対する質問のデータ、アバター１１１ａを表示するためのデータなど）を記録している。 The database 31 is composed of a recording device such as a hard disk or a semiconductor memory, and data required for various processes performed by the control unit 13 (for example, data for asking a user, data for displaying an avatar 111a, etc.). Is being recorded.

ここで、情報伝達部１１は、利用者に対し、非定型質問を含む情報を伝達するように構成されている。本明細書において「非定型質問」とは、利用者が認知機能予測装置を利用するたびに必ず質問される定型質問以外の質問である。
なお、定型質問としては、例えば、長谷川式認知症スケール（ＨＤＳ−Ｒ）及びＭＭＳＥ等の神経心理検査に含まれる日時、場所、計算等の質問が挙げられる。また、非定型質問としては、あらかじめ定められた複数の質問からランダムに選択される質問、利用者の年齢に応じた過去のイベントに関する質問、利用者の応答に基づき作成された質問等が含まれる。情報伝達部１１から利用者に対し伝達される非定型質問には、利用者の年齢に応じた過去のイベントに関する質問が必ず含まれるように構成されている。また、非定型質問は、少なくとも１０問から構成される非定型質問セットからランダムに３問選択される非定型質問セットであってもよい。 Here, the information transmission unit 11 is configured to transmit information including an atypical question to the user. In the present specification, the "atypical question" is a question other than the fixed question that is always asked every time the user uses the cognitive function prediction device.
Examples of standard questions include questions such as date and time, place, and calculation included in neuropsychological tests such as the Hasegawa Dementia Scale (HDS-R) and the MMSE. In addition, the atypical questions include questions randomly selected from a plurality of predetermined questions, questions about past events according to the user's age, questions created based on the user's response, and the like. .. The atypical question transmitted from the information transmission unit 11 to the user is configured to always include a question about a past event according to the age of the user. Further, the atypical question may be an atypical question set in which three questions are randomly selected from an atypical question set composed of at least 10 questions.

ここで、利用者の年齢に応じた過去のイベントに関する質問は、利用者が１０〜４０代だった頃のイベント（例えば、当時の出来事、事件、行事、事象、流行等）であることが好ましい。上記非定型質問に利用者が１０〜４０代だった頃の過去のイベントに関する質問を含むことで、認知機能の低下に起因する状態又は症状の予測精度が向上する可能性があるためである。
なお、利用者の年齢は、利用者の認知機能予測装置１０の利用時の年齢を、外部情報入力部２１や、測定部１２における集音部１２１、撮像部１２２を介して、制御部１３で利用毎に確認することとしてもよい。また、認知機能予測装置１０の利用開始時の初期設定において、予め利用者の年齢が外部情報入力部２１や、測定部１２における集音部１２１、撮像部１２２を介して入力され、データベース３１に保管されている利用者初期情報に基づき、制御部１３にて利用時の利用者年齢を算出するものとしてもよい。
利用者の年齢に応じた過去のイベントに関する質問は、制御部１３が上記のように確認又は算出した利用者年齢に基づき、予めデータベースに保管されている過去のイベントから利用者の年齢に基づき選択されるものであってよい。また、認知機能予測装置１０がインターネット等のネットワーク回線等に接続可能である場合は、制御部１３は、確認又は算出された利用者の年齢に基づき、利用者が特定の年代である過去のイベントに関する質問を、ウェブ情報や外部データベースから取得した情報から選択、作成することができる。 Here, the question about the past event according to the age of the user is preferably an event when the user was in his 10s to 40s (for example, an event, an incident, an event, an event, a fashion, etc. at that time). .. This is because including questions about past events when the user was in his 10s and 40s in the above atypical questions may improve the prediction accuracy of the condition or symptom caused by the deterioration of cognitive function.
As for the age of the user, the age at the time of using the cognitive function prediction device 10 of the user is determined by the control unit 13 via the external information input unit 21, the sound collecting unit 121 in the measuring unit 12, and the imaging unit 122. You may check each time you use it. Further, in the initial setting at the start of use of the cognitive function prediction device 10, the age of the user is input in advance via the external information input unit 21, the sound collecting unit 121 in the measuring unit 12, and the imaging unit 122, and is input to the database 31. Based on the stored user initial information, the control unit 13 may calculate the user age at the time of use.
Questions about past events according to the user's age are selected from past events stored in the database in advance based on the user's age, based on the user's age confirmed or calculated by the control unit 13 as described above. It may be something that is done. Further, when the cognitive function prediction device 10 can be connected to a network line such as the Internet, the control unit 13 has a past event in which the user is in a specific age based on the confirmed or calculated age of the user. Questions about can be selected and created from web information and information obtained from external databases.

また、情報伝達部１１は、日時、場所及び人間関係から選択される少なくとも１つの定型質問と、利用者が１０〜４０代だった頃のイベントに関する少なくとも１つの非定型質問とを伝達するように構成されていることが好ましい。定型質問のうち、上述の日時、場所及び人間関係に関する質問は、見当識及び短期記憶（即時記憶）を基に回答される質問である。また、利用者が１０〜４０代だった頃のイベントに関する質問は、長期記憶（遠隔記憶）を基に回答される質問である。認知機能の低下に起因する状態又は症状の進行に伴い、短期記憶の低下に続き、長期記憶が低下することが知られており、これら二つの質問を組み合わせて利用者に質問することで、認知機能の低下に起因する状態又は症状の初期症状の進行度を図ることができる。 In addition, the information transmission unit 11 transmits at least one fixed question selected from the date and time, place, and human relations, and at least one atypical question about the event when the user was in his 10s and 40s. It is preferably configured. Of the standard questions, the above-mentioned questions regarding date and time, place, and relationships are questions that are answered based on orientation and short-term memory (immediate memory). In addition, questions about events when users were in their 10s and 40s are questions that are answered based on long-term memory (remote memory). It is known that long-term memory declines following short-term memory decline as the condition or symptom progresses due to cognitive decline. By combining these two questions and asking the user, cognition The degree of progression of the initial symptoms of the condition or symptom caused by the deterioration of function can be measured.

制御部１３は、利用者からの応答に基づき自由質問を作成するように構成されている質問作成部１３４をさらに備えることができる。質問作成部１３４は、特徴データ作成部１３１、特徴量抽出部１３２及び認知機能予測部１３３と同様に、コンピュータ上で動作して所定の機能を発揮するハードウェアとプログラムとを含むものである。
なお、質問作成部１３４で作成される質問（以下、単に自由質問と記載する。）は、利用者の応答に基づき作成される質問でもあってもよく、予め準備されている質問の中から、利用者の応答に基づき選択される質問であってもよい。例えば、上記自由質問は、制御部１３における質問作成部１３４が、予めデータベースに保管されている複数の質問から利用者の応答に基づき選択した質問であってもよく、利用者からの応答に基づき、予めデータベースに保管されているデータを組み合わせて作成された質問であってもよい。また、認知機能予測装置１０がインターネット等のネットワーク回線等に接続可能である場合、上記自由質問は、質問作成部１３４において利用者の応答に基づき、ウェブ情報や外部データベースから取得した情報から選択、作成された質問であってもよい。
情報伝達部１１において、利用者からの応答に基づく自由質問を伝達することで、利用者は、認知機能予測装置１０の利用にあたり、アバター１１１ａと対話をしている感覚を得ることができ、緊張感がほぐれ、自然な状態での利用者を測定することができ、認知機能予測の精度が向上するためである。 The control unit 13 can further include a question creation unit 134 configured to create a free question based on a response from the user. The question creation unit 134, like the feature data creation unit 131, the feature amount extraction unit 132, and the cognitive function prediction unit 133, includes hardware and a program that operate on a computer and exert a predetermined function.
The question created by the question creation unit 134 (hereinafter, simply referred to as a free question) may be a question created based on the response of the user, and is selected from the questions prepared in advance. The question may be selected based on the user's response. For example, the free question may be a question selected by the question creation unit 134 in the control unit 13 based on the user's response from a plurality of questions stored in the database in advance, and is based on the user's response. , The question may be created by combining the data stored in the database in advance. When the cognitive function prediction device 10 can be connected to a network line such as the Internet, the above-mentioned free question is selected from web information and information acquired from an external database based on the user's response in the question creation unit 134. It may be a created question.
By transmitting a free question based on the response from the user in the information transmission unit 11, the user can get a feeling of interacting with the avatar 111a when using the cognitive function prediction device 10, and is nervous. This is because the feeling is relaxed, the user can be measured in a natural state, and the accuracy of cognitive function prediction is improved.

なお、利用者の趣味や興味等に関する情報を、利用者の認知機能予測装置１０の利用時、又は、認知機能予測装置１０の初期設定時に、外部情報入力部２１や、測定部１２における集音部１２１、撮像部１２２を介して入手し、制御部１３から、データベース３１に保管しておいてもよい。質問作成部１３４において自由質問が作成される際に、データベース３１から利用者の趣味や興味のある情報を読み出し、自由質問の分野を利用者の趣味や興味のある分野に特定することにより、認知機能予測装置１０から伝達される質問に対し、利用者の関心が高まり、定期的な認知機能予測装置１０の利用を促進することができる。これにより、早期認知機能予測が可能となる。
また、認知機能予測部１３３において、利用者の趣味や興味のある分野に関する質問と、そうではない分野に関する質問に対する特徴量の差を検出することにより、より高精度に認知機能の低下に起因する状態又は症状を予測することができる可能性がある。 Information on the user's hobbies, interests, etc. is collected by the external information input unit 21 and the measurement unit 12 when the user's cognitive function prediction device 10 is used or when the cognitive function prediction device 10 is initially set. It may be obtained via the unit 121 and the imaging unit 122 and stored in the database 31 from the control unit 13. When a free question is created in the question creation unit 134, it is recognized by reading out information on the user's hobbies and interests from the database 31 and specifying the field of the free question as the user's hobby and interest field. The user's interest in the question transmitted from the function prediction device 10 is increased, and the regular use of the cognitive function prediction device 10 can be promoted. This enables early prediction of cognitive function.
In addition, the cognitive function prediction unit 133 detects the difference in the feature amount between the question about the user's hobby or interest field and the question about the other field, which is caused by the deterioration of the cognitive function with higher accuracy. It may be possible to predict the condition or symptoms.

制御部１３における特徴量抽出部１３２は、測定部１２で得られた利用者音声に基づき特徴データ作成部１３１で作成された音声的特徴データ及び言語的特徴データのうち少なくとも１種の特徴データに基づき、特徴量を抽出するように構成されていることが好ましい。また、制御部１３における特徴量抽出部１３２は、測定部１２で得られた利用者音声に基づき特徴データ作成部１３１で作成された音声的特徴データ及び言語的特徴データに基づき特徴量を抽出するように構成されていることがより好ましい。言い換えると、音声的特徴データに基づく特徴量と言語的特徴データに基づく特徴量との２種以上が抽出されることが好ましい。
特徴データ作成部１３１で作成された音声的特徴データ及び言語的特徴データに基づき特徴量を抽出することにより、認知機能の低下に起因する状態又は症状を有さない利用者と、軽度の認知機能の低下に起因する状態又は症状を有する利用者とを高精度で分類することができるためである。特に認知機能の低下に起因する状態又は症状が認知症である場合には、非認知症の利用者と経度認知障害（ＭＣＩ）の利用者とを高精度（９０％以上）で分類することができる。
なお、上記音声的特徴データから抽出される特徴量としては、ピッチ（基本周波数、声の高さ）、声量（パワー）、声質、反応時間、ポーズ（発話間隔）等が挙げられる。
また、上記言語的特徴データから抽出される特徴量としては、トークン数（形態素）数、フィラー、タイプトークン比（ＴＴＲ（ＴｙｐｅＴｏｋｅｎＲａｔｉｏ））、品詞情報（名詞、動詞、形容詞及び副詞の数等）、構文の複雑さ、語彙の選定及び語彙の難易度等が挙げられる。なお、上記タイプトークン比とは、利用者が応答時に発する音声に含まれる単語の総数であるトークン数と、上記利用者が応答時に発する音声に含まれる重複を許さない単語の総数であるタイプ数を前記トークン数で除した値である。 The feature amount extraction unit 132 in the control unit 13 converts the feature data into at least one of the voice feature data and the linguistic feature data created by the feature data creation unit 131 based on the user voice obtained by the measurement unit 12. Based on this, it is preferable that the feature amount is extracted. Further, the feature amount extraction unit 132 of the control unit 13 extracts the feature amount based on the voice feature data and the linguistic feature data created by the feature data creation unit 131 based on the user voice obtained by the measurement unit 12. It is more preferable that it is configured as such. In other words, it is preferable that two or more types of a feature amount based on the phonetic feature data and a feature amount based on the linguistic feature data are extracted.
By extracting the feature amount based on the voice feature data and the linguistic feature data created by the feature data creation unit 131, a user who does not have a state or a symptom due to a decrease in cognitive function and a user who has a mild cognitive function. This is because it is possible to classify users who have a condition or symptom caused by a decrease in the amount of data with high accuracy. In particular, when the condition or symptom caused by cognitive decline is dementia, it is possible to classify non-dementia users and longitude cognitive impairment (MCI) users with high accuracy (90% or more). it can.
Examples of the feature amount extracted from the voice feature data include pitch (basic frequency, pitch), voice volume (power), voice quality, reaction time, pause (speech interval), and the like.
The feature quantities extracted from the linguistic feature data include the number of tokens (morphemes), fillers, type-token ratio (TTR (Type Taken Radio)), part of speech information (number of nouns, verbs, adjectives, adverbs, etc.). ), Syntax complexity, vocabulary selection and vocabulary difficulty. The type-token ratio is the number of tokens, which is the total number of words included in the voice uttered by the user when responding, and the number of types, which is the total number of words included in the voice uttered by the user when the user does not allow duplication. Is divided by the number of tokens.

制御部１３における特徴量抽出部１３２では、測定部１２で得られた利用者画像に基づき特徴データ作成部１３１で作成された視線パターン、フェイシャルアクションコーディングシステム（ＦＡＣＳ）及びフェイシャルランドマーク特徴からなる群より選択される少なくとも１種の画像的特徴データと、測定部１２で得られた利用者音声に基づき特徴データ作成部１３１で作成された音声的特徴データ及び言語的特徴データのうち少なくとも１種の特徴データとに基づき、特徴量が抽出されるように構成されていることが好ましい。この場合、画像的特徴データから抽出される特徴量と、音声的特徴データ及び／又は言語的特徴データから抽出される特徴量との計２種以上が抽出される。また、測定部１２で得られた利用者画像に基づき特徴データ作成部で作成された上記画像的特徴データと、測定部１２で得られた利用者音声に基づき特徴データ作成部１３１で作成された音声的特徴データ及び言語的特徴データとに基づき、特徴量が抽出されるように構成されていることがより好ましい。この場合、画像的特徴データから抽出される特徴量、音声的特徴データから抽出される特徴量及び言語的特徴データから抽出される特徴量の計３種以上が抽出される。
音声的特徴データ及び／又は言語的特徴データに加え、画像的特徴データから、特徴量を抽出することにより、より高精度な認知機能予測が可能となるためである。また、利用者によっては、音声的特徴データにおいて外部雑音などのノイズが含まれるが、画像的特徴データから抽出される特徴量を併せて用いることにより、認知機能予測の精度を向上できる可能性があるためである。 The feature amount extraction unit 132 of the control unit 13 is a group consisting of a line-of-sight pattern, a facial action coding system (FACS), and a facial landmark feature created by the feature data creation unit 131 based on the user image obtained by the measurement unit 12. At least one of the at least one image feature data selected from the above, and the audio feature data and the linguistic feature data created by the feature data creation unit 131 based on the user voice obtained by the measurement unit 12. It is preferable that the feature amount is extracted based on the feature data. In this case, a total of two or more types, a feature amount extracted from the image feature data and a feature amount extracted from the phonetic feature data and / or the linguistic feature data, are extracted. Further, the feature data creation unit 131 is created based on the above-mentioned image feature data created by the feature data creation unit based on the user image obtained by the measurement unit 12 and the user voice obtained by the measurement unit 12. It is more preferable that the feature amount is extracted based on the phonetic feature data and the linguistic feature data. In this case, a total of three or more types of the feature amount extracted from the image feature data, the feature amount extracted from the audio feature data, and the feature amount extracted from the linguistic feature data are extracted.
This is because it is possible to predict the cognitive function with higher accuracy by extracting the feature amount from the image feature data in addition to the voice feature data and / or the linguistic feature data. In addition, depending on the user, noise such as external noise is included in the audio feature data, but there is a possibility that the accuracy of cognitive function prediction can be improved by using the feature amount extracted from the image feature data together. Because there is.

なお、画像的特徴データは、公知の画像処理方法及び／又は画像認識方法を用いて作成されるが、例えば、インテル（登録商標）オープンＣＶ（ＩｎｔｅｌＯｐｅｎＳｏｕｒｃｅＣｏｍｐｕｔｅｒＶｉｓｉｏｎＬｉｂｒａｒｙ）や、フリーのツールＯｐｅｎＦａｃｅ等を利用して作成してもよい。また、例えば、オープンＣＶ等に登録されているオブジェクト検出プログラムを用いること等により顔認識プログラムを作成し、画像処理及び／又は画像認識を行ってもよい。なお、画像認識プログラムに必ずしもオープンＣＶを利用しなくてもよいし、既存のプログラムや、既存の画像認識回路を搭載したチップを利用してもよい。 The image feature data is created by using a known image processing method and / or image recognition method. For example, Intel® OpenCV (Intel Open Source Computer Vision Library) or a free tool OpenFace Etc. may be used to create the image. Further, for example, a face recognition program may be created by using an object detection program registered in OpenCV or the like, and image processing and / or image recognition may be performed. It is not always necessary to use the open CV for the image recognition program, or an existing program or a chip equipped with an existing image recognition circuit may be used.

また、特徴量抽出部１３２において特徴量を抽出するにあたって、個人差を吸収するために最尤線形回帰法（ＭＬＬＲ：ＭａｘｉｍｕｍＬｉｋｅｌｉｈｏｏｄＬｉｎｅａｒＲｅｇｒｅｓｓｉｏｎ）といった手段により、特徴量を更新してもよい。これは、個人の少量の特徴、及びこれまでに収集した多数の人物の特徴から、特徴量分布を最尤推定により個人向けに適応的に変更する方法である。特徴量を抽出するにあたって、個人差を吸収するための処理を行うことにより、より高精度な認知機能予測が可能となるためである。 Further, when the feature amount is extracted by the feature amount extraction unit 132, the feature amount may be updated by a means such as a maximum likelihood linear regression method (MLLR: Maximum Likelihood Linear Regression) in order to absorb individual differences. This is a method of adaptively changing the feature distribution for an individual by maximum likelihood estimation from the characteristics of a small amount of an individual and the characteristics of a large number of people collected so far. This is because it is possible to predict the cognitive function with higher accuracy by performing a process for absorbing individual differences in extracting the feature amount.

情報伝達部１１は、制御部１３における認知機能予測部１３３で得られた利用者に対する認知機能の低下に起因する状態又は症状の予測結果（認知機能予測結果）を表示することができる。
この場合、制御部１３は、認知機能予測部１３３で得られた認知機能予測結果を情報伝達部１１に送るように構成されている。 The information transmission unit 11 can display the prediction result (cognitive function prediction result) of the state or symptom caused by the deterioration of the cognitive function for the user obtained by the cognitive function prediction unit 133 in the control unit 13.
In this case, the control unit 13 is configured to send the cognitive function prediction result obtained by the cognitive function prediction unit 133 to the information transmission unit 11.

また、制御部１３は、図示しないが、例えば、プリンター等の印刷装置、インターネットやＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）に接続可能な通信装置等を介して認知機能予測部１３３で得られた認知機能予測結果を印刷又はデータ送信することができる。
また例えば、制御部１３は、認知機能予測部１３３で得られた認知機能予測結果を、通信手段を介して、利用者の親族やかかりつけ医療機関等に対して送信することができる。
上記印刷装置及び通信装置等は、本発明の認知機能予測装置１０に含まれていてもよく、通信回線を介して本発明の認知機能予測装置１０に繋がれていてもよい。 Although not shown, the control unit 13 is a cognitive function prediction result obtained by the cognitive function prediction unit 133 via, for example, a printing device such as a printer, a communication device that can be connected to the Internet or a LAN (Local Area Network), or the like. Can be printed or data transmitted.
Further, for example, the control unit 13 can transmit the cognitive function prediction result obtained by the cognitive function prediction unit 133 to the relatives of the user, the family medical institution, or the like via the communication means.
The printing device, communication device, and the like may be included in the cognitive function prediction device 10 of the present invention, or may be connected to the cognitive function prediction device 10 of the present invention via a communication line.

また、本発明の実施形態は、上記した認知機能予測装置１０の構成を、例えば、利用者が使用するパソコン、スマホ、タブレット、テレビ等で実現してもよい。また、本発明の実施形態は上述の認知機能予測装置１０のようにその構成要素の全てを一体の装置で構成されたものに限定されるものではなく、複数の装置で構成されたものであってもよい。例えば、情報伝達部１１と測定部１２とを備える利用者端末と、測定部１２で得られた測定結果を演算処理して認知機能の低下に起因する状態又は症状を予測する認知機能予測装置とを別体とする構成であってもよい。 Further, in the embodiment of the present invention, the configuration of the cognitive function prediction device 10 described above may be realized by, for example, a personal computer, a smartphone, a tablet, a television or the like used by the user. Further, the embodiment of the present invention is not limited to the one in which all the constituent elements are composed of an integrated device as in the above-mentioned cognitive function prediction device 10, but is composed of a plurality of devices. You may. For example, a user terminal including an information transmission unit 11 and a measurement unit 12, and a cognitive function prediction device that calculates and processes the measurement results obtained by the measurement unit 12 to predict a state or symptom caused by a decrease in cognitive function. May be a separate body.

［認知機能予測システム］
図３は、図１に記載の認知機能予測装置１０における情報伝達部１１と測定部１２とを別体として構成した例を示した図であり、本発明の一実施形態に係る認知機能予測システムを説明するための図である。
図３に示すように、認知機能予測システム４０は、利用者に対し音声情報及び画像情報を伝達するための情報伝達部１１と、利用者音声及び／又は利用者画像を測定するための測定部１２と、外部情報入力部２１とを有する利用者端末５０と、利用者端末５０における測定部１２からの測定結果に基づき、音声的特徴データ、言語的特徴データ及び画像的特徴データからなる群から選択される少なくとも１種の特徴データを作成するための特徴データ作成部１３１と、特徴データ作成部１３１で得られた特徴データから特徴量を抽出するための特徴量抽出部１３２と、特徴量抽出部１３２で得られた特徴量に基づき、利用者に認知機能の低下に起因する状態又は症状の傾向があるか否かを予測するように構成されている認知機能予測部１３３と、質問作成部１３４及びデータベース３１を有する認知機能予測装置６０と、を別体として備えるものである。このように利用者端末５０と、認知機能の低下に起因する状態又は症状を予測する認知機能予測装置６０とを別体とする場合、利用者端末５０と認知機能予測装置６０とは、通信回線により繋がれている。 [Cognitive function prediction system]
FIG. 3 is a diagram showing an example in which the information transmission unit 11 and the measurement unit 12 in the cognitive function prediction device 10 shown in FIG. 1 are configured as separate bodies, and is a diagram showing a cognitive function prediction system according to an embodiment of the present invention. It is a figure for demonstrating.
As shown in FIG. 3, the cognitive function prediction system 40 has an information transmission unit 11 for transmitting voice information and image information to the user, and a measurement unit for measuring the user voice and / or the user image. A group consisting of voice feature data, linguistic feature data, and image feature data based on the measurement results from the user terminal 50 having the external information input unit 21 and the external information input unit 21 and the measurement unit 12 on the user terminal 50. A feature data creation unit 131 for creating at least one selected feature data, a feature amount extraction unit 132 for extracting a feature amount from the feature data obtained by the feature data creation unit 131, and a feature amount extraction unit. The cognitive function prediction unit 133, which is configured to predict whether or not the user has a tendency of a state or symptom caused by a decrease in cognitive function, and a question creation unit, based on the feature amount obtained in the unit 132. A cognitive function prediction device 60 having 134 and a database 31 is provided as a separate body. When the user terminal 50 and the cognitive function prediction device 60 for predicting a state or symptom caused by a decrease in cognitive function are separated in this way, the user terminal 50 and the cognitive function prediction device 60 are connected to a communication line. Is connected by.

通信回線としては、赤外線、近距離無線通信回線、ネットワーク回線、ローカルエリアネットワーク回線、電話回線等、一般的に用いられている通信回線を適宜選択することができる。また、利用者端末５０と認知機能予測装置６０との連結は、有線であるか無線であるかを問わず、一般的に用いられている通信回線を利用するのに必要な送受信機や通信部を備えることができる。 As the communication line, a commonly used communication line such as an infrared ray, a short-range wireless communication line, a network line, a local area network line, and a telephone line can be appropriately selected. Further, the connection between the user terminal 50 and the cognitive function prediction device 60 is a transmitter / receiver and a communication unit necessary for using a generally used communication line regardless of whether it is wired or wireless. Can be provided.

本実施形態を図３のように利用者端末５０と認知機能予測装置６０とに分けて構成した場合、例えば、利用者が既に使用しているスマートフォン、タブレット、パソコン及びテレビ等、情報伝達部１１と測定部１２とを備え、通信回線に接続可能な端末を、利用者端末５０として用いることができる。こうすることで、利用者が従来から使用している端末を利用者端末５０とすることができきるため、一実施形態である認知機能予測システムを利用する利用者の心理的負担を軽減することができる。これにより認知機能予測システムの定期的な利用が促進され、早期段階で認知機能の低下に起因する状態又は症状を予測することができる。 When the present embodiment is divided into the user terminal 50 and the cognitive function prediction device 60 as shown in FIG. 3, for example, the information transmission unit 11 such as a smartphone, tablet, personal computer, or television already used by the user. A terminal having a device and a measuring unit 12 and capable of connecting to a communication line can be used as the user terminal 50. By doing so, the terminal that the user has been using conventionally can be the user terminal 50, so that the psychological burden on the user who uses the cognitive function prediction system, which is one embodiment, can be reduced. Can be done. This promotes the regular use of the cognitive function prediction system, and can predict the condition or symptom caused by the deterioration of cognitive function at an early stage.

［認知機能予測方法］
次に、本実施形態の一つである認知機能予測方法を説明する。
本実施形態の認知機能予測方法は、利用者に対し音声情報及び画像情報を伝達する情報伝達ステップと、利用者音声及び／又は利用者画像を測定する測定ステップと、上記測定部の測定結果に基づいて、音声的特徴データ、言語的特徴データ及び画像的特徴データからなる群から選択される少なくとも１種の特徴データを作成する特徴データ作成ステップと、上記特徴データ作成ステップで得られた特徴データから特徴量を抽出する特徴量抽出ステップと、上記特徴量抽出ステップで得られた特徴量に基づき利用者に認知機能の低下に起因する状態又は症状の傾向があるか否かを予測する認知機能予測ステップとを備え、上記情報伝達部から伝達される情報は、非定型質問を含み、上記非定型質問は、利用者の年齢に応じた過去のイベントに関する質問を含む。 [Cognitive function prediction method]
Next, a cognitive function prediction method, which is one of the present embodiments, will be described.
The cognitive function prediction method of the present embodiment includes an information transmission step of transmitting voice information and image information to the user, a measurement step of measuring the user voice and / or the user image, and the measurement result of the measurement unit. Based on this, a feature data creation step for creating at least one type of feature data selected from a group consisting of voice feature data, linguistic feature data, and image feature data, and feature data obtained in the above feature data creation step. A cognitive function that predicts whether or not the user has a tendency of a state or symptom due to a decrease in cognitive function based on the feature amount extraction step of extracting the feature amount from the feature amount and the feature amount obtained in the above feature amount extraction step. The information transmitted from the information transmission unit including the prediction step includes an atypical question, and the atypical question includes a question about a past event according to the age of the user.

図４は、本発明の一実施形態の認知機能予測方法を説明するためのフローチャートである。また、図４のフローチャートは、図１で示した認知機能予測装置１０及び図３で示した認知機能予測システム４０の動作例を示している。なお図４では、簡便化のため、情報伝達部１１における表示部１１１で表示される画像データをアバター１１１ａとし、音声出力部１１２から出力される音声情報は、アバター１１１ａが質問しているように認識されるよう、擬人化表現を用いて記載する。 FIG. 4 is a flowchart for explaining a cognitive function prediction method according to an embodiment of the present invention. Further, the flowchart of FIG. 4 shows an operation example of the cognitive function prediction device 10 shown in FIG. 1 and the cognitive function prediction system 40 shown in FIG. In FIG. 4, for the sake of simplicity, the image data displayed by the display unit 111 in the information transmission unit 11 is referred to as the avatar 111a, and the voice information output from the voice output unit 112 is as inquired by the avatar 111a. Describe using anthropomorphic expressions so that they can be recognized.

制御部において、利用者による認知機能予測装置の起動が検知されると、制御部は、情報伝達部に対し、非定型質問を含む質問を伝達する。情報伝達部を介して、アバターが質問しているように上記非定型質問を含む質問が利用者に対し伝達され、測定部により利用者の応答が測定される（ＳＴ１）。測定部における測定結果が制御部に送達され、制御部の特徴データ作成部にて、特徴データが作成される（ＳＴ２）。次に、特徴量抽出部にて、得られた特徴データから特徴量が抽出される（ＳＴ３）。そして、認知機能予測部にて、得られた特徴量から認知機能の低下に起因する状態又は症状か否かが予測される（ＳＴ４）。認知機能予測部で得られた認知機能予測結果は、制御部から情報伝達部へ伝達され、認知機能予測結果が出力される（ＳＴ５）。 When the control unit detects that the user activates the cognitive function prediction device, the control unit transmits a question including an atypical question to the information transmission unit. A question including the above-mentioned atypical question is transmitted to the user via the information transmission unit as if the avatar is asking a question, and the measurement unit measures the user's response (ST1). The measurement result in the measurement unit is delivered to the control unit, and the feature data creation unit in the control unit creates the feature data (ST2). Next, the feature amount extraction unit extracts the feature amount from the obtained feature data (ST3). Then, the cognitive function prediction unit predicts whether or not the condition or symptom is caused by the deterioration of the cognitive function from the obtained feature amount (ST4). The cognitive function prediction result obtained by the cognitive function prediction unit is transmitted from the control unit to the information transmission unit, and the cognitive function prediction result is output (ST5).

なお、図４のフロー図には明記しないが、ＳＴ１で得られる測定結果、ＳＴ２で得られる特徴データ、ＳＴ３で得られる特徴量、ＳＴ４で得られる認知機能予測結果は、制御部１３により、適宜データベース３１に保管される。
また、ＳＴ１で使用される質問は、予めデータベース３１に保管されている質問データから選択された質問でもよく、利用者の応答に応じて、制御部１３における質問作成部１３４において作成された質問であってもよい。 Although not specified in the flow chart of FIG. 4, the measurement result obtained in ST1, the feature data obtained in ST2, the feature amount obtained in ST3, and the cognitive function prediction result obtained in ST4 are appropriately determined by the control unit 13. It is stored in the database 31.
Further, the question used in ST1 may be a question selected from the question data stored in the database 31 in advance, and may be a question created in the question creation unit 134 in the control unit 13 according to the response of the user. There may be.

また、認知機能予測結果の出力は、情報伝達部を用いて出力される場合に限定されず、上述の通り、印刷装置や通信装置を用いて出力することが可能である。また、ＳＴ５における認知機能予測結果の出力のタイミングは、ＳＴ４終了後であればいつでもよく、例えば、利用者本人、利用者の家族及び／又はかかりつけ医師が設定する任意のタイミングにおいて、利用者本人、利用者の家族及び／又はかかりつけ医師に提供されるようにしてもよい。 Further, the output of the cognitive function prediction result is not limited to the case where it is output by using the information transmission unit, and can be output by using a printing device or a communication device as described above. Further, the timing of outputting the cognitive function prediction result in ST5 may be any time after the end of ST4, for example, at any timing set by the user himself / herself, the user's family and / or the family doctor, the user himself / herself. It may be provided to the user's family and / or family doctor.

以上説明したフローチャートの少なくともＳＴ１からＳＴ４は、例えば、コンピュータとして構成される認知機能予測装置において実現されるプログラムによって処理される。 At least ST1 to ST4 of the flowchart described above are processed by, for example, a program realized in a cognitive function prediction device configured as a computer.

以上説明した本実施形態は、利用者の年齢に応じた過去のイベントに関する質問を含む非定型質問に対する利用者の応答を測定し、得られた測定結果から１以上の特徴データを作成し、特徴データから得られた１以上の特徴量に基づいて、利用者が認知機能の低下に起因する状態又は症状であるか否かを予測している。このため、本実施形態は、利用者の見当識、短期記憶と長期記憶に関する質問を含んでおり、より高精度に認知機能予測を行うことができる。認知機能の低下に起因する状態又は症状の中でも、認知症についてより高精度に認知機能予測を行うことができる。 In the present embodiment described above, the user's response to an atypical question including a question about a past event according to the user's age is measured, and one or more feature data is created from the obtained measurement result, and the feature is characterized. Based on one or more features obtained from the data, it is predicted whether or not the user has a condition or symptom caused by a decrease in cognitive function. Therefore, this embodiment includes questions about user orientation, short-term memory and long-term memory, and can predict cognitive function with higher accuracy. Among the states or symptoms caused by the deterioration of cognitive function, it is possible to predict the cognitive function with higher accuracy for dementia.

なお、本明細書において、認知機能低下に起因する状態又は症状として、もの忘れ、記憶力低下、集中力低下、注意力低下、判断力低下、空間認識力低下、神経活動性低下、神経伝達機能低下、認知柔軟性低下、実行機能低下、情報処理速度低下、鬱様症状、認知症（アルツハイマー病などの疾患に起因するものを含む）のような状態又は症状が挙げられる。
なお、本明細書における認知機能予測装置は、認知症を予測するための認知症予測装置であることが好ましく、認知機能予測方法は認知症を予測するための認知症予測方法であることが好ましく、認知機能予測システムは認知症を予測するための認知症予測システムであることが好ましい。
本明細書において、認知機能の低下に起因する状態又は症状は認知症に読み替えることができ、認知機能予測装置は認知症予測装置と読み替えることができ、認知機能予測方法は認知症予測方法に読み替えることができ、認知機能予測システムは認知症予測システムに読み替えることができる。 In the present specification, as states or symptoms caused by cognitive decline, forgetfulness, memory decline, concentration decline, attention loss, judgment weakness, spatial cognitive decline, neural activity decline, and nerve transmission function decline. , Cognitive inflexibility, executive function, information processing slowdown, depression-like symptoms, dementia (including those caused by diseases such as Alzheimer's disease) and other conditions or symptoms.
The cognitive function prediction device in the present specification is preferably a dementia prediction device for predicting dementia, and the cognitive function prediction method is preferably a dementia prediction method for predicting dementia. , The cognitive function prediction system is preferably a dementia prediction system for predicting dementia.
In the present specification, a state or symptom caused by a decrease in cognitive function can be read as dementia, a cognitive function predictor can be read as a dementia predictor, and a cognitive function predictor can be read as a dementia predictor. The cognitive function prediction system can be read as a dementia prediction system.

次に本発明の実施形態に係る認知機能予測装置１０の動作の一例について説明する。なお、以下で説明する認知機能予測装置１０の動作は、図３に示される通信回線により接続されている利用者端末５０及び認知機能予測装置６０を備えた認知機能予測システム４０の動作と共通するものであり、認知機能予測システム４０の動作の一例として読み替えることができる。 Next, an example of the operation of the cognitive function prediction device 10 according to the embodiment of the present invention will be described. The operation of the cognitive function prediction device 10 described below is common to the operation of the cognitive function prediction system 40 including the user terminal 50 and the cognitive function prediction device 60 connected by the communication line shown in FIG. It can be read as an example of the operation of the cognitive function prediction system 40.

最初に、例えば、制御部１３が、オペレータ（例えば被験者の親族等）または利用者による認知機能予測装置１０に対する所定の入力操作（例えば、外部情報入力部２１及び／又は測定部１２を介した操作）が行われたことを検出することで、認知機能予測装置１０の動作が開始される。 First, for example, the control unit 13 performs a predetermined input operation (for example, an operation via the external information input unit 21 and / or the measurement unit 12) to the cognitive function prediction device 10 by an operator (for example, a relative of a subject) or a user. ) Is detected, the operation of the cognitive function prediction device 10 is started.

なお、初期設定として、認知機能予測装置１０に利用者情報を登録してもよい。このような利用者情報の登録は、例えば、外部情報入力部２１又は測定部１２に対し、利用者情報（氏名、年齢等）等の所定の情報が入力されたことが、制御部１３で検出されると、制御部１３は入力された利用者情報をデータベース３１に保管することにより達成される。上記利用者情報には、例えば、アバター１１１ａに関するデータも含まれる。
また、利用者情報として、外部情報入力部２１又は測定部１２を介して、指紋認証、虹彩認証、静脈認証、声紋認証及び顔認証等の生体情報を予め登録してもよい。最初に利用者の生体情報を登録することで、次回の認知機能予測装置１０の利用時に、測定部１２及び／又は外部情報入力部２１を介して制御部１３にて生体情報が検出されることにより利用者が特定され、スムーズに認知機能予測装置１０を作動させることができる。 As an initial setting, user information may be registered in the cognitive function prediction device 10. In such registration of user information, for example, the control unit 13 detects that predetermined information such as user information (name, age, etc.) has been input to the external information input unit 21 or the measurement unit 12. Then, the control unit 13 is achieved by storing the input user information in the database 31. The user information also includes, for example, data regarding the avatar 111a.
Further, as user information, biometric information such as fingerprint authentication, iris authentication, vein authentication, voice print authentication and face authentication may be registered in advance via the external information input unit 21 or the measurement unit 12. By first registering the biometric information of the user, the biometric information is detected by the control unit 13 via the measurement unit 12 and / or the external information input unit 21 when the cognitive function prediction device 10 is used next time. The user is identified by the above, and the cognitive function prediction device 10 can be operated smoothly.

認知機能予測装置１０が作動すると、まず、アバター１１１ａが自己紹介（名前を名乗る）をしていると利用者が感じるように、制御部１３が、データベース３１から必要な画像データ及び音声データを読み出し、当該データを情報伝達部１１に対して入力する。なお、以下では説明の簡略化のため、アバター１１１ａの擬人的な動作の説明のみを行い、アバター１１１ａが擬人的な動作をするために必要となる制御部１３の動作（データベース３１から必要な画像データ及び音声データを読み出し、当該データを情報伝達部１１に対して入力する動作）については説明を省略する。 When the cognitive function prediction device 10 is activated, the control unit 13 first reads necessary image data and audio data from the database 31 so that the user feels that the avatar 111a is introducing himself (name himself). , The data is input to the information transmission unit 11. In the following, for the sake of simplification of the explanation, only the anthropomorphic operation of the avatar 111a will be described, and the operation of the control unit 13 required for the avatar 111a to perform the anthropomorphic operation (image required from the database 31). The operation of reading data and audio data and inputting the data to the information transmission unit 11) will be omitted.

アバター１１１ａが自己紹介を行うことによって、機械と対話する利用者の違和感や緊張を緩和することができる。また、アバター１１１ａが自己紹介を行ったあと、アバター１１１ａが利用者の自己紹介を促す（例えば、アバター１１１ａが「お名前は何といいますか」という質問を行う）ことで、機械と対話する利用者の違和感や緊張をさらに緩和することができる。 When the avatar 111a introduces himself / herself, it is possible to alleviate the discomfort and tension of the user who interacts with the machine. Also, after the avatar 111a introduces himself, the avatar 111a encourages the user to introduce himself (for example, the avatar 111a asks the question "What is your name?") To interact with the machine. It is possible to further alleviate the discomfort and tension of the person.

次に、アバター１１１ａは、利用者に対して質問を行う。例えば、アバター１１１ａが、「今日は何月何日ですか。」、「大阪万博について知っていることをお話しください。」、「あなたが２０代の頃に、印象に残っている出来事を教えてください。」等の複数の質問を行う。アバター１１１ａにより行われる複数の質問には、固定質問（長谷川式認知症スケール、ＭＭＳＥ等の神経心理検査に含まれる日時、場所、計算等の質問）の他、非定型質問を含み、非定型質問は、利用者の年齢に応じた過去のイベントに関する質問を含むものである。
例えば、アバター１１１ａは、図５に記載のような質問セットの中からランダムに少なくとも３問程度選択し、質問を行うが、図５の中では、Ｑ５〜Ｑ１３に関する質問が、利用者の年齢に応じた過去のイベントに関する質問に該当するため、Ｑ５〜Ｑ１３のいずれかの質問が含まれる。
なお、アバター１１１ａによる質問は、利用者の年齢に応じた過去のイベントに関する質問を含む少なくとも３問以上であることが好ましく、５問以上であることがより好ましい。 Next, the avatar 111a asks the user a question. For example, Avatar 111a said, "What month and what day is it today?", "Tell us what you know about the Osaka Expo.", "Tell us about the events that left an impression on you when you were in your twenties. Ask multiple questions such as "Please." The multiple questions asked by Avatar 111a include fixed questions (questions such as date and time, place, calculation, etc. included in neuropsychiatric tests such as Hasegawa dementia scale and MMSE), as well as atypical questions. Includes questions about past events according to the user's age.
For example, the avatar 111a randomly selects at least three questions from the question set as shown in FIG. 5 and asks questions. In FIG. 5, the questions related to Q5 to Q13 are set according to the age of the user. Since it corresponds to the question regarding the past event that was responded to, the question of any of Q5 to Q13 is included.
The questions by the avatar 111a are preferably at least 3 or more, including questions about past events according to the age of the user, and more preferably 5 or more.

アバター１１１ａによる質問が行われてから利用者の回答が完了するまでの間、測定部１２が利用者の反応を測定する。なお、アバター１１１ａが、利用者に質問を行ったにもかかわらず利用者が何らの回答もしない場合（即ち、利用者に対する１つの質問が終了してから集音部１２１が利用者の音声を集音しない状態が所定の時間続いたことを制御部１３が検出した場合）、その質問の回答を待たずに次の質問に移ってもよい。上記所定の時間は、例えば、利用者の会話リズムに応じて決定されることとしてもよく、一律で例えば１５秒と設定してもよい。また、利用者が答え難い質問ほど、アバター１１１ａが回答を待つ時間を長くするように設定してもよい。
また、アバター１１１ａは、利用者が何らかの音声を発して回答した後に沈黙した場合（即ち、利用者に対する１つの質問が終了してから集音部１２１が利用者の音声を集音し、その後に集音部１２１が利用者の音声を集音しない状態が所定の時間続いたことを制御部１３が検出した場合）、利用者の回答が完了したとして次の動作を行ってもよい。 The measuring unit 12 measures the user's reaction from the time when the question is asked by the avatar 111a to the time when the user's answer is completed. When the avatar 111a asks the user a question but the user does not give any answer (that is, after one question to the user is completed, the sound collecting unit 121 collects the user's voice. When the control unit 13 detects that the state of not collecting sound continues for a predetermined time), the next question may be moved to without waiting for the answer of the question. The predetermined time may be determined according to, for example, the conversation rhythm of the user, or may be uniformly set to, for example, 15 seconds. Further, the question that is difficult for the user to answer may be set so that the avatar 111a waits for an answer longer.
Further, when the avatar 111a is silenced after the user utters some voice and answers (that is, the sound collecting unit 121 collects the user's voice after one question to the user is completed, and then the avatar 111a collects the user's voice. When the control unit 13 detects that the sound collecting unit 121 does not collect the user's voice for a predetermined time), the following operation may be performed assuming that the user's answer is completed.

そして、制御部１３における特徴データ作成部１３１が、測定部１２の測定結果に基づいて、音声的特徴データ、言語的特徴データ及び画像的特徴データからなる群から選択される少なくとも１種の特徴データを作成する。具体的には、特徴データ作成部１３１にて、測定部１２における集音部１２１で得られた利用者音声データに対し、音声区間検出処理、音声認識処理等の各種処理を行い、音声的特徴データ及び言語的特徴データを準備することができる。また特徴データ作成部１３１にて、測定部１２における撮像部１２２で得られた利用者画像データに対し、フリーのツールＯｐｅｎＦａｃｅ等で表情特徴抽出処理を行い、画像的特徴に関するデータを準備することができる。なお、特徴データ作成部１３１では、音声的特徴データ及び／又は言語的特徴データ並びに画像的特徴データを作成することが好ましく、音声的特徴データ、言語的特徴データ及び画像的特徴データを作成することがより好ましい。 Then, the feature data creation unit 131 of the control unit 13 selects at least one type of feature data from the group consisting of voice feature data, linguistic feature data, and image feature data based on the measurement result of the measurement unit 12. To create. Specifically, the feature data creation unit 131 performs various processes such as voice section detection processing and voice recognition processing on the user voice data obtained by the sound collection unit 121 in the measurement unit 12, and the voice feature. Data and linguistic feature data can be prepared. Further, the feature data creation unit 131 may perform facial expression feature extraction processing on the user image data obtained by the image pickup unit 122 in the measurement unit 12 with a free tool OpenFace or the like to prepare data related to the image features. it can. The feature data creation unit 131 preferably creates voice feature data and / or linguistic feature data and image feature data, and creates voice feature data, linguistic feature data, and image feature data. Is more preferable.

次に、制御部１３における特徴量抽出部１３２が、特徴データ作成部１３１で作成された特徴データから、特徴量を抽出する。上記特徴量は、特徴データ作成部１３１で作成された特徴データに基づき抽出されるものである。ここで、特徴量抽出部１３２が算出する特徴量について、図面を参照して説明する。図６は、制御部における特徴量抽出部１３２が抽出する特徴量を説明した表である。 Next, the feature amount extraction unit 132 in the control unit 13 extracts the feature amount from the feature data created by the feature data creation unit 131. The feature amount is extracted based on the feature data created by the feature data creation unit 131. Here, the feature amount calculated by the feature amount extraction unit 132 will be described with reference to the drawings. FIG. 6 is a table explaining the feature amount extracted by the feature amount extraction unit 132 in the control unit.

図６に示すように、特徴量抽出部１３２は、特徴データ作成部１３１で作成された音声的特徴データ、言語的特徴データ及び画像的特徴データから、種々の特徴量を抽出するよう構成されている。特徴量抽出部１３２は、音声的特徴データから抽出される特徴量及び／又は言語的特徴データから抽出される特徴量、並びに、画像的特徴データから抽出される特徴量を抽出することが好ましく、音声的特徴データから抽出される特徴量、言語的特徴データから抽出される特徴量、及び、画像的特徴データから抽出される特徴量を抽出することがより好ましい。図６に示される各特徴量は、例えば次のような処理により抽出されるものである。 As shown in FIG. 6, the feature amount extraction unit 132 is configured to extract various feature amounts from the voice feature data, the linguistic feature data, and the image feature data created by the feature data creation unit 131. There is. The feature amount extraction unit 132 preferably extracts the feature amount extracted from the audio feature data and / or the feature amount extracted from the linguistic feature data, and the feature amount extracted from the image feature data. It is more preferable to extract the feature amount extracted from the audio feature data, the feature amount extracted from the linguistic feature data, and the feature amount extracted from the image feature data. Each feature amount shown in FIG. 6 is extracted by, for example, the following processing.

音声的特徴データからは、例えば、Ｓｎａｃｋｓｏｕｎｄｔｏｏｌｋｉｔ、ＯｐｅｎＳｍｉｌｅのような音声分析ツール等を使用することにより、ピッチ、声量（パワー）、声質、反応時間、ポーズ（発話間隔）等の特徴量を抽出することができる。
音声的特徴データから得られる特徴量は、利用者の音声の内容（音声に含まれる形態素や単語等）を解析することなく算出可能な、音声そのものに関する特徴量である。
「ピッチ」は、声の高さ、基本周波数である。ポーズは、利用者の発話（応答）の中で、沈黙が１秒以上の回数をカウントした合計数と発話間隔が最長の時間を特徴量としたものである。反応時間は、アバター１１１ａの質問終了時から利用者の応答開始までの時間差のことである。基本周波数に関しては、変動係数、平均値、最大値、中央値、最小値、レンジを特徴量とすることができる。また、声量（パワー）に関しては、平均値、最大値、最小値を特徴量とすることができる。また、声質に関しては、利用者の音声における第１倍音（ｈ１）と第３フォルマント（ａ３）の振幅差である。 Features such as pitch, voice volume (power), voice quality, reaction time, and pause (speech interval) are extracted from the voice feature data by using, for example, a voice analysis tool such as Snack sound toolkit or OpenSmile. can do.
The feature amount obtained from the voice feature data is a feature amount related to the voice itself, which can be calculated without analyzing the content of the user's voice (morphemes, words, etc. contained in the voice).
"Pitch" is the pitch and fundamental frequency of the voice. The pause is characterized by the total number of times the silence is counted for 1 second or more and the time when the utterance interval is the longest in the user's utterance (response). The reaction time is the time difference between the end of the question of the avatar 111a and the start of the response of the user. For the fundamental frequency, the coefficient of variation, the average value, the maximum value, the median value, the minimum value, and the range can be used as feature quantities. Further, regarding the voice volume (power), the average value, the maximum value, and the minimum value can be set as feature quantities. Regarding the voice quality, it is the amplitude difference between the first harmonic overtone (h1) and the third formant (a3) in the user's voice.

言語的特徴データからは、例えば、ＭｅＣａｂのような形態素解析エンジン等を使用することにより、日本語の形態素解析を行うことができ、トークン（形態素）数、フィラー、タイプトークン比（ＴＴＲ（ＴｙｐｅＴｏｋｅｎＲａｔｉｏ））、品詞情報（名詞、動詞、形容詞及び副詞の数等）、構文の複雑さ、語彙の選定、語彙の難易度、及び、発話速度等の特徴量を抽出することができる。
すなわち、言語的特徴データから得られる特徴量は、利用者音声の内容に関する特徴量であり、利用者の音声の内容（音声に含まれる形態素や単語等）を解析することで抽出される。
言語特徴に分類される特徴量のそれぞれは、音声に含まれる形態素や単語の情報は、例えば、利用者の音声のデータに対して周知の音声認識方法を適用して利用者の音声を文字列に変換した上で、当該文字列に対して周知の形態素や単語の解析方法（例えば、Ｍｅｃａｂ）を適用することで得られる。
トークン数は、利用者の音声に含まれる単語の総数である。フィラーは、「うー」や「あー」といった特定の意味を持たない語句の数である。「ＴＴＲ」は、利用者の音声に含まれている重複を許さない単語の総数であるタイプ数をトークン数で除した値である。なお、ＴＴＲは、利用者が同じ単語を使用するほど、タイプ数が増えずにトークン数が増えるため、値が小さくなる。「難易度」は、語句の難しさのレベルを数値で定義した所定の辞書に基づいて決定される全ての名詞の難しさのレベルの中間値である。「発話速度」は、被験者の発話時間を単語数で除した値である。 From the linguistic feature data, for example, by using a morphological analysis engine such as MeCab, Japanese morphological analysis can be performed, and the number of tokens (morphological elements), filler, and type-token ratio (TTR (Type Taken)). Ratio)), part of speech information (number of nouns, verbs, adjectives and adjuncts, etc.), syntactic complexity, vocabulary selection, vocabulary difficulty, and feature quantities such as speech speed can be extracted.
That is, the feature amount obtained from the linguistic feature data is a feature amount related to the content of the user's voice, and is extracted by analyzing the content of the user's voice (morphemes, words, etc. contained in the voice).
For each of the features classified into language features, the morphological element and word information contained in the voice is, for example, a character string of the user's voice by applying a well-known voice recognition method to the user's voice data. It can be obtained by applying a well-known morphological element or word analysis method (for example, Mecab) to the character string after converting to.
The number of tokens is the total number of words contained in the user's voice. A filler is a number of words that do not have a specific meaning, such as "uh" or "ah". "TTR" is a value obtained by dividing the number of types, which is the total number of non-duplicate words contained in the user's voice, by the number of tokens. The value of TTR becomes smaller as the user uses the same word because the number of types does not increase and the number of tokens increases. The "difficulty level" is an intermediate value of the difficulty level of all nouns determined based on a predetermined dictionary in which the difficulty level of a phrase is defined numerically. The "speaking speed" is a value obtained by dividing the speaking time of the subject by the number of words.

画像的特徴データとしては、例えば、Ｏｐｅｎｆａｃｅを使用することにより、フェイシャルアクションユニット（ＦＡＣＳ）、フェイシャルランドマーク特徴及び視線パターン等の画像的特徴データが挙げられ、ＦＡＣＳから抽出した特徴量、フェイシャルランドマーク特徴から抽出した特徴量、及び、視線パターンから抽出した特徴量等を抽出することができる。また、アバター１１１ａの質問終了時から利用者が口を動かすまでの応答時間（口元反応時間）を画像的特徴データから抽出される特徴量としてもよい。画像的特徴データから得られる特徴量は、利用者の音声とは無関係に、利用者の外見のみから抽出される特徴量である。 Examples of the image feature data include image feature data such as a facial action unit (FACS), a facial landmark feature, and a line-of-sight pattern by using Openface, and feature amounts and facial landmarks extracted from FACS. The feature amount extracted from the feature, the feature amount extracted from the line-of-sight pattern, and the like can be extracted. Further, the response time (mouth reaction time) from the end of the question of the avatar 111a to the movement of the user may be used as the feature amount extracted from the image feature data. The feature amount obtained from the image feature data is a feature amount extracted only from the appearance of the user regardless of the voice of the user.

測定部１２で得られた利用者音声に基づき作成された音声的特徴データ及び言語的特徴データのうち少なくとも１種の特徴データに基づき、特徴量が抽出されることが好ましく、音声的特徴データ及び言語的特徴データに基づき、特徴量が抽出されることがより好ましい。また、上記特徴量は、音声的特徴データ及び／又は言語的特徴データに基づき抽出された特徴量に加え、測定部１２で得られた利用者画像に基づき作成された視線パターン、フェイシャルアクションコーディングシステム及びフェイシャルランドマーク特徴のうち少なくとも１種の画像的特徴データから抽出された特徴量が含まれることがさらに好ましい。また、音声的特徴データ、言語的特徴データ及び上記画像的特徴データから抽出された特徴量が含まれることが最も好ましい。特徴量抽出部１３２で得られる特徴量に画像的特徴データから得られる特徴量が含まれることで、認知機能予測精度が向上するためである。
また、特徴量抽出部１３２では、ピッチ（声の高さ、基本周波数）、声量（パワー）、声質、反応時間及びポーズ（発話間隔）からなる群より選択される少なくとも１種の音声的特徴と、トークン（形態素）数、フィラー、タイプトークン比（ＴＴＲ（ＴｙｐｅＴｏｋｅｎＲａｔｉｏ））及び品詞情報（名詞、動詞、形容詞及び副詞の数等）からなる群より選択される少なくとも１種の言語的特徴量と、フェイシャルアクションユニットから抽出される特徴量、フェイシャルランドマーク特徴から抽出される特徴量及び視線パターンからから抽出される特徴量からなる群より選択される少なくとも１種の画像的特徴量とが抽出されることが好ましく、上記特徴量から１０種以上の特徴量が抽出されることがより好ましい。また、上記各特徴データから抽出されるそれぞれの特徴量を含むことが好ましい。認知機能予測結果の精度を向上できるためである。上述の各特徴量は、対話の際の脳の情報処理、認知活動をより反映した特徴量であると考えられるからである。 It is preferable that the feature amount is extracted based on at least one of the voice feature data and the linguistic feature data created based on the user voice obtained by the measuring unit 12, and the voice feature data and the linguistic feature data It is more preferable that the feature amount is extracted based on the linguistic feature data. Further, the above-mentioned feature amount includes a line-of-sight pattern created based on a user image obtained by the measuring unit 12 in addition to the feature amount extracted based on the voice feature data and / or the linguistic feature data, and a facial action coding system. And it is more preferable to include a feature amount extracted from at least one type of image feature data among the facial landmark features. Further, it is most preferable to include the audio feature data, the linguistic feature data, and the feature amount extracted from the image feature data. This is because the feature amount obtained by the feature amount extraction unit 132 includes the feature amount obtained from the image feature data, so that the cognitive function prediction accuracy is improved.
In addition, the feature amount extraction unit 132 has at least one kind of voice feature selected from the group consisting of pitch (pitch, basic frequency), voice volume (power), voice quality, reaction time, and pause (speech interval). , Token (morphological element) number, filler, type-token ratio (TTR (Type Taken Radio)) and part-of-speech information (number of nouns, verbs, adjectives, adjectives, etc.) at least one linguistic feature selected from the group. And at least one type of image feature selected from the group consisting of the feature amount extracted from the facial action unit, the feature amount extracted from the facial landmark feature, and the feature amount extracted from the line-of-sight pattern. It is more preferable that 10 or more kinds of feature amounts are extracted from the above feature amounts. Moreover, it is preferable to include each feature amount extracted from each of the above feature data. This is because the accuracy of the cognitive function prediction result can be improved. This is because each of the above-mentioned features is considered to be a feature that more reflects the information processing and cognitive activity of the brain during dialogue.

なお、上述のようにアバター１１１ａが複数の質問を行う場合は、例えば質問毎に算出される１０種類の特徴量を種類毎に平均化することで、最終的に質問数×１０個の特徴量が抽出される。 When the avatar 111a asks a plurality of questions as described above, for example, by averaging 10 types of features calculated for each question, the number of questions x 10 features is finally obtained. Is extracted.

制御部１３における認知機能予測部１３３では、特徴量抽出部１３２で得られた少なくとも１種の特徴量に基づき利用者に認知機能の低下に起因する状態又は症状の傾向があるか否かを予測する。具体的に、認知機能予測部１３３は、測定部１２の測定結果に基づいて特徴量抽出部１３２において１以上の特徴量を抽出し、当該特徴量に基づいて利用者が認知機能の低下に起因する状態又は症状であるか否かを判定する。
なお、認知機能予測部１３３では、１０種以上の特徴量に基づき利用者に認知機能の低下に起因する状態又は症状の傾向があるか否かを予測することが好ましい。また、上記１０種以上の特徴量には、上記各特徴データから抽出されるそれぞれの特徴量を含むことがより好ましい。認知機能予測精度が高精度となるためである。 The cognitive function prediction unit 133 in the control unit 13 predicts whether or not the user has a tendency of a state or symptom due to a decrease in cognitive function based on at least one feature amount obtained by the feature amount extraction unit 132. To do. Specifically, the cognitive function prediction unit 133 extracts one or more feature amounts in the feature amount extraction unit 132 based on the measurement result of the measurement unit 12, and the user causes a decrease in cognitive function based on the feature amount. Determine if it is a condition or symptom.
In addition, it is preferable that the cognitive function prediction unit 133 predicts whether or not the user has a tendency of a state or a symptom due to a decrease in cognitive function based on 10 or more kinds of feature quantities. Further, it is more preferable that the above 10 or more kinds of feature amounts include each feature amount extracted from each of the above feature data. This is because the cognitive function prediction accuracy is high.

認知機能予測部１３３は、上記の１０種（１０次元）の特徴量に対して、所定の判定方法を適用することで、利用者が認知機能の低下に起因する状態又は症状であるか否かを予測する。この予測方法として、例えば、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）、ロジスティック回帰分析、及び、深層学習からなる群から選択される少なくとも１つの識別モデルを用いて予測するように構成されていることが好ましい。識別モデルの入力として、特徴量抽出部１３２で抽出された特徴量がそれぞれ最小値０、最大値１になるように正規化し、認知機能の低下に起因する状態又は症状があるグループ（例えば、認知症グループ）と認知機能の低下に起因する状態又は症状がないグループ（例えば、非認知症グループ）とを分類するモデルを学習させることにより判定方法を構築させることができる。なお、判定方法の構築にあたっては、反応時間（音声的特徴量）を含む１０種以上の特徴量を学習させることにより判定方法を構築させることが好ましい。 By applying a predetermined determination method to the above 10 types (10 dimensions) of the feature amount, the cognitive function prediction unit 133 determines whether or not the user is in a state or symptom caused by a decrease in cognitive function. Predict. As this prediction method, for example, it is preferable that the prediction is made using at least one discriminative model selected from the group consisting of SVM (Support Vector Machine), logistic regression analysis, and deep learning. As input of the discriminative model, the feature amount extracted by the feature amount extraction unit 132 is normalized so as to have a minimum value of 0 and a maximum value of 1, respectively, and a group having a state or symptom due to deterioration of cognitive function (for example, cognition). A determination method can be constructed by training a model that classifies a group (for example, a non-dementia group) and a group (for example, a non-dementia group) that has no condition or symptom due to a decrease in cognitive function. In constructing the determination method, it is preferable to construct the determination method by learning 10 or more types of features including the reaction time (speech feature amount).

そして、制御部１３は、認知機能予測部１３３で得られた認知機能予測結果データを、情報伝達部１１等に出力する。 Then, the control unit 13 outputs the cognitive function prediction result data obtained by the cognitive function prediction unit 133 to the information transmission unit 11 and the like.

ここで、認知機能予測装置１０における認知機能の予測精度の検証結果について、図面を参照して説明する。図７は、本発明の実施形態に係る認知機能予測装置における認知機能の低下に起因する状態又は症状の１種である認知症の予測精度の検証結果について示したグラフである。図７（ａ）は、判定方法としてＳＶＭを採用して認知症を予測する場合の予測精度の検証結果を示したグラフであり、図７（ｂ）は、ロジスティック回帰分析を採用して認知症を予測する場合の予測精度の検証結果を示したグラフである。 Here, the verification result of the prediction accuracy of the cognitive function in the cognitive function prediction device 10 will be described with reference to the drawings. FIG. 7 is a graph showing the verification results of the prediction accuracy of dementia, which is one of the states or symptoms caused by the deterioration of cognitive function in the cognitive function prediction device according to the embodiment of the present invention. FIG. 7 (a) is a graph showing the verification result of the prediction accuracy when SVM is adopted as the determination method to predict dementia, and FIG. 7 (b) is a graph showing the verification result of the prediction accuracy when dementia is predicted by adopting the logistic regression analysis. It is a graph which showed the verification result of the prediction accuracy in the case of predicting.

図７（ａ）及び（ｂ）に示すグラフは、ｌｅａｖｅ−ｏｎｅ−ｐａｒｔｉｃｉｐａｎｔ−ｏｕｔ交差検証を用いて算出したＲＯＣ曲線及びＡＵＣ（ＡｒｅａＵｎｄｅｒｔｈｅＣｕｒｖｅ）値を示したグラフである。ｌｅａｖｅ−ｏｎｅ−ｐａｒｔｉｃｉｐａｎｔ−ｏｕｔ交差検証とは、複数の標本の中から１つを抜き出した残りの標本（学習用の標本）を用いて判定方法を構築するとともに、抜き出した１つの標本（テスト用の標本）に対して当該判定方法を適用して判定結果を得る検証方法であって、全ての標本が１回はテスト用の標本になるようにして複数の判定結果を得る検証方法である。また、図７（ａ）及び（ｂ）に示す検証結果は、認知症グループと非認知症グループが半々である２４人程度の協力者のそれぞれから２１種の特徴量を得るとともに、この２１種の特徴量のセットを１つの標本として算出したものである。
２１種の特徴量は、質問の応答にかかった時間（反応時間）、発話と発話との間の中で沈黙が１秒以上の回数をカウントした合計数（ポーズカウント数）、発話と発話の間（１秒以上）の時間平均値（ポーズ時間平均値）、発話と発話の間（１秒以上）の時間の最大値（ポーズ時間最大値）、基本周波数の平均値、基本周波数の最小値、基本周波数の最大値、基本周波数の中央値、基本周波数の最大値と最小値との差、基本周波数の標準偏差、基本周波数の変動係数、声量の平均値、声量の標準偏差、発話の長さ（発話時間）、数発話内のトークン数、発話内のフィラーの総数、発話内の動詞の総数、発話内の名詞の総数、発話内の形容詞の総数、発話内の副詞の総数、発話時間あたりのトークン数である。なお、本実施例において、アバター１１１ａは、図６に記載の１９問のうち、Ｑ５〜Ｑ１３の利用者の年齢に応じた過去のイベントに関する質問を少なくとも１つを含む質問セット（５問）をランダム選び、協力者にそれぞれ質問した。上記数発話は、アバター１１１ａからの複数の質問に対する利用者の全回答（利用者応答の全発話）であってもよく、利用者の年齢に応じた過去のイベントに関する質問を含む少なくとも３問の質問に対する回答（利用者応答の発話）であってもよい。 The graphs shown in FIGS. 7A and 7B are graphs showing ROC curves and AUC (Area Under the Curve) values calculated using leave-one-participant-out cross-validation. In leave-one-participant-out cross-validation, a judgment method is constructed using the remaining sample (study sample) extracted from a plurality of samples, and one sample extracted (for testing). This is a verification method for obtaining a judgment result by applying the judgment method to the sample), and is a verification method for obtaining a plurality of judgment results so that all the samples are once used as test samples. In addition, the verification results shown in FIGS. 7 (a) and 7 (b) obtained 21 types of features from each of about 24 collaborators, each of whom has half the dementia group and half the non-dementia group, and these 21 types. This is a set of features calculated as one sample.
The 21 feature quantities are the time taken to answer the question (response time), the total number of silences between utterances of 1 second or more (pause count), and utterances and utterances. Time average value (pause time average value) between utterances (1 second or more), maximum value of time between utterances (1 second or more) (pause time maximum value), basic frequency average value, basic frequency minimum value , Maximum value of basic frequency, median value of basic frequency, difference between maximum value and minimum value of basic frequency, standard deviation of basic frequency, fluctuation coefficient of basic frequency, average value of voice volume, standard deviation of voice volume, length of utterance (Speech time), number of tokens in utterance, total number of fillers in utterance, total number of verbs in utterance, total number of nomenclature in utterance, total number of adjectives in utterance, total number of adjuncts in utterance, total time The number of tokens per. In this embodiment, the avatar 111a asks a question set (5 questions) including at least one question about a past event according to the age of the user of Q5 to Q13 out of the 19 questions shown in FIG. We chose randomly and asked each collaborator a question. The above few utterances may be all the user's answers to the plurality of questions from the avatar 111a (all the user's response utterances), and at least three questions including questions about past events according to the user's age. It may be the answer to the question (the utterance of the user response).

ＲＯＣ曲線は、特定の判定方法における陽性及び陰性の境界を決定付けるパラメータを変更しながら検証を行って得られる複数の検証結果を結んだ線であり、横軸が偽陽性率（陽性と判定されたが実際は陰性である割合）、縦軸が真陽性率（陽性と判定されて実際にも陽性である割合）である。また、図７（ａ）及び（ｂ）に示すグラフにおいて、偽陽性率と真陽性率が一致する場合を表す斜めの破線は、陽性及び陰性を完全にランダムに判定する場合に相当する。そのため、ＲＯＣ曲線が、当該破線よりも上側（ランダムに判定する場合よりも真陽性率が高い領域）にあり、当該破線から乖離しているほど、認知症の判定精度が高いと言える。また、ＡＵＣ値は、ＲＯＣ曲線の下側の面積の大きさを表しており、この値が最大値である１に近づくほど、認知症の判定精度が高いと言える。 The ROC curve is a line connecting a plurality of verification results obtained by performing verification while changing the parameters that determine the boundary between positive and negative in a specific determination method, and the horizontal axis is the false positive rate (determined as positive). However, the percentage that is actually negative) and the vertical axis are the true positive percentage (the percentage that is determined to be positive and is actually positive). Further, in the graphs shown in FIGS. 7A and 7B, the diagonal broken line indicating the case where the false positive rate and the true positive rate match corresponds to the case where positive and negative are completely randomly determined. Therefore, it can be said that the more the ROC curve is on the upper side of the broken line (the region where the true positive rate is higher than that of the random judgment) and the more deviated from the broken line, the higher the determination accuracy of dementia. Further, the AUC value represents the size of the area on the lower side of the ROC curve, and it can be said that the closer this value is to 1, which is the maximum value, the higher the determination accuracy of dementia.

図７（ａ）及び（ｂ）に示すように、ＳＶＭ及びロジスティック回帰のいずれを採用した場合のＲＯＣ曲線も、上述した斜めの斜線よりも上側にあり、当該斜線から充分乖離している。さらに、ＳＶＭ及びロジスティック回帰のいずれを採用した場合のＡＵＣ値も、最大値である１に極めて近い値（ＳＶＭ：０．９５、ロジスティック回帰：０．９２）になっている。したがって、本発明の実施形態に係る認知機能予測装置は、認知機能の低下に起因する状態又は症状の１種である認知症の予測精度が極めて高いことが分かる。 As shown in FIGS. 7A and 7B, the ROC curve when either SVM or logistic regression is adopted is above the diagonal diagonal line described above, and is sufficiently deviated from the diagonal line. Further, the AUC value when either SVM or logistic regression is adopted is a value extremely close to the maximum value of 1 (SVM: 0.95, logistic regression: 0.92). Therefore, it can be seen that the cognitive function prediction device according to the embodiment of the present invention has extremely high prediction accuracy of dementia, which is one of the states or symptoms caused by the deterioration of cognitive function.

以上のように、本発明の実施形態に係る認知機能予測装置は、利用者の年齢に応じた過去のイベントに関する質問を含む非定型質問に対する利用者の応答を測定し、得られた測定結果から１以上の特徴データを作成し、特徴データから得られた１以上の特徴量に基づいて、利用者が認知機能の低下に起因する状態又は症状であるか否かを予測する。このため、本実施形態は、利用者の短期記憶と長期記憶に関する質問を含んでおり、より高精度な認知機能の低下に起因する状態又は症状予測が可能になる。 As described above, the cognitive function predictor according to the embodiment of the present invention measures the user's response to an atypical question including a question about a past event according to the user's age, and from the obtained measurement result. One or more feature data is created, and based on the one or more feature amounts obtained from the feature data, it is predicted whether or not the user is in a state or symptom caused by a decrease in cognitive function. Therefore, the present embodiment includes questions regarding the user's short-term memory and long-term memory, and enables more accurate prediction of the state or symptom caused by the deterioration of cognitive function.

なお、図７（ａ）及び（ｂ）に例示した検証結果は、上述した２１種の特徴量を用いて利用者が認知機能の低下に起因する状態又は症状の１種である認知症か否かを判定した場合の検証結果であるが、必ずしもこの２１種の特徴量の全てを用いなくても認知症の判定精度を向上させることは可能である。特に、この２１種の特徴量の中で、認知症の判定精度の向上に資するものを選択的に用いて利用者が認知症か否かを判定することによって、認知症の判定精度の向上を見込むことができる。すなわち、認知機能の低下に起因する状態又は症状の向上に資する特徴量を選択的に用いて利用者が認知機能の低下に起因する状態又は症状か否かを判定することによって、認知機能の低下に起因する状態又は症状の判定精度の向上を見込むことができる。 The verification results illustrated in FIGS. 7 (a) and 7 (b) show whether or not the user has dementia, which is one of the states or symptoms caused by the deterioration of cognitive function, using the above-mentioned 21 types of features. Although it is a verification result when it is determined whether or not, it is possible to improve the determination accuracy of dementia without necessarily using all of these 21 types of feature quantities. In particular, among these 21 types of feature quantities, those that contribute to the improvement of the determination accuracy of dementia are selectively used to determine whether or not the user has dementia, thereby improving the determination accuracy of dementia. Can be expected. That is, the cognitive decline is determined by the user determining whether the condition or symptom is caused by the cognitive decline by selectively using the feature amount that contributes to the improvement of the cognitive decline or the symptom. It is expected that the accuracy of determining the condition or symptom caused by the above will be improved.

また、図７（ａ）及び（ｂ）に例示した検証結果は、画像的特徴データから抽出された特徴量を含まないものであるが、画像的特徴データから抽出された特徴量を含めることとしてもよい。ここで、図８に示すグラフは、図７と同様にＬｅａｖｅ−ｏｎｅ−ｐａｒｔｉｃｉｐａｎｔ−ｏｕｔ交差検証を用いて算出したＲＯＣ曲線及びＡＵＣ（ＡｒｅａＵｎｄｅｒｔｈｅＣｕｒｖｅ）値を示したグラフであり、画像的特徴データを用いた認知機能の低下に起因する状態又は症状の１種である認知症の判定精度の検証について示したグラフである。図８に示す検証結果は、認知症グループと非認知症グループが半々である２４人程度の協力者のそれぞれから、画像的特徴データから抽出された３種の特徴量を得るとともに、この３種のセットを特徴量として、各協力者から抽出し算出したものである。
３種の特徴量は、フェイシャルアクションコーディングシステムから抽出された動作単位（ＡｃｔｉｏｎＵｎｉｔｓ（ＡＵｓ））と、視線パターンから抽出された特徴量、及び、フェイシャルランドマーク特徴から抽出された口元反応時間である。 Further, the verification results illustrated in FIGS. 7A and 7B do not include the feature amount extracted from the image feature data, but include the feature amount extracted from the image feature data. May be good. Here, the graph shown in FIG. 8 is a graph showing the ROC curve and the AUC (Area Under the Curve) value calculated by using the Leave-one-participant-out intersection verification as in FIG. 7, and is an image feature. It is a graph which showed the verification of the determination accuracy of dementia which is one of the states or symptoms caused by the deterioration of cognitive function using data. The verification results shown in FIG. 8 obtained three types of features extracted from the image feature data from each of about 24 collaborators, half of whom were in the dementia group and half of the non-dementia group, and these three types. It is calculated by extracting from each collaborator using the set of features as the feature quantity.
The three types of features are the action units (Action Units (Australia)) extracted from the facial action coding system, the features extracted from the line-of-sight pattern, and the mouth reaction time extracted from the facial landmark features. ..

図８に示すように、画像的特徴データから抽出された３種の特徴量の組合せを用い、ロジスティック回帰を採用した場合のＲＯＣ曲線は、異なる２つの質問（Ｑ１、Ｑ２）のいずれの場合も斜線よりも上側にあり、斜線から充分乖離している。更に、ＡＵＣ値も最大値である１に近い値（Ｑ１：０．７８、Ｑ２：０．８２）になっている。したがって、画像的特徴データから抽出される特徴量のみを用いた場合でも本発明の実施形態に係る認知機能予測装置は、認知機能の低下に起因する状態又は症状の予測精度が高く、具体的に認知症の予測精度が高いことが分かる。なお、図８中のＱ１及びＱ２は、定型質問であり、Ｑ１は、「今日は何月何日ですか」であり、Ｑ２は、「これまでに楽しかった思い出を話してください」である。
よって、画像的特徴データから抽出される特徴量と、言語的特徴データ及び／又は音声的特徴データから抽出される特徴量とを組み合わせることで、認知機能予測精度の個人差によるばらつきが解消され、一貫性のある認知機能予測が可能になると考えられる。 As shown in FIG. 8, the ROC curve when logistic regression is adopted using a combination of three types of features extracted from the image feature data is in any case of two different questions (Q1 and Q2). It is above the diagonal line and is sufficiently deviated from the diagonal line. Further, the AUC value is also close to the maximum value of 1 (Q1: 0.78, Q2: 0.82). Therefore, even when only the feature amount extracted from the image feature data is used, the cognitive function prediction device according to the embodiment of the present invention has high prediction accuracy of the state or symptom caused by the deterioration of the cognitive function, and specifically. It can be seen that the prediction accuracy of dementia is high. In addition, Q1 and Q2 in FIG. 8 are standard questions, Q1 is "what month and what day is it today", and Q2 is "tell me the memories you have enjoyed so far".
Therefore, by combining the feature amount extracted from the image feature data and the feature amount extracted from the linguistic feature data and / or the phonetic feature data, the variation due to the individual difference in the cognitive function prediction accuracy can be eliminated. It is thought that consistent cognitive function prediction will be possible.

ここで、認知機能の低下に起因する状態又は症状の１種である認知症の予測におけるそれぞれの特徴量の有意性について説明する。
図７（ｂ）に示したロジスティック回帰法による検証結果では、２１種の特徴量の中でも１）反応時間、２）基本周波数の最大値と最小値との差、３）基本周波数の最大値、４）ポーズ時間の平均値、５）発話内の動詞の総数は、認知症の検出に当たって重要な特徴量であり、特に反応時間は、ロジスティック回帰法による認知症予測において、最も影響が大きい特徴量であり、認知症の早期発見に最も効果的な属性を示す。よって、認知機能の低下に起因する状態又は症状の早期発見に最も効果的な属性を示すと考えられる。 Here, the significance of each feature amount in the prediction of dementia, which is one of the states or symptoms caused by the deterioration of cognitive function, will be described.
According to the verification results by the logistic regression method shown in FIG. 7 (b), among the 21 types of features, 1) reaction time, 2) difference between the maximum and minimum values of the basic frequency, and 3) the maximum value of the basic frequency, 4) Mean value of pause time, 5) Total number of verbs in speech is an important feature in detecting dementia, and reaction time is the feature that has the greatest influence on the prediction of dementia by logistic regression. It shows the most effective attributes for early detection of dementia. Therefore, it is considered to show the most effective attributes for early detection of conditions or symptoms caused by cognitive decline.

また、本発明においては、アバター１１１ａが利用者に対して行う質問に非定型質問が含まれており、非定型質問は、利用者の年齢に応じた過去のイベントに関する質問を含む。認知機能の低下に起因する状態又は症状を有する患者の中でも初期段階に属する患者は、過去のイベントに関する質問に対する反応時間は、健常者の同質問に対する反応時間と統計的な有意差を示さない。そのため、本質問を含むことにより、認知機能の低下に起因する状態又は症状の進行度、なかでも認知症の進行度を予測することができる。
なお、アバター１１１ａによる質問は、上記質問の他、認知機能の低下に起因する状態又は症状の初期段階の患者と、健常者のそれぞれにおける反応時間に統計学的な有意差がある質問を利用者に対して行うようにすると、好ましい。 Further, in the present invention, the question asked by the avatar 111a to the user includes an atypical question, and the atypical question includes a question about a past event according to the age of the user. Among patients with conditions or symptoms caused by cognitive decline, those in the early stages do not show a statistically significant difference in reaction time to questions about past events from reaction times to the same questions in healthy individuals. Therefore, by including this question, it is possible to predict the degree of progression of the condition or symptom caused by the deterioration of cognitive function, especially the degree of progression of dementia.
In addition to the above questions, the questions asked by Avatar 111a include questions in which there is a statistically significant difference in response time between patients in the early stages of conditions or symptoms caused by cognitive decline and healthy subjects. It is preferable to do so.

また、認知機能予測装置の使用が想定される状況に応じて、質問の内容を決めると好ましい。例えば、自宅のテレビ、パソコン、タブレット及びスマートフォン等を認知機能予測システム４０における利用者端末５０として使用することが想定される場合、自宅で回答しやすい質問として、例えば、「朝食は何を食べましたか」などの質問を行ってもよい。 In addition, it is preferable to determine the content of the question according to the situation in which the cognitive function prediction device is expected to be used. For example, when it is assumed that a TV, a personal computer, a tablet, a smartphone, or the like at home is used as a user terminal 50 in the cognitive function prediction system 40, as a question that is easy to answer at home, for example, "What did you eat for breakfast?" You may ask questions such as "Is it?"

上述の実施形態では、アバター１１１ａが画像である場合について例示しているが（図２参照）、アバターはロボット等の立体的な物体であってもよい。ただし、アバターを画像とした方が、認知機能予測装置の構成を簡略化することができるため、好ましい。 In the above embodiment, the case where the avatar 111a is an image is illustrated (see FIG. 2), but the avatar may be a three-dimensional object such as a robot. However, it is preferable to use the avatar as an image because the configuration of the cognitive function prediction device can be simplified.

本発明は、利用者が認知機能の低下に起因する状態又は症状であるか否かを予測する認知機能予測装置、認知機能予測システム、認知機能予測方法及びプログラムに利用可能である。 The present invention can be used for a cognitive function prediction device, a cognitive function prediction system, a cognitive function prediction method and a program for predicting whether or not a user has a state or symptom caused by a decrease in cognitive function.

１０、６０：認知機能予測装置
１１：情報伝達部
１１１：表示部
１１１ａ：アバター
１１１ｂ：テキスト
１１２：音声出力部
１２：測定部
１２１：集音部
１２２：撮像部
１３：制御部
１３１：特徴データ作成部
１３２：特徴量抽出部
１３３：認知機能予測部
１３４：質問作成部
２１：外部情報入力部
３１：データベース
４０：認知機能予測システム
５０：利用者端末 10, 60: Cognitive function prediction device 11: Information transmission unit 111: Display unit 111a: Avatar 111b: Text 112: Audio output unit 12: Measurement unit 121: Sound collection unit 122: Imaging unit 13: Control unit 131: Feature data creation Department 132: Feature extraction unit 133: Cognitive function prediction unit 134: Question creation unit 21: External information input unit 31: Database 40: Cognitive function prediction system 50: User terminal

Claims

An information transmission unit for transmitting audio and image information to users,
A measuring unit for measuring user voice and / or user image,
A feature data creation unit for creating at least one type of feature data selected from a group consisting of voice feature data, linguistic feature data, and image feature data based on the measurement results obtained by the measurement unit. ,
A feature amount extraction unit for extracting a feature amount from the feature data obtained by the feature data creation unit, and a feature amount extraction unit.
A cognitive function prediction configured to predict whether or not the user is prone to a condition or symptom caused by cognitive decline based on at least one feature amount obtained by the feature amount extraction unit. With a department
The information transmitted from the information transmission unit includes atypical questions.
The atypical question is a cognitive function predictor including a question about a past event according to a user's age.

It also has a question-making unit that is configured to create questions based on user responses.
The cognitive function prediction device according to claim 1, wherein the atypical question includes a question created by the question creation unit.

The feature amount extraction unit is based on the feature data of at least one of the voice feature data and the linguistic feature data created by the feature data creation unit based on the user voice obtained by the measurement unit. The cognitive function predictor according to claim 1 or 2, which is configured to extract data.

The feature amount extraction unit is at least one selected from a group consisting of a line-of-sight pattern, a facial action coding system, and a facial landmark feature created by the feature data creation unit based on a user image obtained by the measurement unit. Features based on the image feature data of the above and at least one of the voice feature data and the linguistic feature data created by the feature data creation section based on the user voice obtained by the measurement section. The cognitive function predictor according to claim 1 or 2, which is configured to extract a quantity.

Based on the feature amount extracted by the feature amount extraction unit, the cognitive function prediction unit determines whether or not the user has a tendency of a state or a symptom caused by cognitive decline in SVM (Support Vector Machine). The cognitive function predictor according to any one of claims 1 to 4, which is configured to predict using at least one selected from a group consisting of logistic regression analysis and deep learning.

Information transmission steps to convey audio and image information to users,
Measurement steps to measure user voice and / or user image,
A feature data creation step of creating at least one type of feature data selected from a group consisting of voice feature data, linguistic feature data, and image feature data based on the measurement results of the measurement unit.
A feature amount extraction step for extracting a feature amount from the feature data obtained in the feature data creation step, and a feature amount extraction step.
Based on at least one feature amount obtained by the feature amount extraction unit, the user is provided with a cognitive function prediction step for predicting whether or not the user is prone to a state or symptom caused by cognitive decline.
The information transmitted from the information transmission unit includes atypical questions.
The atypical question is a cognitive function prediction method including a question about a past event according to a user's age.

A program for causing a computer to execute the cognitive function prediction method according to claim 6.

A user terminal including an information transmission unit for transmitting voice information and image information to a user, and a measurement unit for measuring user voice and / or user image.
A feature data creation unit for creating at least one type of feature data selected from a group consisting of voice feature data, linguistic feature data, and image feature data based on the measurement results of the measurement unit, and the feature data creation unit. Based on the feature amount extraction unit for extracting the feature amount from the feature data obtained by the data creation unit and at least one kind of feature amount extracted by the feature amount extraction unit, the user suffers from cognitive decline. It has a cognitive function predictor that is configured to predict whether or not there is a tendency for the resulting condition or symptom, and a cognitive function predictor.
The information transmitted from the information transmission unit includes atypical questions.
The atypical question is a cognitive function prediction system characterized by including questions about past events according to the age of the user.