JP2020127703A

JP2020127703A - Estimation system of mild cognitive dysfunction and estimation device of mild cognitive dysfunction

Info

Publication number: JP2020127703A
Application number: JP2019128769A
Authority: JP
Inventors: 厳増岡; Gen Masuoka; 伯秀舞草; Norihide Maikusa
Original assignee: Nippontect Systems Co Ltd
Current assignee: Nippontect Systems Co Ltd
Priority date: 2018-07-24
Filing date: 2019-07-10
Publication date: 2020-08-27
Anticipated expiration: 2039-07-10
Also published as: JP6804779B2

Abstract

To estimate presence or absence of mild cognitive dysfunction in a subject at high validity.SOLUTION: An estimation system of mild cognitive dysfunction includes: a discriminator being made to learn to output the presence or absence of the mild cognitive dysfunction on the basis of the learning data comprising the correct answer mark of the answer to question concerning the orientation of age and date/time and one or more voice feature values extracted from audio data at the answer about a healthy person and a mild cognitive dysfunction onset person; a data acquisition unit for acquiring subject information containing the age information on a subject, the information on the answer to the question and the audio data at the answer to the question; a data analysis unit for acquiring the correct answer mark of the answer on the basis of the subject information, and extracting the one or more voice feature values from the audio data; and an estimated result output unit for inputting the age on the subject, the correct answer mark and the one or more voice feature values in the discriminator, and outputting the presence or absence of the mild cognitive dysfunction of the subject outputted from the discriminator as the estimated result.SELECTED DRAWING: Figure 2

Description

本発明は、被検者の音声データを用いて軽度の認知機能障害の有無を推定する軽度の認知機能障害の推定システム及び軽度の認知機能障害の推定装置に関する。 The present invention relates to a system for estimating mild cognitive impairment and a device for estimating mild cognitive impairment, which estimates the presence or absence of mild cognitive impairment using voice data of a subject.

超高齢化社会を迎えつつある現在、医療機関を受診する認知症発症者の数は急激に増加しており、２０１５年の世界アルツハイマー報告によると、２０１５年に全世界で４，６８０万人の認知症発症者が存在し、２０５０年までに１億３，１５０万人に増加すると予想されている。現在のところ認知症に対する決定的な治療法は確立されていないものの、軽度の認知機能障害が認められる段階での発見により以後の認知症の進行を遅くできる可能性があるため、認知症の早期発見方法の開発が強く望まれている。 As we enter a super-aging society, the number of people with dementia who visit medical institutions is rapidly increasing. According to the 2015 World Alzheimer's Report, the number of people with dementia in 2015 will reach 46.8 million worldwide. There are people with dementia who are expected to increase to 131.5 million by 2050. Although no definitive treatment for dementia is currently established, early detection of dementia may occur due to the possibility of slowing the progression of subsequent dementia by discovering it at the stage of mild cognitive impairment. The development of discovery methods is strongly desired.

これまでに実施されてきた代表的な認知症のスクリーニング手法として、改訂長谷川式簡易知能評価スケール（ＨＤＳ−Ｒ）や、ミニメンタルステート検査（ＭＭＳＥ）などが挙げられる。これらは音声諮問を主とした手法であるが、スクリーニングの実施に要する時間が長い、十分なトレーニングを受けた検査者でなければ実施が難しい、などの問題点が指摘されている。そこで、これらの問題点を解消する技術として、被検者の音声データから抽出された音声特徴量を使用した機械学習によって専門医による診断結果を推定する自動識別技術が検討されてきた。これらの自動識別技術は被検者が認知症か否かを確定するものでは無く、認知症か否かの診断はむしろ米国精神医学会による診断マニュアルであるＤｉａｇｎｏｓｔｉｃａｎｄＳｔａｔｉｓｔｉｃａｌＭａｎｕａｌｏｆＭｅｎｔａｌＤｉｓｏｒｄｅｒｓ−５（ＤＳＭ−５）などに基づき専門家により慎重になされるべきであるが、認知機能障害の疑いがあることを早期に被検者或いはその関係者に知らせることにより早期治療の開始につなげることが可能になる。 As a typical dementia screening method that has been implemented so far, the revised Hasegawa-type simplified intelligence evaluation scale (HDS-R), the mini mental state test (MMSE), and the like can be mentioned. Although these methods are mainly based on voice consultation, it has been pointed out that it takes a long time to carry out screening, and it is difficult for an inspector who has received sufficient training to carry out screening. Therefore, as a technique for solving these problems, an automatic identification technique for estimating a diagnosis result by a specialist by machine learning using a voice feature amount extracted from voice data of a subject has been studied. These automatic identification techniques do not determine whether a subject has dementia, and the diagnosis of dementia is rather a diagnostic manual by the American Psychiatric Association, which is a Diagnostic Manual and Statistical Manual of Mental Disorders-5 ( Although it should be carefully done by an expert based on DSM-5) etc., it is possible to start early treatment by notifying the subject or his related person that there is a suspicion of cognitive impairment early. become.

例えば、非特許文献１（情報処理学会研究報告Ｖｏｌ．２０１７−ＳＬＰ−１１７Ｎｏ．８，ｐｐ１−６（２０１７））には、認知症の早期発見を目標として、臨床的認知症尺度（ＣＤＲ）のスコアが０（健常）、０．５（認知症の疑い）、及び１（軽度認知症）の被検者を対象とし、ＨＤＳ−Ｒの各質問に対する被検者の回答の正誤、回答音声に対する発話スタイル分析より定義される音声素性、及び、回答音声に対する音声認識結果より定義される言語素性を用い、各種機械学習法を用いて専門医による診断結果（ＣＤＲスコア０、０．５、１）を予測する実験の結果が報告されている。この文献では、音声素性として、音声からの感情認識や発話スタイル認識で広く使われているｏｐｅｎＳＭＩＬＥ（非特許文献２（ｈｔｔｐ：／／ａｕｄｅｅｒｉｎｇ．ｃｏｍ／ｔｅｃｈｎｏｌｏｇｙ／ｏｐｅｎｓｍｉｌｅ）参照）のｅｍｏｂａｓｅといわれる音声特徴量セットが使用されており、言語素性として、ＣＳＪ−Ｋａｌｄｉによる認識結果に対するキーワード素性が使用されており、８種類の識別器が機械学習のために使用されている。ｅｍｏｂａｓｅは、音声強度・音圧・１２のメル周波数ケプストラム係数・ピッチ・ピッチの包絡・音声プロバビリティ・８の線スペクトル対・０交差比の計２６種類の特徴量とこれらの特徴量の時間差分のそれぞれに対して１９種類の統計量（最大・最小・範囲・平均・最大／最小の絶対値・線形近似の傾き／切片／誤差・放物線近似の誤差・標準偏差・分布の歪度／尖度・第ｋ四分位数（ｋ＝１，２，３）・第１−２四分位範囲・第２−３四分位範囲・第１−３四分位範囲）を求めた、合計９８８種類（２６×２×１９）の短期的音声特徴量から構成される統計的な音声特徴量セットである。そして、使用された識別器の種類によって識別精度が上下するものの、ＨＤＳ−Ｒにおける日時の見当識に関する質問に関する正解点数のみを用いた識別では０．５６〜０．６３の精度が得られたのに対し、上記質問に対する回答音声から抽出された音声素性のみを用いた識別では０．４７〜０．５５の精度しか得られず、上記質問に対する回答音声から抽出された言語素性のみを用いた識別では０．４３〜０．５６の精度しか得られず、上記質問に対する回答の正解点数に対して音声・言語素性を追加した識別でも０．３４〜０．６４の精度に留まったことが示されている。したがって、上述の結果から判断される限り、機械学習によるＣＤＲスコアの識別の精度の向上のためには音声特徴量の追加が有効に作用しなかったことが分かる。 For example, Non-Patent Document 1 (Information Processing Society of Japan, Research Report Vol. 2017-SLP-117 No. 8, pp1-6 (2017)) has a clinical dementia scale (CDR) for the purpose of early detection of dementia. Targeting subjects with a score of 0 (healthy), 0.5 (suspected of dementia), and 1 (mild dementia), and correct or incorrect answer of the subject to each question of HDS-R, answer voice Diagnosis results by specialists using various machine learning methods using the voice features defined by the utterance style analysis and the language features defined by the voice recognition result for the answer voice (CDR score 0, 0.5, 1) The result of the experiment to predict is reported. In this document, as a voice feature, a voice feature called “embase” of openSMILE (see Non-Patent Document 2 (http://auding.com/technology/opensmile)) that is widely used in emotion recognition and speech style recognition from voice. A quantity set is used, a keyword feature for a recognition result by CSJ-Kaldi is used as a language feature, and eight types of classifiers are used for machine learning. Emobase is a total of 26 types of features such as voice intensity, sound pressure, 12 mel frequency cepstrum coefficient, pitch, pitch envelope, voice probability, 8 line spectrum vs. 0 crossing ratio, and the time difference between these features. 19 kinds of statistics (maximum/minimum/range/average/maximum/minimum absolute value/slope/intercept/error of linear approximation/error of parabolic approximation/standard deviation/skewness/kurtosis of distribution -The k-th quartile (k=1, 2, 3)-The 1st-2nd quartile range-The 2nd-3rd quartile range-The 1st-3rd quartile range) were found, totaling 988. It is a statistical voice feature amount set including short-term voice feature amounts of types (26×2×19). Then, although the classification accuracy varies depending on the type of the classifier used, the accuracy of 0.56 to 0.63 was obtained by the classification using only the correct answer score regarding the question regarding the orientation of the date and time in HDS-R. On the other hand, the identification using only the voice features extracted from the answer voice to the question gives only the accuracy of 0.47 to 0.55, and the identification using only the language feature extracted from the answer voice to the question. However, it was shown that the accuracy of 0.43 to 0.56 was obtained, and the accuracy of 0.34 to 0.64 was maintained even in the discrimination in which the voice and language features were added to the correct answer score of the above question. ing. Therefore, as far as it is judged from the above results, it can be seen that the addition of the voice feature amount did not effectively act to improve the accuracy of the CDR score identification by machine learning.

非特許文献３（日本音響学会２０１８年春季研究発表会講演論文集１−Ｑ−４４（２０１８））は、重症度が軽度である認知症発症者に限定した検討ではないが、統計的パラメトリック音声合成の合成音声品質に有効である変調スペクトル（非特許文献４（ＩＥＥＥ／ＡＣＭＴｒａｎｓ，ｏｎＡｕｄｉｏ，ＳｐｅｅｃｈａｎｄＬａｎｇｕａｇｅＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．２４，Ｎｏ．４，ｐｐ７５５−７６７（２０１６））参照）を導入した自動識別技術を提案している。変調スペクトルは、音声特徴量時系列の対数パワースペクトルとして定義され、音声特徴量時系列の時間変動をフーリエ基底を用いて表現した長期的な音声特徴量である。変調スペクトルは任意の短期的音声特徴量の時間変化から算出可能であるが、この文献では、識別性能が高い変調スペクトルとしてメル周波数ケプストラム係数の変調スペクトルが採用されており、ｍｏｄｃｅｐといわれるメル周波数ケプストラム係数の変調スペクトルのケプストラム（変調ケプストラム）の低次成分が上述したｅｍｏｂａｓｅと共に音声特徴量セットとして用いられている。そして、ＨＤＳ−Ｒの質問に対する回答時の音声データから抽出された上記音声特徴量セットに含まれる音声特徴量と４種類の識別器を使用し、アルツハイマー型認知症発症者と健常者との２値分類タスクを行ったところ、ｅｍｏｂａｓｅを用いた場合には識別率が４７．２〜５４．２％であったのに対し、ｍｏｄｃｅｐを用いた場合には識別率が５１．９〜５６．５％であり、ｅｍｏｂａｓｅとｍｏｄｃｅｐの結合ベクトルを用いた場合には識別率が４７．２〜５５．１％であり、ｍｏｄｃｅｐが識別率を向上させたことが報告されている。しかしながら、音声特徴量のみを用いた自動識別における識別率はこの文献においても５７％以下に留まっている。 Non-Patent Document 3 (Proceedings of the 2018 Acoustic Research Conference of the Acoustical Society of Japan 1-Q-44 (2018)) is not a study limited to persons with dementia with mild severity, but statistical parametric speech. A modulation spectrum (see IEEE/ACM Trans, on Audio, Speech and Language Processing, Vol. 24, No. 4, pp 755-767 (2016)) effective for synthetic speech quality of synthesis was introduced. We propose automatic identification technology. The modulation spectrum is defined as a logarithmic power spectrum of the time series of the voice feature quantity, and is a long-term voice feature quantity that represents the time variation of the time series of the voice feature quantity using a Fourier base. The modulation spectrum can be calculated from the time change of an arbitrary short-term speech feature amount, but in this document, the modulation spectrum of the mel frequency cepstrum coefficient is adopted as the modulation spectrum with high identification performance, and the mel frequency cepstrum called modcep is adopted. The low-order component of the cepstrum (modulation cepstrum) of the coefficient modulation spectrum is used as the audio feature set together with the above-mentioned emobase. Then, by using the voice feature amount and four types of discriminators included in the voice feature amount set extracted from the voice data at the time of answering the question of HDS-R, 2 of the Alzheimer-type dementia onset person and the healthy person are used. When the value classification task was performed, the identification rate was 47.2 to 54.2% when using Emobase, whereas the identification rate was 51.9 to 56.5 when using Modcep. %, the discrimination rate was 47.2 to 55.1% when the combination vector of the embase and modcep was used, and it is reported that modcep improved the discrimination rate. However, the identification rate in the automatic identification using only the voice feature amount is less than 57% even in this document.

なお、自動識別技術のために使用可能な音声特徴量セットはｅｍｏｂａｓｅ及びｍｏｄｃｅｐの他にも存在し、Ｔｓａｎａｓｆｅａｔｕｒｅｓ、ＹＡＡＦＥｆｅａｔｕｒｅｓ等、多くの音声特徴量セットが知られている（例えば、非特許文献５（ＰＬｏｓＯＮＥ１２（１０）：ｅ０１８５６１３（２０１７））の表２参照）。 It should be noted that there are voice feature set that can be used for the automatic identification technique in addition to the demobase and the modcep, and many voice feature sets such as Tsanas features and YAAFE features are known (for example, non-patent document 1). 5 (see Table 2 of PLos ONE 12(10):e0185613 (2017))).

情報処理学会研究報告Ｖｏｌ．２０１７−ＳＬＰ−１１７Ｎｏ．８，ｐｐ１−６（２０１７）IPSJ Research Report Vol. 2017-SLP-117 No. 8, pp1-6 (2017) ｈｔｔｐ：／／ａｕｄｅｅｒｉｎｇ．ｃｏｍ／ｔｅｃｈｎｏｌｏｇｙ／ｏｐｅｎｓｍｉｌｅhttp: //audering. com/technology/opensmile 日本音響学会２０１８年春季研究発表会講演論文集１−Ｑ−４４（２０１８）Acoustical Society of Japan 2018 Spring Research Presentation Lecture Proceedings 1-Q-44 (2018) ＩＥＥＥ／ＡＣＭＴｒａｎｓ，ｏｎＡｕｄｉｏ，ＳｐｅｅｃｈａｎｄＬａｎｇｕａｇｅＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．２４，Ｎｏ．４，ｐｐ７５５−７６７（２０１６）IEEE/ACM Trans, on Audio, Speech and Language Processing, Vol. 24, no. 4, pp755-767 (2016) ＰＬｏｓＯＮＥ１２（１０）：ｅ０１８５６１３（２０１７）PLos ONE 12(10):e0185613(2017) ｈｔｔｐｓ：／／ｗｗｗ．ｃｈｉｂａ．ｍｅｄ．ｏｒ．ｊｐ／ｐｅｒｓｏｎｎｅｌ／ｎｕｒｓｉｎｇ／ｄｏｗｎｌｏａｄ／ｔｅｘｔ２０１２＿１０.ｐｄｆhttps://www. chiba. med. or. jp/person/nursing/download/text2012_10.pdf

音声データを使用した機械学習によって専門医による診断結果を推定する自動識別技術は、簡便性及び迅速性の点で極めて優れているものの、上述したようにこれまでの検討において満足できる正答率の値が得られているとは言えない。また、軽度の認知機能障害が認められる段階での発見により以後の認知症の進行を遅くできる可能性があるため、軽度の認知機能障害が認められる段階での自動識別を可能にする技術が望まれる。 Although the automatic identification technology that estimates the diagnosis result by the specialist by machine learning using voice data is extremely excellent in terms of simplicity and speed, as described above, the value of the correct answer rate that can be satisfied in the studies so far is It cannot be said that it has been obtained. In addition, since it may be possible to delay the progression of subsequent dementia by the discovery at the stage of mild cognitive impairment, a technology that enables automatic identification at the stage of mild cognitive impairment is desired. Be done.

そこで、本発明の目的は、音声データを使用することにより簡便且つ迅速に被検者における軽度の認知機能障害の有無を推定する軽度の認知機能障害の推定システムであって、向上した正答率を有する推定システムを提供することである。 Therefore, an object of the present invention is a mild cognitive impairment estimation system that simply and quickly estimates the presence or absence of mild cognitive impairment in a subject by using voice data, with an improved correct answer rate. It is to provide an estimation system having.

発明者らは、軽度の認知機能障害の疑いの有無を短時間に且つ高い正答率で識別することに適した質問を発見すべく、代表的な認知症のスクリーニング手法であって且つ長期間にわたる実績を有しているＨＤＳ−Ｒに着目して検討を行った。以下に、ＨＤＳ−Ｒの各質問及び正解点数を示す。ＨＤＳ−Ｒの総合得点は３０点満点であり、２０点以下で認知症の疑いが高まるとされている。また、各質問における正解点数の算出のためのルールは質問毎に定められており、例えば、質問１に関する回答が実年齢±２である場合には正解点数が１点とされ、それ以外の場合は誤答として正解点数が０点とされる。

In order to find a question suitable for identifying the presence or absence of suspicion of mild cognitive impairment in a short time and with a high correct answer rate, the inventors have used a typical dementia screening method and have a long-term The study was conducted by focusing on HDS-R, which has a track record. Below, each question and correct answer score of HDS-R are shown. The total score of HDS-R is 30 points, and it is said that the suspicion of dementia increases when the score is 20 points or less. Further, the rule for calculating the correct answer score for each question is set for each question. For example, when the answer to question 1 is the actual age ±2, the correct answer score is 1 point, and in other cases. As an incorrect answer, the correct answer score is 0.

発明者らはまず、専門家により認知症発症者と診断された者の中からＨＤＳ−Ｒの総合得点が１４点以上である者を「軽度の認知機能障害発症者」として選定した。すなわち、本発明に関する限り、「軽度の認知機能障害」の語は「認知症において認められるＨＤＳ−Ｒの総合得点が１４点以上である認知機能障害」を意味する。ＨＤＳ−Ｒは認知症の重症度の判定のために使用されるものではないが、一方で各重症度群間のＨＤＳ−Ｒの総合得点に有意差が認められており、重症度が軽度の群の平均得点が１９±５点であることが知られている（非特許文献６（ｈｔｔｐｓ：／／ｗｗｗ．ｃｈｉｂａ．ｍｅｄ．ｏｒ．ｊｐ／ｐｅｒｓｏｎｎｅｌ／ｎｕｒｓｉｎｇ／ｄｏｗｎｌｏａｄ／ｔｅｘｔ２０１２＿１０.ｐｄｆ）参照）ため、本発明ではこの情報を基に「軽度の認知機能障害」の語を上述のように定義した。 The inventors first selected a person having a total HDS-R score of 14 or more as a "mild cognitive impairment onset person" from among persons who were diagnosed as dementia onset persons by an expert. That is, as far as the present invention is concerned, the term "mild cognitive impairment" means "cognitive impairment having a total HDS-R score of 14 or more in dementia". HDS-R is not used to judge the severity of dementia, but on the other hand, there is a significant difference in the overall score of HDS-R between the severity groups, indicating that the severity is mild. It is known that the average score of the group is 19±5 (see Non-Patent Document 6 (https://www.chiba.med.or.jp/personal/nursing/download/text2012_10.pdf)). In the present invention, based on this information, the term "mild cognitive impairment" was defined as described above.

発明者らは次に、健常者と軽度の認知機能障害発症者に関し、ＨＤＳ−Ｒの各質問に対する回答の正解点数のみを用いた健常／軽度の認知機能障害の自動識別、及び、ＨＤＳ−Ｒの各質問に対する回答時の音声データから抽出した音声特徴量のみを用いた健常／軽度の認知機能障害の自動識別における正答率を評価した。なお、正答率とは、以下の式から算出される値である。
正答率＝（真陽性者数＋真陰性者数）／（被検者数） Next, regarding the healthy person and the person with a mild cognitive impairment, the inventors next performed automatic identification of a healthy/mild cognitive impairment using only the correct answer score for each question of HDS-R, and HDS-R. We evaluated the correct answer rate in automatic identification of healthy/mild cognitive dysfunction using only the voice features extracted from the voice data when answering each question. The correct answer rate is a value calculated from the following formula.
Correct answer rate = (Number of true positives + Number of true negatives) / (Number of subjects)

その結果、以下で詳細を示すが、ＨＤＳ−Ｒの各質問に対する回答の正解点数のみを用いた自動識別においては、質問７、すなわち、３つの言葉の遅延再生に関する質問を用いた場合の正答率が最も高く、次いで、質問２、すなわち、日時の見当識に関する質問を用いた場合の正答率が高かった。ＨＤＳ−Ｒの各質問に対する回答時の音声データから抽出した音声特徴量のみを用いた自動識別では、質問の種類による影響は小さく正答率が０．６前後であり、質問２〜９に関しては、非特許文献１における結果と同様に、音声特徴量のみを用いた場合の正答率の値が正解点数のみを用いた場合の正答率の値より低かった。 As a result, as will be described in detail below, in the automatic identification using only the correct answer score for each question of HDS-R, the correct answer rate in the case of using question 7, that is, the question regarding delayed reproduction of three words. Was the highest, followed by Question 2, that is, the correct answer rate when using the question regarding the orientation of the date and time. In automatic identification using only the voice feature amount extracted from the voice data at the time of answering each question of HDS-R, the influence of the type of question is small and the correct answer rate is around 0.6. Similar to the result in Non-Patent Document 1, the value of the correct answer rate when only the voice feature amount was used was lower than the value of the correct answer rate when only the correct answer score was used.

質問７、すなわち、３つの言葉の遅延再生に関する質問に対する回答の正解点数を用いた自動識別における正答率は０．９にも達し、極めて高い値であった。したがって、質問７が軽度の認知機能障害の有無の推定のために極めて優れていることが分かる。しかし、この質問の開始から回答を得るまでには比較的長い時間を要するため、評価の簡便性及び迅速性の点で問題がある。そこで、発明者らは、回答の正解点数のみを用いた自動識別において次に正答率の値が高く、質問の開始から回答を得るまでに要する時間が短時間で済む質問２、すなわち、日時の見当識に関する質問に着目し、この質問に対する回答時の音声データを使用した自動識別における正答率の向上方法を検討した。日時の見当識に関する質問は、被検者における軽度の認知機能障害の有無を推定するための検査サーバと被検者が質問に対する回答を音声で入力するために使用する被検者端末とがインターネット回線等の通信回線を介して接続されている場合でも、検査サーバが回答の正誤を自動判定することができる点でも好適である。 The correct answer rate in the automatic identification using the correct answer score for the question 7, that is, the question regarding the delayed reproduction of three words, reached 0.9, which was an extremely high value. Therefore, it can be seen that question 7 is extremely excellent for estimating the presence or absence of mild cognitive impairment. However, since it takes a relatively long time from the start of this question until the answer is obtained, there is a problem in the ease and speed of evaluation. Therefore, in the automatic identification using only the correct answer score, the inventors have the next highest correct answer rate, and the time required from the start of the question to obtaining the answer is short, that is, the question 2, that is, the date and time. Focusing on the question about orientation, we investigated a method of improving the correct answer rate in automatic identification using voice data when answering this question. For questions regarding the orientation of the date and time, the test server for estimating the presence or absence of mild cognitive impairment in the subject and the subject terminal used by the subject to input the answer to the question by voice are available on the Internet. This is also preferable in that the inspection server can automatically determine whether the answer is correct or incorrect, even when connected via a communication line such as a line.

その結果、発明者らは、被検者の質問２に対する回答の正解点数と、この質問に対する回答時の音声データから抽出した音声特徴量に加えて、被検者の年齢も用いて機械学習による自動識別を行うと、正答率が質問７に関する正答率に匹敵するまでに向上することを発見し、本発明を完成させた。以下に詳細に示すが、年齢のみを用いた自動識別の正答率は、質問２に対する回答の正解点数のみを用いた自動識別における正答率より低い値が得られている。しかしながら、年齢の導入は正答率の向上に効果的に寄与した。また、質問２に対する質問の正解点数とこの質問に対する回答時の音声データから抽出した音声特徴量とを用いた自動識別、及び、質問２に対する回答の正解点数と年齢とを用いた自動識別の場合には、満足できる正答率の向上が得られなかった。 As a result, the inventors conducted machine learning using the age of the subject in addition to the correct answer score of the subject's question 2 and the voice feature amount extracted from the voice data at the time of answering this question. The present inventors have completed the present invention by discovering that automatic identification improves the correct answer rate to the level of the correct answer rate for question 7. As will be described in detail below, the correct answer rate in automatic identification using only age is lower than the correct answer rate in automatic identification using only correct answer points for question 2. However, the introduction of age effectively contributed to the improvement of the correct answer rate. Further, in the case of automatic identification using the correct answer score for question 2 and the voice feature amount extracted from the voice data at the time of answering this question, and automatic identification using the correct answer score and age for the answer to question 2. Did not get a satisfactory improvement in correct answer rate.

そこで、本発明はまず、
通信回線を介して接続された検査サーバと被検者端末とを含む、被検者における軽度の認知機能障害の有無を推定する軽度の認知機能障害の推定システムであって、
上記検査サーバが、
健常者及び軽度の認知機能障害発症者のそれぞれについての、年齢と、日時の見当識に関する質問に対する回答の正解点数と、上記日時の見当識に関する質問に対する回答時の音声データから抽出された１種以上の音声特徴量と、から成る学習データに基づき軽度の認知機能障害の有無を出力するように学習させられた識別器と、
上記被検者端末から送信された、上記被検者についての、年齢情報と、上記日時の見当識に関する質問に対する回答の情報と、上記日時の見当識に関する質問に対する回答時の音声データと、を含む被検者情報を受信するデータ取得部と、
上記被検者情報に基づき、上記回答の正解点数を取得すると共に、上記音声データから上記１種以上の音声特徴量を抽出するデータ解析部と、
上記被検者についての年齢と上記正解点数と上記１種以上の音声特徴量とを上記識別器に入力し、上記識別器から出力された上記被検者における軽度の認知機能障害の有無を推定結果として上記被検者端末に送信する推定結果出力部と
を備えていることを特徴とする軽度の認知機能障害の推定システムに関する。 Therefore, the present invention first
An estimation system for a mild cognitive impairment, which includes an examination server and a subject terminal connected via a communication line, for estimating the presence or absence of a mild cognitive impairment in a subject,
The inspection server
One kind extracted from the sound data at the time of answering the question regarding the age and the orientation regarding the age and the orientation regarding the age and the orientation regarding each of the healthy person and the person with the mild cognitive impairment. The above speech feature amount, and a discriminator trained to output the presence or absence of mild cognitive impairment based on learning data consisting of:
About the subject, transmitted from the subject terminal, age information, information on the answer to the question regarding the orientation of the date and time, and voice data at the time of answering the question regarding the orientation of the date and time, A data acquisition unit that receives subject information including
A data analysis unit that obtains the correct answer score based on the subject information, and extracts the one or more types of voice feature amounts from the voice data;
The age of the subject, the correct answer score, and the one or more types of voice feature quantities are input to the discriminator, and the presence or absence of mild cognitive impairment in the subject output from the discriminator is estimated. As a result, the present invention relates to an estimation system for a mild cognitive impairment, comprising: an estimation result output unit for transmitting to the subject terminal.

本発明はまた、
被検者における軽度の認知機能障害の有無を推定する軽度の認知機能障害の推定装置であって、
健常者及び軽度の認知機能障害発症者のそれぞれについての、年齢と、日時の見当識に関する質問に対する回答の正解点数と、上記日時の見当識に関する質問に対する回答時の音声データから抽出された１種以上の音声特徴量と、から成る学習データに基づき軽度の認知機能障害の有無を出力するように学習させられた識別器と、
上記被検者についての、年齢情報と、上記日時の見当識に関する質問に対する回答の情報と、上記日時の見当識に関する質問に対する回答時の音声データと、を含む被検者情報を取得するデータ取得部と、
上記被検者情報に基づき、上記回答の正解点数を取得すると共に、上記音声データから上記１種以上の音声特徴量を抽出するデータ解析部と、
上記被検者についての年齢と上記正解点数と上記１種以上の音声特徴量とを上記識別器に入力し、上記識別器から出力された上記被検者における軽度の認知機能障害の有無を推定結果として出力する推定結果出力部と
を備えていることを特徴とする軽度の認知機能障害の推定装置に関する。 The present invention also provides
A mild cognitive impairment estimating device for estimating the presence or absence of mild cognitive impairment in a subject,
One kind extracted from the sound data at the time of answering the question regarding the age and the orientation regarding the age and the orientation regarding each of the healthy person and the person with the mild cognitive impairment The above speech feature amount, and a discriminator trained to output the presence or absence of mild cognitive impairment based on learning data consisting of:
Data acquisition to obtain subject information including age information about the subject, information about answers to questions regarding orientation at the date and time, and voice data when answering questions regarding orientation at the date and time Department,
A data analysis unit that obtains the correct answer score based on the subject information, and extracts the one or more types of voice feature amounts from the voice data;
The age of the subject, the correct answer score, and the one or more types of voice feature quantities are input to the discriminator, and the presence or absence of mild cognitive impairment in the subject output from the discriminator is estimated. An estimation device for mild cognitive impairment, comprising: an estimation result output unit that outputs the result.

なお、本発明において、日時の見当識に関する質問はＨＤＳ−Ｒの質問２に限定されず、例えば季節に関する質問が含まれていても良く、年、月、日及び曜日に関する質問の一部を欠いていても良い。また、上記データ取得部は、各被検者について年齢と日時の見当識に関する質問に対する回答の正解点数とを特定するために必要になる年齢情報及び上記質問に対する回答の情報を、タッチパネル式ディスプレーやキーボード等を介して文字列として取得しても良く、音声データとして取得しても良い。音声データとして取得された場合には、上記データ解析部が、年齢を問う質問に対する回答時の音声データから被検者の年齢を特定し、日時の見当識に関する質問に対する回答時の音声データから回答の正解点数を特定すると共に１種以上の音声特徴量を抽出し、得られたデータが上記推定結果出力部において被検者における軽度の認知機能障害の有無を推定するために使用される。 Note that in the present invention, the question regarding the orientation of the date and time is not limited to the question 2 of HDS-R, and may include, for example, the question regarding the season, and some of the questions regarding the year, month, day, and day of the week are omitted. It may be. Further, the data acquisition unit, the age information and the information of the answer to the question necessary to specify the correct answer score for the question regarding the orientation and orientation of the date and time for each subject, touch panel display or. It may be acquired as a character string via a keyboard or the like, or may be acquired as voice data. When acquired as voice data, the data analysis unit specifies the age of the subject from the voice data at the time of answering the question asking the age, and answers from the voice data at the time of answering the question regarding the orientation of the date and time. The correct answer points are specified and one or more kinds of voice feature quantities are extracted, and the obtained data is used in the estimation result output section to estimate the presence or absence of mild cognitive impairment in the subject.

本発明の軽度の認知機能障害の推定システム及び軽度の認知機能障害の推定装置により、被検者における軽度の認知機能障害の有無を簡便且つ迅速にしかも高い正答率で推定することができる。 The mild cognitive impairment estimation system and the mild cognitive impairment estimation device of the present invention enable the presence/absence of mild cognitive impairment in a subject to be estimated easily and quickly with a high correct answer rate.

第１の実施の形態の軽度の認知機能障害の推定システムの構成の概略を示す図である。It is a figure which shows the outline of a structure of the estimation system of the mild cognitive impairment of 1st Embodiment. 図１に示す軽度の認知機能障害の推定システムの機能ブロック図である。It is a functional block diagram of the estimation system of the mild cognitive impairment shown in FIG. 図２に示す識別器による機械学習のプロセスを示すフローチャートである。3 is a flowchart showing a machine learning process by the classifier shown in FIG. 2. 図１に示す軽度の認知機能障害の推定システムによる軽度の認知機能障害の推定のプロセスを示すフローチャートである。3 is a flowchart showing a process of estimating a mild cognitive impairment by the estimation system for a mild cognitive impairment shown in FIG. 1. 第２の実施の形態の軽度の認知機能障害の推定装置の機能ブロック図である。It is a functional block diagram of the estimation device of the mild cognitive impairment according to the second embodiment. ＨＤＳ−Ｒの各質問に対する正解点数のみを使用した自動識別における正答率と、各質問に対する回答時の音声データから抽出された音声特徴量のみを使用した自動識別における正答率と、年齢のみを使用した自動識別における正答率と、を比較した図である。Correct answer rate in automatic identification using only correct answer points for each question of HDS-R, correct answer rate in automatic identification using only voice feature amount extracted from voice data at the time of answering each question, and using only age It is the figure which compared with the correct answer rate in the automatic identification. 音声特徴量セットとしてｍｏｄｃｅｐを使用し、識別器として勾配ブースティング木を使用した場合の、ＨＤＳ−Ｒの質問２に対する回答の正解点数と回答時の音声データから抽出された音声特徴量と年齢とを使用した自動識別における正答率を、正解点数のみを使用した自動識別における正答率、正解点数と音声特徴量とを使用した自動識別における正答率、及び、正解点数と年齢とを使用した自動識別における正答率と比較した図である。When modcep is used as the voice feature set and a gradient boosting tree is used as the discriminator, the number of correct answers to the question 2 of HDS-R and the voice feature amount and age extracted from the voice data at the time of answer The correct answer rate in the automatic identification using, the correct answer rate in the automatic identification using only the correct answer score, the correct answer rate in the automatic identification using the correct answer score and the voice feature amount, and the automatic identification using the correct answer score and the age It is a figure compared with the correct answer rate in. 音声特徴量セットとしてｅｍｏｂａｓｅを使用し、識別器として勾配ブースティング木を使用した場合の、ＨＤＳ−Ｒの質問２に対する回答の正解点数と回答時の音声データから抽出された音声特徴量と年齢とを使用した自動識別における正答率を、正解点数のみを使用した自動識別における正答率、正解点数と音声特徴量とを使用した自動識別における正答率、及び、正解点数と年齢とを使用した自動識別における正答率と比較した図である。When using embase as a voice feature set and using a gradient boosting tree as a discriminator, the number of correct answers to the question 2 of HDS-R and the voice feature amount and age extracted from the voice data at the time of answer The correct answer rate in the automatic identification using, the correct answer rate in the automatic identification using only the correct answer score, the correct answer rate in the automatic identification using the correct answer score and the voice feature amount, and the automatic identification using the correct answer score and the age It is a figure compared with the correct answer rate in. 音声特徴量セットとしてｅｍｏｂａｓｅとｍｏｄｃｅｐとの結合ベクトルを使用し、識別器として勾配ブースティング木を使用した場合の、ＨＤＳ−Ｒの質問２に対する回答の正解点数と回答時の音声データから抽出された音声特徴量と年齢とを使用した自動識別における正答率を、正解点数のみを使用した自動識別における正答率、正解点数と音声特徴量とを使用した自動識別における正答率、及び、正解点数と年齢とを使用した自動識別における正答率と比較した図である。It is extracted from the correct answer score and the voice data at the time of answer to the question 2 of HDS-R when the combination vector of emobase and modcep is used as the voice feature set and the gradient boosting tree is used as the classifier. Correct answer rate in automatic identification using voice feature amount and age, correct answer rate in automatic identification using only correct answer score, correct answer rate in automatic identification using correct answer score and voice feature amount, and correct answer score and age It is a figure compared with the correct answer rate in automatic identification using and.

以下、本発明の軽度の認知機能障害の推定システム及び軽度の認知機能障害の推定装置の実施の形態について説明するが、本発明の軽度の認知機能障害の推定システム及び軽度の認知機能障害の推定装置は以下の実施の形態に限定されず、本発明の趣旨を逸脱しない範囲での変更が可能である。 Hereinafter, embodiments of the estimation system for mild cognitive impairment and the estimation device for mild cognitive impairment according to the present invention will be described. However, the estimation system for mild cognitive impairment according to the present invention and the estimation for mild cognitive impairment The device is not limited to the following embodiments, and modifications can be made without departing from the spirit of the present invention.

第１の実施の形態
図１は、本実施の形態の軽度の認知機能障害の推定システム１の構成の概略を示した図である。軽度の認知機能障害の推定システム１は、このシステム１を利用する被検者に関する情報を入出力するための被検者端末２０と、被検者端末２０から提供された情報を基に上記被検者における軽度の認知機能障害の有無を推定して推定結果を被検者端末２０に提供する検査サーバ１０と、を含んでおり、検査サーバ１０と被検者端末２０とはインターネット回線等の通信回線３０により接続されている。検査サーバ１０は、演算処理部、通信部、記憶部等を備えた一般的なコンピュータにより構成されており、記憶部に記憶されているソフトウェア（処理プログラム）との協働により、検査サーバ１０として動作するように構成されている。被検者端末２０は、演算処理部、通信部、データの入出力のためのタッチパネル式ディスプレー２０ａ、音声入力用マイク２０ｂ、及び音声出力用スピーカ２０ｃ等を備えた一般的なコンピュータにより構成されており、検査サーバ１０から被検者端末２０に提供されるアプリケーションソフトにより被検者端末２０として動作するように構成されている。 1st Embodiment FIG. 1: is the figure which showed the outline of a structure of the estimation system 1 of the mild cognitive impairment according to this Embodiment. The estimation system 1 for mild cognitive impairment includes a subject terminal 20 for inputting/outputting information about a subject who uses the system 1, and the subject terminal 20 based on the information provided from the subject terminal 20. The examination server 10 that estimates the presence or absence of mild cognitive impairment in the examiner and provides the estimation result to the subject terminal 20, and the examination server 10 and the subject terminal 20 are connected to each other via an Internet line or the like. It is connected by a communication line 30. The inspection server 10 is configured by a general computer including an arithmetic processing unit, a communication unit, a storage unit, and the like, and cooperates with software (processing program) stored in the storage unit to serve as the inspection server 10. Is configured to work. The examinee's terminal 20 is composed of a general computer including an arithmetic processing unit, a communication unit, a touch panel display 20a for inputting/outputting data, a voice input microphone 20b, a voice output speaker 20c, and the like. The application software provided from the examination server 10 to the subject terminal 20 is configured to operate as the subject terminal 20.

図２は、図１に示す検査サーバ１０、被検者端末２０及び通信回線３０から構成される軽度の認知機能障害の推定システム１の機能ブロック図である。理解の容易のため、被検者端末２０は一台のみを示している。 FIG. 2 is a functional block diagram of the estimation system 1 for the mild cognitive impairment, which includes the examination server 10, the subject terminal 20, and the communication line 30 illustrated in FIG. For easy understanding, only one subject terminal 20 is shown.

被検者端末２０は、データ入力部２１と推定結果表示部２２とを必須の構成要素として有している。データ入力部２１は、タッチパネル式ディスプレー２０ａ上の表示及び／又はスピーカ２０ｃからの音声出力を介して被検者の年齢を問う質問と日時の見当識に関する質問とを提示し、被検者の年齢に関する情報と上記日時の見当識に関する質問に対する被検者の回答に関する情報とをタッチパネル式ディスプレー２０ａを介して取得すると共に、上記日時の見当識に関する質問に対する被検者の回答時の音声データをマイク２０ｂを介して取得し、取得したデータを被検者情報として検査サーバ１０に通信回線３０を介して送信するものである。推定結果表示部２２は、検査サーバ１０から通信回線３０を介して送信された、上記被検者における軽度の認知機能障害の有無に関する推定結果を、タッチパネル式ディスプレー２０ａ上の表示及び／又はスピーカ２０ｃからの音声出力を介して上記被検者に提示するものである。 The subject terminal 20 has a data input unit 21 and an estimation result display unit 22 as essential components. The data input unit 21 presents a question asking the age of the subject and a question as to the orientation of the date and time through the display on the touch panel display 20a and/or the voice output from the speaker 20c, and the age of the subject is displayed. Information about the subject and the answer to the question regarding the orientation regarding the date and time is acquired through the touch panel display 20a, and the voice data when the subject responds to the question regarding the orientation regarding the date and time is microphoned. 20b, and the acquired data is transmitted to the inspection server 10 via the communication line 30 as subject information. The estimation result display unit 22 displays the estimation result regarding the presence or absence of mild cognitive impairment in the subject, which is transmitted from the examination server 10 via the communication line 30, on the touch panel display 20a and/or the speaker 20c. It is presented to the subject through a voice output from.

検査サーバ１０は、学習データ記憶部１２と識別器１３とを有する学習処理部１１と、データ取得部１４と、音声特徴量抽出部１６と正解点数確認部１７と年齢情報確認部１８を有するデータ解析部１５と、推定結果出力部１９とを必須の構成要素として有している。 The inspection server 10 has a learning processing unit 11 having a learning data storage unit 12 and a discriminator 13, a data acquisition unit 14, a voice feature amount extraction unit 16, a correct answer score confirmation unit 17, and an age information confirmation unit 18. The analysis unit 15 and the estimation result output unit 19 are included as essential components.

学習処理部１１は、被検者についての年齢と日時の見当識に関する質問に対する回答の正解点数と上記質問に対する回答時の音声データから抽出された１種以上の音声特徴量とが識別器１３に入力された際に、被検者における軽度の認知機能障害の有無を出力しうるように、識別器１３に対して機械学習を施すものである。被検者についての年齢の特定と上記質問に対する回答の正解点数の特定と上記質問に対する回答時の音声データからの上記１種以上の音声特徴量の抽出とは、後述するデータ解析部１５によって行われる。学習データ記憶部１２には、機械学習のための学習データとして、健常者及び軽度の認知機能障害発症者のそれぞれについての、年齢と、被検者に対して提示されたものと同一の日時の見当識に関する質問に対する回答の正解点数と、上記質問に対する回答時の音声データから抽出された上記１種以上の音声特徴量と、から成るデータが軽度の認知機能障害の有無に関するデータと関連付けられて記憶されている。 The learning processing unit 11 stores in the discriminator 13 the correct answer score for the question regarding the age and date orientation regarding the subject, and one or more types of voice feature amount extracted from the voice data at the time of answering the question. When input, the machine learning is performed on the discriminator 13 so that the presence or absence of mild cognitive impairment in the subject can be output. Identification of the age of the subject, identification of the correct answer score for the question, and extraction of the one or more types of voice feature amount from the voice data at the time of answering the question are performed by the data analysis unit 15 described later. Be seen. The learning data storage unit 12 stores, as learning data for machine learning, the age and the same date and time as that presented to the subject for each of a healthy person and a person with mild cognitive impairment. The data consisting of the correct answer score for the question regarding orientation and the one or more types of voice feature amounts extracted from the voice data at the time of answering the question are associated with the data regarding the presence or absence of mild cognitive impairment. Remembered

上記日時の見当識に関する質問に対する回答時の音声データから抽出される音声特徴量は、１種以上であれば良く、複数種類の音声特徴量から成る公知の音声特徴量セットも特に限定なく使用することができる。このような音声特徴量セットとしては、非特許文献１，３において採用されているｅｍｏｂａｓｅ及び非特許文献３において採用されているｍｏｄｃｅｐに加えて、ａｖｅｃ２０１１、ａｖｅｃ２０１３、ｅｍｏ＿ｌａｒｇｅ、ｅｍｏｂａｓｅ２０１０、ＩＳ０９＿ｅｍｏｔｉｏｎ、ＩＳ１０＿ｐａｒａｌｉｎｇ、ＩＳ１０＿ｐａｒａｌｉｎｇ＿ｃｏｍｐａｔ、ＩＳ１１＿ｓｐｅａｋｅｒ＿ｓｔａｔｅ、ＩＳ１２＿ｓｐｅａｋｅｒ＿ｔｒａｉｔ、ＩＳ１２＿ｓｐｅａｋｅｒ＿ｔｒａｉｔ＿ｃｏｍｐａｔ、ＩＳ１３＿ＣｏｍＰａｒＥ、Ｅｓｓｅｎｔｉａｄｅｓｃｒｉｐｔｏｒｓ、ＭＰＥＧ７ｄｅｓｃｒｉｐｔｏｒｓ、ＫＴＵｆｅａｔｕｒｅｓ、ｊＡｕｄｉｏｆｅａｔｕｒｅｓ、ＹＡＡＦＥｆｅａｔｕｒｅｓ、Ｔｓａｎａｓｆｅａｔｕｒｅｓなどが挙げられる。これらの音声特徴量セットに含まれる音声特徴量の結合ベクトル、例えばｅｍｏｂａｓｅとｍｏｄｃｅｐとの結合ベクトル、から成るセットを音声特徴量セットとして使用することもできる。 The voice feature amount extracted from the voice data at the time of answering the question regarding the orientation of the date and time may be one or more types, and a known voice feature amount set including a plurality of types of voice feature amounts is also used without particular limitation. be able to. As such an audio feature amount set, in addition to the demobase adopted in Non-Patent Documents 1 and 3 and the modcep adopted in Non-Patent Document 3, avec2011, avec2013, emo_lage, emobase2010, IS09_emotion, IS10_paraling, IS10_paring. , IS11_speaker_state, IS12_speaker_trait, IS12_speaker_trait_compat, IS13_ComParE, Essentia descriptors, MPEG7 descriptors, KTu featuresEatures, jAudi, and so on. It is also possible to use a set of a combination vector of voice feature quantities included in these voice feature quantity sets, for example, a combination vector of emobase and modcep, as the voice feature quantity set.

識別器としては、２値分類タスクのために使用可能な公知の識別器を特に限定なく使用することができ、ロジスティック回帰、サポートベクトルマシーン、決定木、ランダムフォレスト、勾配ブースティング木、ランダムフォレストと勾配ブースティング木とを組み合わせたアンサンブル学習であるＸＧブースティング、パーセプトロン、畳み込みニューラルネットワーク、再起型ニューラルネットワーク、残差ネットワーク、単純ベイズ、ｋ−近似法などの教師あり学習法において使用される識別器が挙げられる。識別器としてランダムフォレスト、勾配ブースティング、又はＸＧブースティングを使用すると、比較的少ない数の学習データであっても高い正答率を得ることができるため好ましい。 As the classifier, known classifiers that can be used for the binary classification task can be used without particular limitation, and logistic regression, support vector machines, decision trees, random forests, gradient boosting trees, random forests, etc. Discriminator used in supervised learning methods such as XG boosting, perceptron, convolutional neural network, recursive neural network, residual network, naive Bayes, k-approximation method, which are ensemble learning combined with gradient boosting tree Is mentioned. It is preferable to use random forest, gradient boosting, or XG boosting as the classifier because a high correct answer rate can be obtained even with a relatively small number of learning data.

学習処理部１１は、軽度の認知機能障害の推定システム１が稼働する前の準備段階において動作し、上記学習データを用いた機械学習により識別器１３を最適化する役割を果たす。図３は、学習処理部１１が行う処理を示すフローチャートを示している。ステップＳ１０１において、学習処理部１１は、学習処理のために使用していない未処理の学習データ、すなわち、健常者及びの軽度の認知機能障害発症者のそれぞれについての、年齢と、日時の見当識に関する質問に対する回答の正解点数と、上記質問に対する回答時の音声データから抽出された１種以上の音声特徴量と、軽度の認知機能障害の発症の有無と、が関連付けられたデータがあるか否かをチェックし、未処理の学習データがある場合にはステップＳ１０２に処理を進め、未処理の学習データがない場合には、ステップＳ１０４に処理を進めて一連の処理を終了する。未処理の学習データがある場合、すなわち、学習処理部１１の最初の稼働の前に学習データ記憶部１２に記憶されている学習データや、最初の稼働後に学習データ記憶部１２に追加された学習データがある場合には、学習処理部１１は、ステップＳ１０２において、未処理の学習データを学習データ記憶部１２から取得し、ステップＳ１０３において、識別器１３が出力する健常／軽度の認知機能障害の出力結果と正解値と差が最小になるように学習処理を施す。なお、学習処理に先だって、学習データを構成する年齢、正解点数及び各音声特徴量のデータに対してそれぞれ平均値を０、標準偏差を１とする白色化処理が施される。このステップＳ１０３の後、学習処理部１１は再度ステップＳ１０１に戻って処理を継続する。 The learning processing unit 11 operates in a preparatory stage before the estimation system 1 for mild cognitive impairment is activated, and plays a role of optimizing the discriminator 13 by machine learning using the learning data. FIG. 3 shows a flowchart showing the processing performed by the learning processing unit 11. In step S101, the learning processing unit 11 recognizes the unprocessed learning data not used for the learning process, that is, the age and the date and time of each of the healthy person and the mild cognitive impairment onset person. Whether there is data that correlates the correct answer score for the question, the one or more types of voice features extracted from the voice data at the time of answering the question, and the presence/absence of mild cognitive impairment. If there is unprocessed learning data, the process proceeds to step S102. If there is no unprocessed learning data, the process proceeds to step S104, and the series of processes ends. When there is unprocessed learning data, that is, the learning data stored in the learning data storage unit 12 before the first operation of the learning processing unit 11 or the learning data added to the learning data storage unit 12 after the first operation. If there is data, the learning processing unit 11 acquires unprocessed learning data from the learning data storage unit 12 in step S102, and in step S103, the normal/mild cognitive impairment of the healthy/mild cognitive impairment is output. Learning processing is performed so that the difference between the output result and the correct value is minimized. Prior to the learning process, the whitening process with an average value of 0 and a standard deviation of 1 is performed on the data of the age, the number of correct answers, and each voice feature amount forming the learning data. After step S103, the learning processing unit 11 returns to step S101 again and continues the processing.

データ取得部１４は、被検者端末２０から通信回線３０を介して送信された、被検者の年齢に関する情報と、上記日時の見当識に関する質問に対する被検者の回答に関する情報と、上記日時の見当識に関する質問に対する被検者の回答時の音声データと、から成る被検者情報を受信し、データ解析部１５に送信するものである。 The data acquisition unit 14 includes information regarding the age of the subject, which is transmitted from the subject terminal 20 via the communication line 30, information regarding the subject's answer to the question regarding the orientation regarding the date and time, and the date and time described above. The subject information consisting of voice data at the time of the subject's answer to the question regarding the orientation is received and transmitted to the data analysis unit 15.

データ解析部１５は、音声特徴量抽出部１６と正解点数確認部１７と年齢情報確認部１８とを有している。音声特徴量抽出部１６は、日時の見当識に関する質問に対する被検者の回答時の音声データから上記１種以上の音声特徴量を抽出するものである。正解点数確認部１７は、日時の見当識に関する質問に対する被検者の回答の情報を基にして回答の正解点数を特定するものである。年齢情報確認部１８は、被検者の年齢情報を基にして被検者の年齢を特定するものである。データ解析部１５は、音声特徴量抽出部１６と正解点数確認部１７と年齢情報確認部１８とにそれぞれの処理を実行させ、実行結果を受信した後、処理が終了したことを推定結果出力部１９に送信する。 The data analysis unit 15 includes a voice feature amount extraction unit 16, a correct answer score confirmation unit 17, and an age information confirmation unit 18. The voice feature amount extraction unit 16 extracts one or more types of voice feature amounts from voice data at the time of a subject's answer to a question regarding orientation regarding date and time. The correct answer score confirming unit 17 identifies the correct answer score based on the information of the answer of the subject to the question regarding the orientation of the date and time. The age information confirmation unit 18 identifies the age of the subject based on the age information of the subject. The data analysis unit 15 causes the speech feature amount extraction unit 16, the correct answer score confirmation unit 17, and the age information confirmation unit 18 to execute respective processes, receives the execution results, and then estimates that the processes have ended. Send to 19.

推定結果出力部１９は、データ解析部１５からの処理終了の信号を受信した後、データ解析部１５が取得した上記被検者についての年齢と上記正解点数と上記１種以上の音声特徴量のそれぞれのデータを読み出し、読み出したデータを識別器１３に入力し、識別器１３に上記被検者についての健常／軽度の認知機能障害の二値分類タスクを実行させ、識別器１３による出力結果を推定結果として被検者端末２０に送信するものである。 The estimation result output unit 19 receives the signal indicating the end of processing from the data analysis unit 15 and then calculates the age of the subject, the correct answer score, and the one or more types of voice feature amounts acquired by the data analysis unit 15. Each data is read, the read data is input to the discriminator 13, the discriminator 13 is made to execute the binary classification task of the healthy/mild cognitive impairment for the subject, and the output result by the discriminator 13 is obtained. The estimation result is transmitted to the subject terminal 20.

次に、軽度の認知機能障害の推定システム１における具体的な処理について説明する。図４は、軽度の認知機能障害の推定システム１による被検者における軽度の認知機能障害の有無の推定のプロセスを示すフローチャートである。以下では、被検者が補助者の支援を受けて軽度の認知機能障害の推定システム１を利用することを仮定して説明するが、被検者が単独でこのシステム１を利用することができることはもちろんである。 Next, a specific process in the estimation system 1 for mild cognitive impairment will be described. FIG. 4 is a flowchart showing a process of estimating the presence or absence of mild cognitive impairment in a subject by the estimation system 1 for mild cognitive impairment. In the following, description will be made assuming that the subject uses the estimation system 1 for mild cognitive impairment with the assistance of an assistant, but that the subject can independently use the system 1. Of course.

軽度の認知機能障害の推定システム１の利用を希望する被検者を支援する補助者が被検者端末２０を操作してこのシステム１にアクセスすると、被検者端末２０のデータ入力部２１は、タッチパネル式ディスプレー２０ａ上の表示及び／又はスピーカ２０ｃからの音声出力を介してシステム１の説明をした上で、「まずあなたの年齢を教えてください。」或いは「まずあなたの生年月日を教えてください。」というような被検者の年齢を問う質問を、ディスプレー２０ａ上の表示及び／又はスピーカ２０ｃからの音声出力を介して被検者及び補助者に提示する（Ｓ１）。これに対し、補助者は上記被検者の年齢を問う質問に対応する情報をタッチパネル式ディスプレー２０ａに表示された入力用画像の表示に従ってデータ入力部２１に入力する（Ｓ２）。補助者は被検者の年齢や生年月日についての正確な情報を予め保険証等を参照して確認しておくのが好ましい。次いで、データ入力部２１は、タッチパネル式ディスプレー２０ａ上の表示及び／又はスピーカ２０ｃからの音声出力を介して、「次の質問に対する答えを声で教えてください」と説明した上で、「今日は何年の何月何日ですか？」、「何曜日ですか？」というような日時の見当識に関する質問を被検者及び補助者に提示する（Ｓ３）。これに対し、被検者は、「平成○○年の〇○月○○日です。」、「○曜日です。」というような上記質問に対する回答を、マイク２０ｂを介してデータ入力部２１に入力する（Ｓ４）。このとき、補助者は被検者にマイク２０ｂを向けるなどして回答の入力を促すことが好ましい。 When an assistant who supports a subject who desires to use the system 1 for estimating mild cognitive impairment operates the subject terminal 20 to access the system 1, the data input unit 21 of the subject terminal 20 , After explaining the system 1 through the display on the touch panel display 20a and/or the voice output from the speaker 20c, "Please tell me your age first." or "Tell me your birth date first." A question asking the age of the subject such as "Please." is presented to the subject and the assistant through the display on the display 20a and/or the voice output from the speaker 20c (S1). On the other hand, the assistant inputs information corresponding to the question about the age of the subject to the data input unit 21 according to the display of the input image displayed on the touch panel display 20a (S2). It is preferable that the assistant confirms accurate information about the age and date of birth of the subject by referring to the insurance card or the like in advance. Next, the data input section 21 explains, “Voice the answer to the next question,” via the display on the touch panel display 20a and/or the voice output from the speaker 20c, and then “today Questions about orientation such as "what day of the year and what day of the year?" and "what day of the week?" are presented to the subject and the assistant (S3). On the other hand, the subject answers to the above-mentioned questions such as "It is XX month XX day of Heisei xx" and "It is a day of the week" in the data input section 21 through the microphone 20b. Input (S4). At this time, it is preferable that the assistant prompts the examinee to input an answer by pointing the microphone 20b at the examinee.

次いで、データ入力部２１はさらに、タッチパネル式ディスプレー２０ａ上の表示及び／又はスピーカ２０ｃからの音声出力を介して、「確認のため、先ほどの質問に対する回答を画面の表示に従って入力して下さい」というような、日時の見当識に関する質問に対する被検者の回答を確認するための情報の入力を促し、補助者はタッチパネル式ディスプレー２０ａに表示された入力用画像の表示に従って被検者の回答の情報をデータ入力部２１に入力する（Ｓ５）。確認のための情報の入力は、「平成○○年〇○月○○日」、「○曜日」というような音声による回答をそのまま繰り返すものであっても良く、例えば年、月、日、曜日の一つの回答だけが誤っていた場合に「○，○，○，×」のような回答の正誤を入力するものであっても良い。データ入力部２１は、ステップＳ２，Ｓ４，Ｓ５で得られた情報を被検者情報としてまとめて検査サーバ１０に通信回線３０を介して送信する（Ｓ６）。 Then, the data input unit 21 further says, “For confirmation, enter the answer to the question above according to the display on the screen” via the display on the touch panel display 20a and/or the voice output from the speaker 20c. Such as, prompting the input of information for confirming the subject's answer to the question regarding the orientation of the date and time, the assistant is the information of the subject's answer according to the display of the input image displayed on the touch panel display 20a. Is input to the data input unit 21 (S5). The confirmation information may be input by repeating the voice response such as "Heisei xx year xx month xx day" and "○ day of the week" as is, for example, year, month, day, day of the week. If only one answer is wrong, the correctness of the answer such as "○, ○, ○, ×" may be input. The data input unit 21 collects the information obtained in steps S2, S4, and S5 as subject information and transmits it to the inspection server 10 via the communication line 30 (S6).

検査サーバ１０におけるデータ取得部１４は、被検者端末２０からの上記被検者についての被検者情報を受信してデータ解析部１５に送信する（Ｓ７）。データ解析部１５は、まず、音声特徴量抽出部１６に対し、上記被検者情報のうちの日時の見当識に関する質問に対する回答時の音声データを送信し、この音声データから上記被検者についての１種以上の音声特徴量を抽出させ、抽出結果を受信する（Ｓ８）。音声特徴量抽出部１６は、学習データ記憶部１２に記憶されている１種以上の音声特徴量と同一の種類の音声特徴量を、この音声特徴量に対応する抽出ルールに従って抽出する。次いで、データ解析部１５は、正解点数確認部１７に対し、上記被検者情報のうちの日時の見当識に関する質問に対する回答の情報を送信し、上記被検者についての回答の正解点数を特定させ、特定結果を受信する（Ｓ９）。正解点数確認部１７は、送信された回答の情報の形式に従った確認ルールに従って正解点数を特定する。ＨＤＳ−Ｒの質問２が日時の見当識に関する質問として採用された場合には、年、月、日、曜日のそれぞれに各１点が割り振られ、正解を得た回答の合計得点が正解点数となるが、「平成○○年〇○月○○日」、「○曜日」のような形式で回答の情報が送信された場合には、回答における年、月、日、曜日と検査日の年、月、日、曜日とが対比されて正誤が判断されて正解点数が特定され、「○，○，○，×」のような形式で回答の情報が送信された場合には、正解（○）の数が正解点数として特定される。次いで、データ解析部１５は、年齢情報確認部１８に対し、上記被検者情報のうちの年齢の情報を送信し、上記被検者の年齢を特定させ、特定結果を受信する（Ｓ１０）。年齢情報確認部１８は、送信された年齢の情報の形式に従った確認ルールに従って年齢を特定する。年齢そのものが送信された場合には、送信された年齢がそのまま被検者の年齢として特定され、年齢の情報が生年月日の形式で送信された場合には、検査日と生年月日との差から年齢が特定される。 The data acquisition unit 14 in the inspection server 10 receives the subject information about the subject from the subject terminal 20 and transmits it to the data analysis unit 15 (S7). The data analysis unit 15 first transmits to the voice feature amount extraction unit 16 voice data at the time of answering a question regarding the orientation of the date and time in the subject information, and from this voice data, One or more types of voice feature quantities are extracted and the extraction result is received (S8). The voice feature amount extraction unit 16 extracts a voice feature amount of the same type as one or more types of voice feature amounts stored in the learning data storage unit 12 according to an extraction rule corresponding to this voice feature amount. Next, the data analysis unit 15 transmits the information of the answer to the question regarding the orientation of the date and time in the subject information to the correct score confirmation unit 17, and identifies the correct answer score for the subject. Then, the identification result is received (S9). The correct answer score confirmation unit 17 specifies the correct answer score according to a confirmation rule according to the format of the information of the transmitted answer. When HDS-R question 2 is adopted as a question regarding the orientation of the date and time, 1 point is assigned to each of the year, month, day, and day of the week, and the total score of the answers that have obtained the correct answer is the correct answer score. However, if the response information is sent in a format such as “Heisei xx year xx month xx day” or “xx day of the week”, the year, month, day, day of the week in the response and year of inspection day , Month, day, day of the week are compared to determine the correct answer, the correct answer score is specified, and if the answer information is sent in a format such as “○, ○, ○, ×”, the correct answer (○ ) Is specified as the correct answer score. Next, the data analysis unit 15 transmits the age information of the subject information to the age information confirmation unit 18, causes the age of the subject to be identified, and receives the identification result (S10). The age information confirmation unit 18 specifies the age according to a confirmation rule according to the format of the transmitted information on the age. When the age itself is sent, the sent age is directly specified as the age of the subject, and when the age information is sent in the form of the date of birth, the date of examination and the date of birth are combined. Age is specified from the difference.

次いで、データ解析部１５が推定結果出力部１９に対して処理が終了したことを示す信号を送信すると、推定結果出力部１９は、データ解析部１５が取得した上記被検者についての年齢と上記正解点数と上記１種以上の音声特徴量とを上述した学習済みの識別器１３に入力し（Ｓ１１）、識別器１３に健常／軽度の認知機能障害の二値分類タスクを実行させ、識別器１３による出力結果を受信し（Ｓ１２）、被検者端末２０に対して通信回線３０を介して出力結果を送信する（Ｓ１３）。 Next, when the data analysis unit 15 transmits a signal indicating that the processing has been completed to the estimation result output unit 19, the estimation result output unit 19 causes the data analysis unit 15 to acquire the age of the subject and the above. The correct answer score and the one or more kinds of speech feature quantities are input to the learned discriminator 13 (S11), and the discriminator 13 is caused to execute a binary classification task for healthy/mild cognitive impairment, and the discriminator The output result by 13 is received (S12), and the output result is transmitted to the subject terminal 20 via the communication line 30 (S13).

被検者端末２０の推定結果表示部２２は、上記被検者における軽度の認知機能障害の有無に関する出力結果を受信し（Ｓ１４）、出力結果を軽度の認知機能障害の推定システム１における推定結果として、被検者端末２０のタッチパネル式ディスプレー２０ａ上の表示及び／又はスピーカ２０ｃからの音声出力を介して被検者及び補助者に提示し（Ｓ１５）、処理を終了する。被検者の認知機能が健常であるとの出力結果が得られた場合には、「あなたの認知機能には問題がありません。」というような推定結果が被検者及び補助者に提示され、被検者に軽度の認知機能障害が認められるとの出力結果が得られた場合には、「あなたの認知機能が低下している可能性があります。専門家による診察をお勧めします。」というような推定結果が被検者及び補助者に提示される。 The estimation result display unit 22 of the subject terminal 20 receives the output result regarding the presence or absence of mild cognitive impairment in the subject (S14), and outputs the output result in the estimation system 1 for mild cognitive impairment. As a result, it is presented to the subject and the assistant through the display on the touch panel display 20a of the subject terminal 20 and/or the voice output from the speaker 20c (S15), and the process is ended. If an output result that the cognitive function of the subject is healthy is obtained, an estimation result such as "There is no problem with your cognitive function" is presented to the subject and the assistant, If you get the output that the subject has mild cognitive impairment, "Your cognitive decline may be present. We recommend that you consult a specialist." Such an estimation result is presented to the subject and the assistant.

第２の実施の形態
本発明の第２の実施の形態は、通信回線を介さずに単一の装置として働く軽度の認知機能障害の推定装置である。軽度の認知機能障害の推定装置４０は、演算処理部、記憶部、データの入出力のためのタッチパネル式ディスプレー、音声入力用マイク、音声出力用スピーカ等を有する一般的なコンピュータシステムにより構成されており、記憶部に記憶されているソフトウェア（処理プログラム）との協働により、軽度の認知機能障害の推定装置４０として動作するように構成されている。 Second Embodiment A second embodiment of the present invention is a device for estimating mild cognitive impairment, which works as a single device without going through a communication line. The device 40 for estimating a mild cognitive impairment comprises a general computer system having an arithmetic processing unit, a storage unit, a touch panel display for inputting/outputting data, a voice input microphone, a voice output speaker, and the like. In cooperation with software (processing program) stored in the storage unit, the apparatus is configured to operate as the estimation device 40 for mild cognitive impairment.

図５は、軽度の認知機能障害の推定装置４０の機能ブロック図である。軽度の認知機能障害の推定装置４０は、学習データ記憶部４２と識別器４３とを有する学習処理部４１と、データ取得部４４と、音声特徴量抽出部４６と正解点数確認部４７と年齢情報確認部４８を有するデータ解析部４５と、推定結果出力部４９とを必須の構成要素として有している。このうち、学習データ記憶部４２と識別器４３とを有する学習処理部４１は、第１の実施の形態の軽度の認知機能障害の推定システム１の検査サーバ１０における学習データ記憶部１２と識別器１３とを有する学習処理部１１と同じ機能を有しており、音声特徴量抽出部４６と正解点数確認部４７と年齢情報確認部４８を有するデータ解析部４５は、第１の実施の形態の軽度の認知機能障害の推定システム１の検査サーバ１０における音声特徴量抽出部１６と正解点数確認部１７と年齢情報確認部１８を有するデータ解析部１５と同じ機能を有している。 FIG. 5 is a functional block diagram of the estimation device 40 for mild cognitive impairment. The apparatus 40 for estimating mild cognitive impairment includes a learning processing unit 41 having a learning data storage unit 42 and a discriminator 43, a data acquisition unit 44, a voice feature amount extraction unit 46, a correct answer score confirmation unit 47, and age information. It has a data analysis unit 45 having a confirmation unit 48 and an estimation result output unit 49 as essential components. Of these, the learning processing unit 41 having the learning data storage unit 42 and the discriminator 43 is the learning data storage unit 12 and the discriminator in the examination server 10 of the estimation system 1 for the mild cognitive impairment according to the first embodiment. The data analysis unit 45, which has the same function as the learning processing unit 11 including 13 and includes the voice feature amount extraction unit 46, the correct answer score confirmation unit 47, and the age information confirmation unit 48, is the same as that of the first embodiment. It has the same function as the data analysis unit 15 including the voice feature amount extraction unit 16, the correct score confirmation unit 17, and the age information confirmation unit 18 in the inspection server 10 of the estimation system 1 for the mild cognitive impairment.

データ取得部４４は、タッチパネル式ディスプレー上の表示及び／又はスピーカからの音声出力を介して被検者の年齢を問う質問と日時の見当識に関する質問とを提示し、被検者の年齢に関する情報と、上記日時の見当識に関する質問に対する被検者の回答に関する情報とをタッチパネル式ディスプレーを介して取得し、上記日時の見当識に関する質問に対する被検者の回答時の音声データをマイクを介して取得し、取得したデータを被検者情報としてデータ解析部４５に送信するものである。推定結果出力部４９は、データ解析部４５からの処理終了の信号を受信した後、データ解析部４５が取得した上記被検者についての年齢と上記日時の見当識に関する質問に対する回答の正解点数と上記日時の見当識に関する質問に対する回答時の音声データから抽出された１種以上の音声特徴量とを読み出し、読み出したデータを識別器４３に入力し、識別器４３に上記被検者についての健常／軽度の認知機能障害の二値分類タスクを実行させ、識別器１３による出力結果を推定結果として、タッチパネル式ディスプレー上の表示及び／又はスピーカからの音声出力を介して提示するものである。 The data acquisition unit 44 presents a question regarding the age of the subject and a question regarding the orientation of the date and time through the display on the touch panel display and/or the voice output from the speaker, and the information regarding the age of the subject. And the information about the response of the subject to the question regarding the orientation of the date and time is obtained through the touch panel display, and the voice data at the time of the response of the subject to the question regarding the orientation of the date and time is input via the microphone. The acquired data is transmitted to the data analysis unit 45 as subject information. The estimation result output unit 49 receives the signal indicating the end of processing from the data analysis unit 45, and then obtains the number of correct answers to the question regarding the age and the orientation regarding the subject acquired by the data analysis unit 45. At least one type of voice feature amount extracted from the voice data at the time of answering the question regarding the orientation regarding the date and time is read, the read data is input to the discriminator 43, and the discriminator 43 receives the normal sound of the subject. / The binary classification task of mild cognitive impairment is executed, and the output result by the discriminator 13 is presented as the estimation result via the display on the touch panel display and/or the voice output from the speaker.

第２の実施の形態の軽度の認知機能障害の推定装置４０は、第１の実施の形態の軽度の認知機能障害の推定システム１と比較して、データ取得部４４が通信回線を介して被検者端末から提供された被検者情報を受信せず、データ取得部４４が自ら被検者情報を取得する点と、推定結果出力部４９が被検者における軽度の認知機能障害の有無の推定結果を通信回線を介して被検者端末に送信せず、推定結果出力部４９が自ら軽度の認知機能障害の推定装置４０に備えられているタッチパネル式ディスプレー及び／又はスピーカを介して推定結果を提示する点を除いて同一であるため、これ以上の説明を省略する。 The estimation device 40 for the mild cognitive impairment according to the second embodiment is different from the estimation system 1 for the mild cognitive impairment according to the first embodiment in that the data acquisition unit 44 receives data via a communication line. The point that the data acquisition unit 44 acquires the subject information by itself without receiving the subject information provided from the examiner terminal, and the estimation result output unit 49 indicates whether the subject has a mild cognitive impairment. The estimation result is not transmitted to the subject terminal via the communication line, and the estimation result output unit 49 itself estimates the estimation result via the touch panel display and/or the speaker provided in the estimation device 40 for the mild cognitive impairment. Since they are the same except that they are presented, a further description will be omitted.

変形形態
第１の実施の形態の軽度の認知機能障害の推定システム１及び第２の実施の形態の軽度の認知機能障害の推定装置４０の変形形態では、データ取得部１４，４４によって、被検者についての年齢情報と日時の見当識に関する質問に対する被検者の回答とが音声データによって取得される。この場合には、図４のステップＳ２において、被検者についての年齢情報をタッチパネル式ディスプレーからの入力によって得るプロセスに代えて音声によって得るプロセスが実行され、また、図４のステップＳ５のプロセス、すなわち、日時の見当識に関する質問に対する回答をステップＳ４において音声データで得た後にタッチパネル式ディスプレーからの入力によって確認するプロセスは不要である。この変形形態では、正解点数確認部１７，４７が、日時の見当識に関する質問に対する被検者の回答時の音声データにＳ−ＪＮＡＳなどに基づく高齢者用音声認識器を適用して音声データを文字列に変換した後、被検者における回答の正解点数を特定し、年齢情報確認部１８，４８が、被検者の年齢を問う質問に対する回答時の音声データに同様の音声認識器を適用して音声データを文字列に変換した後、被検者の年齢を特定する。 Modification In the modification of the estimation system 1 for the mild cognitive impairment according to the first embodiment and the estimation device 40 for the mild cognitive impairment according to the second embodiment, the data acquisition units 14 and 44 are used to perform the examination. Age information about the person and the answer of the subject to the question regarding the orientation of the date and time are acquired by the voice data. In this case, in step S2 of FIG. 4, a process of obtaining age information about the subject by voice instead of the process of obtaining by input from the touch panel display is executed, and the process of step S5 of FIG. That is, there is no need for the process of obtaining the answer to the question regarding the orientation of the date and time by voice data in step S4 and then confirming the answer by input from the touch panel display. In this modification, the correct answer confirmation units 17 and 47 apply a voice recognition device for the elderly based on S-JNAS or the like to voice data at the time of the answer of the subject to the question regarding the orientation of the date and time to convert the voice data into voice data. After converting to a character string, the correct answer score of the subject is specified, and the age information confirmation units 18 and 48 apply the same voice recognizer to the voice data at the time of answering the question asking the age of the subject. After converting the voice data into a character string, the age of the subject is specified.

本発明を以下の実施例を用いて説明するが、本発明は以下の実施例に限定されない。 The present invention will be described using the following examples, but the present invention is not limited to the following examples.

（１）実験条件
以下に、本実験の参加者及び参加者におけるＨＤＳ−Ｒの総合得点の分布を示す。表中の軽度の認知機能障害発症者は、ＤＳＭ−５に基づき専門家により認知症発症者であると診断された者のうち、ＨＤＳ−Ｒの点数が１４〜２５点である者により構成されている。

(1) Experimental conditions The distribution of the total HDS-R score among the participants and participants of this experiment is shown below. The persons with mild cognitive impairment in the table are composed of persons whose HDS-R score is 14 to 25 among persons diagnosed as having dementia by an expert based on DSM-5. ing.

ＨＤＳ−Ｒの各質問に対する回答時の音声データの録音は、医療機関の診断室において、収録参加者（健常者又は軽度の認知機能障害発症者）と診断者の２者、若しくは、収録参加者の付き添いを含む３者により実施された。収録音声のサンプリング周波数は４８ｋＨｚ、ファイル形式はＲＩＦＦＷＡＶであり、音声分析時に１６ｋＨｚにダウンサンプリングして音声を正規化した。収録参加者の音声データは、収録音声データから手作業で切りだした。 Recording of voice data at the time of answering each question of HDS-R is performed by two persons, a recording participant (healthy person or a person with mild cognitive impairment) and a diagnostic person, or a recording participant in a diagnostic room of a medical institution. It was carried out by 3 people including the attendants. The sampling frequency of the recorded voice was 48 kHz, the file format was RIFF WAV, and the voice was normalized by downsampling to 16 kHz during voice analysis. The audio data of the recording participants was manually extracted from the recorded audio data.

表２に示されている健常者及び軽度の認知機能障害発症者のそれぞれにおける年齢、ＨＤＳ−Ｒの各質問に対する回答の正解点数、及び各質問に対する回答時の音声データから抽出された音声特徴量から成るデータを基にして、健常／軽度の認知機能障害の自動識別における正答率を評価した。音声データから抽出する音声特徴量を含む音声特徴量セットとしては、非特許文献１，３において採用されているｅｍｏｂａｓｅ、非特許文献３において採用されているｍｏｄｃｅｐ（但し、フーリエ変換長を２０４８、フレームシフトを５ｍｓｅｃに設定。音声特徴量の種類の数は２０。）、及び、上記ｅｍｏｂａｓｅと上記ｍｏｄｃｅｐとの結合ベクトルからなるセット（以下、「ｅｍｏｂａｓｅ＋ｍｏｄｃｅｐ」と表す。）の３種を用いた。また、音声特徴量を含むデータを用いた自動識別のための識別器として、勾配ブースティング木（ＧＢＣ）、ランダムフォレスト（ＲＦ）及びサポートベクトルマシーン（ＳＶＭ）の３種を用いた。これらの識別器はｓｃｉｋｉｔ−ｌｅａｒｎライブラリ（ｈｔｔｐｓ：／／ｇｉｔｈｕｂ．ｃｏｍ／ｓｃｉｋｉｔ−ｌｅａｒｎ／ｓｃｉｋｉｔ−ｌｅａｒｎ，ｖｅｒｓｉｏｎ０．１９．１）を用いて準備した。但し、各識別器のハイパーパラメータをチューニングする目的で、線形サポートベクトルマシーンのコストパラメータ、ＲＢＦカーネルサポートベクトルマシーンのコストパラメータ及びガンマパラメータ、ランダムフォレストの推定器の数及びクラスウェイト、勾配ブースティング木の推定器の数及び最大深度のそれぞれを最適化するためにグリッドサーチを使用した。 Ages of healthy subjects and persons with mild cognitive impairment shown in Table 2, correct answer scores for each question of HDS-R, and voice feature amount extracted from voice data at the time of answering each question Based on the data consisting of, the correct answer rate in the automatic discrimination of healthy/mild cognitive impairment was evaluated. As the audio feature amount set including the audio feature amount extracted from the audio data, there are an embbase adopted in Non-Patent Documents 1 and 3, a modcep adopted in Non-Patent Document 3 (however, Fourier transform length is 2048, frame The shift was set to 5 msec. The number of types of voice feature amount was 20.), and three types of a set (hereinafter, referred to as “emobase+modcep”) composed of a combined vector of the above-mentioned emobase and the above-mentioned modcep were used. Further, three types of gradient boosting tree (GBC), random forest (RF), and support vector machine (SVM) were used as discriminators for automatic discrimination using data including voice feature amounts. These discriminators were prepared using the scikit-learn library (https://github.com/scikit-learn/scikit-learn, version 0.19.1). However, for the purpose of tuning the hyperparameters of each classifier, the cost parameter of the linear support vector machine, the cost parameter and the gamma parameter of the RBF kernel support vector machine, the number and class weight of the estimator of the random forest, the gradient boosting tree A grid search was used to optimize each of the number of estimators and the maximum depth.

上記識別器を用いた自動識別の評価に当たっては、表２に示した健常者と軽度の認知機能障害発症者を任意の５グループに分け、４グループのデータを識別器に学習させるための学習データとして使用し、残りの１グループのデータを正答率の算出のために用いた。なお、識別器に学習用データを入力するにあたっては、年齢、正解点数、音声特徴量の各データを白色化した後に入力した。また、正答率の算出は、正答率の算出のためのグループを変えた評価を５回繰り返すクロスバリデーションにより行った。 In the evaluation of the automatic discrimination using the above-mentioned discriminator, the learning data for dividing the healthy subjects and the mild cognitive impairment onset shown in Table 2 into arbitrary 5 groups and allowing the discriminator to learn 4 groups of data The data of the remaining one group was used for calculating the correct answer rate. When inputting the learning data to the discriminator, the data of age, correct answer score, and voice feature amount were whitened and then input. In addition, the calculation of the correct answer rate was performed by cross validation in which evaluations with different groups for calculating the correct answer rate were repeated 5 times.

（２）予備検討：正解点数のみを用いた識別／年齢のみを用いた識別／音声特徴量のみを用いた識別
本発明による健常／軽度の認知機能障害の自動識別の評価に先だって、年齢のみを用いた場合の自動識別における正答率と、ＨＤＳ−Ｒの各質問に対する回答の正解点数のみを用いた場合の自動識別における正答率と、ＨＤＳ−Ｒの各質問に対する回答時の音声データから抽出された音声特徴量のみを用いた場合の自動識別における正答率と、を評価した。表３に、得られた正答率の値をまとめて示す。 (2) Preliminary examination: discrimination using only correct answer points/discrimination using only age/discrimination using only voice feature amount Prior to the evaluation of the automatic discrimination of healthy/mild cognitive impairment according to the present invention, only age is evaluated. Correct answer rate in automatic identification when used, correct answer rate in automatic identification when using only the correct answer score for each question of HDS-R, and extracted from voice data when answering each question of HDS-R The correct answer rate in the automatic identification when only the speech feature amount was used was evaluated. Table 3 shows the obtained values of the correct answer rate.

但し、機械学習は本来一次元のデータを用いた機械学習には適さないため、正解点数のみを用いた自動識別における各質問についての正答率の値及び年齢のみを用いた自動識別における正答率の値は、最も正答率が高くなるように識別のカットオフ値を定めた上で得た。仮に一次元のデータを用いて機械学習により正答率を得たとしても、表中の正答率を超えることは無いと考えられる。

However, since machine learning is originally not suitable for machine learning using one-dimensional data, the value of the correct answer rate for each question in automatic identification using only correct answer points and the correct answer rate in automatic identification using only age The value was obtained after defining the cutoff value for discrimination so that the correct answer rate was the highest. Even if the correct answer rate is obtained by machine learning using one-dimensional data, it is considered that the correct answer rate in the table is not exceeded.

図６には、音声特徴量セットしてｍｏｄｃｅｐを使用し、識別器として勾配ブースティング木（ＧＢＣ）を使用することによって得られた、ＨＤＳ−Ｒの各質問に対する回答時の音声データから抽出された音声特徴量のみを用いた場合の自動識別における正答率の値を、正解点数のみを用いた自動識別における各質問についての正答率の値及び年齢のみを用いた自動識別における正答率の値と共に示した。 In FIG. 6, voice features are set and modcep is used, and gradient boosting tree (GBC) is used as a discriminator to extract the voice data at the time of answering each question of HDS-R. The value of the correct answer rate in the automatic identification using only the voice feature amount, together with the value of the correct answer rate for each question in the automatic identification using only the correct answer score and the value of the correct answer rate in the automatic identification using only the age. Indicated.

表３或いは図６から把握されるように、ＨＤＳ−Ｒの各質問に対する回答の正解点数のみを用いた自動識別では、質問の種類に応じて正答率が大きく変化した。質問７、すなわち、３つの言葉の遅延再生に関する質問を用いた場合の正答率が最も高く、次いで、質問２、すなわち、日時の見当識に関する質問を用いた場合の正答率が高く、いずれも年齢のみを用いた場合の正答率よりも高かった。質問７に対する回答の正解点数のみを用いた場合の正答率は０．９にも達し、この質問が健常／軽度の認知機能障害の自動識別に関して極めて優れた質問であることが分かる。しかしながら、この質問の開始から回答を得るまでには比較的長い時間を要するため、評価の簡便性及び迅速性の点で問題がある。 As can be seen from Table 3 or FIG. 6, in the automatic identification using only the correct answer score for each question of HDS-R, the correct answer rate greatly changed depending on the type of question. Question 7 had the highest percentage of correct answers when using the question regarding delayed playback of three words, followed by question 2, that is, the highest percentage of correct answers when using the question regarding orientation of the date and time. It was higher than the correct answer rate when using only. The correct answer rate when using only the correct answer score for question 7 reaches 0.9, which shows that this question is a very excellent question regarding automatic identification of healthy/mild cognitive impairment. However, since it takes a relatively long time from the start of this question until the answer is obtained, there is a problem in terms of simplicity and speed of evaluation.

また、表３或いは図６から把握されるように、ＨＤＳ−Ｒの各質問に対する回答時の音声データから抽出した音声特徴量のみを用いた自動識別に関しては、音声特徴量セットの種類、識別器の種類、及び質問の種類の相違による影響は比較的小さく、正答率が０．６前後であった。表３から把握されるように、音声特徴量のみを用いた自動識別における正答率の値はほとんどが０．５を超えているため、健常／軽度の認知機能障害の識別のために音声特徴量は有効であるといえるが、満足できる正答率には至っていない。また、質問２〜９に関しては、非特許文献１における結果と同様に、音声特徴量のみを用いた場合の正答率の値が正解点数のみを用いた場合の正答率の値より低くなる傾向があった。 Further, as can be understood from Table 3 or FIG. 6, regarding automatic identification using only the voice feature amount extracted from the voice data at the time of answering each question of HDS-R, the type of voice feature amount set, the discriminator The effect of differences in the type of question and the type of question was relatively small, and the correct answer rate was around 0.6. As can be seen from Table 3, most of the correct answer rates in the automatic identification using only the voice feature amount exceed 0.5. Therefore, the voice feature amount is used to identify the normal/mild cognitive impairment. Can be said to be effective, but has not reached a satisfactory correct answer rate. As for the questions 2 to 9, like the results in Non-Patent Document 1, the value of the correct answer rate when only the voice feature amount is used tends to be lower than the value of the correct answer rate when only the correct answer score is used. there were.

（３）年齢と正解点数と音声特徴量を用いた識別
上述したように、質問７に対する回答の正解点数のみを用いた場合の正答率は極めて高いものの、評価の簡便性及び迅速性の点で問題があるため、次に正解点数のみを用いた自動識別における正答率が高く、しかも質問の開始から回答を得るまでに要する時間が短時間で済む質問２に対する回答時の音声データを使用した自動識別を検討した。以下、音声特徴量セットしてｍｏｄｃｅｐ又はｅｍｏｂａｓｅを使用した結果について説明する。 (3) Discrimination Using Age, Correct Answer Score, and Voice Feature Amount As described above, the correct answer rate is extremely high when only the correct answer score for the question 7 is used, but it is easy and quick to evaluate. Since there is a problem, the correct answer rate in the automatic identification using only the correct answer score is next high, and the time required from the start of the question to obtaining the answer is short. Considered identification. Hereinafter, the result of setting the audio feature amount and using modcep or emobase will be described.

表４に、被検者についての年齢と質問２に対する回答の正解点数と質問２に対する回答時の音声データから抽出された音声特徴量とを用いた自動識別（本発明）における正解率を、質問２に対する回答の正解点数のみを用いた自動識別における正答率、質問２に対する回答の正解点数と音声特徴量とを用いた自動識別における正答率、及び質問２に対する回答の正解点数と年齢とを用いた自動識別における正答率と比較して示す。

In Table 4, the correct answer rate in the automatic identification (the present invention) using the age of the subject, the correct answer score for the question 2 and the voice feature amount extracted from the voice data at the time of the answer to the question 2 is asked. Using the correct answer rate in automatic identification using only the correct answer score for 2, the correct answer rate in automatic identification using the answer score and the voice feature amount for question 2, and the correct answer score and age for the answer to question 2. It shows in comparison with the correct answer rate in the automatic identification.

また、表５に、質問２に代えて質問７を用いた比較実験の結果を示す。

Further, Table 5 shows the results of the comparative experiment using Question 7 instead of Question 2.

表５から把握されるように、質問７に対する回答を用いた場合には、正解点数と音声特徴量とを用いた自動識別、正解点数と年齢とを用いた自動識別、及び年齢と正解点数と音声特徴量とを用いた自動識別のいずれにおける正答率も、正解点数のみを用いた自動識別における正答率より低かった。特に、識別器としてサポートベクトルマシーン（ＳＶＭ）を用いたときに、音声特徴量を含むデータを用いた自動識別の正答率の低下が著しく、音声特徴量セットとしてｅｍｏｂａｓｅを用いるとｍｏｄｃｅｐを用いた場合に比較してさらに顕著に正答率が低下した。一般的に機械学習を安定に行うためには多くの学習データが必要であることが知られているが、表２に示されている健常者及び軽度の認知機能障害発症者の数では、サポートベクトルマシーンを用いた機械学習を安定に行うためには不足しており、この不足の効果が、２０種類の音声特徴量を含むｍｏｄｃｅｐを用いた場合よりも、９８８種類の音声特徴量を含むｅｍｏｂａｓｅを用いた場合に、より顕著に表れたと考えられる。健常者及び軽度の認知機能障害発症者の数が増加すれば、サポートベクトルマシーンを用いてもランダムフォレスト或いは勾配ブースティング木を用いたときと同様の正答率が得られると期待される。 As understood from Table 5, when the answer to the question 7 is used, the automatic identification using the correct answer score and the voice feature amount, the automatic identification using the correct answer score and the age, and the age and the correct answer score. The rate of correct answers in any of the automatic identifications using the voice feature amount was lower than that in the automatic identifications using only the correct answer points. In particular, when a support vector machine (SVM) is used as a classifier, the correct answer rate of automatic identification using data including a voice feature amount is remarkably decreased, and when embase is used as a voice feature amount set, when modcep is used. The rate of correct answers was significantly lower than that of. It is generally known that a large amount of learning data is required to perform machine learning stably, but in the number of healthy people and mild cognitive impairment onset shown in Table 2, This is insufficient for stable machine learning using a vector machine, and the effect of this shortage is that emobase including 988 kinds of voice feature quantities is more than the case where modcep including 20 kinds of voice feature quantities is used. It is considered that the more remarkable appearance was observed when using. If the number of healthy subjects and those with mild cognitive impairment increases, it is expected that the correct answer rate will be obtained using the support vector machine as well as when using the random forest or the gradient boosting tree.

しかし、表４から把握されるように、質問２に対する回答を用いると、識別器としてランダムフォレスト（ＲＦ）或いは勾配ブースティング木（ＧＢＣ）を用いた場合には、年齢と正解点数と音声特徴量とを用いた自動識別における正答率が向上し、質問７に対する回答の正解点数のみを用いた自動識別における正答率（０．９０３）に匹敵するまで向上した正答率が得られた。しかし、識別器としてサポートベクトルマシーン（ＳＶＭ）を用いた場合には、質問７に対する回答を用いた場合と同様に、健常者及び軽度の認知機能障害発症者の数の不足が手伝って、音声特徴量を含むデータを用いた自動識別の正答率が低下し、音声特徴量としてｅｍｏｂａｓｅを用いるとｍｏｄｃｅｐを用いた場合に比較してさらに顕著に正答率が低下した。健常者及び軽度の認知機能障害発症者の数が増加すれば、サポートベクトルマシーンを用いてもランダムフォレスト或いは勾配ブースティング木を用いたときと同様の正答率が得られると期待される。 However, as can be understood from Table 4, when the answer to question 2 is used, when the random forest (RF) or the gradient boosting tree (GBC) is used as the discriminator, the age, the correct answer score, and the voice feature amount are used. The correct answer rate in the automatic identification using and improved, and the correct answer rate improved to the level of the correct answer rate (0.903) in the automatic identification using only the correct answer score for the question 7. However, when the support vector machine (SVM) is used as the discriminator, as in the case where the answer to the question 7 is used, the lack of the number of healthy persons and those with mild cognitive impairment causes the lack of voice characteristics. The correct answer rate of the automatic identification using the data including the amount is decreased, and the correct answer rate is significantly decreased when the emote is used as the voice feature amount as compared with the case where the modcep is used. If the number of healthy subjects and those with mild cognitive impairment increases, it is expected that the correct answer rate will be obtained using the support vector machine as well as when using the random forest or the gradient boosting tree.

ｅｍｏｂａｓｅは９８８種類の短期的音声特徴量から構成される統計的な音声特徴量のセットであり、ｍｏｄｃｅｐは音声特徴量時系列の時間変動を表現した２０種類の長期的な音声特徴量を含むセットであり、両者は特に時間的性質の点で相違するが、表４から把握されるように、上述した音声特徴量の性質の相違に関わらず、健常／軽度の認知機能障害の自動識別のために、被検者についての年齢と質問２に対する回答の正解点数と質問２に対する回答時の音声データから抽出された音声特徴量とを用いた自動識別が有効であることが分かる。特に、識別器として勾配ブースティング木（ＧＢＣ）を用いた場合には、以下に示すように、年齢と正解点数と音声特徴量とを用いた自動識別における正答率の向上が顕著であった。 Emobase is a set of statistical voice features that is composed of 988 types of short-term voice features, and modcep is a set that includes 20 types of long-term voice features that represent time variations of time series of voice features. Although the two are different in terms of temporal characteristics in particular, as can be seen from Table 4, regardless of the difference in the characteristics of the above-mentioned voice features, for automatic identification of healthy/mild cognitive impairment. It can be seen that the automatic identification using the age of the subject, the correct answer score for the question 2, and the voice feature amount extracted from the voice data at the time of answering the question 2 is effective. In particular, when a gradient boosting tree (GBC) is used as a classifier, the correct answer rate is significantly improved in automatic classification using age, correct answer score, and voice feature amount, as shown below.

（４）音声特徴量セットの影響
図７に、音声特徴量セットしてｍｏｄｃｅｐを使用し、識別器として勾配ブースティング木を使用した場合の、ＨＤＳ−Ｒの質問２に対する回答の正解点数と回答時の音声データから抽出された音声特徴量と年齢とを使用した自動識別（本発明）における正答率を、正解点数のみを使用した自動識別における正答率、正解点数と音声特徴量とを使用した自動識別における正答率、及び正解点数と年齢とを使用した自動識別における正答率と比較した結果を示す。 (4) Effect of voice feature set In FIG. 7, when the voice feature set and modcep are used and the gradient boosting tree is used as the discriminator, the correct answer score and the answer to the question 2 of HDS-R are shown. The correct answer rate in the automatic identification (the present invention) using the voice feature amount and the age extracted from the voice data at the time was used as the correct answer rate, the correct answer score and the voice feature amount in the automatic identification using only the correct answer score. The result of comparison with the correct answer rate in automatic identification and the correct answer rate in automatic identification using correct answer score and age is shown.

図８には、音声特徴量セットとしてｅｍｏｂａｓｅを使用し、識別器として勾配ブースティング木を使用した場合の、ＨＤＳ−Ｒの質問２に対する回答の正解点数と回答時の音声データから抽出された音声特徴量と年齢とを使用した自動識別（本発明）における正答率を、正解点数のみを使用した自動識別における正答率、正解点数と音声特徴量とを使用した自動識別における正答率、及び正解点数と年齢とを使用した自動識別における正答率と比較した結果を示す。 FIG. 8 shows the number of correct answers to the question 2 of HDS-R and the speech extracted from the speech data at the time of the answer when the emote is used as the speech feature set and the gradient boosting tree is used as the classifier. The correct answer rate in the automatic identification (the present invention) using the feature amount and the age, the correct answer rate in the automatic identification using only the correct answer score, the correct answer rate in the automatic identification using the correct answer score and the voice feature amount, and the correct answer score. The results of comparison with the correct answer rate in automatic identification using age and age are shown.

さらに、図９には、音声特徴量セットとしてｅｍｏｂａｓｅ＋ｍｏｄｃｅｐを使用し、識別器として勾配ブースティング木を使用した場合の、ＨＤＳ−Ｒの質問２に対する回答の正解点数と回答時の音声データから抽出された音声特徴量と年齢とを使用した自動識別（本発明）における正答率を、正解点数のみを使用した自動識別における正答率、正解点数と音声特徴量とを使用した自動識別における正答率、及び正解点数と年齢とを使用した自動識別における正答率と比較した結果を示す。 Further, in FIG. 9, when the emboss+modcep is used as the voice feature amount set and the gradient boosting tree is used as the discriminator, the correct answer points to the question 2 of HDS-R and the voice data at the time of the answer are extracted. Correct answer rate in automatic identification using the voice feature amount and age (the present invention), correct answer rate in automatic identification using only correct answer score, correct answer rate in automatic identification using correct answer score and voice feature amount, and The result compared with the correct answer rate in automatic identification using a correct answer score and age is shown.

図７〜図９から把握されるように、正解点数と音声特徴量とを用いた自動識別及び正解点数と年齢とを用いた自動識別における正答率の向上は、正解点数のみを用いた自動識別における正答率と比較して顕著であるとは言えず、図７における正解点数と音声特徴量とを用いた自動識別においてはむしろ正答率が低下したが、年齢と正解点数と音声特徴量とを用いた自動識別（本発明）における正答率は顕著に向上しており、しかも、年齢と正解点数と音声特徴量とを用いた自動識別における正答率は、時間的性質の点で相違する音声特徴量セット（ｅｍｏｂａｓｅ，ｍｏｄｃｅｐ）或いはその結合ベクトルのセット（ｅｍｏｂａｓｅ＋ｍｏｄｃｅｐ）のいずれを用いてもほぼ同じ値であった。したがって、ＨＤＳ−Ｒの質問２に対する回答の正解点数と回答時の音声データから抽出された音声特徴量と年齢とを使用した自動識別（本発明）は、健常／軽度の認知機能障害の自動識別のために極めて有効であり、音声特徴量の性質の相違に依存せずに高い正答率が得られることが分かった。 As can be understood from FIGS. 7 to 9, the accuracy of the correct answer in the automatic identification using the correct answer score and the voice feature amount and the automatic identification using the correct answer score and the age is improved by the automatic identification using only the correct answer score. It cannot be said that it is remarkable compared with the correct answer rate in the above, and the correct answer rate was rather lowered in the automatic identification using the correct answer score and the voice feature amount in FIG. The correct answer rate in the used automatic identification (present invention) is remarkably improved, and the correct answer rate in the automatic identification using the age, the correct answer score, and the voice feature amount is different from the voice feature in terms of temporal characteristics. The values were almost the same regardless of whether the quantity set (emobase, modcep) or the set of combined vectors (emobase+modcep) was used. Therefore, the automatic identification (the present invention) using the correct answer score for the question 2 of HDS-R, the voice feature amount extracted from the voice data at the time of the response, and the age (the present invention) is the automatic identification of the healthy/mild cognitive impairment. Therefore, it was found that it is extremely effective, and that a high correct answer rate can be obtained without depending on the difference in the characteristics of the voice features.

（５）フィールドテスト
音声特徴量セットとしてｍｏｄｃｅｐを使用し、識別器として学習済みの勾配ブースティング木を使用した軽度の認知機能障害の推定システムを構築し、このシステムが新たな被検者における認知機能の変化を正しく判定しうるか否かを評価するフィールドテストを実施した。このテストの被検者は、文書同意が得られた７５歳以上の、認知症専門の医療機関においてＤＳＭ−５によりアルツハイマー型の軽度認知症若しくは軽度認知障害（ＭＣＩ）であると診断された者及び地域の高年齢者団体を通じて自主的に参加した健常者とした。これらの被検者に予め臨床的認知症尺度（ＣＤＲ）を実施し、ＣＤＲスコアに基づき、ＣＤＲ０群（健常、６２名）、ＣＤＲ０．５群（認知症疑い若しくは軽度認知障害、１３名）、及びＣＤＲ１群（軽度認知症、１５名）に分類した。ＣＤＲによる分類と上記システムによる判定結果とを比較することにより、上記システムの臨床的な妥当性が評価される。なお、ＣＤＲ１群は本明細書において定義された「軽度の認知機能障害発症者」に包含される。 (5) Field test Using modcep as a voice feature set, we constructed a system for estimating mild cognitive impairment using a trained gradient boosting tree as a discriminator, and this system was used to recognize new subjects. A field test was conducted to evaluate whether functional changes could be correctly judged. Subjects in this test are those who are 75 years of age or older who are informed by DSM-5 and are diagnosed as having Alzheimer's type mild dementia or mild cognitive impairment (MCI). And healthy people who voluntarily participated through the local elderly group. Clinical dementia scale (CDR) was previously performed on these subjects, and based on the CDR score, CDR0 group (healthy, 62 people), CDR0.5 group (suspected dementia or mild cognitive impairment, 13 people), And CDR1 group (mild dementia, 15 persons). The clinical validity of the system is evaluated by comparing the classification by CDR and the determination result by the system. The CDR1 group is included in “subjects of mild cognitive impairment” defined in the present specification.

フィールドテストの結果、上記システムは、ＣＤＲ０群の約９０％を「認知機能良好」と判定し、ＣＤＲ１群の約９０％を「認知機能の変化あり」と判定し、いずれの群においても高い正答率を示した。また、上記システムは軽度の認知機能障害の推定システムであるが、興味深いことに、ＣＤＲ０．５群の約７０％を「認知機能の変化あり」と判定し、上記システムの軽度認知障害（ＭＣＩ）の有無の推定に対する有効性が認められた。 As a result of the field test, the above system judged that about 90% of the CDR0 group was “good cognitive function” and about 90% of the CDR1 group was “changed cognitive function”, and the high correct answer in any group. Showed the rate. In addition, although the above system is an estimation system for mild cognitive impairment, it is interesting that about 70% of the CDR0.5 group was determined to have "change in cognitive function", and the mild cognitive impairment (MCI) of the above system was determined. It was confirmed to be effective for estimating the presence or absence of.

以上の結果から、上記システムが数十秒という極めて短い時間で臨床的に妥当な判定を与えたことがわかる。 From the above results, it can be seen that the above system gave a clinically valid judgment in an extremely short time of tens of seconds.

本発明の軽度の認知機能障害の推定システム及び軽度の認知機能障害の推定装置により、被検者における軽度の認知機能障害の有無を簡便且つ迅速にしかも高い正答率で推定することができるため、軽度の認知機能障害が認められる段階での治療開始につなげることができる。 By the estimation system of the mild cognitive impairment of the present invention and the estimation device of the mild cognitive impairment, it is possible to easily and quickly estimate the presence or absence of mild cognitive impairment in the subject with a high correct answer rate, It can be linked to the start of treatment at the stage when mild cognitive impairment is observed.

１軽度の認知機能障害の推定システム
１０検査サーバ
２０被検者端末
３０通信回線
１３識別器
１４データ取得部
１５データ解析部
１９推定結果出力部
４０軽度の認知機能障害の推定装置
４３識別器
４４データ取得部
４５データ解析部
４９推定結果出力部 1 Estimating system for mild cognitive impairment 10 Examination server 20 Subject terminal 30 Communication line 13 Discriminator 14 Data acquisition unit 15 Data analysis unit 19 Estimated result output unit 40 Estimator for mild cognitive impairment 43 Discriminator 44 data Acquisition unit 45 Data analysis unit 49 Estimated result output unit

Claims

An estimation system for a mild cognitive impairment, which includes an examination server and a subject terminal connected via a communication line, for estimating the presence or absence of a mild cognitive impairment in a subject,
The inspection server is
One kind extracted from the sound data at the time of answering the question regarding the age and the orientation regarding the age and the orientation regarding the age and the orientation regarding each of the healthy person and the person with the mild cognitive impairment. The above speech feature amount, and a discriminator trained to output the presence or absence of mild cognitive impairment based on learning data consisting of:
Age information about the subject transmitted from the subject terminal, information on answers to questions regarding the orientation regarding the date and time, and voice data at the time of answering questions regarding the orientation regarding the date and time. A data acquisition unit that receives subject information including
A data analysis unit that acquires the correct answer score based on the subject information, and extracts the one or more types of voice feature amounts from the voice data;
The age of the subject, the correct answer score, and the one or more kinds of voice feature amounts are input to the discriminator, and the presence or absence of mild cognitive impairment in the subject output from the discriminator is estimated. As a result, an estimation result output unit for transmitting to the subject terminal is provided, and the estimation system for mild cognitive impairment is provided.

The data acquisition unit receives, as the subject voice information, voice data at the time of answering each of the question about the age of the subject and the question regarding the orientation regarding the date and time, instead of the subject information. ,
The data analysis unit, based on the voice information of the subject, specifies the age of the subject from the voice data at the time of answering the question asking the age, and from the voice data at the time of answering the question regarding orientation of the date and time, The estimation system for mild cognitive impairment according to claim 1, wherein the correct answer score is specified and the one or more kinds of voice feature quantities are extracted.

The estimation system for mild cognitive impairment according to claim 1 or 2, wherein the classifier is random forest, gradient boosting tree, or XG boosting.

A mild cognitive impairment estimating device for estimating the presence or absence of mild cognitive impairment in a subject,
One kind extracted from the sound data at the time of answering the question regarding the age and the orientation regarding the date and time, and the correct answer score for the question regarding the orientation regarding the age and the date regarding each of the healthy person and the person with the mild cognitive impairment. The above speech feature amount, and a discriminator trained to output the presence or absence of mild cognitive impairment based on learning data consisting of:
Data acquisition for obtaining subject information including age information about the subject, information about answers to questions regarding orientation at the date and time, and voice data at the time of answering questions regarding orientation at the date and time Department,
A data analysis unit that acquires the correct answer score based on the subject information, and extracts the one or more types of voice feature amounts from the voice data;
The age of the subject, the correct answer score, and the one or more kinds of voice feature amounts are input to the discriminator, and the presence or absence of mild cognitive impairment in the subject output from the discriminator is estimated. An estimation device for mild cognitive impairment, comprising: an estimation result output unit that outputs the result.