JP2011255106A

JP2011255106A - Cognitive dysfunction danger computing device, cognitive dysfunction danger computing system, and program

Info

Publication number: JP2011255106A
Application number: JP2010134403A
Authority: JP
Inventors: Shohei Kato; 昇平加藤; Akiko Kobayashi; 朗子小林; Toshiaki Kojima; 敏昭小島; Hidenori Ito; 英則伊藤; Akira Honma; 昭本間
Original assignee: IFCOM Inc; Nagoya Institute of Technology NUC
Current assignee: IFCOM Inc; Nagoya Institute of Technology NUC
Priority date: 2010-06-11
Filing date: 2010-06-11
Publication date: 2011-12-22
Anticipated expiration: 2030-06-11
Also published as: JP4876207B2

Abstract

PROBLEM TO BE SOLVED: To compute the danger of cognitive dysfunction with high accuracy.SOLUTION: A feature quantity-selecting section 22 selects combinations of rhythm feature quantities highest in correlation with HDS-R scores from multiple kinds of rhythm feature quantities based on a plurality of learning data including the multiple kinds of rhythm feature quantities extracted from voice data, and the HDS-R scores obtained on a speaker of the voice data. A weighting determining section 24 determines weighting to each of the selected combinations of the rhythm feature quantities based on the selected combinations of rhythm feature quantities and the HDS-R scores of each of the plurality of learning data. A feature quantity-extracting section 28 extracts the multiple kinds of rhythm feature quantities from the input voice data. A danger computing section 30 computes the danger of cognitive dysfunction based on the selected combinations of the rhythm feature quantities out of the extracted rhythm feature quantities, and weighting determined by the weighting determining section 24.

Description

本発明は、認知機能障害危険度算出装置、認知機能障害危険度算出システム、及びプログラムに係り、特に、音声データに基づいて、認知機能障害の危険度を算出する認知機能障害危険度算出装置、認知機能障害危険度算出システム、及びプログラムに関する。 The present invention relates to a cognitive dysfunction risk calculation device, a cognitive dysfunction risk calculation system, and a program, and in particular, a cognitive dysfunction risk calculation device that calculates the risk of cognitive dysfunction based on voice data, The present invention relates to a cognitive impairment risk calculation system and program.

従来より、認知症のスクリーニングは、ＨＤＳ−Ｒ（改訂長谷川式簡易知能評価スケール）、ＭＭＳＥ（Ｍｉｎｉ‐ＭｅｎｔａｌＳｔａｔｅＥｘａｍｉｎａｔｉｏｎ）、ＣＤＲ（ＣｌｉｎｉｃａｌＤｅｍｅｎｔｉａＲａｔｉｎｇ）などが、ｆＭＲＩ、ＦＤＧ‐ＰＥＴ、ＣＳＦバイオマーカーなどの神経生理学に基づくテストと同様に広く用いられている。これらは一定のトレーニングを受けた医師、あるいは臨床心理士などにより、主として医療機関において実施されている。 Conventionally, screening for dementia includes HDS-R (revised Hasegawa simplified intelligence evaluation scale), MMSE (Mini-Mental State Examination), CDR (Clinical Dimensionia Rating), etc., and nerves such as fMRI, FDG-PET, and CSF biomarkers. It is widely used as well as physiologically based tests. These are implemented mainly in medical institutions by doctors who have received a certain amount of training or clinical psychologists.

また、患者の認知症の症状レベルを特定してそれに応じた質問と正解を生成し、患者の回答と正解とを比較して正誤の判定を行う認知症診断支援システムが知られている（特許文献１）。 There is also known a dementia diagnosis support system that identifies a patient's dementia symptom level, generates a question and a correct answer according to the symptom level, and compares the patient's answer with the correct answer to determine correctness (patent) Reference 1).

特開２００７−２８２９９２号公報JP 2007-282929 A

しかしながら、上記特許文献１に記載の技術では、繰り返し診断を行う場合、患者が質問に対する正解を覚えてしまう可能性があり、正しい判定ができなくなってしまう、という問題がある。 However, the technique described in Patent Document 1 has a problem that, when performing repeated diagnosis, there is a possibility that the patient will remember the correct answer to the question, and the correct determination cannot be made.

本発明は、上記の問題点を解決するためになされたもので、認知機能障害の危険度を精度良く算出することができる認知機能障害危険度算出装置、認知機能障害危険度算出システム、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and is a cognitive dysfunction risk calculating device, a cognitive dysfunction risk calculating system, and a program capable of accurately calculating the risk of cognitive dysfunction. The purpose is to provide.

上記の目的を達成するために第１の発明に係る認知機能障害危険度算出装置は、音声データから抽出される複数種類の韻律特徴量と、前記音声データの発話者について求められた認知機能障害の危険度とを含む複数の学習データに基づいて、前記複数種類の韻律特徴量から、前記危険度との相関が最も高くなる前記韻律特徴量の組み合わせを選択する特徴量選択手段と、前記複数の学習データの各々の前記選択された韻律特徴量の組み合わせと前記危険度とに基づいて、前記選択された韻律特徴量の組み合わせの各々に対する重み付けを決定する重み付け決定手段と、入力された音声データから、前記複数種類の韻律特徴量を抽出する特徴量抽出手段と、前記特徴量抽出手段によって抽出された前記韻律特徴量のうちの前記選択された韻律特徴量の組み合わせと、前記重み付け決定手段によって決定された重み付けとに基づいて、認知機能障害の危険度を算出する危険度算出手段と、を含んで構成されている。 In order to achieve the above object, a cognitive impairment risk calculating apparatus according to a first aspect of the present invention is a cognitive impairment calculated for a plurality of types of prosodic feature quantities extracted from speech data and a speaker of the speech data. Based on a plurality of learning data including the risk level, feature quantity selection means for selecting a combination of the prosodic feature quantities that has the highest correlation with the risk level from the plurality of types of prosodic feature quantities; Weight determination means for determining a weight for each combination of the selected prosodic feature values based on the combination of the selected prosodic feature values and the degree of risk of each of the learning data, and input speech data From the feature quantity extracting means for extracting the plurality of types of prosodic feature quantities, and the selected prosodic feature among the prosodic feature quantities extracted by the feature quantity extracting means. And combinations of quantities, based on the weighting and determined by the weighting determining section is configured to include a a risk calculation means for calculating the risk of cognitive impairment.

第２の発明に係るプログラムは、コンピュータを、音声データから抽出される複数種類の韻律特徴量と、前記音声データの発話者について求められた認知機能障害の危険度とを含む複数の学習データに基づいて、前記複数種類の韻律特徴量から、前記危険度との相関が最も高くなる前記韻律特徴量の組み合わせを選択する特徴量選択手段、前記複数の学習データの各々の前記選択された韻律特徴量の組み合わせと前記危険度とに基づいて、前記選択された韻律特徴量の組み合わせの各々に対する重み付けを決定する重み付け決定手段、入力された音声データから、前記複数種類の韻律特徴量を抽出する特徴量抽出手段、及び前記特徴量抽出手段によって抽出された前記韻律特徴量のうちの前記選択された韻律特徴量の組み合わせと、前記重み付け決定手段によって決定された重み付けとに基づいて、認知機能障害の危険度を算出する危険度算出手段として機能させるためのプログラムである。 According to a second aspect of the present invention, there is provided a program for storing a plurality of pieces of learning data including a plurality of types of prosodic feature values extracted from speech data and a risk of cognitive impairment obtained for a speaker of the speech data. Based on the plurality of types of prosodic feature quantities, feature quantity selection means for selecting a combination of the prosodic feature quantities that has the highest correlation with the degree of risk, and the selected prosodic features of each of the plurality of learning data A weight determining means for determining a weight for each of the selected combinations of prosodic feature values based on a combination of amounts and the degree of risk, and a feature for extracting the plurality of types of prosodic feature values from input speech data A combination of the selected prosodic feature quantities among the prosodic feature quantities extracted by the quantity extracting means, the feature quantity extracting means, and the weight Only on the basis of the weighting and determined by the determining means, a program for functioning as a risk calculation means for calculating the risk of cognitive impairment.

第１の発明及び第２の発明によれば、特徴量選択手段によって、音声データから抽出される複数種類の韻律特徴量と、音声データの発話者について求められた認知機能障害の危険度とを含む複数の学習データに基づいて、複数種類の韻律特徴量から、危険度との相関が最も高くなる韻律特徴量の組み合わせを選択する。重み付け決定手段によって、複数の学習データの各々の選択された韻律特徴量の組み合わせと危険度とに基づいて、選択された韻律特徴量の組み合わせの各々に対する重み付けを決定する。 According to the first and second aspects of the present invention, the feature quantity selecting means extracts the plurality of types of prosodic feature quantities extracted from the speech data and the risk level of cognitive impairment determined for the speaker of the speech data. A combination of prosodic feature values having the highest correlation with the degree of risk is selected from a plurality of types of prosodic feature values based on the plurality of learning data included. The weight determination means determines the weight for each selected combination of prosodic feature values based on the selected combination of prosodic feature values and the degree of risk of each of the plurality of learning data.

そして、特徴量抽出手段によって、入力された音声データから、複数種類の韻律特徴量を抽出する。危険度算出手段によって、特徴量抽出手段によって抽出された韻律特徴量のうちの選択された韻律特徴量の組み合わせと、重み付け決定手段によって決定された重み付けとに基づいて、認知機能障害の危険度を算出する。 Then, a plurality of types of prosodic feature quantities are extracted from the input voice data by the feature quantity extracting means. Based on the combination of the prosodic feature quantities selected from the prosody feature quantities extracted by the feature quantity extracting means and the weighting determined by the weight determining means by the risk calculating means, the risk degree of cognitive dysfunction is calculated. calculate.

このように、認知機能障害の危険度との相関が最も高くなる韻律特徴量の組み合わせと、韻律特徴量の組み合わせの各々に対して決定された重み付けとに基づいて、認知機能障害の危険度を精度良く算出することができる。 In this way, the risk level of cognitive dysfunction is determined based on the combination of prosodic feature values having the highest correlation with the risk level of cognitive dysfunction and the weight determined for each of the prosodic feature value combinations. It is possible to calculate with high accuracy.

第３の発明に係る認知機能障害危険度算出装置は、音声データから抽出される複数種類の韻律特徴量と、前記音声データの発話者について求められた認知機能障害の危険度とを含む複数の学習データの前記複数種類の韻律特徴量に対して分析処理を行って、前記複数種類の韻律特徴量を合成した合成変数を複数種類生成する合成変数生成手段と、前記複数の学習データと、前記生成された複数種類の合成変数とに基づいて、前記複数種類の合成変数から、前記危険度との相関が最も高くなる前記合成変数の組み合わせを選択する合成変数選択手段と、前記複数の学習データの各々について求められる前記合成変数の組み合わせと、前記複数の学習データの各々の前記危険度とに基づいて、前記選択された合成変数の組み合わせの各々に対する重み付けを決定する重み付け決定手段と、入力された音声データから、前記複数種類の韻律特徴量を抽出する特徴量抽出手段と、前記特徴量抽出手段によって抽出された前記韻律特徴量から求められる前記合成変数の組み合わせと、前記重み付け決定手段によって決定された重み付けとに基づいて、認知機能障害の危険度を算出する危険度算出手段と、を含んで構成されている。 A cognitive impairment risk calculating apparatus according to a third invention includes a plurality of types of prosodic feature quantities extracted from speech data, and a plurality of cognitive impairment risks determined for a speaker of the speech data. Analysis processing is performed on the plurality of types of prosodic feature values of the learning data to generate a plurality of types of composite variables obtained by combining the plurality of types of prosodic feature values, the plurality of learning data, Based on the plurality of types of generated synthetic variables, a combination variable selecting means for selecting a combination of the combined variables that has the highest correlation with the degree of risk from the plurality of types of combined variables, and the plurality of learning data Each of the selected combination variables based on the combination of the combination variables obtained for each of the plurality of learning data and the degree of risk of each of the plurality of learning data. Weighting determining means for determining a search, feature quantity extracting means for extracting the plurality of types of prosodic feature quantities from the input speech data, and the prosodic feature quantities extracted by the feature quantity extracting means And a risk degree calculating means for calculating the risk degree of cognitive impairment based on the combination of the composite variables and the weight determined by the weight determining means.

第４の発明に係るプログラムは、コンピュータを、音声データから抽出される複数種類の韻律特徴量と、前記音声データの発話者について求められた認知機能障害の危険度とを含む複数の学習データの前記複数種類の韻律特徴量に対して分析処理を行って、前記複数種類の韻律特徴量を合成した合成変数を複数種類生成する合成変数生成手段、前記複数の学習データと、前記生成された複数種類の合成変数とに基づいて、前記複数種類の合成変数から、前記危険度との相関が最も高くなる前記合成変数の組み合わせを選択する合成変数選択手段、前記複数の学習データの各々について求められる前記合成変数の組み合わせと、前記複数の学習データの各々の前記危険度とに基づいて、前記選択された合成変数の組み合わせの各々に対する重み付けを決定する重み付け決定手段、入力された音声データから、前記複数種類の韻律特徴量を抽出する特徴量抽出手段、及び前記特徴量抽出手段によって抽出された前記韻律特徴量から求められる前記合成変数の組み合わせと、前記重み付け決定手段によって決定された重み付けとに基づいて、認知機能障害の危険度を算出する危険度算出手段として機能させるためのプログラムである。 According to a fourth aspect of the present invention, there is provided a program for storing a plurality of pieces of learning data including a plurality of types of prosodic feature quantities extracted from speech data and a risk level of cognitive impairment obtained for a speaker of the speech data. Analysis processing is performed on the plurality of types of prosodic feature values to generate a plurality of types of composite variables obtained by combining the plurality of types of prosodic feature values, the plurality of learning data, and the generated plurality of types Based on the synthetic variable of the type, the synthetic variable selection means for selecting the combination of the synthetic variable having the highest correlation with the degree of risk from the plurality of synthetic variables, each of the plurality of learning data is obtained. A weight for each of the selected combination of synthetic variables based on the combination of the synthetic variables and the risk of each of the plurality of learning data Weighting determining means for determining the input, feature quantity extracting means for extracting the plural types of prosodic feature quantities from the input speech data, and the composite variable obtained from the prosodic feature quantities extracted by the feature quantity extracting means And a weight calculating means for calculating the risk degree of cognitive impairment based on the weight determined by the weight determining means.

第３の発明及び第４の発明によれば、合成変数生成手段によって、音声データから抽出される複数種類の韻律特徴量と、音声データの発話者について求められた認知機能障害の危険度とを含む複数の学習データの複数種類の韻律特徴量に対して分析処理を行って、複数種類の韻律特徴量を合成した合成変数を複数種類生成する。そして、合成変数選択手段によって、複数の学習データと、生成された複数種類の合成変数とに基づいて、複数種類の合成変数から、危険度との相関が最も高くなる合成変数の組み合わせを選択する。重み付け決定手段によって、複数の学習データの各々について求められる合成変数の組み合わせと、複数の学習データの各々の前記危険度とに基づいて、選択された合成変数の組み合わせの各々に対する重み付けを決定する。 According to the third and fourth aspects of the present invention, the synthetic variable generation means obtains a plurality of types of prosodic feature quantities extracted from the speech data, and the cognitive dysfunction risk calculated for the speaker of the speech data. An analysis process is performed on a plurality of types of prosodic feature values of a plurality of learning data included to generate a plurality of types of synthetic variables by combining a plurality of types of prosodic feature values. Then, the combination variable selection means selects a combination of the combination variables having the highest correlation with the risk from the plurality of types of combination variables based on the plurality of learning data and the plurality of types of combination variables generated. . The weight determination means determines a weight for each selected combination of the composite variables based on the combination of the composite variables obtained for each of the plurality of learning data and the risk level of each of the plurality of learning data.

そして、特徴量抽出手段によって、入力された音声データから、複数種類の韻律特徴量を抽出する。危険度算出手段によって、特徴量抽出手段によって抽出された韻律特徴量から求められる合成変数の組み合わせと、重み付け決定手段によって決定された重み付けとに基づいて、認知機能障害の危険度を算出する。 Then, a plurality of types of prosodic feature quantities are extracted from the input voice data by the feature quantity extracting means. The risk level calculation means calculates the risk level of cognitive dysfunction based on the combination of the synthetic variables obtained from the prosodic feature values extracted by the feature value extraction means and the weights determined by the weight determination means.

このように、認知機能障害の危険度との相関が最も高くなる、韻律特徴量の合成変数の組み合わせと、合成変数の組み合わせの各々に対して決定された重み付けとに基づいて、認知機能障害の危険度を精度良く算出することができる。 Thus, based on the combination of the prosodic feature variable that has the highest correlation with the risk of cognitive impairment and the weight determined for each combination of the combined variable, the cognitive impairment The degree of risk can be calculated with high accuracy.

上記の複数種類の韻律特徴量は、音声の周波数成分に関する特徴量、音声のフォルマント構造に関する特徴量、音声の大きさに関する特徴量、発話速度に関する特徴量、及び質問に回答するまでの反応時間に関する特徴量の少なくとも１つを含むようにすることができる。 The above-mentioned plural types of prosodic feature quantities relate to feature quantities related to frequency components of speech, feature quantities related to speech formant structure, feature quantities related to speech volume, feature quantities related to speech speed, and reaction time until answering a question. At least one of the feature amounts can be included.

上記の学習データの認知機能障害の危険度を、発話者に対する長谷川式簡易知能評価スケールによって求められたものとすることができる。 The degree of risk of cognitive dysfunction in the learning data can be determined by the Hasegawa simple intelligence evaluation scale for the speaker.

上記の特徴量抽出手段は、質問に対する回答として入力された音声データから、複数種類の韻律特徴量を抽出するようにすることができる。 The feature quantity extraction means can extract a plurality of types of prosodic feature quantities from voice data input as an answer to a question.

第５の発明に係る認知機能障害危険度算出システムは、音声データから抽出される複数種類の韻律特徴量と、前記音声データの発話者について求められた認知機能障害の危険度とを含む複数の学習データに基づいて、前記複数種類の韻律特徴量から、前記危険度との相関が最も高くなる前記韻律特徴量の組み合わせを選択する特徴量選択手段、及び前記複数の学習データの各々の前記選択された韻律特徴量の組み合わせと前記危険度とに基づいて、前記選択された韻律特徴量の組み合わせの各々に対する重み付けを決定する重み付け決定手段を含む特徴量選択装置と、入力された音声データから、前記複数種類の韻律特徴量を抽出する特徴量抽出手段、及び前記特徴量抽出手段によって抽出された前記韻律特徴量のうちの前記選択された韻律特徴量の組み合わせと、前記重み付け決定手段によって決定された重み付けとに基づいて、認知機能障害の危険度を算出する危険度算出手段を含む危険度算出装置と、を含んで構成されている。 A cognitive dysfunction risk calculation system according to a fifth invention includes a plurality of types of prosodic feature quantities extracted from speech data, and a plurality of cognitive dysfunction risks determined for a speaker of the speech data. Based on learning data, feature quantity selection means for selecting a combination of the prosodic feature quantities having the highest correlation with the degree of risk from the plurality of types of prosodic feature quantities, and the selection of each of the plurality of learning data From the input speech data, a feature value selection device including weight determination means for determining a weight for each of the selected prosodic feature value combinations based on the prosodic feature value combination and the risk level, Feature quantity extraction means for extracting the plurality of types of prosodic feature quantities, and the selected prosody of the prosodic feature quantities extracted by the feature quantity extraction means A combination of symptoms amount, based on the weight determined by the weighting determining section is configured to include a a risk calculation device that includes a risk calculation means for calculating the risk of cognitive impairment.

第６の発明に係る認知機能障害危険度算出システムは、音声データから抽出される複数種類の韻律特徴量と、前記音声データの発話者について求められた認知機能障害の危険度とを含む複数の学習データの前記複数種類の韻律特徴量に対して主成分分析を行って、前記複数種類の韻律特徴量を合成した合成変数を複数種類生成する合成変数生成手段、前記複数の学習データと、前記生成された複数種類の合成変数とに基づいて、前記複数種類の合成変数から、前記危険度との相関が最も高くなる前記合成変数の組み合わせを選択する合成変数選択手段、及び前記複数の学習データの各々について求められる前記合成変数の組み合わせと、前記複数の学習データの各々の前記危険度とに基づいて、前記選択された合成変数の組み合わせの各々に対する重み付けを決定する重み付け決定手段を含む合成変数選択装置と、入力された音声データから、前記複数種類の韻律特徴量を抽出する特徴量抽出手段、及び前記特徴量抽出手段によって抽出された前記韻律特徴量から求められる前記合成変数の組み合わせと、前記重み付け決定手段によって決定された重み付けとに基づいて、認知機能障害の危険度を算出する危険度算出手段を含む危険度算出装置と、を含んで構成されている。 A cognitive dysfunction risk calculation system according to a sixth invention includes a plurality of types of prosodic feature quantities extracted from voice data and a plurality of cognitive dysfunction risk levels obtained for a speaker of the voice data. Performing principal component analysis on the plurality of types of prosodic feature values of the learning data to generate a plurality of types of composite variables obtained by combining the plurality of types of prosodic feature values, the plurality of learning data, Based on a plurality of types of synthesized variables generated, a synthesized variable selection means for selecting a combination of the synthesized variables having the highest correlation with the degree of risk from the plurality of types of synthesized variables, and the plurality of learning data Each of the selected combination variables based on the combination of the combination variables obtained for each of the plurality of learning data and the degree of risk of each of the plurality of learning data. A synthetic variable selection device including weight determination means for determining the weight to be determined, feature quantity extraction means for extracting the plurality of types of prosodic feature quantities from the input speech data, and the prosody extracted by the feature quantity extraction means A risk calculating device including a risk calculating means for calculating the risk of cognitive impairment based on the combination of the composite variables obtained from the feature amount and the weight determined by the weight determining means. It is configured.

本発明のプログラムは、記録媒体に格納して提供することができる。 The program of the present invention can be provided by being stored in a recording medium.

以上説明したように、本発明の認知機能障害危険度算出装置、認知機能障害危険度算出システム、及びプログラムによれば、認知機能障害の危険度との相関が最も高くなる韻律特徴量の組み合わせと、韻律特徴量の組み合わせの各々に対して決定された重み付けとに基づいて、又は、認知機能障害の危険度との相関が最も高くなる、韻律特徴量の合成変数の組み合わせと、合成変数の組み合わせの各々に対して決定された重み付けとに基づいて、認知機能障害の危険度を精度良く算出することができる、という効果が得られる。 As described above, according to the cognitive dysfunction risk calculation device, the cognitive dysfunction risk calculation system, and the program of the present invention, the combination of prosodic feature quantities that has the highest correlation with the risk of cognitive dysfunction and Based on the weight determined for each combination of prosodic features, or a combination of synthetic variables of prosodic features and a combination of synthetic variables that have the highest correlation with the risk of cognitive impairment Based on the weights determined for each of the above, an effect is obtained that the risk of cognitive impairment can be calculated with high accuracy.

本発明の第１の実施の形態に係る認知機能障害危険度算出装置の構成を示す概略図である。It is the schematic which shows the structure of the cognitive dysfunction risk calculation apparatus which concerns on the 1st Embodiment of this invention. （Ａ）音声データから抽出された周波数成分を示すグラフ、及び（Ｂ）フォルマント周波数を示すグラフである。(A) The graph which shows the frequency component extracted from audio | voice data, (B) The graph which shows a formant frequency. （Ａ）音声データから抽出された短時間パワーを示すグラフ、及び（Ｂ）音声データの振幅を示すグラフである。(A) The graph which shows the short time power extracted from audio | voice data, (B) The graph which shows the amplitude of audio | voice data. 本発明の第１の実施の形態に係るコンピュータの学習処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the learning process routine of the computer which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係るコンピュータの認知機能障害危険度算出処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the cognitive impairment disorder degree calculation process routine of the computer which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る認知機能障害危険度算出装置の構成を示す概略図である。It is the schematic which shows the structure of the cognitive dysfunction risk calculation apparatus which concerns on the 2nd Embodiment of this invention. 第１の実施の形態及び第２の実施の形態の手法で、選択された韻律特徴量又は合成変数のうちの有意な韻律特徴量又は合成変数の例を示す図である。It is a figure which shows the example of the significant prosodic feature-value or synthetic | combination variable among the prosodic feature-values or synthetic variables selected by the method of 1st Embodiment and 2nd Embodiment. 第１の実施の形態及び第２の実施の形態の手法において、算出された危険度とＨＤＳ−Ｒスコアとの相関を示す図である。It is a figure which shows the correlation with the calculated risk and HDS-R score in the method of 1st Embodiment and 2nd Embodiment. 第１の実施の形態の手法で算出された危険度とＨＤＳ−Ｒスコアとの散布図である。It is a scatter diagram of the risk and the HDS-R score calculated by the method of the first embodiment. 第２の実施の形態の手法で算出された危険度とＨＤＳ−Ｒスコアとの散布図である。It is a scatter diagram of the risk and the HDS-R score calculated by the method of the second embodiment. 本発明の第３の実施の形態に係る認知機能障害危険度算出システムの構成を示す概略図である。It is the schematic which shows the structure of the cognitive dysfunction risk calculation system which concerns on the 3rd Embodiment of this invention. 本発明の第４の実施の形態に係る認知機能障害危険度算出システムの構成を示す概略図である。It is the schematic which shows the structure of the cognitive dysfunction risk calculation system which concerns on the 4th Embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。なお、本実施の形態では、危険度算出対象であるユーザの音声を入力して、認知機能障害の危険度を算出する認知機能障害危険度算出装置に、本発明を適用した場合を例に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the present embodiment, a case where the present invention is applied to a cognitive impairment risk calculation apparatus that inputs the voice of a user who is a risk calculation target and calculates the risk of cognitive impairment will be described as an example. To do.

図１に示すように、第１の発明の実施の形態に係る認知機能障害危険度算出装置１０は、ユーザからの音声を入力するための音声入力部１２と、音声を出力するスピーカ１３と、入力された音声データに基づいて、認知機能障害の危険度を算出して、表示装置１６に表示させるコンピュータ１４とを備えている。 As shown in FIG. 1, a cognitive dysfunction risk degree calculation device 10 according to an embodiment of the first invention includes a voice input unit 12 for inputting voice from a user, a speaker 13 for outputting voice, A computer 14 that calculates the risk of cognitive impairment based on the input voice data and causes the display device 16 to display it.

音声入力部１２は、例えば、音声を入力するためのマイクロホンで構成されている。 The voice input unit 12 is composed of, for example, a microphone for inputting voice.

コンピュータ１４は、ＣＰＵ、ＲＯＭ、ＲＡＭ、及びＨＤＤを備え、ＨＤＤには、後述する学習処理ルーチン及び認知機能障害危険度算出処理ルーチンに対するプログラムが記憶されている。 The computer 14 includes a CPU, a ROM, a RAM, and an HDD. The HDD stores programs for a learning process routine and a cognitive function impairment risk calculation process routine, which will be described later.

コンピュータ１４は、機能的には次に示すように構成されている。図１に示すように、コンピュータ１４は、学習データ記憶部２０、特徴量選択部２２、重み付け決定部２４、質問再生部２５と、音声取得部２６、特徴量抽出部２８、危険度算出部３０を備えている。 The computer 14 is functionally configured as follows. As shown in FIG. 1, the computer 14 includes a learning data storage unit 20, a feature amount selection unit 22, a weight determination unit 24, a question playback unit 25, a voice acquisition unit 26, a feature amount extraction unit 28, and a risk level calculation unit 30. It has.

学習データ記憶部２０には、学習データとして、音声データから得られた複数種類の音声韻律特徴量と、当該音声データの発話者に対して求められた認知機能障害の危険度とからなるデータが予め記憶されており、多数の発話者に対する学習データが予め記憶されている。 In the learning data storage unit 20, as learning data, data including a plurality of types of speech prosody features obtained from speech data and the cognitive dysfunction risk required for the speaker of the speech data is stored. It is stored in advance, and learning data for a large number of speakers is stored in advance.

本実施の形態では、学習データを得るために、口頭による質疑応答形式のテストとして改訂長谷川式簡易知能評価スケール（ＨＤＳ−Ｒ）を実施したときの音声会話を記録した音声データを用いている。例えば、質疑応答の中から「見当識」と「数字逆唱」についての２問の回答音声を収集し、これに追加して、出身地、子供のころの遊び、学生時代、の３つのテーマについて雑談したものから任意の発話音声の冒頭１フレーズについても収集して、音声データを収集した。このように得られた音声データから得られた複数種類の音声韻律特徴量と、テストによって得られた、認知機能障害の危険度を表わすＨＤＳ−Ｒスコアとを学習データとして収集して、学習データ記憶部２０に格納した。 In this embodiment, in order to obtain learning data, voice data in which voice conversation is recorded when the revised Hasegawa simplified intelligence evaluation scale (HDS-R) is implemented as a verbal Q & A test is used. For example, we collect voices of answers to two questions about “registration” and “number recitation” from questions and answers, and in addition to this, three themes of hometown, childhood play, and school days Voice data was collected by collecting even the first phrase of any utterance voice from those chatting. A plurality of types of speech prosodic features obtained from the speech data obtained in this way and HDS-R scores representing the risk of cognitive impairment obtained by the test are collected as learning data, and learning data is collected. Stored in the storage unit 20.

複数種類の音声韻律特徴量として、例えば、以下に示す１３０種の音声韻律特徴量が用いられる。これらの特徴抽出の際には、量子化ビット数１６およびサンプリング周波数４４．１［ＫＨｚ］のデジタル音声を用い、短時間分析におけるフレーム長を２３［ｍｓｅｃ］、フレーム周期１１［ｍｓｅｃ］とし、窓関数としてＨａｍｍｉｎｇ窓（１０２４ポイント）を使用した。 For example, the following 130 types of speech prosody features are used as the plurality of types of speech prosody features. When extracting these features, digital speech having 16 quantization bits and a sampling frequency of 44.1 [KHz] is used, the frame length in the short-time analysis is 23 [msec], the frame period is 11 [msec], and the window A Hamming window (1024 points) was used as a function.

まず、音声の高さに関係するピッチ構造を反映させるために、基本周波数と基本周波数のｎ倍の周波数を持つｎ次高調波成分から得られる、以下に示す５３種の音声韻律特徴量を用いた。ただし、基本周波数の時間変化の振幅とは、１事例の基本周波数のデータ列の上位２５％と下位２５％の値を無視したときの最大値と最小値の幅とする。 First, in order to reflect the pitch structure related to the pitch of speech, the following 53 types of speech prosodic features obtained from the fundamental frequency and n-order harmonic components having a frequency n times the fundamental frequency are used. It was. However, the amplitude of the time variation of the fundamental frequency is defined as the width between the maximum value and the minimum value when the upper 25% and lower 25% values of the data string of the basic frequency of one case are ignored.

Ｆ１−７．発音開始直後ｔ秒間の基本周波数の時間変化の振幅（ｔ＝０．０５，０．１０，・・・，０．３５）
Ｆ８．周波数重心（各高調波成分のパワー値を重みとする周波数の重みつき平均）
Ｆ１０−４８．全高調波成分のパワー値の合計に対する基音からｎ次までの高調波成分のパワー値の合計の割合（ｎ＝２，３，・・・，４０）
Ｆ４９．奇数次の高調波成分（基音含む）と偶数次の高調波成分とのパワー値の合計の比
Ｆ５０−５３．基本周波数の標準偏差、平均、最大、最小値（図２（Ａ）参照） F1-7. Amplitude of time change of fundamental frequency for t seconds immediately after the start of sound generation (t = 0.05, 0.10, ..., 0.35)
F8. Frequency centroid (weighted average of frequencies weighted by the power value of each harmonic component)
F10-48. Ratio of the total power value of the harmonic components from the fundamental to the nth order relative to the total power value of all harmonic components (n = 2, 3,..., 40)
F49. Ratio of sum of power values of odd-order harmonic components (including fundamental tone) and even-order harmonic components F50-53. Standard frequency standard deviation, average, maximum, minimum (see Fig. 2 (A))

また、音声の特徴を表すフォルマント構造を反映させるために、以下に示すフォルマント周波数（図２（Ｂ）参照）とフォルマント帯域幅を、音声韻律特徴量として５６種類用いる。 In order to reflect the formant structure representing the features of speech, 56 types of formant frequencies (see FIG. 2B) and formant bandwidths shown below are used as speech prosodic feature values.

Ｆ５４−５７．第ｎフォルマント周波数の標準偏差（ｎ＝１，・・・，４）
Ｆ５８−６１．第ｎフォルマント周波数の平均値（ｎ＝１，・・・，４）
Ｆ６２−６５．第ｎフォルマント周波数の最大値（ｎ＝１，・・・，４）
Ｆ６６−６９．第ｎフォルマント周波数の最小値（ｎ＝１，・・・，４）
Ｆ７０−７３．第ｎフォルマント周波数の中央値（ｎ＝１，・・・，４）
Ｆ７４−７７．第ｎフォルマント周波数の最大値と最小値の差（ｎ＝１，・・・，４）
Ｆ７８−８１．第ｎフォルマント周波数の線形近似直線の傾き（ｎ＝１，・・・，４）
Ｆ８２−８５．第ｎフォルマント帯域幅の標準偏差（ｎ＝１，・・・，４）
Ｆ８６−８９．第ｎフォルマント帯域幅の平均値（ｎ＝１，・・・，４）
Ｆ９０−９３．第ｎフォルマント帯域幅の最大値（ｎ＝１，・・・，４）
Ｆ９４−９７．第ｎフォルマント帯域幅の最小値（ｎ＝１，・・・，４）
Ｆ９８−１０１．第ｎフォルマント帯域幅の中央値（ｎ＝１，・・・，４）
Ｆ１０２−１０５．第ｎフォルマント帯域幅の最大値と最小値の差（ｎ＝１，・・・，４）
Ｆ１０６−１０９．第ｎフォルマント帯域幅の線形近似直線の傾き（ｎ＝１，・・・，４） F54-57. Standard deviation of nth formant frequency (n = 1, ..., 4)
F58-61. Average value of nth formant frequency (n = 1,..., 4)
F62-65. Maximum value of the nth formant frequency (n = 1,..., 4)
F66-69. Minimum value of the nth formant frequency (n = 1,..., 4)
F70-73. Median value of nth formant frequency (n = 1,..., 4)
F74-77. Difference between the maximum value and minimum value of the nth formant frequency (n = 1,..., 4)
F78-81. The slope of the linear approximation line of the nth formant frequency (n = 1,..., 4)
F82-85. Standard deviation of the nth formant bandwidth (n = 1,..., 4)
F86-89. Average value of nth formant bandwidth (n = 1,..., 4)
F90-93. Maximum value of nth formant bandwidth (n = 1,..., 4)
F94-97. Minimum value of nth formant bandwidth (n = 1,..., 4)
F98-101. Median of the nth formant bandwidth (n = 1,..., 4)
F102-105. Difference between the maximum value and the minimum value of the nth formant bandwidth (n = 1,..., 4)
F106-109. The slope of the linear approximation line of the nth formant bandwidth (n = 1,..., 4)

更に、音声の大きさに関係する振幅構造を反映させるために、短時間パワーとその包絡線から得られる、以下に示す１０種類の音声韻律特徴量を用いる。 Furthermore, in order to reflect the amplitude structure related to the volume of speech, the following 10 types of speech prosody features obtained from the short-time power and its envelope are used.

Ｆ１１０．パワー包絡線の線形最小二乗法による近似直線の傾き
Ｆ１１１−１１７．発音開始直後t 秒間のパワー包絡線の微分係数の中央値（ｔ＝０．０５，０．１０，・・・，０．３５）
Ｆ１１８−１２４．最大パワー値と発音開始からt 秒後のときのパワー値の比（ｔ＝０．０５，０．１０，・・・，０．３５）
Ｆ１２５−１２８．短時間パワーの標準偏差、平均、最大、最小値（図３（Ａ）参照） F110. Inclination of approximate straight line by linear least square method of power envelope F111-117. Median value of the derivative of the power envelope for t seconds immediately after the start of sounding (t = 0.05, 0.10, ..., 0.35)
F118-124. Ratio of maximum power value and power value at t seconds after the start of sound generation (t = 0.05, 0.10, ..., 0.35)
F125-128. Standard deviation, average, maximum, and minimum values of short-time power (see Fig. 3 (A))

また、時間構造を反映させるために、２種類の韻律特徴量として、図３（Ｂ）に示すような、発話者が話す速さ、及び質問に回答するまでの反応時間を用いる。 In order to reflect the time structure, the speed of speaking by the speaker and the reaction time until the question is answered are used as two types of prosodic feature quantities, as shown in FIG.

Ｆ１２９．１モーラあたりの発話継続時間
Ｆ１３０．返答までの反応時間 F129. Speech duration per mora F130. Reaction time to reply

次に、韻律特徴量の組み合わせを選択する原理について説明する。 Next, the principle of selecting a combination of prosodic feature values will be described.

本実施の形態では、ユーザの音声データから音声韻律特徴を抽出し、解析することで認知機能障害（ＣＩ）と健常（ＮＬ）を判別することを目的としている。しかしながら、解析を行なう際に、音声データから抽出した韻律特徴量が多すぎると、その韻律特徴量の中には認知機能障害の判別に無関係な韻律特徴量が含まれる可能性があり、モデルの構築や判別の精度に悪影響を与えることが考えられる。また、韻律特徴量が多すぎるとモデルが複雑になりすぎたり、計算コストが高くなったりする短所もある。 The purpose of this embodiment is to discriminate between cognitive impairment (CI) and healthy (NL) by extracting and analyzing speech prosodic features from user speech data. However, when analyzing, if there are too many prosodic features extracted from speech data, the prosodic features may include prosodic features that are irrelevant to the discrimination of cognitive dysfunction. It is possible that the accuracy of construction and discrimination will be adversely affected. In addition, if there are too many prosodic feature quantities, the model becomes too complex and the calculation cost becomes high.

そこで、本実施の形態では、上記の１３０種類の音声韻律特徴に対して、特徴選択を行なう。特徴選択の手法としては、科学的理論や事前の知識によって適当な変数を指定する変数指定法や、全ての変数の組合せを計算し、最良と思われるものを選択する総当たり法、一定の規則にしたがって変数を逐次選択していく逐次選択法などがあげられる。現在のところ、高齢者の認知機能障害と因果関係の高い音声特徴は特定されておらず、特徴選択として有用な理論や事前の知識は存在しない。また、抽出した韻律特徴量のすべての組合せを計算することは計算コストが高くなる。そのため、一般的に多用されている逐次選択法として例えばフォワードステップワイズ法（ＦＳＷ）を用いて特徴選択を行なう。なお、フォワードステップワイズ法は、より良い組み合わせとなるように、組み合わせを増減させて逐次選択していく方法であり、非特許文献（Ｄｒａｐｅｒ，Ｎ．ａｎｄＳｍｉｔｈ，Ｈ．: ＡｐｐｌｉｅｄＲｅｇｒｅｓｓｉｏｎＡｎａｌｙｓｉｓ（3ｒｄｅｄｉｔｉｏｎ），ＪｏｈｎＷｉｌｅｙ＆Ｓｏｎｓ（１９９８））に記載の方法と同様であるため、詳細な説明を省略する。 Therefore, in the present embodiment, feature selection is performed on the 130 types of speech prosodic features described above. Feature selection methods include variable specification methods that specify appropriate variables based on scientific theory and prior knowledge, brute force methods that calculate combinations of all variables, and select the one that seems to be the best, and certain rules A sequential selection method that sequentially selects variables according to At present, speech features that are highly causally related to cognitive impairment in the elderly have not been identified, and there is no theory or prior knowledge useful for feature selection. In addition, calculating all combinations of the extracted prosodic feature values increases the calculation cost. Therefore, feature selection is performed using, for example, a forward stepwise method (FSW) as a commonly used sequential selection method. Note that the forward stepwise method is a method in which combinations are increased / decreased sequentially so as to obtain a better combination, and non-patent literature (Draper, N. and Smith, H .: Applied Regression Analysis (3rd edition) ), John Wiley & Sons (1998)), the detailed description is omitted.

本実施の形態では、特徴量選択部２２によって、学習データ記憶部２０に記憶された多数の学習データに基づいて、フォワードステップワイズ法に従って、認知機能障害の危険度（ＨＤＳ−Ｒスコア）との相関が最も高くなる音声韻律特徴量の組み合わせを探索し、探索された音声韻律特徴量の組み合わせを、選択する音声韻律特徴量の組み合わせとする。 In the present embodiment, the feature amount selection unit 22 calculates the risk of cognitive impairment (HDS-R score) according to the forward stepwise method based on a large number of learning data stored in the learning data storage unit 20. A combination of speech prosodic feature values having the highest correlation is searched, and the searched combination of speech prosodic feature values is set as a combination of selected speech prosodic feature values.

また、重み付け決定部２４では、選択された韻律特徴量の組み合わせを入力とし、かつ、認知機能障害の危険度を出力する学習モデルを構築し、学習データ記憶部２０に記憶された多数の学習データにおける、選択された韻律特徴量の組み合わせと認知機能障害の危険度（ＨＤＳ−Ｒスコア）とに基づいて、例えば韻律特徴量の組み合わせを説明変数とし、かつ、ＨＤＳ−Ｒスコアを目標属性とした重回帰分析を行うことによって、選択された各韻律特徴量に対する重み付けを学習する。ここで構築された学習モデルを、音声韻律に基づく認知機能障害評定（ＳＰＣＩＲ: ｓｐｅｅｃｈｐｒｏｓｏｄｙ−ｂａｓｅｄｃｏｇｎｉｔｉｖｅｉｍｐａｉｒｍｅｎｔｒａｔｉｎｇ）のモデルと称することとする。 In addition, the weight determination unit 24 constructs a learning model that receives the selected combination of prosodic feature values and outputs the risk of cognitive impairment, and stores a large number of learning data stored in the learning data storage unit 20. Based on the selected combination of prosodic features and the risk of cognitive impairment (HDS-R score), for example, the combination of prosodic features is an explanatory variable and the HDS-R score is a target attribute A weighting for each selected prosodic feature is learned by performing multiple regression analysis. The learning model constructed here will be referred to as a model of cognitive dysfunction rating (SPCI) based on phonetic prosody (SPCI: speech prosthesis-based cognitive impulse rating).

質問再生部２５では、予め設定された質問データをスピーカ２５から再生する。例えば、質問データとして、「出身地はどちらですか？」などの日常的な会話における質問を用いている。 The question reproducing unit 25 reproduces preset question data from the speaker 25. For example, as questions data, questions in daily conversation such as “Where are you from?” Are used.

音声取得部２６では、スピーカ２５から質問が再生されたときに、質問に対する回答として、音声入力部１２によって入力されたユーザの音声データを取得する。 The voice acquisition unit 26 acquires the user's voice data input by the voice input unit 12 as an answer to the question when the question is reproduced from the speaker 25.

特徴量抽出部２８は、取得したユーザの音声データから、上記の１３０種類の音声韻律特徴量を抽出する。 The feature quantity extraction unit 28 extracts the 130 types of speech prosodic feature quantities from the acquired user voice data.

危険度算出部３０は、特徴量抽出部２８によって抽出された音声韻律特徴量のうち、特徴量選択部２２によって選択された韻律特徴量の組み合わせに対応するものを取り出し、重み付け決定部２４によって決定された重み付けを適用した学習モデルに、取り出した韻律特徴量の組み合わせを入力して、認知機能障害の危険度を算出する。 The degree-of-risk calculation unit 30 extracts a speech prosody feature amount extracted by the feature amount extraction unit 28 that corresponds to the combination of the prosodic feature amounts selected by the feature amount selection unit 22, and is determined by the weight determination unit 24. The extracted prosodic feature combinations are input to the learning model to which the weighting is applied, and the risk of cognitive impairment is calculated.

次に、本実施の形態に係る認知機能障害危険度算出装置１０の作用について説明する。 Next, the operation of the cognitive dysfunction risk degree calculating device 10 according to the present embodiment will be described.

まず、オペレータが、学習用の音声データから得られた複数種類の音声韻律特徴量とＨＤＳ−Ｒを、学習データとしてコンピュータ１４に入力して、学習データ記憶部２０に多数格納させる。 First, an operator inputs a plurality of types of speech prosodic features and HDS-R obtained from learning speech data to the computer 14 as learning data, and stores them in the learning data storage unit 20.

そして、コンピュータ１４において、図４に示す学習処理ルーチンが実行される。 Then, a learning process routine shown in FIG. 4 is executed in the computer 14.

まず、ステップ１００で、学習データ記憶部２０から学習データを読み出し、ステップ１０２において、上記ステップ１００で読み出した学習データを用いて、フォワードステップワイズ法に従って、ＨＤＳ−Ｒスコアとの相関が最も高くなる音声韻律特徴量の組み合わせを探索し、探索された音声韻律特徴量の組み合わせを、選択する音声韻律特徴量の組み合わせとする。 First, learning data is read from the learning data storage unit 20 in step 100, and in step 102, the correlation with the HDS-R score is highest according to the forward stepwise method using the learning data read in step 100. A combination of speech prosodic feature values is searched, and the searched combination of speech prosodic feature values is set as a combination of selected speech prosodic feature values.

そして、ステップ１０４で、上記ステップ１００で読み出した学習データを用いて、学習モデルにおける、上記ステップ１０２で選択された音声韻律特徴量の各々に対する重み付けを学習して、決定する。 In step 104, the learning data read in step 100 is used to learn and determine the weighting for each of the speech prosodic feature values selected in step 102 in the learning model.

次のステップ１０６では、上記ステップ１０２で選択された音声韻律特徴量の組み合わせと、上記ステップ１０４で学習された重み付けとを、メモリ（図示省略）に記憶させて、学習処理ルーチンを終了する。 In the next step 106, the combination of the speech prosodic feature values selected in step 102 and the weight learned in step 104 are stored in a memory (not shown), and the learning processing routine is terminated.

学習データが追加されるなど、学習データ記憶部２０の学習データが更新されるたびに、コンピュータ１４において、上記の学習処理ルーチンが再度実行される。 Each time the learning data in the learning data storage unit 20 is updated, such as when learning data is added, the above-described learning processing routine is executed again in the computer 14.

そして、操作部（図示省略）によって危険度の算出指示が入力されると、コンピュータ１４において、図５に示す認知機能障害危険度算出処理ルーチンが実行される。 When a risk level calculation instruction is input through the operation unit (not shown), the computer 14 executes a cognitive function disorder risk level calculation process routine shown in FIG.

まず、ステップ１１８において、スピーカ１３から質問データを再生し、ステップ１２０において、音声入力部１２を介して質問の回答として入力された音声データを取得し、ステップ１２２で、取得した音声データから、１３０種類の音声韻律特徴量を抽出する。 First, in step 118, question data is reproduced from the speaker 13, and in step 120, voice data input as a question answer via the voice input unit 12 is acquired. In step 122, 130 is obtained from the acquired voice data. Extracts phonetic prosody features of types.

そして、ステップ１２４において、上記ステップ１２２で抽出した音声韻律特徴量から、上記学習処理ルーチンで選択された音声韻律特徴量の組み合わせに対応する音声韻律特徴量を選択する。次のステップ１２６では、上記学習処理ルーチンで学習された重み付けを適用した学習モデルに、上記ステップ１２４で選択した音声韻律特徴量の組み合わせを入力して、認知機能障害の危険度を算出し、ステップ１２８において、表示装置１６に算出結果を表示させて、認知機能障害危険度算出処理ルーチンを終了する。 In step 124, a speech prosody feature value corresponding to the combination of the speech prosody feature values selected in the learning processing routine is selected from the speech prosody feature values extracted in step 122. In the next step 126, the combination of the phonetic prosody features selected in step 124 is input to the learning model to which the weighting learned in the learning processing routine is applied, and the risk level of cognitive dysfunction is calculated. In 128, the calculation result is displayed on the display device 16, and the cognitive impairment risk calculation processing routine is terminated.

以上説明したように、第１の発明の実施の形態に係る認知機能障害危険度算出装置によれば、ＨＤＳ−Ｒスコアとの相関が最も高くなる音声韻律特徴量の組み合わせと、音声韻律特徴量の組み合わせの各々に対して学習された重み付けとに基づいて、認知機能障害の危険度を精度良く算出することができる。 As described above, according to the cognitive dysfunction risk degree calculating device according to the embodiment of the first invention, the combination of the speech prosody feature amount having the highest correlation with the HDS-R score, and the speech prosody feature amount. Based on the weights learned for each of the combinations, the risk level of cognitive impairment can be calculated with high accuracy.

また、従来の改訂長谷川式簡易知能評価スケールを用いた判定方法と比較した場合に、質問に対する回答の正しさや発音の正しさなど、言語の内容を考慮する必要がないため、客観的に、認知機能障害の危険度を判定することができる。また、繰り返し、認知機能障害の危険度を算出する場合であっても、認知機能障害の危険度を精度良く算出することができる。 In addition, when compared with the judgment method using the conventional revised Hasegawa simplified intelligence evaluation scale, it is not necessary to consider the contents of the language such as the correctness of the answer to the question and the correctness of the pronunciation, so objectively, The risk level of cognitive impairment can be determined. In addition, even when the risk level of cognitive dysfunction is repeatedly calculated, the risk level of cognitive dysfunction can be calculated with high accuracy.

次に、第２の実施の形態について説明する。なお、第１の実施の形態と同様の構成となる部分については、同一符号を付して説明を省略する。 Next, a second embodiment will be described. In addition, about the part which becomes the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第２の実施の形態では、複数種類の韻律特徴量から主成分分析によって得られた合成変数を用いて、認知機能障害の危険度を算出している点が、第１の実施の形態と主に異なっている。 In the second embodiment, the risk of cognitive impairment is calculated using a composite variable obtained by principal component analysis from a plurality of types of prosodic feature quantities. Is different.

図６に示すように、第２の実施の形態に係る認知機能障害危険度算出装置２１０のコンピュータ２１４は、学習データ記憶部２０、変数合成部２２１、合成変数選択部２２２、重み付け決定部２２４、質問再生部２５、音声取得部２６、特徴量抽出部２８、合成変数算出部２２９、危険度算出部２３０を備えている。 As shown in FIG. 6, the computer 214 of the cognitive dysfunction risk calculation apparatus 210 according to the second embodiment includes a learning data storage unit 20, a variable synthesis unit 221, a synthesis variable selection unit 222, a weight determination unit 224, A question reproduction unit 25, a voice acquisition unit 26, a feature amount extraction unit 28, a composite variable calculation unit 229, and a risk level calculation unit 230 are provided.

変数合成部２２１は、学習データ記憶部２０に記憶された学習データの例えば１３０種類の音声韻律特徴量に基づいて、主成分分析（ＰＣＡ）による変数合成を行って、例えば１３０種類の主成分（合成変数）を生成する。また、変数合成部２２１は、各学習データについて、１３０種類の音声韻律特徴量から１３０種類の合成変数に変換する。 The variable synthesis unit 221 performs variable synthesis by principal component analysis (PCA) based on, for example, 130 types of speech prosodic features of the learning data stored in the learning data storage unit 20, for example, 130 types of principal components ( Composite variable). In addition, the variable synthesis unit 221 converts each type of learning data from 130 types of speech prosody features to 130 types of synthesis variables.

合成変数選択部２２２は、学習データ記憶部２０に記憶された多数の学習データの合成変数及び認知機能障害の危険度（ＨＤＳ−Ｒスコア）に基づいて、フォワードステップワイズ法に従って、認知機能障害の危険度との相関が最も高くなる合成変数の組み合わせを探索し、探索された合成変数の組み合わせを、選択する合成変数の組み合わせとする。 The synthetic variable selection unit 222 performs the cognitive impairment according to the forward stepwise method based on the synthetic variable of many learning data stored in the learning data storage unit 20 and the risk of cognitive impairment (HDS-R score). A combination of synthetic variables having the highest correlation with the degree of risk is searched, and the combination of searched synthetic variables is selected as a combination of synthetic variables to be selected.

また、重み付け決定部２２４では、選択された合成変数の組み合わせを入力として、認知機能障害の危険度を出力する学習モデルを構築し、学習データ記憶部２０に記憶された多数の学習データにおける、選択された合成変数の組み合わせと認知機能障害の危険度（ＨＤＳ−Ｒスコア）とに基づいて、例えば合成変数の組み合わせを説明変数とし、かつ、ＨＤＳ−Ｒスコアを目標属性とした重回帰分析を行うことによって、学習モデルにおける、各合成変数に対する重み付けを学習する。 In addition, the weighting determination unit 224 receives the combination of the selected synthetic variables as input, constructs a learning model that outputs the risk of cognitive dysfunction, and selects a large number of learning data stored in the learning data storage unit 20. Based on the combination of the synthesized variables and the risk of cognitive impairment (HDS-R score), for example, a multiple regression analysis is performed with the combination of the synthetic variables as explanatory variables and the HDS-R score as a target attribute. Thus, the weighting for each synthetic variable in the learning model is learned.

合成変数算出部２２９は、特徴量抽出部２８によって抽出された１３０種類の音声韻律特徴量に基づいて、変数合成部２２１で生成された１３０種類の合成変数を算出する。 The synthesis variable calculation unit 229 calculates 130 types of synthesis variables generated by the variable synthesis unit 221 based on the 130 types of speech prosodic feature values extracted by the feature amount extraction unit 28.

危険度算出部２３０は、合成変数算出部２２９によって算出された合成変数のうち、合成変数選択部２２２によって選択された合成変数の組み合わせに対応するものを取り出し、重み付け決定部２４によって重み付けが決定された学習モデルに、取り出した合成変数の組み合わせを入力して、認知機能障害の危険度を算出する。 The risk degree calculation unit 230 takes out the combination variable calculated by the combination variable calculation unit 229 corresponding to the combination of the combination variables selected by the combination variable selection unit 222, and the weight determination unit 24 determines the weight. The combination of the extracted synthetic variables is input to the learning model, and the risk level of cognitive impairment is calculated.

次に、第２の実施の形態に係る学習処理ルーチンについて説明する。 Next, a learning processing routine according to the second embodiment will be described.

まず、学習データ記憶部２０から学習データを読み出し、学習データの音声韻律特徴量に基づいて、主成分分析を行って、合成変数を生成するとともに、各学習データの音声韻律特徴量を合成変数に変換する。そして、学習データを用いて、フォワードステップワイズ法に従って、ＨＤＳ−Ｒスコアとの相関が最も高くなる合成変数の組み合わせを探索し、探索された合成変数の組み合わせを、選択する合成変数の組み合わせとする。 First, the learning data is read from the learning data storage unit 20, and the principal component analysis is performed based on the speech prosody feature value of the learning data to generate a synthesis variable, and the speech prosody feature value of each learning data is used as the synthesis variable. Convert. Then, using the learning data, in accordance with the forward stepwise method, a combination of synthetic variables having the highest correlation with the HDS-R score is searched, and the combination of searched synthetic variables is selected as a combination of synthetic variables to be selected. .

そして、学習データを用いて、学習モデルにおける、上記で選択された合成変数の各々に対する重み付けを学習して、決定する。次に、上記で選択された合成変数の組み合わせと、上記で学習された重み付けとを、メモリ（図示省略）に記憶させて、学習処理ルーチンを終了する。 Then, the learning data is used to learn and determine the weighting for each of the synthesis variables selected above in the learning model. Next, the combination of the synthesis variables selected above and the weighting learned above are stored in a memory (not shown), and the learning processing routine is terminated.

また、第２の実施の形態に係る認知機能障害危険度算出処理ルーチンについて説明する。 A cognitive impairment risk calculation processing routine according to the second embodiment will be described.

まず、スピーカ１３から質問データを再生し、音声入力部１２を介して質問の回答として入力された音声データを取得し、取得した音声データから、１３０種類の音声韻律特徴量を抽出する。そして、抽出した１３０種類の音声韻律特徴量から、学習処理ルーチンで生成された１３０種類の合成変数を算出する。 First, question data is reproduced from the speaker 13, voice data input as a question answer via the voice input unit 12 is acquired, and 130 types of voice prosodic feature quantities are extracted from the acquired voice data. Then, 130 types of synthesis variables generated by the learning processing routine are calculated from the 130 types of extracted speech prosodic features.

そして、上記で算出した合成変数から、上記学習処理ルーチンで選択された合成変数の組み合わせに対応する合成変数を選択する。次に、上記学習処理ルーチンで学習された重み付けを適用した学習モデルに、上記で選択した合成変数の組み合わせを入力して、認知機能障害の危険度を算出し、表示装置１６に算出結果を表示させて、認知機能障害危険度算出処理ルーチンを終了する。 Then, a composite variable corresponding to the combination of the composite variables selected in the learning processing routine is selected from the composite variables calculated above. Next, the combination of the synthetic variables selected above is input to the learning model to which the weighting learned in the learning processing routine is applied, the risk level of cognitive impairment is calculated, and the calculation result is displayed on the display device 16 Then, the cognitive dysfunction risk degree calculation processing routine is terminated.

次に、上記の第１の実施の形態及び第２の実施の形態の手法を用いて認知機能障害の危険度を算出した結果について説明する。 Next, the result of calculating the risk of cognitive impairment using the techniques of the first and second embodiments will be described.

ここで、１１５名の高齢者（年齢３８−９９歳、男性３２名、女性８３名）から、総数３１９の音声データを収集して、認知機能障害の危険度を算出し、ＨＤＳ−Ｒスコアとの相関性を求めた。図７〜図１０では、第１の実施の形態で説明した認知機能障害危険度の算出方法による算出結果をＳＰＣＩＲ_{ＦＳＷ−ＡＩＣ}と表わし、第２の実施の形態で説明した認知機能障害危険度の算出方法による算出結果をＳＰＣＩＲ_{ＰＣＡ−ＦＳＷ−ＡＩＣ}と表わす。 Here, a total of 319 audio data were collected from 115 elderly people (aged 38-99 years old, 32 men, 83 women), the risk of cognitive impairment was calculated, and the HDS-R score The correlation was calculated. 7 to 10, the calculation result obtained by the cognitive impairment risk calculation method described in the first embodiment is represented as SPCI _FSW-AIC, and the cognitive impairment disorder risk described in the second embodiment. The calculation result obtained by the calculation method is represented as SPCI _PCA-FSW-AIC .

図７、図８に示すように、第１の実施の形態の手法では、１９個の韻律特徴量が選択され、そのうち、Ｆ１２９、Ｆ１２８、Ｆ１１８、Ｆ１３０、Ｆ５７、Ｆ８、Ｆ１０１、Ｆ５９、Ｆ１１０、Ｆ７２、Ｆ６９、Ｆ７３が、有意な韻律特徴量であると判断された。このように、ＡＩＣ規準を用いた特徴量選択により、有意な韻律特徴量が選択されていることがわかった。 As shown in FIG. 7 and FIG. 8, in the method of the first embodiment, 19 prosodic feature quantities are selected, of which F129, F128, F118, F130, F57, F8, F101, F59, F110, F72, F69, and F73 were determined to be significant prosodic feature values. Thus, it was found that a significant prosodic feature value was selected by feature value selection using the AIC criterion.

また、ＨＤＳ−Ｒスコアとの相関Ｒが、０．６７であり、補正済み決定係数（信頼度）がＲ^２＝０．４１であった。図９に示すように、算出された認知機能障害の危険度とＨＤＳ−Ｒスコアとの間に、比較的有意な相関を持つことが分かった。 The correlation R with the HDS-R score was 0.67, and the corrected determination coefficient (reliability) was R ² = 0.41. As shown in FIG. 9, it was found that there was a relatively significant correlation between the calculated risk of cognitive impairment and the HDS-R score.

また、第２の実施の形態の手法では、１３０種類の全特徴を適切に合成した５５個の主成分（合成変数）が選択され、多くの合成変数が、有意であると判断された。高い固有値を持つ主成分（例えばＰＣ２、ＰＣ７、ＰＣ４）のみならず、ＰＣ７７、ＰＣ１１５、ＰＣ１０３などの低い固有値を持つ主成分も、ＨＤＳ−Ｒスコアの推定に有用であることが示された。 In the method of the second embodiment, 55 principal components (synthetic variables) obtained by appropriately synthesizing all 130 types of features are selected, and many synthetic variables are determined to be significant. It was shown that principal components having low eigenvalues such as PC77, PC115, and PC103 as well as principal components having high eigenvalues (for example, PC2, PC7, PC4) are useful for HDS-R score estimation.

また、ＨＤＳ−Ｒスコアとの相関Ｒが、０．７７であり、図１０に示すように、算出された認知機能障害の危険度とＨＤＳ−Ｒスコアとの間に強い相関を持つことが示唆された。また、補正済み決定係数Ｒ^２＝０．５０であり、韻律特徴に基づく高齢者の発話音声解析の認知機能障害のスクリーニングへの応用可能性として有意義な結果となった。 Further, the correlation R with the HDS-R score is 0.77, and as shown in FIG. 10, it is suggested that there is a strong correlation between the calculated risk of cognitive impairment and the HDS-R score. It was done. Further, the corrected determination coefficient R ² = 0.50, which is a significant result as an applicability to the screening of cognitive dysfunction in speech analysis of elderly people based on prosodic features.

以上説明したように、第２の実施の形態に係る認知機能障害危険度算出装置によれば、ＨＤＳ−Ｒスコアとの相関が最も高くなる、音声韻律特徴量の合成変数の組み合わせと、合成変数の組み合わせの各々に対して学習された重み付けとに基づいて、認知機能障害の危険度を精度良く算出することができる。
なお、上記の実施の形態では、主成分分析を行って、音声韻律特徴量の合成変数を生成する場合を例に説明したが、これに限定されるものではなく、例えば、因子分析を行って、音声韻律特徴量の合成変数を生成するようにしてもよい。 As described above, according to the cognitive dysfunction risk calculation apparatus according to the second embodiment, the combination of the speech prosodic feature value synthesis variable and the synthesis variable that have the highest correlation with the HDS-R score. Based on the weights learned for each of the combinations, the risk level of cognitive impairment can be calculated with high accuracy.
In the embodiment described above, the case where the principal component analysis is performed to generate the synthesis variable of the speech prosody feature amount is described as an example. However, the present invention is not limited to this. For example, the factor analysis is performed. Alternatively, a synthesis variable of the speech prosody feature value may be generated.

次に、第３の実施の形態について説明する。なお、第１の実施の形態と同様の構成となる部分については、同一符号を付して説明を省略する。 Next, a third embodiment will be described. In addition, about the part which becomes the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第３の実施の形態では、学習データから韻律特徴量の選択及び重み付けの学習を行う装置と、認知機能障害の危険度を算出する装置とがネットワークを介して接続されている点が、第１の実施の形態と主に異なっている。 In the third embodiment, the first point is that a device for selecting prosody features and learning for weighting from learning data and a device for calculating the risk of cognitive impairment are connected via a network. This is mainly different from the embodiment.

図１１に示すように、第３の実施の形態に係る認知機能障害危険度算出システム３１０は、学習データから韻律特徴量の選択及び重み付けの学習を行う特徴量学習装置３１１と、入力された音声データから、認知機能障害の危険度を算出する複数の認知機能障害危険度算出装置３１２とを備えており、特徴量学習装置３１１と、複数の認知機能障害危険度算出装置３１２とは、インターネットなどのネットワーク３１３で接続されている。 As shown in FIG. 11, the cognitive dysfunction risk calculation system 310 according to the third embodiment includes a feature amount learning device 311 that performs prosody feature amount selection and weight learning from learning data, and input speech. A plurality of cognitive dysfunction risk calculating devices 312 that calculate the risk of cognitive dysfunction from the data, and the feature quantity learning device 311 and the plurality of cognitive dysfunction risk calculating devices 312 include the Internet Network 313.

特徴量学習装置３１１は、例えばコンピュータサーバで構成され、機能的には次に示すように構成されている。上記図１１に示すように、特徴量学習装置３１１は、学習データ記憶部２０、特徴量選択部２２、重み付け決定部２４、及びパラメータ送信部３２４を備えている。 The feature quantity learning device 311 is constituted by a computer server, for example, and is functionally configured as follows. As shown in FIG. 11, the feature quantity learning device 311 includes a learning data storage unit 20, a feature quantity selection unit 22, a weight determination unit 24, and a parameter transmission unit 324.

パラメータ送信部３２４は、特徴量選択部２２によって選択された韻律特徴量の組み合わせ、及び重み付け決定部２４によって決定された学習モデルの重み付けを示すパラメータを、ネットワーク３１３を介して複数の認知機能障害危険度算出装置３１２に送信する。 The parameter transmission unit 324 receives a parameter indicating the combination of the prosodic feature values selected by the feature value selection unit 22 and the weight of the learning model determined by the weight determination unit 24 via the network 313. To the degree calculation device 312.

認知機能障害危険度算出装置３１２は、音声入力部１２、スピーカ１３、コンピュータ３１４、及び表示装置１６を備えている。 The cognitive function disorder risk calculation device 312 includes a voice input unit 12, a speaker 13, a computer 314, and a display device 16.

コンピュータ３１４は、パラメータ受信部３２５、質問再生部２５、音声取得部２６、特徴量抽出部２８、及び危険度算出部３０を備えている。 The computer 314 includes a parameter reception unit 325, a question reproduction unit 25, a voice acquisition unit 26, a feature amount extraction unit 28, and a risk level calculation unit 30.

パラメータ受信部３２５は、特徴量学習装置３１１から送信されたパラメータを受信し、メモリ（図示省略）に格納しておく。 The parameter receiving unit 325 receives the parameter transmitted from the feature amount learning device 311 and stores it in a memory (not shown).

危険度算出部３０は、特徴量抽出部２８によって抽出された音声韻律特徴量のうち、メモリに格納されたパラメータが示す韻律特徴量の組み合わせに対応するものを取り出し、メモリに格納されたパラメータが示す重み付けを適用した学習モデルに、取り出した韻律特徴量の組み合わせを入力して、認知機能障害の危険度を算出する。 The risk level calculation unit 30 extracts speech prosodic feature values extracted by the feature value extraction unit 28 that correspond to the combination of prosodic feature values indicated by the parameters stored in the memory, and the parameters stored in the memory are extracted. The combination of the extracted prosodic feature quantities is input to a learning model to which the weighting shown is applied, and the risk of cognitive impairment is calculated.

次に、本実施の形態に係る認知機能障害危険度算出システム３１０の作用について説明する。 Next, the operation of the cognitive dysfunction risk calculation system 310 according to the present embodiment will be described.

オペレータが、学習用の音声データから得られた複数種類の音声韻律特徴量とＨＤＳ−Ｒを、学習データとして特徴量学習装置３１１に入力して、学習データ記憶部２０に記憶される学習データが追加される度に、上記の第１の実施の形態で説明した学習処理ルーチンが実行される。そして、特徴量学習装置３１１は、選択された音声韻律特徴量の組み合わせと、学習された重み付けとを示すパラメータを、ネットワーク３１３を介して認知機能障害危険度算出装置３１２に送信する。 The operator inputs a plurality of types of prosodic features and HDS-R obtained from the learning speech data to the feature learning device 311 as learning data, and the learning data stored in the learning data storage unit 20 Each time it is added, the learning process routine described in the first embodiment is executed. Then, the feature amount learning device 311 transmits a parameter indicating the selected combination of the phonetic prosody feature amount and the learned weighting to the cognitive dysfunction risk calculating device 312 via the network 313.

そして、認知機能障害危険度算出装置３１２では、上記のパラメータを受信して、メモリ（図示省略）に格納する。 The cognitive dysfunction risk degree calculation device 312 receives the above parameters and stores them in a memory (not shown).

また、認知機能障害危険度算出装置３１２において、危険度の算出指示が入力されると、コンピュータ３１４において、上記の第１の実施の形態で説明した認知機能障害危険度算出処理ルーチンが実行される。 In addition, when a risk level calculation instruction is input to the cognitive function disorder risk level calculation device 312, the computer 314 executes the cognitive function level risk level calculation processing routine described in the first embodiment. .

このように、ＨＤＳ−Ｒスコアとの相関が最も高くなる音声韻律特徴量の組み合わせと、音声韻律特徴量の組み合わせの各々に対して学習された重み付けとを、特徴量学習装置で学習し、ネットワークを介して、学習結果を、複数の認知機能障害危険度算出装置に配布することにより、各認知機能障害危険度算出装置では、最新の学習結果を用いて、認知機能障害の危険度を算出することができる。 In this way, the combination of speech prosody features that has the highest correlation with the HDS-R score and the weights learned for each combination of speech prosody features are learned by the feature value learning device, and the network By distributing the learning results to a plurality of cognitive impairment risk calculators, each cognitive impairment risk calculator uses the latest learning results to calculate the risk of cognitive impairment. be able to.

なお、上記の第２の実施の形態の構成に、上記の実施の形態を適用するようにしてもよい。この場合には、認知機能障害危険度算出システムは、ネットワーク３１３で接続された、学習データから合成変数の選択及び重み付けの学習を行う合成変数学習装置と、複数の認知機能障害危険度算出装置とを備えており、合成変数学習装置は、学習データ記憶部２０、変数合成部２２１、合成変数選択部２２２、重み付け決定部２２４、及びパラメータ送信部３２４を備える。また、認知機能障害危険度算出装置のコンピュータは、パラメータ受信部３２５、質問再生部２５、音声取得部２６、特徴量抽出部２８、合成変数算出部２２９、及び危険度算出部３０を備える。 Note that the above-described embodiment may be applied to the configuration of the above-described second embodiment. In this case, the cognitive dysfunction risk degree calculation system includes a synthetic variable learning apparatus that performs selection of a synthetic variable and learning of weighting from learning data, and a plurality of cognitive dysfunction risk risk calculation apparatuses connected by the network 313. The combined variable learning apparatus includes a learning data storage unit 20, a variable combining unit 221, a combined variable selecting unit 222, a weight determining unit 224, and a parameter transmitting unit 324. The computer of the cognitive dysfunction risk calculation apparatus includes a parameter reception unit 325, a question reproduction unit 25, a voice acquisition unit 26, a feature amount extraction unit 28, a synthetic variable calculation unit 229, and a risk calculation unit 30.

次に、第４の実施の形態について説明する。なお、第１の実施の形態及び第３の実施の形態と同様の構成となる部分については、同一符号を付して説明を省略する。 Next, a fourth embodiment will be described. In addition, about the part which becomes the structure similar to 1st Embodiment and 3rd Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第４の実施の形態では、入出力端末と、認知機能障害危険度算出装置とがネットワークを介して接続されている点が、第３の実施の形態と主に異なっている。 The fourth embodiment is mainly different from the third embodiment in that the input / output terminal and the cognitive dysfunction risk calculating device are connected via a network.

図１２に示すように、第４の実施の形態に係る認知機能障害危険度算出システム４１０では、複数の入出力端末４１２と認知機能障害危険度算出装置４１４とを備えており、複数の入出力端末４１２と、認知機能障害危険度算出装置４１４とは、ネットワーク３１３で接続されている。 As shown in FIG. 12, the cognitive dysfunction risk calculation system 410 according to the fourth embodiment includes a plurality of input / output terminals 412 and a cognitive dysfunction risk calculation device 414, and a plurality of input / output terminals. The terminal 412 and the cognitive dysfunction risk degree calculation device 414 are connected via a network 313.

入出力端末４１２は、音声入力部１２、スピーカ１３、表示装置１６、質問再生部２５、音声取得部２６、及び通信部４２５を備えており、スピーカ１３から再生された質問に対する回答として、音声入力部１２を介してユーザから音声データが入力されると、通信部４２５によって、ネットワーク３１３を介して音声データを認知機能障害危険度算出装置４１４へ送信する。 The input / output terminal 412 includes a voice input unit 12, a speaker 13, a display device 16, a question reproduction unit 25, a voice acquisition unit 26, and a communication unit 425, and inputs voice as an answer to a question reproduced from the speaker 13. When voice data is input from the user via the unit 12, the communication unit 425 transmits the voice data to the cognitive impairment risk calculating device 414 via the network 313.

認知機能障害危険度算出装置４１４は、通信部４２４によって音声データを受信すると、音声データに基づいて、認知機能障害の危険度を算出して、通信部４２４によって、算出結果を、ネットワーク３１３を介して入出力端末４１２へ送信する。 When the communication unit 424 receives the voice data, the cognitive dysfunction risk calculation device 414 calculates the risk of cognitive dysfunction based on the voice data, and the communication unit 424 sends the calculation result via the network 313. To the input / output terminal 412.

入出力端末４１２は、通信部４２５によって、算出結果を受信すると、表示装置１６に、算出された認知機能障害の危険度を表示する。 When the communication unit 425 receives the calculation result, the input / output terminal 412 displays the calculated risk level of cognitive impairment on the display device 16.

なお、認知機能障害危険度算出装置４１４の他の構成及び作用については、第１の実施の形態と同様であるため、説明を省略する。 In addition, since it is the same as that of 1st Embodiment about the other structure and effect | action of the cognitive function disorder risk calculation apparatus 414, description is abbreviate | omitted.

このように、複数の入出力端末と、認知機能障害危険度算出装置とをネットワークで接続し、音声データ及び算出結果を送受信することにより、各入出力端末で、認知機能障害の危険度を判定することができる。 In this way, multiple input / output terminals and a cognitive dysfunction risk calculation device are connected via a network, and voice data and calculation results are transmitted / received to determine the risk of cognitive dysfunction at each input / output terminal. can do.

なお、上記の第２の実施の形態の構成に、上記の実施の形態を適用するようにしてもよい。この場合には、認知機能障害危険度算出システムは、ネットワーク３１３で接続された、各入出力端末４１２と、認知機能障害危険度算出装置２１０と同様の機能を有する装置とを備えるようにすればよい。 Note that the above-described embodiment may be applied to the configuration of the above-described second embodiment. In this case, the cognitive dysfunction risk calculation system includes each input / output terminal 412 and a device having the same function as the cognitive dysfunction risk calculation device 210 connected by the network 313. Good.

また、上記の第１の実施の形態〜第４の実施の形態では、１３０種類の音声韻律特徴量を用いる場合を例に説明したが、これに限定されるものではなく、他の種類の音声韻律特徴量を用いてもよく、また、より少ない又はより多い種類の音声韻律特徴量を用いてもよい。 In the first to fourth embodiments, the case where 130 types of speech prosody features are used has been described as an example. However, the present invention is not limited to this, and other types of speech prosody features are used. Prosodic feature quantities may be used, and fewer or more types of speech prosodic feature quantities may be used.

また、フォワードステップワイズ法（変数増減法）を用いて、最適な音声韻律特徴量又は合成変数の組み合わせを探索する場合を例に説明したが、これに限定されるものではなく、他の逐次選択法を用いて、最適な音声韻律特徴量又は合成変数の組み合わせを探索するようにしてもよい。例えば、変数増加法、変数減少法などを用いて、最適な音声韻律特徴量又は合成変数の組み合わせを探索するようにしてもよい。また、ＥＭアルゴリズム、遺伝的アルゴリズム（ＧＡ）、粒子群最適化（ＰａｒｔｉｃｌｅＳｗａｒｍＯｐｔｉｍｉｚａｔｉｏｎ、ＰＳＯ）などの同時選択法を用いて、最適な音声韻律特徴量又は合成変数の組み合わせを探索するようにしてもよい。 In addition, the case where the optimal speech prosodic feature value or combination of synthesis variables is searched using the forward stepwise method (variable increase / decrease method) has been described as an example, but the present invention is not limited to this, and other sequential selection A method may be used to search for an optimal speech prosodic feature value or combination of synthetic variables. For example, an optimal speech prosodic feature value or combination of synthesis variables may be searched using a variable increase method, a variable decrease method, or the like. In addition, an optimal speech prosodic feature value or combination of synthetic variables may be searched using a simultaneous selection method such as an EM algorithm, a genetic algorithm (GA), or particle swarm optimization (PSO). Good.

また、重回帰分析を用いて、当該学習モデルで用いる、各音声韻律特徴量又は各合成変数の重み付けを学習する場合を例に説明したが、これに限定されるものではなく、例えば、リッジ回帰、サポートベクトル回帰（ＳＶ回帰）、カーネル回帰などの手法を用いて、各音声韻律特徴量又は各合成変数の重み付けを学習するようにしてもよい。
また、上記の実施の形態では、長谷川式スコアなど認知機能テストの点数（０〜３０）に相当する数値を、認知機能障害の危険度として算出する場合を例に説明したが、これに限定されるものではなく、例えば，健常（ＮＬ）、認知症疑い（ＭＣＩ）、認知症（ＡＤ）の３分類で表わされる認知機能障害の危険度を求めるようにしてもよい。この場合には、ベイジアンネットワーク、正準判別分析、線形判別分析、ニューラルネットワーク、ナイーブベイズ法、サポートベクトルマシン（ＳＶＭ）などの学習モデルを用いて、認知機能障害の危険度を算出するようにすればよい。また、当該学習モデルで用いる、各音声韻律特徴量又は各合成変数の重み付けを学習により決定するようにすればよい。 Moreover, although the case where the weighting of each speech prosodic feature amount or each synthetic variable used in the learning model is learned by using multiple regression analysis has been described as an example, the present invention is not limited to this, for example, ridge regression Further, the weight of each speech prosodic feature quantity or each synthesis variable may be learned using a method such as support vector regression (SV regression), kernel regression, or the like.
In the above embodiment, the case where a numerical value corresponding to the score (0 to 30) of the cognitive function test such as the Hasegawa formula score is calculated as the risk level of cognitive dysfunction is described as an example. For example, the risk of cognitive dysfunction represented by three categories of normal (NL), suspected dementia (MCI), and dementia (AD) may be obtained. In this case, the risk of cognitive impairment is calculated using a learning model such as Bayesian network, canonical discriminant analysis, linear discriminant analysis, neural network, naive Bayes method, support vector machine (SVM), etc. That's fine. Further, the weight of each speech prosody feature amount or each synthesis variable used in the learning model may be determined by learning.

また、質問をスピーカから再生する場合を例に説明したが、これに限定されるものではなく、表示装置によって質問を表示するようにしてもよい。
また、質問に対する回答として入力された音声データに基づいて、認知機能障害の危険度を算出する場合を例に説明したが、これに限定されるものではなく、音声モニタリングにより入力された音声データに基づいて、認知機能障害の危険度を算出するようにしてもよい。 Moreover, although the case where the question is reproduced from the speaker has been described as an example, the present invention is not limited to this, and the question may be displayed by a display device.
Moreover, although the case where the risk level of cognitive dysfunction is calculated based on the voice data input as an answer to the question has been described as an example, the present invention is not limited to this, but the voice data input by voice monitoring Based on this, the risk level of cognitive dysfunction may be calculated.

１０、２１０、３１２、４１４認知機能障害危険度算出装置
１２音声入力部
１４、２１４、３１４コンピュータ
２０学習データ記憶部
２２特徴量選択部
２４、２２４重み付け決定部
２５質問再生部
２８特徴量抽出部
３０、２３０危険度算出部
２２１変数合成部
２２２合成変数選択部
２２９合成変数算出部
３１０、４１０認知機能障害危険度算出システム
３１１特徴量学習装置 10, 210, 312, 414 Cognitive impairment risk calculator 12 Voice input unit 14, 214, 314 Computer 20 Learning data storage unit 22 Feature amount selection unit 24, 224 Weight determination unit 25 Question playback unit 28 Feature amount extraction unit 30 , 230 Risk level calculation unit 221 Variable synthesis unit 222 Composite variable selection unit 229 Composite variable calculation units 310 and 410 Cognitive impairment disorder risk calculation system 311 Feature amount learning device

Claims

Based on a plurality of types of prosodic feature quantities extracted from the voice data and a plurality of learning data including the risk level of cognitive impairment obtained for a speaker of the voice data, Feature quantity selection means for selecting a combination of the prosodic feature quantities that have the highest correlation with the risk,
Weight determination means for determining a weight for each of the selected combinations of prosodic feature quantities based on the combination of the selected prosodic feature quantities and the degree of risk of each of the plurality of learning data;
Feature quantity extraction means for extracting the plurality of types of prosodic feature quantities from the input voice data;
Based on the combination of the selected prosodic feature quantities of the prosodic feature quantities extracted by the feature quantity extracting means and the weights determined by the weight determining means, the risk level of cognitive dysfunction is calculated. Risk calculation means;
Cognitive impairment risk calculation device including

Analysis processing on the plurality of types of prosodic feature quantities of the plurality of learning data including a plurality of types of prosodic feature quantities extracted from the speech data and a risk level of cognitive impairment obtained for a speaker of the speech data And a synthetic variable generating means for generating a plurality of synthetic variables obtained by synthesizing the plural types of prosodic feature quantities;
Based on the plurality of learning data and the plurality of types of generated synthetic variables, a synthetic variable selection that selects the combination of the synthetic variables that has the highest correlation with the degree of risk from the plurality of types of synthetic variables. Means,
Weighting determination for determining a weight for each of the selected combination of synthetic variables based on the combination of the synthetic variables obtained for each of the plurality of learning data and the risk of each of the plurality of learning data Means,
Feature quantity extraction means for extracting the plurality of types of prosodic feature quantities from the input voice data;
Risk calculation means for calculating the risk of cognitive impairment based on the combination of the synthetic variables obtained from the prosodic feature values extracted by the feature value extraction means and the weights determined by the weight determination means When,
Cognitive impairment risk calculation device including

The plurality of types of prosodic feature quantities include a feature quantity relating to a frequency component of speech, a feature quantity relating to a speech formant structure, a feature quantity relating to speech volume, a feature quantity relating to speech speed, and a feature relating to a response time until a question is answered. The cognitive impairment risk calculating device according to claim 1 or 2, comprising at least one of the quantities.

The cognitive dysfunction risk degree calculation device according to any one of claims 1 to 3, wherein the risk level of the cognitive dysfunction of the learning data is determined by a Hasegawa simple intelligence evaluation scale for the speaker. .

The cognitive dysfunction risk degree calculation device according to any one of claims 1 to 4, wherein the feature amount extraction unit extracts the plurality of types of prosodic feature amounts from voice data input as an answer to a question.

Based on a plurality of types of prosodic feature quantities extracted from the voice data and a plurality of learning data including the risk level of cognitive impairment obtained for a speaker of the voice data, Based on the feature quantity selection means for selecting the combination of the prosodic feature quantities that has the highest correlation with the risk degree, and the risk degree and the combination of the selected prosodic feature quantities of each of the plurality of learning data , A feature amount selection device including weight determination means for determining a weight for each of the selected prosodic feature amount combinations;
A feature amount extracting means for extracting the plurality of types of prosodic feature quantities from the input speech data; and a combination of the selected prosodic feature quantities among the prosodic feature quantities extracted by the feature quantity extracting means; A risk calculating device including a risk calculating means for calculating a risk of cognitive impairment based on the weight determined by the weight determining means;
Cognitive impairment risk calculation system.

Analysis processing on the plurality of types of prosodic feature quantities of the plurality of learning data including a plurality of types of prosodic feature quantities extracted from the speech data and a risk level of cognitive impairment obtained for a speaker of the speech data And a synthetic variable generating means for generating a plurality of synthetic variables obtained by synthesizing the plural types of prosodic feature quantities,
Based on the plurality of learning data and the plurality of types of generated synthetic variables, a synthetic variable selection that selects the combination of the synthetic variables that has the highest correlation with the degree of risk from the plurality of types of synthetic variables. And a weight for each of the selected combination of synthesis variables is determined based on the combination of the combination variables obtained for each of the plurality of learning data and the risk of each of the plurality of learning data. A synthetic variable selection device including a weight determination means for
Feature quantity extraction means for extracting the plurality of types of prosodic feature quantities from the input speech data; a combination of the synthetic variables obtained from the prosodic feature quantities extracted by the feature quantity extraction means; and the weight determination means A risk level calculation device including a risk level calculation means for calculating a risk level of cognitive impairment based on the weight determined by
Cognitive impairment risk calculation system.

Computer
Based on a plurality of types of prosodic feature quantities extracted from the voice data and a plurality of learning data including the risk level of cognitive impairment obtained for a speaker of the voice data, Feature quantity selection means for selecting a combination of the prosodic feature quantities that has the highest correlation with the risk level;
Weight determination means for determining a weight for each of the selected combination of prosodic feature quantities based on the combination of the selected prosodic feature quantities and the degree of risk of each of the plurality of learning data;
A feature amount extracting means for extracting the plurality of types of prosodic feature quantities from the input speech data; and a combination of the selected prosodic feature quantities among the prosodic feature quantities extracted by the feature quantity extracting means; A program for functioning as a risk degree calculating means for calculating a risk degree of cognitive dysfunction based on the weight determined by the weight determining means.

Computer
Analysis processing on the plurality of types of prosodic feature quantities of the plurality of learning data including a plurality of types of prosodic feature quantities extracted from the speech data and a risk level of cognitive impairment obtained for a speaker of the speech data And a synthetic variable generating means for generating a plurality of synthetic variables obtained by synthesizing the plural types of prosodic feature quantities,
Based on the plurality of learning data and the plurality of types of generated synthetic variables, a synthetic variable selection that selects the combination of the synthetic variables that has the highest correlation with the degree of risk from the plurality of types of synthetic variables. means,
Weighting determination for determining a weight for each of the selected combination of synthetic variables based on the combination of the synthetic variables obtained for each of the plurality of learning data and the risk of each of the plurality of learning data means,
Feature quantity extraction means for extracting the plurality of types of prosodic feature quantities from the input speech data; a combination of the synthetic variables obtained from the prosodic feature quantities extracted by the feature quantity extraction means; and the weight determination means A program for functioning as a risk level calculation means for calculating the risk level of cognitive impairment based on the weight determined by.