JP6731631B2

JP6731631B2 - Cognitive function evaluation device, program

Info

Publication number: JP6731631B2
Application number: JP2016036269A
Authority: JP
Inventors: 満春細川; 北田　耕作; 耕作北田
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2016-02-26
Filing date: 2016-02-26
Publication date: 2020-07-29
Anticipated expiration: 2036-02-26
Also published as: JP2017148431A

Description

本発明は、認知機能評価装置、プログラムに関する。本発明は、より詳しくは、対象者の音声により対象者の認知機能を評価する認知機能評価装置に関する。また、本発明は、コンピュータを認知機能評価装置として機能させるプログラム、コンピュータで認知機能評価方法を実現するプログラムに関する。 The present invention, cognitive function evaluation device, about the program. The present invention is more particularly directed to cognitive evaluation equipment for evaluating the cognitive function of a subject by speech of the subject. The present invention also relates to a program that causes a computer to function as a cognitive function evaluation device, and a program that realizes a cognitive function evaluation method on a computer.

従来、音声データから韻律特徴量を抽出し、音声データの韻律特徴量から、音声データの発話者について認知機能障害の危険度を算出する技術が提案されている（たとえば、特許文献１参照）。引用文献１には、音声データから抽出される複数種類の韻律特徴量の組み合わせと、組み合わせた韻律特徴量の各々に対する重み付けとに基づいて、認知機能障害の危険度を算出する技術が記載されている。引用文献１では、韻律特徴量は、音声の周波数成分に関する特徴量、音声のフォルマント構造に関する特徴量、音声の大きさに関する特徴量、発話速度に関する特徴量、質問に回答するまでの反応時間に関する特徴量の少なくとも１つを含んでいる。そして、それぞれの特徴量は、収集した音声から１つずつの値が求められている。 Conventionally, a technique has been proposed in which a prosody feature amount is extracted from voice data and the risk level of cognitive impairment for a speaker of the voice data is calculated from the prosody feature amount of the voice data (for example, refer to Patent Document 1). The cited document 1 describes a technique for calculating the risk of cognitive impairment based on a combination of a plurality of types of prosodic feature quantities extracted from voice data and weighting for each of the combined prosodic feature quantities. There is. In Cited Document 1, the prosody feature amount is a feature amount regarding a frequency component of a voice, a feature amount regarding a voice formant structure, a feature amount regarding a voice volume, a feature amount regarding an utterance speed, and a feature regarding a reaction time until answering a question. At least one of the amounts. Then, for each feature amount, a value is obtained for each of the collected voices.

特開２０１１−２５５１０６号公報JP, 2011-255106, A

特許文献１では、収集した音声から複数種類の韻律特徴量を求めているが、１種類の韻律特徴量は１種類の値で表されている。そのため、比較的多くの種類の韻律特徴量が必要である。 In Patent Document 1, a plurality of types of prosody feature amounts are obtained from collected voices, but one type of prosody feature amount is represented by one type of value. Therefore, a relatively large number of prosody feature quantities are required.

本発明は、認知機能を評価するための特徴量の抽出を容易にした認知機能評価装置を提供することを目的とする。さらに、本発明は、コンピュータを認知機能評価装置として機能させるプログラムと、コンピュータで認知機能評価方法を実現するプログラムとを提供することを目的とする。 The present invention shall be the object of the invention to provide a cognitive test system to facilitate the extraction of the feature amount for evaluating cognitive functions. A further object of the present invention is to provide a program that causes a computer to function as a cognitive function evaluation device, and a program that implements a cognitive function evaluation method on the computer.

本発明に係る認知機能評価装置は、取得部と特徴抽出部と推定部とを備える。前記取得部は、発話区間における対象者の音声データを取得する。前記特徴抽出部は、発話区間の前記音声データから前記対象者の認知機能を反映する特徴量を抽出する。前記推定部は、前記特徴量の変化に基づいて前記対象者の認知機能を推定する。前記特徴量は、前記音声データの所定の抽出期間における韻律情報を反映した１種類の変量の時系列データで表されている。 The cognitive function evaluation device according to the present invention includes an acquisition unit, a feature extraction unit, and an estimation unit. The acquisition unit acquires the voice data of the target person in the utterance section. The feature extraction unit extracts a feature amount that reflects the cognitive function of the target person from the voice data in the utterance section. The estimation unit estimates the cognitive function of the subject based on the change in the feature amount. The feature amount is represented by one type of variable time series data that reflects prosody information in a predetermined extraction period of the voice data.

本発明に係るプログラムは、コンピュータを認知機能評価装置として機能させるためのプログラムである。 A program according to the present invention is a program for causing a computer to function as a cognitive function evaluation device.

また、本発明に係る別のプログラムは、コンピュータで認知機能評価方法を実現するためのプログラムである。 Another program according to the present invention is a program for realizing a cognitive function evaluation method on a computer.

本発明の構成によれば、認知機能を評価するための特徴量の抽出が容易であるという利点を有する。 According to the configuration of the present invention, there is an advantage that it is easy to extract the feature amount for evaluating the cognitive function.

図１は認知機能評価装置の構成例を示すブロック回路図である。FIG. 1 is a block circuit diagram showing a configuration example of a cognitive function evaluation device. 図２は認知機能評価装置の構成例における発話区間を説明する波形図である。FIG. 2 is a waveform diagram illustrating a speech section in a configuration example of the cognitive function evaluation device. 図３は認知機能評価装置の構成例における特徴量の変化を示すグラフである。FIG. 3 is a graph showing changes in the feature amount in the configuration example of the cognitive function evaluation device. 図４は認知機能評価装置の構成例における別の特徴量の変化を示すグラフである。FIG. 4 is a graph showing a change in another feature amount in the configuration example of the cognitive function evaluation device. 図５は認知機能評価装置の他の構成例を示すブロック回路図である。FIG. 5 is a block circuit diagram showing another configuration example of the cognitive function evaluation device. 図６は認知機能評価装置の使用の形態を示すブロック回路図である。FIG. 6 is a block circuit diagram showing a mode of use of the cognitive function evaluation device.

以下に説明する認知機能評価装置は、人の発話に基づいて認知機能を評価するように構成されている。認知機能の評価は、人の音声に含まれるピッチ周波数とフォルマントとの少なくとも一方を用いて行う。そのため、発話内容を理解することなく人の認知機能を評価することが可能である。 The cognitive function evaluation device described below is configured to evaluate the cognitive function based on the utterance of a person. The cognitive function is evaluated using at least one of the pitch frequency and formant contained in the human voice. Therefore, a person's cognitive function can be evaluated without understanding the utterance content.

図１に示すように、認知機能評価装置１０は、対象者の音声データを取得する取得部１１を備え、取得部１１が取得した音声データから特徴量を抽出する特徴抽出部１２を備える。さらに、認知機能評価装置１０は、特徴量の変化に基づいて対象者の認知機能を推定する推定部１３を備える。対象者は、主として高齢者を想定しているが、若年性アルツハイマー症の疑いがある人でもよい。高齢者は、福祉施設に入居している高齢者、デイサービスセンターを利用する高齢者、独居の高齢者、あるいはサービス付き高齢者向け住宅に居住する高齢者など、主として見守りの必要がある高齢者を想定する。 As shown in FIG. 1, the cognitive function evaluation apparatus 10 includes an acquisition unit 11 that acquires voice data of a subject, and a feature extraction unit 12 that extracts a feature amount from the voice data acquired by the acquisition unit 11. Further, the cognitive function evaluation device 10 includes an estimation unit 13 that estimates the cognitive function of the subject based on the change in the characteristic amount. The subject is mainly intended for the elderly, but may be a person suspected of having juvenile Alzheimer's disease. Elderly people who are mainly in need of monitoring, such as those who live in welfare facilities, those who use day service centers, those who live alone, or those who live in housing for the elderly with services. Assume

取得部１１は、マイクロフォンから入力される対象者の音声信号をデジタルデータである音声データに変換するアナログ−デジタル変換を行う構成、またはデジタルデータに変換された状態で記録されている対象者の音声データが入力される構成が採用される。すなわち、取得部１１としては、アナログ信号をデジタル信号に変換する構成と、デジタル信号が入力される構成とのどちらを採用してもよい。ここでは、前者の構成を採用していると仮定する。なお、マイクロフォンは認知機能評価装置１０と一体に設けられていてもよい。 The acquisition unit 11 is configured to perform analog-to-digital conversion for converting a voice signal of a subject input from a microphone into voice data that is digital data, or a voice of the subject recorded in a state of being converted into digital data. A structure in which data is input is adopted. That is, the acquisition unit 11 may have either a configuration for converting an analog signal into a digital signal or a configuration for receiving a digital signal. Here, it is assumed that the former configuration is adopted. The microphone may be integrated with the cognitive function evaluation device 10.

ところで、人が発話する際には、一般的に、音声が継続的に生じる期間と、音声が休止する期間とが生じる。音声の休止は、特徴抽出部１２が、音圧と時間とにより判断する。すなわち、特徴抽出部１２は、音声が生じていない状態を音圧に基づいて検出し、この状態が所定の判定時間にわたって継続している場合に音声が休止している無音区間と判定する。判定時間は、たとえば３００［ｍｓ］以上５００［ｍｓ］以下程度に設定される。なお、対象者に応じて判定時間を短縮あるいは延長することが可能である。また、特徴抽出部１２は、音声が生じている状態を音圧に基づいて検出し、この状態が無音区間を挟まずに継続している場合に音声が継続的に生じている発話区間と判定する。なお、発話区間には、制限時間が設定されていることが望ましい。制限時間は、たとえば１５［ｓ］以上３０［ｓ］以下程度に設定される。 By the way, when a person speaks, generally, there are a period in which voice is continuously generated and a period in which voice is stopped. The pause of the voice is determined by the feature extraction unit 12 based on the sound pressure and the time. That is, the feature extraction unit 12 detects a state in which no voice is generated based on the sound pressure, and when this state continues for a predetermined determination time, determines that the voice is in a silent section. The determination time is set to, for example, about 300 [ms] or more and 500 [ms] or less. The determination time can be shortened or extended depending on the target person. In addition, the feature extraction unit 12 detects a state in which voice is generated based on the sound pressure, and when this state continues without interposing a silent period, determines that the voice period is a continuous utterance period. To do. It is desirable that a time limit is set for the utterance section. The time limit is set to, for example, 15 [s] or more and 30 [s] or less.

特徴抽出部１２は、図２のように、音声データにおける１つの発話区間Ｔｓについて、所定の抽出期間Ｔｘごとに求めた変量Ｖａの時系列データを特徴量とする。変量Ｖａは、ピッチ周波数とフォルマントとの少なくとも一方から求められる。特徴抽出部１２は、ピッチ周波数とフォルマントとを求めるために、短時間の窓関数を用いた短時間フーリエ変換、あるいはウェーブレット変換を行って、音声データのスペクトルを分析する。フォルマントを用いて変量Ｖａを求める場合、第一フォルマントから第三フォルマントまでの３種類を用いればよいが、２種類以下あるいは４種類以上のフォルマントを用いることも可能である。 As shown in FIG. 2, the feature extraction unit 12 uses the time-series data of the variable Va obtained for each predetermined extraction period Tx for one utterance section Ts in the voice data as the feature amount. The variable Va is obtained from at least one of the pitch frequency and the formant. The feature extraction unit 12 analyzes the spectrum of the voice data by performing a short-time Fourier transform using a short-time window function or a wavelet transform in order to obtain the pitch frequency and the formant. When the variable Va is obtained by using the formants, three types from the first formant to the third formant may be used, but it is also possible to use two types or less or four types or more.

上述したように、特徴抽出部１２が求める特徴量は、発話区間Ｔｓでの抽出期間Ｔｘごとの韻律情報を反映した１種類の変量Ｖａの時系列データで表される。すなわち、発話区間Ｔｓに複数の抽出期間Ｔｘが含まれ、かつ抽出期間Ｔｘごとに求めた変量Ｖａが音声データの韻律情報を反映するように、抽出期間Ｔｘが設定される。抽出期間Ｔｘは、たとえば０．５［ｓ］以上２［ｓ］以下程度に設定される。ここでは、抽出期間Ｔｘを１［ｓ］に定めている。 As described above, the feature amount obtained by the feature extraction unit 12 is represented by time-series data of one type of variable Va that reflects prosody information for each extraction period Tx in the utterance section Ts. That is, the extraction period Tx is set such that the utterance section Ts includes a plurality of extraction periods Tx and the variable Va obtained for each extraction period Tx reflects the prosody information of the voice data. The extraction period Tx is set to, for example, about 0.5 [s] or more and 2 [s] or less. Here, the extraction period Tx is set to 1 [s].

ところで、本件の発明者らは、抽出期間Ｔｘの長さを適切に定め、抽出期間Ｔｘにおけるピッチ周波数またはフォルマントの代表値を適切に定めることによって、正常な対象者と認知機能が低下した対象者との音声を区別できるという知見を得た。すなわち、発話区間Ｔｓにおいて、抽出期間Ｔｘのピッチ周波数またはフォルマントの代表値は、正常な対象者の音声では変化が比較的小さく、認知機能が低下した対象者の音声では変化が比較的大きくなるという知見が得られた。 By the way, the inventors of the present invention appropriately determine the length of the extraction period Tx, and appropriately determine the representative value of the pitch frequency or the formant in the extraction period Tx, so that the normal subject and the subject whose cognitive function is deteriorated. We obtained the knowledge that the voices of and can be distinguished. That is, in the utterance period Ts, the change in the pitch frequency or the representative value of the formant in the extraction period Tx is relatively small in the voice of the normal subject and is relatively large in the voice of the subject whose cognitive function is deteriorated. Knowledge was obtained.

いま、特徴量をピッチ周波数から求める場合を想定する。この場合、抽出期間Ｔｘにおけるピッチ周波数の代表値は、平均値、中央値、最頻値、最大値、最小値などから選択される。望ましくは、抽出期間Ｔｘにおける代表値は、抽出期間Ｔｘにおけるピッチ周波数の平均値、中央値、最頻値から選択される。さらに、代表値は、所定範囲の数値で表されるように規格化される。 Now, assume that the feature amount is obtained from the pitch frequency. In this case, the representative value of the pitch frequency in the extraction period Tx is selected from an average value, a median value, a mode value, a maximum value, a minimum value and the like. Desirably, the representative value in the extraction period Tx is selected from the average value, the median value, and the mode value of the pitch frequencies in the extraction period Tx. Further, the representative value is standardized so as to be represented by a numerical value in a predetermined range.

たとえば、ピッチ周波数が５００［Ｈｚ］である場合を「１」に定めておき、特徴抽出部１２は、抽出期間Ｔｘにおける代表値が５００［Ｈｚ］のｎ倍であるときに、「ｎ」を抽出期間Ｔｘの変量Ｖａに定める。この場合、特徴抽出部１２は、抽出期間Ｔｘにおける代表値が５００［Ｈｚ］のｎ分の１である場合、「−ｎ」を抽出期間Ｔｘの変量Ｖａに定める。この変量Ｖａは一例であって、たとえば、ピッチ周波数が３００［Ｈｚ］である場合を「０」に定めておき、特徴抽出部１２は、抽出期間Ｔｘにおける代表値が３００［Ｈｚ］×２^ｎであるときに、「ｎ」を抽出期間Ｔｘの変量Ｖａに定めてもよい。いずれにしても、特徴抽出部１２は、抽出期間Ｔｘにおけるピッチ周波数の代表値を規格化した値を変量Ｖａとして算出する。 For example, the case where the pitch frequency is 500 [Hz] is set to “1”, and the feature extraction unit 12 sets “n” when the representative value in the extraction period Tx is n times 500 [Hz]. The variable Va of the extraction period Tx is set. In this case, when the representative value in the extraction period Tx is 1/n of 500 [Hz], the feature extraction unit 12 sets “−n” to the variable Va of the extraction period Tx. This variable Va is an example, and for example, when the pitch frequency is 300 [Hz], it is set to “0”, and the feature extraction unit 12 sets the representative value in the extraction period Tx to 300 [Hz]×2 ^n. In this case, “n” may be set to the variable Va of the extraction period Tx. In any case, the feature extraction unit 12 calculates a value obtained by normalizing the representative value of the pitch frequency in the extraction period Tx as the variable Va.

特徴抽出部１２は、複数のフォルマントから変量Ｖａを求めることが可能であり、またピッチ周波数とフォルマントとから変量Ｖａを求めることが可能である。この場合、抽出期間Ｔｘについて複数の代表値が求められるから、特徴抽出部１２は、複数の代表値を合成した後に、合成値を規格化した値を変量Ｖａに定める。複数の代表値を合成するにあたっては代表値に応じた重み付けを行うことが望ましい。 The feature extraction unit 12 can obtain the variation amount Va from a plurality of formants, and can obtain the variation amount Va from the pitch frequency and the formants. In this case, since a plurality of representative values are obtained for the extraction period Tx, the feature extraction unit 12 synthesizes the plurality of representative values and then determines a value obtained by normalizing the synthesized value as the variable Va. When synthesizing a plurality of representative values, it is desirable to perform weighting according to the representative value.

特徴抽出部１２は、発話区間Ｔｓにおいて抽出期間Ｔｘごとに変量Ｖａを求めるから、発話区間Ｔｓにおいて複数個の変量Ｖａが得られる。この変量Ｖａの時系列データが特徴量であって、特徴量は、ピッチ周波数とフォルマントとの少なくとも一方に基づいて求めているから、音声データの抽出期間Ｔｘにおける韻律情報を反映している。また、変量Ｖａは１種類の情報に集約されているから、特徴量は１種類の情報の時系列データとして表される。しかも、特徴量は、規格化されているから、実際の音声データに対して所定の範囲内の数値で表される。変量Ｖａは、たとえば、閉区間［−１０，１０］を値域とするように規格化される。また、特徴抽出部１２は、変量Ｖａを整数値で表すために、上述のようにして求めた変量Ｖａを整数値にまるめてもよい。 Since the feature extraction unit 12 obtains the variable Va for each extraction period Tx in the utterance section Ts, a plurality of variables Va are obtained in the utterance section Ts. The time-series data of the variable Va is the feature amount, and since the feature amount is obtained based on at least one of the pitch frequency and the formant, it reflects the prosodic information in the extraction period Tx of the voice data. In addition, since the variable Va is aggregated into one type of information, the feature amount is represented as time-series data of one type of information. Moreover, since the feature amount is standardized, it is represented by a numerical value within a predetermined range with respect to the actual voice data. The variable Va is normalized, for example, so that the closed interval [−10, 10] is set as the range. Further, the feature extracting unit 12 may round the variable Va obtained as described above to an integer value in order to represent the variable Va with an integer value.

特徴抽出部１２が求めた特徴量は、認知機能が正常であれば変化が比較的少なく、認知機能が低下すると変化が大きくなるという知見が得られている。そこで、推定部１３は、特徴量の変化に基づいて認知機能を推定する。たとえば、１回の発話区間Ｔｓにおいて図３のような特徴量が得られた場合を想定する。図３に示す例では、変量Ｖａが０以上９以下の範囲で推移しており、変量Ｖａが０の期間を除いたとしても、変量Ｖａの差は４程度である。すなわち、この例は特徴量の変化が比較的大きいと言える。 It has been found that the feature amount obtained by the feature extraction unit 12 is relatively small when the cognitive function is normal, and is large when the cognitive function is deteriorated. Therefore, the estimation unit 13 estimates the cognitive function based on the change in the feature amount. For example, it is assumed that the feature amount as shown in FIG. 3 is obtained in one utterance section Ts. In the example shown in FIG. 3, the variable Va has changed in the range of 0 or more and 9 or less, and even if the period in which the variable Va is 0 is excluded, the difference of the variable Va is about 4. That is, in this example, it can be said that the change in the feature amount is relatively large.

推定部１３は、特徴量の変化を評価するひとつの方法として、発話区間Ｔｓにおける特徴量の最大値と最小値との差（レンジ）の大きさを用いる。図３に示す例では、最大値が９、最小値が０であるから、レンジは９である。推定部１３は、特徴量の評価にレンジの大きさを用いる場合、レンジに対する閾値を設定し、レンジが閾値を超えると認知機能の低下と評価すればよい。また、レンジの大きさに基づいて認知機能を複数段階に分類することも可能である。 The estimation unit 13 uses the magnitude of the difference (range) between the maximum value and the minimum value of the feature amount in the utterance section Ts as one method of evaluating the change in the feature amount. In the example shown in FIG. 3, the maximum value is 9 and the minimum value is 0, so the range is 9. When using the magnitude|size of a range for evaluation of a feature-value, the estimation part 13 should just set the threshold value with respect to a range, and when a range exceeds a threshold value, it should just be evaluated as a fall of a cognitive function. It is also possible to classify cognitive function into multiple stages based on the size of the range.

推定部１３において特徴量を評価する方法は、上述した例に限らない。たとえば、推定部１３が、発話区間Ｔｓにおいて、変量Ｖａが０である２つの抽出期間Ｔｘの間における特徴量が単峰性か否かを評価し、単峰性ではない場合に認知機能が低下していると評価してもよい。つまり、変量Ｖａが０である２つの抽出期間Ｔｘの間で極小になる状態が生じていると認知機能が低下していると評価される。また、この場合、推定部１３は、単峰性であっても、レンジの大きさが所定の閾値を超えると認知機能が低下していると評価してもよい。 The method of evaluating the feature amount in the estimation unit 13 is not limited to the above example. For example, in the utterance section Ts, the estimation unit 13 evaluates whether or not the feature amount between the two extraction periods Tx in which the variable Va is 0 is unimodal, and when it is not unimodal, the cognitive function deteriorates. You may evaluate it as doing. In other words, it is evaluated that the cognitive function is deteriorated when the state where the variable Va is 0 is minimized between the two extraction periods Tx. Further, in this case, the estimation unit 13 may evaluate that the cognitive function is deteriorated when the size of the range exceeds a predetermined threshold, even if it is unimodal.

ところで、推定部１３は、特徴量の変化に基づいて認知機能を評価しているから、特徴量の変化を強調するほうが、認知機能の評価が容易になる。そのため、発話区間Ｔｓにおいて抽出期間Ｔｘごとに得られる変量Ｖａの時系列データのうち隣接する所定の複数個の変量Ｖａを加算した値を一時点のデータとする時系列データで特徴量が表されていてもよい。たとえば、変量Ｖａの時系列データをＶ（１）、Ｖ（２）、…、Ｖ（ｉ）、…で表すとき、Ｄ（２）＝Ｖ（２）＋Ｖ（１）、…、Ｄ（ｉ）＝Ｖ（ｉ）＋Ｖ（ｉ−１）、…とした、時系列データＤ（１）、Ｄ（２）、…、Ｄ（ｉ）、…を求める。ここに、ｉは、発話区間Ｔｓにおけるｉ番目の抽出期間Ｔｘを意味する。 By the way, since the estimation unit 13 evaluates the cognitive function based on the change in the feature amount, it is easier to evaluate the cognitive function by emphasizing the change in the feature amount. Therefore, the characteristic amount is represented by time-series data in which a value obtained by adding a plurality of adjacent predetermined variable amounts Va among the time-series data of the variable Va obtained for each extraction period Tx in the utterance section Ts is used as the data of the temporary point. May be. For example, when the time series data of the variable Va is represented by V(1), V(2),..., V(i),..., D(2)=V(2)+V(1),. )=V(i)+V(i−1),..., The time series data D(1), D(2),..., D(i),. Here, i means the i-th extraction period Tx in the utterance section Ts.

図３に示した特徴量から時系列データＤ（１）、Ｄ（２）、…、Ｄ（ｉ）、…を求めると、図４に示す新たな特徴量が得られる。図３と図４とを比較すればわかるように、図４に示す特徴量は、図３に示す特徴量よりも変化が強調されているから、特徴量の変化の評価が容易である。たとえば、図３に示す特徴量のレンジは９であったのに対して、図４に示す特徴量のレンジは１５である。また、図４に示す特徴量は、図３に示す特徴量に比べると、変量Ｖａが０である２つの抽出期間Ｔｘの間での勾配が強調されており、変化が生じている。推定部１３が閾値を１０に設定しているとすれば、図４のような特徴量が得られた場合は、推定部１３は対象者の認知機能が低下していると評価する。 When the time series data D(1), D(2),..., D(i),... Are obtained from the feature amount shown in FIG. 3, the new feature amount shown in FIG. 4 is obtained. As can be seen by comparing FIG. 3 with FIG. 4, the change in the feature amount shown in FIG. 4 is emphasized more than the feature amount shown in FIG. 3, and therefore the change in the feature amount can be easily evaluated. For example, the range of the characteristic amount shown in FIG. 3 is 9, whereas the range of the characteristic amount shown in FIG. 4 is 15. Further, in the feature amount shown in FIG. 4, compared with the feature amount shown in FIG. 3, the gradient between the two extraction periods Tx in which the variable amount Va is 0 is emphasized, and a change occurs. Assuming that the estimation unit 13 sets the threshold value to 10, when the feature amount as shown in FIG. 4 is obtained, the estimation unit 13 evaluates that the cognitive function of the subject is deteriorated.

上述した構成例では、１回の発話区間Ｔｓの音声データに基づいて認知機能が正常か否かを判断している。この構成に対して、推定部１３が、複数回の発話区間Ｔｓの音声データから得られる特徴量を比較することによって、対象者の認知機能が正常か否かを判断する構成を採用してもよい。すなわち、認知機能評価装置１０は、図５に示すように、複数回の発話区間Ｔｓにおける特徴量を記憶する記憶部１４を備える。認知機能評価装置１０は、音声データを、原則として毎日取得することが望ましい。もちろん、認知機能評価装置１０が音声データを取得する頻度は、１週間に１回程度でもよく、また１日に複数回であってもよい。 In the above-described configuration example, whether or not the cognitive function is normal is determined based on the voice data of one utterance section Ts. In contrast to this configuration, the estimation unit 13 may employ a configuration that determines whether or not the cognitive function of the target person is normal by comparing the feature amounts obtained from the voice data of the utterance section Ts of multiple times. Good. That is, as shown in FIG. 5, the cognitive function evaluation device 10 includes the storage unit 14 that stores the feature amount in the utterance section Ts of a plurality of times. As a general rule, the cognitive function evaluation device 10 desirably acquires voice data every day. Of course, the frequency with which the cognitive function evaluation apparatus 10 acquires the voice data may be about once a week, or may be multiple times a day.

記憶部１４には、上述したように、複数の発話区間Ｔｓの音声データからそれぞれ求めた複数の特徴量が記憶される。推定部１３は、記憶部１４に格納された複数の特徴量について類似の程度を評価し、特徴量の類似の程度に基づいて対象者の認知機能を評価する。この場合、対象者の認知機能が正常であるときの特徴量が記憶部１４に保存されていることが望ましい。対象者の認知機能が正常であるときの特徴量が記憶部１４に保存されていれば、推定部１３は、発話区間Ｔｓごとに得られた特徴量と正常時の特徴量との類似性を評価することにより、対象者の認知機能の変化を推定することが可能である。 As described above, the storage unit 14 stores the plurality of feature amounts respectively obtained from the voice data of the plurality of utterance intervals Ts. The estimation unit 13 evaluates the degree of similarity of the plurality of feature amounts stored in the storage unit 14, and evaluates the cognitive function of the subject based on the degree of similarity of the feature amounts. In this case, it is desirable that the feature amount when the cognitive function of the subject is normal is stored in the storage unit 14. If the characteristic amount when the cognitive function of the subject is normal is stored in the storage unit 14, the estimating unit 13 determines the similarity between the characteristic amount obtained for each utterance section Ts and the normal characteristic amount. By evaluating, it is possible to estimate changes in the cognitive function of the subject.

推定部１３は、類似性の評価に、たとえば特徴量における変量Ｖａの平均値の差分を用い、差分が所定の閾値を超えると認知機能が低下している可能性があると判断する。特徴量の類似性の評価によって、認知機能の低下が疑われる場合、さらに特徴量の変化を用いて、認知機能が低下しているか否かを評価すればよい。 The estimation unit 13 uses, for example, the difference between the average values of the variables Va in the feature amounts for the similarity evaluation, and determines that the cognitive function may be deteriorated when the difference exceeds a predetermined threshold. When the cognitive function is suspected to be deteriorated by the evaluation of the similarity of the characteristic amounts, the change of the characteristic amounts may be further used to evaluate whether or not the cognitive function is deteriorated.

上述した認知機能評価装置１０は、プログラムに従って動作するプロセッサを備える。すなわち、認知機能評価装置１０は、主要なハードウェア要素としてコンピュータを備える。この種のプロセッサは、メモリを別に必要とするＭＰＵ（Micro-Processing Unit）、単一のデバイスにメモリを備えるマイコン（Microcontroller）などから選択される。認知機能評価装置１０は、対象者が管理する端末装置で構成することが可能であるが、図１、図５に示しているように、対象者が管理する端末装置２１は音声を入力するための装置として用い、認知機能評価装置１０は、この端末装置２１と通信するコンピュータサーバ１００で構成されることが望ましい。コンピュータサーバ１００は、１台のコンピュータで構成されるほか、コンピュータネットワークを通して通信する複数台のコンピュータが連携して、ユーザからは１台のコンピュータのように振る舞う構成であってもよい。また、コンピュータサーバ１００は、クラウドコンピューティングシステムとして構築されていてもよい。 The cognitive function evaluation device 10 described above includes a processor that operates according to a program. That is, the cognitive function evaluation device 10 includes a computer as a main hardware element. This type of processor is selected from an MPU (Micro-Processing Unit) which requires a separate memory, a microcomputer (Microcontroller) having a memory in a single device, and the like. The cognitive function evaluation device 10 can be configured by a terminal device managed by the subject, but as shown in FIGS. 1 and 5, the terminal device 21 managed by the subject inputs voice. It is desirable that the cognitive function evaluation device 10 used as the device of FIG. The computer server 100 may be configured by one computer, or may be configured so that a plurality of computers communicating via a computer network cooperate with each other and behave like one computer from the user. Further, the computer server 100 may be constructed as a cloud computing system.

プログラムは、メモリのうちのＲＯＭ（Read Only Memory）に格納された状態で提供されるほか、コンピュータで読取可能な光ディスク、外部記憶装置、メモリカードなどの記録媒体で提供することも可能である。また、インターネットのような電気通信回線を通してプログラムが提供されてもよい。記憶媒体または電気通信回線を通して提供されるプログラムは、書換可能な不揮発性のメモリに格納される。 The program may be provided in a state of being stored in a ROM (Read Only Memory) of the memory, or may be provided in a computer-readable recording medium such as an optical disk, an external storage device, or a memory card. Further, the program may be provided through a telecommunication line such as the Internet. The program provided through the storage medium or the electric communication line is stored in a rewritable nonvolatile memory.

対象者が管理する端末装置２１は、スマートフォン、タブレット端末、ウェアラブルコンピュータ、パーソナルコンピュータなどから選択される。以下では、対象者がスマートフォンを携行している場合を想定する。認知機能評価装置１０は、インターネットのような電気通信回線を通してスマートフォンと通信するコンピュータサーバ１００において実現される。また、認知機能評価装置１０を利用するために、スマートフォンにおいてアプリケーションプログラム（いわゆる、「アプリ」）を実行する構成例を想定する。 The terminal device 21 managed by the target person is selected from a smartphone, a tablet terminal, a wearable computer, a personal computer, or the like. In the following, it is assumed that the target person carries a smartphone. The cognitive function evaluation device 10 is implemented in a computer server 100 that communicates with a smartphone through a telecommunication line such as the Internet. Moreover, in order to use the cognitive function evaluation apparatus 10, a configuration example in which an application program (so-called “app”) is executed in a smartphone is assumed.

この構成例では、アプリが起動されているスマートフォンにおいて、対象者が音声を入力すると、コンピュータサーバ１００で実現されている認知機能評価装置１０に音声データが引き渡される。スマートフォンに入力される音声は、認知機能の評価のために特定の文章を読み上げるような音声よりも会話時の自然な音声であることが望ましい。したがって、スマートフォンに入力される音声は、通話を行う際の音声、あるいは音声による自然言語でウェブサービスを利用する際の音声などを用いることが望ましい。 In this configuration example, when the subject inputs a voice on the smartphone in which the application is activated, the voice data is handed over to the cognitive function evaluation device 10 realized by the computer server 100. It is desirable that the voice input to the smartphone be a natural voice during conversation rather than a voice that reads a specific sentence to evaluate the cognitive function. Therefore, it is desirable to use, as the voice input to the smartphone, a voice when making a call, a voice when using a web service in a natural language of voice, or the like.

対象者の認知機能の評価結果は、対象者の認知機能が正常であれば対象者に通知してもよいが、対象者の認知機能が軽度認知障害とみなされる程度まで低下している場合には、認知機能の低下に対応可能な第三者に通知することが望ましい。ここでは、認知機能が軽度認知障害とみなされる程度に低下した場合に、認知機能が認知障害の範囲であるとみなす。また、以下では、認知機能が軽度認知障害である程度を「予兆レベル」という。 The evaluation result of the cognitive function of the subject may be notified to the subject if the cognitive function of the subject is normal, but if the cognitive function of the subject has declined to the extent that it is considered to be mild cognitive impairment. Should inform a third party who can respond to a decline in cognitive function. Here, the cognitive function is considered to be within the range of cognitive impairment when the cognitive function is reduced to the extent that it is considered to be mild cognitive impairment. Further, in the following, the degree to which the cognitive function is mild cognitive impairment is referred to as a “sign level”.

そのため、図１、図５に示しているように、認知機能評価装置１０は、対象者の認知機能が予兆レベルまで低下していると推定部１３が評価したときに、第三者が管理する他装置２２に通知する通知部１５を備えることが望ましい。他装置２２は、たとえば対象者の家族が管理する端末装置、対象者のかかりつけの医師、介護士などが管理する端末装置である。この端末装置は、たとえば、スマートフォン、タブレット端末、パーソナルコンピュータなどから選択される。 Therefore, as shown in FIGS. 1 and 5, the cognitive function evaluation apparatus 10 is managed by a third party when the estimation unit 13 evaluates that the cognitive function of the target person has decreased to the predictive level. It is desirable to include the notification unit 15 that notifies the other device 22. The other device 22 is, for example, a terminal device managed by the subject's family, a terminal device managed by the subject's family doctor, a caregiver, or the like. This terminal device is selected from, for example, a smartphone, a tablet terminal, and a personal computer.

このように、認知機能評価装置１０が対象者の認知機能を評価した結果、対象者の認知機能が予兆レベルまで低下していると推定される場合、通知部１５を通して第三者に通知することが可能である。対象者の認知機能が予兆レベルまで低下していることを第三者に通知することにより、対象者に認知障害が生じているか否かを早期に診断すること、あるいは認知障害の早期の治療を行うことなどにつながる。 In this way, when the cognitive function evaluation apparatus 10 evaluates the cognitive function of the target person, and it is estimated that the cognitive function of the target person has fallen to the predictive level, the notification unit 15 notifies the third party. Is possible. By notifying a third party that the cognitive function of the subject has declined to a predictive level, early diagnosis of whether the subject has cognitive impairment or early treatment of cognitive impairment is possible. It leads to things to do.

上述した構成例では、対象者が端末装置２１を利用することを前提にしているが、対象者の認知機能が低下している場合、端末装置２１を扱うことができない可能性がある。そこで、認知機能評価装置１０は、対話機能を有したコミュニケーションロボットから音声データを取得してもよい。一般に、コミュニケーションロボットは、人との対話を模擬する機能を有するか、他の人との対話を仲介する機能を有している。そのため、認知機能評価装置１０は、コミュニケーションロボットと連携すれば、対象者の音声データを取得することが可能である。 In the configuration example described above, it is premised that the target person uses the terminal device 21, but when the cognitive function of the target person is deteriorated, there is a possibility that the terminal device 21 cannot be handled. Therefore, the cognitive function evaluation apparatus 10 may acquire voice data from a communication robot having a dialogue function. Generally, a communication robot has a function of simulating a dialogue with a person or a function of mediating a dialogue with another person. Therefore, the cognitive function evaluation apparatus 10 can acquire the voice data of the target person by cooperating with the communication robot.

この構成例を図６に示す。基本的な構成は、図１、図５に示した構成例と同様であり、端末装置２１に代えてコミュニケーションロボット２３を用いている。コミュニケーションロボット２３は、マイクロフォン２３１およびスピーカ２３２を備え、会話の内容に応じて外観を変化させる構成を備えている。外観を変化させるとは、形態、動き、光、色などを変化させることを意味する。また、コミュニケーションロボット２３は、単体で対象者との対話を行うことを可能にする制御装置２３０を備える。さらに、コミュニケーションロボット２３は、インターネットのような電気通信回線を通してコンピュータサーバ１００と通信する通信部２３３を備えている。すなわち、コミュニケーションロボット２３は、制御装置２３０が取得した情報だけではなく、通信部２３３を通してコンピュータサーバ１００から取得した情報を併用して対象者との対話を行う。 An example of this configuration is shown in FIG. The basic configuration is the same as the configuration example shown in FIGS. 1 and 5, and a communication robot 23 is used instead of the terminal device 21. The communication robot 23 includes a microphone 231 and a speaker 232, and has a configuration that changes its appearance according to the content of conversation. Changing the appearance means changing the form, movement, light, color, and the like. In addition, the communication robot 23 includes a control device 230 that enables the communication robot 23 to interact with the target person by itself. Furthermore, the communication robot 23 includes a communication unit 233 that communicates with the computer server 100 via an electric communication line such as the Internet. That is, the communication robot 23 uses not only the information acquired by the control device 230, but also the information acquired from the computer server 100 through the communication unit 233 to interact with the target person.

このようにコミュニケーションロボット２３が対象者と対話することにより、コンピュータサーバ１００は、対象者の音声データを取得することが可能である。したがって、コンピュータサーバ１００で認知機能評価装置１０が実現されていれば、認知機能評価装置１０は、コミュニケーションロボット２３を通して取得した音声データに基づいて、対象者の認知機能を評価することが可能になる。 By the communication robot 23 interacting with the target person in this way, the computer server 100 can acquire the voice data of the target person. Therefore, if the cognitive function evaluation apparatus 10 is realized by the computer server 100, the cognitive function evaluation apparatus 10 can evaluate the cognitive function of the target person based on the voice data acquired through the communication robot 23. ..

なお、コミュニケーションロボット２３は、対象者と一対一に対応していない場合があるから、コミュニケーションロボット２３から受け取った音声データを用いて対象者の認知機能を評価する場合には、発話した対象者を特定する必要がある。そのため、図６に示す認知機能評価装置１０は、対象者を特定する認証部１６を備えている。認証部１６は、音声データから取り出した声紋の情報によって対象者を特定する構成を想定している。ただし、コミュニケーションロボット２３が対象者を撮影するカメラを備え、かつカメラが撮影した画像のデータを認知機能評価装置１０が取得する場合には、認証部１６は、画像のデータに基づく顔認証を行う構成であってもよい。 Since the communication robot 23 may not correspond to the target person in a one-to-one relationship, when evaluating the cognitive function of the target person using the voice data received from the communication robot 23, Need to be identified. Therefore, the cognitive function evaluation device 10 shown in FIG. 6 includes an authentication unit 16 that identifies the target person. The authentication unit 16 is assumed to have a configuration in which the target person is specified by the information of the voiceprint extracted from the voice data. However, when the communication robot 23 includes a camera for photographing the target person and the cognitive function evaluation device 10 acquires the image data captured by the camera, the authentication unit 16 performs face authentication based on the image data. It may be configured.

上述した構成例の認知機能評価装置１０は、取得部１１と特徴抽出部１２と推定部１３とを備える。取得部１１は、発話区間Ｔｓにおける対象者の音声データを取得する。特徴抽出部１２は、発話区間Ｔｓの音声データから対象者の認知機能を反映する特徴量を抽出する。推定部１３は、特徴量の変化に基づいて対象者の認知機能を推定する。特徴量は、音声データの所定の抽出期間Ｔｘにおける韻律情報を反映した１種類の変量の時系列データで表されている。 The cognitive function evaluation device 10 having the above-described configuration example includes an acquisition unit 11, a feature extraction unit 12, and an estimation unit 13. The acquisition unit 11 acquires voice data of the target person in the utterance section Ts. The feature extraction unit 12 extracts a feature amount that reflects the cognitive function of the target person from the voice data in the utterance section Ts. The estimation unit 13 estimates the cognitive function of the subject based on the change in the feature amount. The feature amount is represented by one type of variable time series data that reflects prosody information in a predetermined extraction period Tx of voice data.

この構成によれば、発話区間Ｔｓにおいて抽出期間Ｔｘごとに韻律情報を反映した１種類の変量を抽出し、この変量の時系列データを特徴量とするから、特徴量の抽出が容易である。また、特徴量が１種類の変量Ｖａで表されているから、認知機能の評価を容易に行うことが可能である。 According to this configuration, one kind of variable reflecting the prosody information is extracted for each extraction period Tx in the utterance section Ts, and the time-series data of this variable is used as the characteristic amount, so that the characteristic amount can be easily extracted. Further, since the feature amount is represented by one type of variable Va, it is possible to easily evaluate the cognitive function.

推定部１３は、特徴量の最大値と最小値との差分と、特徴量における変量Ｖａの時間変化との少なくとも一方に基づいて対象者の認知機能を評価することが望ましい。 It is desirable that the estimation unit 13 evaluate the cognitive function of the subject based on at least one of the difference between the maximum value and the minimum value of the feature amount and the temporal change of the variable Va in the feature amount.

この構成によれば、推定部１３は特徴量を用いて複雑な計算を行うことなく対象者の認知機能を評価することができる。 With this configuration, the estimation unit 13 can evaluate the cognitive function of the subject without performing complicated calculation using the feature amount.

変量Ｖａは、音声データから抽出期間Ｔｘごとに抽出されるピッチ周波数を所定範囲の数値で表していることが望ましい。また、変量Ｖａは、音声データから抽出期間Ｔｘごとに抽出される所定の複数個のフォルマントを合成した値を所定範囲の数値で表してもよい。あるいは、変量Ｖａは、音声データから抽出期間Ｔｘごとに抽出されるピッチ周波数と音声データから抽出期間Ｔｘごとに抽出される所定の複数個のフォルマントとを合成した値を所定範囲の数値で表すこともできる。 It is desirable that the variable Va represents the pitch frequency extracted from the voice data for each extraction period Tx, as a numerical value within a predetermined range. In addition, the variable Va may be represented by a value in a predetermined range, which is a value obtained by combining a plurality of predetermined formants extracted from the voice data for each extraction period Tx. Alternatively, the variable Va represents a value obtained by combining a pitch frequency extracted from the voice data in each extraction period Tx and a plurality of predetermined formants extracted from the voice data in each extraction period Tx, with a numerical value in a predetermined range. Can also

すなわち、変量Ｖａは、音声データから求められるピッチ周波数とフォルマントとの少なくとも一方から求められる。したがって、変量Ｖａは音声データの韻律を反映した情報であり、しかも１種類の変量Ｖａで表される。また、変量Ｖａは所定範囲の数値で表されるように規格化されるから、個人差の影響を抑制して客観的な評価が可能である。 That is, the variable Va is obtained from at least one of the pitch frequency and formant obtained from the voice data. Therefore, the variable Va is information that reflects the prosody of the voice data, and is represented by one type of variable Va. In addition, since the variable Va is standardized so as to be represented by a numerical value within a predetermined range, it is possible to suppress the influence of individual differences and perform an objective evaluation.

特徴量は、変量Ｖａの時系列データのうち隣接する所定の複数個の変量Ｖａを加算した値を一時点のデータとする時系列データで表されていることが望ましい。 The characteristic amount is preferably represented by time-series data in which a value obtained by adding a plurality of adjacent predetermined variable amounts Va among the time-series data of the variable Va is used as data at a temporary point.

たとえば、変量Ｖａの時系列データにおいて隣接する２個の変量Ｖａを加算した値を時間軸に沿って並べた時系列データを特徴量に用いると変量Ｖａの変化が強調される。したがって、推定部１３は、特徴量の変化を評価しやすくなり、認知機能の評価の精度を高めることが可能である。 For example, when the time series data in which two adjacent variables Va are added in the time series data of the variable Va are arranged along the time axis as the feature amount, the change of the variable Va is emphasized. Therefore, the estimation unit 13 can easily evaluate the change in the feature amount, and can improve the accuracy of evaluation of the cognitive function.

また、認知機能評価装置１０は、複数回の発話区間Ｔｓにおける特徴量を記憶する記憶部１４を備えていることが望ましい。この場合、推定部１３は、記憶部１４が記憶している複数回の発話区間Ｔｓにおける特徴量の類似の程度を評価し、類似の程度に応じて対象者の認知機能を推定することが望ましい。 Further, it is desirable that the cognitive function evaluation apparatus 10 includes a storage unit 14 that stores the feature amount in the utterance sections Ts of a plurality of times. In this case, it is preferable that the estimation unit 13 evaluate the degree of similarity of the feature amounts in the plurality of utterance sections Ts stored in the storage unit 14 and estimate the cognitive function of the subject according to the degree of similarity. ..

この構成によれば、複数回の発話区間Ｔｓにおいて初期の発話区間Ｔｓが正常時の発話区間Ｔｓであるとすれば、複数回の発話区間Ｔｓでの特徴量の類似の程度が低下した場合に、対象者の認知機能が低下したと推定することが可能である。この構成では、対象者の音声の個人差の影響を抑制して対象者の認知機能を評価することが可能である。 According to this configuration, assuming that the initial utterance section Ts is the normal utterance section Ts in the plurality of utterance sections Ts, when the degree of similarity of the feature amounts in the plurality of utterance sections Ts decreases. It is possible to estimate that the cognitive function of the subject has deteriorated. With this configuration, it is possible to evaluate the cognitive function of the subject while suppressing the influence of individual differences in the voice of the subject.

認知機能評価装置１０において、推定部１３が対象者の認知機能について認知障害の範囲と推定した場合に、他装置２２に通知する通知部１５をさらに備えることが望ましい。 It is preferable that the cognitive function evaluation apparatus 10 further includes a notification unit 15 that notifies the other device 22 when the estimation unit 13 estimates the cognitive function of the subject as a range of cognitive impairment.

この構成によれば、認知機能が認知障害の範囲と推定される程度まで低下すると、他装置２２に通知されるから、第三者から対象者に、検査あるいは治療を受けさせるように促すことが可能になる。 According to this configuration, when the cognitive function declines to the extent that it is estimated to be within the range of cognitive impairment, the other device 22 is notified, and a third party can be prompted to have the subject undergo an examination or treatment. It will be possible.

この構成例における認知機能評価方法は、発話区間Ｔｓにおける対象者の音声データから所定の抽出期間Ｔｘごとに特徴量を抽出するステップと、特徴量の変化に基づいて対象者の認知機能を推定するステップとを備える。特徴量は、音声データの所定の抽出期間Ｔｘにおける韻律情報を反映した１種類の変量の時系列データで表されている。 In the cognitive function evaluation method in this configuration example, a step of extracting a feature amount from the voice data of the subject in the utterance section Ts for each predetermined extraction period Tx, and estimating the cognitive function of the subject based on the change in the feature amount. And steps. The feature amount is represented by one type of variable time series data that reflects prosody information in a predetermined extraction period Tx of voice data.

この方法によれば、特徴量が１種類の変量Ｖａで表されているから、認知機能の評価を容易に行うことが可能である。 According to this method, since the feature amount is represented by one type of variable Va, it is possible to easily evaluate the cognitive function.

この構成例におけるプログラムは、コンピュータを認知機能評価装置１０として機能させるためのプログラムである。あるいは、この構成例におけるプログラムは、コンピュータで認知機能評価方法を実現するためのプログラムである。 The program in this configuration example is a program for causing a computer to function as the cognitive function evaluation device 10. Alternatively, the program in this configuration example is a program for realizing the cognitive function evaluation method on a computer.

なお、上述した実施形態は本発明の一例である。このため、本発明は、上述の実施形態に限定されることはなく、この実施形態以外であっても、本発明に係る技術的思想を逸脱しない範囲であれば、設計等に応じて種々の変更が可能であることはもちろんのことである。 The above-described embodiment is an example of the present invention. For this reason, the present invention is not limited to the above-described embodiment, and other than this embodiment, as long as it does not deviate from the technical idea of the present invention, various types according to the design etc. Of course, it can be changed.

１０認知機能評価装置
１１取得部
１２特徴抽出部
１３推定部
１４記憶部
１５通知部
２２他装置
Ｔｓ発話区間
Ｔｘ抽出期間
Ｖａ変量 10 cognitive function evaluation device 11 acquisition unit 12 feature extraction unit 13 estimation unit 14 storage unit 15 notification unit 22 other device Ts utterance section Tx extraction period Va variable

Claims

An acquisition unit that acquires the voice data of the target person in the utterance section,
A feature extraction unit that extracts a feature amount that reflects the cognitive function of the subject from the voice data in the utterance section;
An estimation unit that estimates the cognitive function of the subject based on the change in the feature amount,
The cognitive function evaluation apparatus is characterized in that the feature amount is represented by one type of variable time series data that reflects prosody information in a predetermined extraction period of the voice data.

The cognitive function according to claim 1, wherein the estimating unit evaluates the cognitive function of the subject based on at least one of a difference between the maximum value and the minimum value of the feature amount and a temporal change of the variable amount in the feature amount. Functional evaluation device.

The cognitive function evaluation device according to claim 1 or 2, wherein the variable represents a pitch frequency extracted from the voice data for each extraction period by a numerical value in a predetermined range.

The cognitive function evaluation apparatus according to claim 1, wherein the variable represents a value obtained by combining a plurality of predetermined formants extracted from the voice data for each extraction period, with a numerical value in a predetermined range.

The variable is a value obtained by synthesizing a pitch frequency extracted from the voice data for each extraction period and a plurality of predetermined formants extracted from the voice data for each extraction period as a numerical value in a predetermined range. The cognitive function evaluation device according to claim 1 or 2.

The feature quantity is represented by time-series data in which a value obtained by adding a plurality of adjacent predetermined variables among the time-series data of the variables is data of a temporary point. The cognitive function evaluation device described in 1.

There are multiple utterance sections,
Further comprising a storage unit that stores the feature amount in the plurality of utterance intervals,
The estimation unit is
7. The cognitive function of the subject is estimated according to the degree of similarity of the feature amounts in the plurality of utterance sections stored in the storage unit, and the degree of similarity is estimated. The cognitive function evaluation device according to item 1.

The cognitive function evaluation device according to claim 1, further comprising a notification unit that notifies another device when the estimation unit estimates the cognitive function of the target person as a range of cognitive impairment.

A program for causing a computer to function as a cognitive function evaluation device,
The cognitive function evaluation device,
A feature extraction unit that extracts a feature amount for each predetermined extraction period from the voice data of the target person in the utterance section,
An estimation unit that estimates the cognitive function of the subject based on the change in the feature amount,
The feature amount is represented by one type of variable time series data that reflects prosody information in a predetermined extraction period of the voice data.
A program characterized by that.

A program for realizing a cognitive function evaluation method on a computer,
The cognitive function evaluation method,
A step of extracting a feature amount for each predetermined extraction period from the voice data of the target person in the utterance section,
Estimating the cognitive function of the subject based on the change in the feature amount,
The feature amount is represented by one type of variable time series data that reflects prosody information in a predetermined extraction period of the voice data.
A program characterized by that.