JPH1195795A

JPH1195795A - Voice quality evaluating method and recording medium

Info

Publication number: JPH1195795A
Application number: JP9250913A
Authority: JP
Inventors: Kiyoaki Aikawa; 清明相川; Satoshi Takahashi; 敏高橋
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1997-09-16
Filing date: 1997-09-16
Publication date: 1999-04-09

Abstract

PROBLEM TO BE SOLVED: To improve the reliability by giving a high score to a correctly uttered voice and giving a low score to an erroneously uttered voice. SOLUTION: A correct solution absolute logarithmic likelihood distribution PGP1 (X), which is defined as the distribution of the likelihood value between phoneme segments and the probability model of correct solution phonemes, and an incorrect solution absolute logarithmic likelihood distribution PGP1 P2 (X) are beforehand obtained by employing learning voices and the distributions are partitioned into segments against evaluation voices and correct solution phoneme groups are assigned (S1). Then, a likelihood SGP1 =x, which is the model of a correct solution phoneme P1 of each segment, is obtained (S2). Then, beforehand-obtained PGP1 (x) and each PGP1 P2 (x) are obtained by substituting (x) and the sum value of each difference is defined as an accumulated absolute score SCP1 (X)(S3).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、コンピュータを
用いて発声された音声の品質評価を行う方法及び記録媒
体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and a recording medium for evaluating the quality of uttered speech using a computer.

【０００２】[0002]

【従来の技術】従来の音声品質評価方法として、スペク
トル距離尺度による方法がある（例えば、文献１，伊藤
憲三，北脇信彦，筧一彦，“音声のディジタル波形符号
化方式の客観的評価尺度の検討”，電子情報通信学会，
Ｖｏｌ．Ｊ６６−Ａ，Ｎｏ．３，ｐｐ．２７３（１９８
３））。例えば、ある通信系や符号化法を通した音声を
評価をする場合、入力端の原音声と出力端の音声のスペ
クトルを比較することにより、音声の品質を評価するこ
とができる。この方法においては、リファレンス（基
準）音声として、評価音声が歪む前の原音声が必要であ
る。したがって、外国語の発声訓練者の音声を評価する
ような場合には適用できない。語学の発声訓練で、リフ
ァレンス音声となるのは、母国語話者と同等の発声方法
を習得した語学訓練者本人の音声であり、訓練中はとう
てい得られない。任意の母国語話者の音声をリファレン
スとすることも考えられるが、音声には話者固有のスペ
クトルの特徴があるため、発声の未熟さに起因するスペ
クトルの特徴の差であるか、話者の違いによるスペクト
ルの特徴の差であるかを区別することができないので正
しい評価が行えないという問題がある。この問題を解決
するために、隠れマルコフモデル（ＨｉｄｄｅｎＭａ
ｒｋｏｖＭｏｄｅｌ：ＨＭＭ）のような確率モデルを
用いた音声品質評価方法がある。ＨＭＭは、多数の話者
の音声を用いて学習して得るため、様々な話者の特徴を
モデル内に含ませることができる。大量の学習データを
用意すれば、話者の違いによる影響を小さくでき、発声
方法の違いによる差のみを評価に反映することができ
る。例えば、ＨＭＭを用いた方法を、外国語の発声訓練
中の音声評価に適用するためには、多数の母国語話者の
音声を集め、これをもとにＨＭＭを作成する。これをリ
ファレンスとして、訓練者の音声と比較することにより
評価を行う。2. Description of the Related Art As a conventional speech quality evaluation method, there is a method based on a spectrum distance scale (for example, Reference 1, Kenzo Ito, Nobuhiko Kitawaki, Kazuhiko Kakehi, "Study of objective evaluation scale of digital waveform coding method for speech". ”, IEICE,
Vol. J66-A, No. 3, pp. 273 (198
3)). For example, when evaluating speech through a certain communication system or coding method, the quality of speech can be evaluated by comparing the spectrum of the original speech at the input end with the spectrum of the speech at the output end. In this method, the original voice before the evaluation voice is distorted is required as the reference (reference) voice. Therefore, it cannot be applied to the case where the voice of a foreign language utterance trainer is evaluated. In language utterance training, the reference voice is the voice of the language trainer who has acquired the same utterance method as that of the native speaker, and cannot be obtained during training. Although it is conceivable to use the voice of an arbitrary native language speaker as a reference, since the voice has a speaker-specific spectral characteristic, it may be the difference between the spectral characteristics due to the immaturity of the utterance, There is a problem that correct evaluation cannot be performed because it is not possible to distinguish whether the difference is due to a difference in spectral characteristics due to the difference between. To solve this problem, a hidden Markov model (Hidden Ma
There is a voice quality evaluation method using a probabilistic model such as rkov Model (rkov Model: HMM). Since the HMM is obtained by learning using the voices of a large number of speakers, various speaker characteristics can be included in the model. If a large amount of learning data is prepared, the influence of the difference between speakers can be reduced, and only the difference due to the difference in utterance method can be reflected in the evaluation. For example, in order to apply the method using the HMM to speech evaluation during foreign language utterance training, speech of a large number of native speakers is collected, and an HMM is created based on the collected speech. Using this as a reference, evaluation is performed by comparing with the voice of the trainee.

【０００３】ＨＭＭの詳細は例えば文献２（中川聖一：
確率モデルによる音声認識，電子情報通信学会）に示さ
れている。図７Ａに、３状態の連続混合分布型ＨＭＭの
例を示す。この様なモデルを音声単位（音素，音節，単
語など）ごとに作成する。各状態Ｓ１からＳ３には、音
声特徴パラメータの統計的な分布Ｄ１からＤ３が付与さ
れる。例えば、これが音素モデルであるとすると、第１
状態は音素の始端付近、第２状態は中心付近、第３状態
は終端付近の特徴量の統計的な分布を表現する。[0003] Details of the HMM are described in, for example, Reference 2 (Seiichi Nakagawa:
Probabilistic model-based speech recognition, IEICE). FIG. 7A shows an example of a three-state continuous mixture distribution HMM. Such a model is created for each voice unit (phonemes, syllables, words, etc.). To each of the states S1 to S3, a statistical distribution D1 to D3 of audio feature parameters is given. For example, if this is a phoneme model, the first
The state represents the statistical distribution of the feature amount near the beginning of the phoneme, the second state near the center, and the third state near the end.

【０００４】各状態の特徴量分布Ｄ１〜Ｄ３は、複雑な
分布形状を表現するために、複数の連続確率分布（以
下、混合連続分布と記す）を用いて表現される場合が多
い。連続確率分布には、様々な分布が考えられるが、正
規分布が用いられることが多い。また、それぞれの正規
分布は、特徴量と同じ次元数の多次元無相関正規分布で
表現されることが多い。図７Ｂに、混合連続分布の例を
示す。この図における正規分布の個数はＮ１〜Ｎ３の３
つである。混合連続分布ＨＭＭの状態ｉの時刻ｔの入力
特徴量ベクトルｏｔ＝（ｏｔ１，ｔ２，…， oｔＰ )
（Ｐは総次元数）に対する出力確率ｂｉ（ｏｔ）は、ｂｉ（ｏｔ）＝Σ_m=1 ^Mｗｉｍ・φｉ，ｍ（ｏｔ）（１）のように計算される。ここで、ｗｉｍは状態ｉのｍ番目
の多次元正規分布に対する重み係数を表わす。多次元正
規分布ｍに対する確率密度は、The feature distributions D1 to D3 in each state are often expressed using a plurality of continuous probability distributions (hereinafter, referred to as mixed continuous distributions) in order to express a complicated distribution shape. Although various distributions can be considered as the continuous probability distribution, a normal distribution is often used. Further, each normal distribution is often represented by a multidimensional uncorrelated normal distribution having the same number of dimensions as the feature amount. FIG. 7B shows an example of the continuous mixture distribution. In this figure, the number of normal distributions is 3 of N1 to N3.
One. Input feature vector ot = (ot1, t2,..., OtP) at time t in state i of the mixed continuous distribution HMM.
The output probability bi (ot) for (P is the total number of dimensions) is calculated as follows: bi (ot) = Σ _{m = 1} ^M wim · φi, m (ot) (1) Here, wim represents a weight coefficient for the m-th multidimensional normal distribution in state i. The probability density for the multidimensional normal distribution m is

【０００５】[0005]

【数１】 (Equation 1)

【０００６】のように計算される。ここで、μｉｍは状
態ｉのｍ番目の多次元正規分布に対する平均値ベクト
ル、Σ_imは共分散行列を表わす。Ｔは行列の転置を表わ
す。各正規分布の共分散行列は対角成分のみであるとす
ると、φｉｍ（ｏｔ）の対数値は、[0006] It is calculated as follows. Here, μim represents a mean vector for the m-th multidimensional normal distribution in state i, and Σ _im represents a covariance matrix. T represents the transpose of the matrix. Assuming that the covariance matrix of each normal distribution has only diagonal components, the logarithmic value of φim (ot) is

【０００７】[0007]

【数２】 (Equation 2)

【０００８】と表わせる。ここで、μｉｍｐは状態ｉの
ｍ番目の多次元正規分布の平均ベクトルの第ｐ次目の成
分を、σｉｍｐ²は、状態ｉのｍ番目の多次元正規分布
の共分散行列の第ｐ次目の対角成分（分散値）を表わ
す。従来の、ＨＭＭを用いた音声評価方法におけるスコ
アとしては、式（３）の対数尤度値が用いられた。ＨＭ
Ｍは、多数の話者の音声の特徴を確率分布として含んで
いるため、ある１人の話者の音声をリファレンスとする
方法よりも適切な評価が行える。しかし、式（３）で計
算される対数尤度値のみを用いる従来法では、正しく発
声された音声に対して、低い対数尤度値を示す場合があ
り、評価値として不適切であるという問題があった。[0008] Here, μimp is the p-th component of the average vector of the m-th multidimensional normal distribution of the state i, and σimp ² is the p-th component of the covariance matrix of the m-th multidimensional normal distribution of the state i. Represents a diagonal component (variance value). As a score in the conventional speech evaluation method using the HMM, the log likelihood value of Expression (3) was used. HM
Since M includes the features of the voices of many speakers as a probability distribution, it is possible to perform more appropriate evaluation than the method using the voice of one speaker as a reference. However, in the conventional method using only the log likelihood value calculated by Expression (3), a low log likelihood value may be shown for a correctly uttered voice, which is inappropriate as an evaluation value. was there.

【０００９】[0009]

【発明が解決しようとする課題】この発明の目的は、確
率モデルを用いた音声品質評価方法において、母音子音
等が正しく発声された音声に対して、高いスコアを与
え、誤った発声の音声に対して低いスコアを与える信頼
性の高い音声評価方法を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a voice quality evaluation method using a stochastic model, in which a high score is given to a voice in which a vowel consonant or the like is uttered correctly, and a voice having an incorrect utterance is given. It is an object of the present invention to provide a highly reliable voice evaluation method that gives a low score to the user.

【００１０】[0010]

【課題を解決するための手段】この発明では、母音や子
音など、音素ごとのような音声基本単位の音声特徴量に
関する確率モデルを用いて、評価音声に対する絶対スコ
アと相対スコアを計算し、これらのどちらかまたは両方
を用いて品質評価を行う。はじめに、絶対スコアを得る
手順を音声基本単位として音素を用いる場合について述
べる。学習時に、大量の学習用音声サンプルを母音、子
音等の音素に分解して音素セグメントを得たのち、各音
声セグメントと、それが属すべき正しい音素（正しい言
語的カテゴリ）の確率モデルとの尤度値、つまり正しい
カテゴリとなる尤度、例えば対数尤度値（正解絶対対数
尤度と記す）を計算する。次に、正解絶対対数尤度値が
どのような広がりをもって分布するかに関する情報を蓄
えるための分布、つまり正しいカテゴリと対数尤度の分
布であって、（正解絶対対数尤度分布）を求めておく。
また、各音声セグメントと、発声とは異なる誤った音素
（誤った言語的カテゴリ）の確率モデルとの尤度値（例
えば対数尤度値、非正解対数尤度）を計算する。非正解
絶対対数尤度値に関しても、これがどのような広がりを
もって分布するかの情報を蓄えるための分布、つまり誤
ったカテゴリとなる尤度の分布（非正解絶対対数尤度分
布）を求めておく。評価時には、評価される音声と発声
されるべき音素系列を入力し、音素セグメントに分解す
る。各音素セグメントに対し、正解絶対対数尤度を計算
し、正解絶対対数尤度と正解絶対対数尤度分布から得ら
れる確からしさを示す尤度値、つまり前記計算した尤度
が正解となる確からしさを示す値（尤度１）と、正解絶
対対数尤度と非正解絶対対数尤度分布から得られる尤
度、つまり前記計算した尤度が非正解となる確からしさ
を示す値（尤度２）を計算する。得られた上記２つの尤
度（尤度１と尤度２）から絶対スコアを得る。According to the present invention, an absolute score and a relative score for an evaluation voice are calculated using a probability model relating to a voice feature amount of a basic voice unit such as a vowel or a consonant. Quality evaluation is performed using either or both of the above. First, a procedure for obtaining an absolute score using a phoneme as a basic speech unit will be described. At the time of learning, a large number of training speech samples are decomposed into phonemes such as vowels and consonants to obtain phoneme segments, and then the likelihood of each speech segment and the probability model of the correct phoneme (correct linguistic category) to which it belongs. A degree value, that is, a likelihood that is a correct category, for example, a log likelihood value (referred to as a correct absolute log likelihood) is calculated. Next, a distribution for storing information on how the correct absolute log likelihood value is distributed, that is, a distribution of a correct category and log likelihood, and (correct absolute log likelihood distribution) is obtained. deep.
Further, the likelihood value (for example, log likelihood value, non-correct log likelihood value) of each speech segment and a probability model of an erroneous phoneme (erroneous linguistic category) different from the utterance is calculated. As for the non-correct absolute log likelihood value, a distribution for storing information on how the distribution is distributed, that is, a distribution of the likelihood as a wrong category (non-correct absolute log likelihood distribution) is obtained. . At the time of evaluation, a speech to be evaluated and a phoneme sequence to be uttered are input and decomposed into phoneme segments. For each phoneme segment, the correct absolute log likelihood is calculated, and the likelihood value indicating the likelihood obtained from the correct absolute log likelihood and the correct absolute log likelihood distribution, that is, the likelihood that the calculated likelihood is the correct answer. (Likelihood 1) and the likelihood obtained from the correct absolute log likelihood and the non-correct absolute log likelihood distribution, that is, a value indicating the likelihood that the calculated likelihood is a non-correct answer (likelihood 2). Is calculated. An absolute score is obtained from the obtained two likelihoods (likelihood 1 and likelihood 2).

【００１１】次に、相対スコアを得る手順を述べる。学
習時に、大量の学習用音声サンプルを用いて、各音声サ
ンプルとそれが属すべき正しい言語の確率モデルとの尤
度値（例えば正解絶対対数尤度）と、誤った音素（誤っ
た言語的カテゴリ）の確率モデルとの尤度値（例えば非
正解絶対対数尤度）の差の尤度値（正解相対対数尤度）
を計算する。次に、正解相対対数尤度値がどのような広
がりをもって分布するかに関する情報を蓄えるたるめの
分布、つまり正しいカテゴリとなる尤度と誤ったカテゴ
リとなる尤度との差の分布、（正解相対対数尤度分布）
を求めておく。また、各音声セグメントと、誤った音素
の確率モデルの対数尤度値（非正解絶対対数尤度）どう
しの差の尤度値（非正解相対対数尤度）を計算する。非
正解相対対数尤度値に関しても、これがどのような広が
りをもって分布するかの情報を蓄えるための分布、各誤
ったカテゴリとなる尤度間の差の分布（非正解相対対数
尤度分布）を求めておく。評価時には、評価される音声
と発声されるべき音素系列を入力し、音素セグメントに
分解する。各音素セグメントに対し正解相対対数尤度を
計算し、正解相対対数尤度と前記予め蓄えた正解相対対
数尤度分布から得られる尤度、つまり前記計算した尤度
が対象とする正解カテゴリと非正解カテゴリとの違いと
なる確からしさを示す値（尤度３）と、正解相対対数尤
度と前記予め蓄えた非正解相対対数尤度分布から得られ
る尤度、つまり前記計算した尤度が、対象とする非正解
カテゴリ間の違いとなる確からしさを示す値（尤度４）
を計算する。得られた上記２つの尤度（尤度３と尤度
４）から相対スコアを得る。Next, a procedure for obtaining a relative score will be described. During training, a large number of training speech samples are used to determine the likelihood value (eg, the absolute log likelihood of the correct answer) between each speech sample and the probability model of the correct language to which it belongs, and the wrong phoneme (the wrong linguistic category ) And the likelihood value (eg, non-correct absolute log likelihood) with the probability model (correct relative log likelihood)
Is calculated. Next, a loose distribution that stores information about the extent to which the correct logarithmic likelihood values are distributed, that is, the distribution of the difference between the likelihood of a correct category and the likelihood of a wrong category, Relative log likelihood distribution)
Ask for. Further, the likelihood value (non-correct relative log likelihood) of the difference between the log likelihood value (non-correct absolute log likelihood) of each speech segment and the erroneous phoneme probability model is calculated. Regarding the non-correct relative log likelihood value, the distribution for storing information on how wide it is distributed and the distribution of the difference between the likelihoods of each incorrect category (non-correct relative log likelihood distribution) Ask for it. At the time of evaluation, a speech to be evaluated and a phoneme sequence to be uttered are input and decomposed into phoneme segments. The correct relative log likelihood is calculated for each phoneme segment, and the correct relative log likelihood and the likelihood obtained from the previously stored correct relative log likelihood distribution, that is, the correct category and the non- A value indicating the likelihood that is different from the correct category (likelihood 3), the likelihood obtained from the correct relative log likelihood and the previously stored non-correct relative log likelihood distribution, that is, the calculated likelihood, A value indicating the likelihood of a difference between the target non-correct answer categories (likelihood 4)
Is calculated. A relative score is obtained from the obtained two likelihoods (likelihood 3 and likelihood 4).

【００１２】最終的な評価スコアとしては、上記の絶対
スコアまたは上記の相対スコアのどちらか、またはその
両方から計算されるスコアを採用することを最も主な特
徴とする。従来の方法は、評価される音声とそれが属す
べき正しい言語カテゴリの確率モデルとの正解絶対対数
尤度値をスコアとしている。したがって、この発明の方
法は、誤った言語カテゴリの確率モデルに対する対数尤
度値を考慮している点、対数尤度値がどのように分布す
るかを考慮している点、絶対尤度値だけでなく、相対尤
度値を考慮している点が従来法と異なる。The most main feature of the final evaluation score is to employ a score calculated from either the absolute score or the relative score or both. Conventional methods score the correct absolute log likelihood value of the evaluated speech and the probability model of the correct language category to which it belongs. Therefore, the method of the present invention considers the log likelihood value for the probability model of the wrong language category, the point of considering how the log likelihood value is distributed, only the absolute likelihood value. However, it differs from the conventional method in that a relative likelihood value is considered.

【００１３】なお、上記確率モデルは、ＨＭＭを用いる
とモデルパラメータの推定が容易であり、推定精度も高
い。また、上記の正解絶対対数尤度分布、非正解絶対対
数尤度分布、正解相対対数尤度分布、非正解相対対数尤
度分布には、数学的に取り扱いやすい正規分布や、シグ
モイド関数を用いると分布のパラメータを求めやすい。In the above probability model, the use of the HMM makes it easy to estimate model parameters, and the estimation accuracy is high. In addition, the above-described correct absolute log likelihood distribution, non-correct absolute log likelihood distribution, correct relative log likelihood distribution, non-correct relative log likelihood distribution, a normal distribution or a sigmoid function that is mathematically easy to handle is used. Easy to find distribution parameters.

【００１４】[0014]

【作用】従来の評価方法で用いられていた、正しい音素
カテゴリの確率モデルとの絶対対数尤度値は、学習用音
声とどの程度類似しているかを示している。もちろん、
この絶対的な指標は音声を評価する上で重要である。し
かし、母音子音などの音素カテゴリには、複数の類似し
たカテゴリ（例えば、音素／ｂ／，／ｄ／，／ｇ／）が
存在するため、評価される音声が他のカテゴリに近い音
であるかどうかを知った上で評価することは重要である
（非正解絶対対数尤度分布の利用）。この発明の実施例
では、正解絶対対数尤度分布と非正解絶対対数尤度分布
の両方を考慮した絶対スコアを用いる。例えば、音素／
ｂ／の音声サンプルがあったとき、音素／ｂ／の確率モ
デルに対する対数尤度値が高いとしても、同様に音素／
ｄ／の確率モデルに対する対数尤度値が高ければ、この
サンプルが正しい音素／ｂ／であると言い切ることは難
しいはずである。この発明の絶対スコアは、このような
事象を考慮した評価値を得ることができる。The absolute log likelihood value of the correct phoneme category and the probability model used in the conventional evaluation method indicates the similarity to the learning speech. of course,
This absolute index is important in evaluating speech. However, in a phoneme category such as a vowel consonant, there are a plurality of similar categories (for example, phonemes / b /, / d /, / g /), so that the evaluated voice is a sound close to other categories. It is important to know whether or not to evaluate (using non-correct absolute log-likelihood distribution). In the embodiment of the present invention, an absolute score considering both the correct absolute log likelihood distribution and the non-correct absolute log likelihood distribution is used. For example, phoneme /
If there is a voice sample of b /, the log likelihood value for the probability model of phoneme / b / is high,
If the log likelihood value for the stochastic model of d / is high, it should be difficult to say that this sample is the correct phoneme / b /. The absolute score of the present invention can obtain an evaluation value in consideration of such an event.

【００１５】また、正しい発声であるにも関わらず、絶
対対数尤度値が低い場合がある。これは、音声認識で用
いられている音声の特徴量（例えばケプストラム）は、
音声の識別（音素間のスコアの大小関係によってのみ決
まる）には適当であるが、絶対的な評価には適当でない
場合があるからである。正しい音素カテゴリの確率モデ
ルとの絶対対数尤度値は低くても、他のカテゴリの確率
モデルに対する絶対対数尤度値は更に低いことが多い。
絶対対数尤度値は低くても、圧倒的に正しい音素カテゴ
リに近いと言える場合は正しい発声とみなしてもよいと
考える。この発明の実施例では、正しい音素カテゴリの
確率モデルとの絶対対数尤度値と、誤った音素カテゴリ
の確率モデルとの絶対対数尤度値との差（相対スコア）
を考慮することにより、以上の事象を考慮した評価値を
得ることができる。In some cases, the absolute log likelihood value is low even though the utterance is correct. This is because the features (for example, cepstrum) of speech used in speech recognition are
This is because it is appropriate for discriminating speech (determined only by the magnitude relationship between the phonemes), but may not be appropriate for absolute evaluation. Although the absolute log likelihood value with the probability model of the correct phoneme category is low, the absolute log likelihood value with respect to the probability model of another category is often lower.
Even if the absolute log likelihood value is low, when it can be said that it is overwhelmingly close to the correct phoneme category, it is considered that it can be regarded as a correct utterance. In the embodiment of the present invention, the difference (relative score) between the absolute log likelihood value with the probability model of the correct phoneme category and the absolute log likelihood value with the probability model of the wrong phoneme category.
By taking into account the above, it is possible to obtain an evaluation value in consideration of the above events.

【００１６】総合的なスコアは、絶対スコアと相対スコ
アのどちらか、または両方を考慮して決定する。両方を
考慮した場合、以上述べた絶対的な音質評価と他のカテ
ゴリとの相対的な音質評価をすることが可能となり、こ
の発明の目的である信頼性の高い音声品質評価を行うこ
とができるようになる。The overall score is determined in consideration of one or both of the absolute score and the relative score. When both are considered, the absolute sound quality evaluation described above and the relative sound quality evaluation with respect to other categories can be performed, and the highly reliable voice quality evaluation, which is the object of the present invention, can be performed. Become like

【００１７】[0017]

【発明の実施の形態】この発明を外国語の発声訓練に用
いた実施例について述べる。ここでは、確率モデルにＨ
ＭＭを用いた。はじめに、学習ステップ（システムの準
備段階）について述べる。母国語話者が発声した音声デ
ータベースをもとに音素単位の不特定話者用ＨＭＭを学
習する。例えば、対象言語が日本語の場合、日本語に現
れるすべての音素ＨＭＭを日本人の発声した音声を用い
て学習する。前後の音素との音響的な影響を考慮して、
音素環境依存型ＨＭＭを作成する。図１は、正解絶対対
数尤度分布と非正解絶対対数尤度分布をもとめるフロー
チャートである。学習音声サンプルと発声に即した音素
系列を与える。音素系列に従い、音素ＨＭＭを連結し、
音声サンプルとマッチングさせ、各音素に対応したセグ
メント音声を求める（Ｓ１）。正解音素のＨＭＭと音素
セグメントとの対数尤度値を求める（Ｓ２）。対数尤度
値は長さＬに従って正規化される。正規化の方法は例え
ば、Ｓｎｏ＝Ｓｓ／Ｌ（４）のようになる。ここで、Ｓｓは音素セグメントから得ら
れる対数尤度値、Ｓｎｏは正規化された対数尤度値であ
る。ここではこの正規化尤度を、正解絶対対数尤度と呼
ぶことにする。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment in which the present invention is used for foreign language utterance training will be described. Here, the probability model is H
MM was used. First, the learning step (system preparation stage) will be described. The HMM for an unspecified speaker for each phoneme is learned based on a speech database uttered by a native language speaker. For example, when the target language is Japanese, all phoneme HMMs appearing in Japanese are learned using voices uttered by Japanese. Considering the acoustic effects of the phonemes before and after,
Create a phoneme environment dependent HMM. FIG. 1 is a flowchart for obtaining a correct absolute log likelihood distribution and a non-correct absolute log likelihood distribution. A learning speech sample and a phoneme sequence based on the utterance are given. Concatenate phoneme HMMs according to phoneme sequence,
Matching with a voice sample, a segment voice corresponding to each phoneme is obtained (S1). A log likelihood value between the HMM of the correct phoneme and the phoneme segment is obtained (S2). The log likelihood value is normalized according to the length L. The normalization method is, for example, as follows: Sno = Ss / L (4) Here, Ss is a log likelihood value obtained from a phoneme segment, and Sno is a normalized log likelihood value. Here, the normalized likelihood is referred to as a correct absolute log likelihood.

【００１８】正解絶対対数尤度Ｓ^G _P1は正しい音素モデ
ル（ｐｈｏｎ１）を与えたときの、セグメント長で正規
化された絶対対数尤度値である。また、音素セグメント
に対して、誤った音素のＨＭＭとの対数尤度値も計算す
る。同様に、対数尤度値はセグメント長で正規化する。
これをここでは、非正解絶対対数尤度と呼ぶことにす
る。The correct absolute log likelihood S ^G _P1 is an absolute log likelihood value normalized by a segment length when a correct phoneme model (phon1) is given. Further, the log likelihood value of the wrong phoneme with respect to the HMM is calculated for the phoneme segment. Similarly, the log likelihood value is normalized by the segment length.
This is referred to herein as non-correct absolute log likelihood.

【００１９】非正解絶対対数尤度Ｓ^N _P1P2は音素（ｐｈ
ｏｎ１）の音声セグメントに、対抗音素モデル（ｐｈｏ
ｎ２）を当てはめたときの、セグメント長で正規化され
た絶対対数尤度値である。学習データのある音素（ｐｈ
ｏｎ１）について、正解絶対対数尤度と非正解絶対対数
尤度を計算した後、次のような累積分布関数を描くと
（Ｓ４，Ｓ５）、例えば図２Ａのような曲線になる。The non-correct absolute log likelihood S ^N _P1P2 is a phoneme (ph
on1), a competing phoneme model (pho
This is the absolute log likelihood value normalized by the segment length when n2) is applied. Phoneme with learning data (ph
For on1), after calculating the correct absolute log likelihood and the non-correct absolute log likelihood, the following cumulative distribution function is drawn (S4, S5), for example, a curve as shown in FIG. 2A is obtained.

【００２０】ｆ（ｘ）＝Ｐｒｏｂ｛Ｓ^G _P1＜ｘ｝（５）ｇ（ｘ）＝Ｐｒｏｂ｛Ｓ^N _P1P2＜ｘ｝（６）これらの累積頻度分布を数学的に取り扱やすいシグモイ
ド関数で近似する（Ｓ６，Ｓ７）。ｆ（ｘ）＝１／｛１＋ｅｘｐ（−α１（ｘ−β１））｝（７）ｇ（ｘ）＝１／｛１＋ｅｘｐ（−α２（ｘ−β２））｝（８）ここで、αとβは、シグモイド曲線の０．１から０．９
の値域の間で最も適合するように決定される。これらの
関数の確率密度分布は数学的に以下のように与えられ
る。F (x) = Prob ｛S ^G _P1 <x｝ (5) g (x) = Prob ｛S ^N _P1P2 <x｝ (6) A sigmoid function that can easily handle these cumulative frequency distributions mathematically. Approximate (S6, S7). f (x) = 1 / {1 + exp (−α1 (x−β1))} (7) g (x) = 1 / {1 + exp (−α2 (x−β2))} (8) where α and β Is 0.1 to 0.9 of the sigmoid curve
Is determined to be the best fit between the ranges of The probability density distributions of these functions are given mathematically as follows:

【００２１】[0021]

【数３】 (Equation 3)

【００２２】[0022]

【数４】 (Equation 4)

【００２３】上記の確率密度関数で与えられる分布の例
を図２Ｂに示す。横軸は尤度である。これらが即ち正解
絶対対数尤度分布と非正解絶対対数尤度分布である。音
素（ｐｈｏｎ１）について正解絶対対数尤度を求め、ま
た、すべての対抗音素（ｐｈｏｎ２：ここでは全２５音
素）ごとに非正解絶対対数尤度を求めたときの累積分布
関数を図３に示す。従って、式（９）の関数は各音素ご
と１つ、式（１０）の関数は各音素ごとに対抗音素の数
だけ存在する。FIG. 2B shows an example of the distribution given by the above probability density function. The horizontal axis is the likelihood. These are the correct absolute log likelihood distribution and the non-correct absolute log likelihood distribution. FIG. 3 shows the cumulative distribution function when the correct absolute log likelihood is obtained for the phoneme (phon1), and the non-correct absolute log likelihood is obtained for every counterphoneme (phon2: 25 phonemes in this case). Accordingly, there is one function in equation (9) for each phoneme, and there are as many functions in equation (10) for each phoneme as the number of opposing phonemes.

【００２４】図４は正解相対対数尤度分布と非正解相対
対数尤度分布をもとめるフローチャートである。図１に
示した場合と同様にして、学習音声サンプルと、それぞ
れ音素セグメントに分解し、その各音素セグメントに正
解音系列の対応正解音素を割当てる（Ｓ１）。これらセ
グメントについて、それぞれ正解絶対対数尤度Ｓ^G _P1を
求め（Ｓ２）、また非正解絶対対数尤度Ｓ^N _P1P2を求め
る（Ｓ３）。これら正解絶対対数尤度と非正解絶対対数
尤度の差を正解相対対数尤度とし、以下のように計算す
る（Ｓ４）。FIG. 4 is a flowchart for obtaining a correct relative log likelihood distribution and a non-correct relative log likelihood distribution. In the same manner as in the case shown in FIG. 1, the training speech sample and each phoneme segment are decomposed, and a corresponding correct phoneme of a correct phoneme sequence is assigned to each phoneme segment (S1). For each of these segments, the correct absolute log likelihood S ^G _P1 is obtained (S2), and the non-correct absolute log likelihood S ^N _P1P2 is obtained (S3). The difference between the correct absolute log likelihood and the non-correct absolute log likelihood is defined as the correct relative log likelihood and is calculated as follows (S4).

【００２５】 Δ^G _P1P2＝Ｓ^G _P1−Ｓ^N _P1P2 （11）また非正解相対対数尤度は、以下のように計算される
（Ｓ５）。 Δ^N _P1P2＝（Ｓ^N _P1）′−Ｓ^N _P1P2 （12）（Ｓ^N _P1）′＝（１／（ｎ−１））Σ^N _P1P2 （13）ここでΣは音素（ｐｈｏｎ１）と一致しない各音素（ｐ
ｈｏｎ２）についてのＳ^N _P1P2の総和でありＮは音素モ
デルの総数である。正解相対対数尤度と非正解相対対数
尤度について、絶対スコアと同様にそれぞれの累積分布
関数を描き（Ｓ６，Ｓ７）、シグモイド関数の当てはめ
を行った後（Ｓ８，Ｓ９）、確率密度関数（ｑ
^G _P1P2（ｘ），ｑ^N _P1P2（ｘ））を得る（Ｓ１０，Ｓ１
１）。これらが、正解相対対数尤度分布と非正解相対対
数尤度分布である。以上が、学習のステップである。Δ ^G _P1P2 = S ^G _P1 −S ^N _P1P2 (11) The non-correct relative log likelihood is calculated as follows (S5). ^{_{^{_{Δ N P1P2 = (S N P1}}}} ) '- S N P1P2 (12) (S N P1)' = (1 / (n-1)) is Σ ^N _P1P2 (13) where sigma does not match the phonemes (phon1) Each phoneme (p
is the sum of S ^N _P1P2 N for Hon2) is the total number of phoneme model. For the correct relative log likelihood and the non-correct relative log likelihood, the respective cumulative distribution functions are drawn in the same manner as the absolute score (S6, S7), and after applying the sigmoid function (S8, S9), the probability density function ( q
^G _P1P2 (x), q ^N _P1P2 (x)) are obtained (S10, S1
1). These are the correct relative log likelihood distribution and the non-correct relative log likelihood distribution. The above is the learning step.

【００２６】次に、評価すべき音声セグメントが与えら
れたとき、評価値を与える評価ステップについて説明す
る。図５は、綜合スコアを求めるフローチャートであ
る。評価すべき音声と発声しようとした目標音声の音素
系列を与える。音素系列に従い音素ＨＭＭを連結し、入
力音声を音素にセグメントする（Ｓ１）。正解音素（ｐ
ｈｏｎ１）に対する正解絶対対数尤度Ｓ^G _P1＝ｘを計算
し（Ｓ２）、式（９）、式（１０）に従い尤度値を計算
する。図２Ｂの２つの分布の重なりは、誤った判定をす
るエラー領域である。この重なりを反映するスコア関数
としては、以下のようなものが考えられる。Next, an evaluation step of giving an evaluation value when a speech segment to be evaluated is given will be described. FIG. 5 is a flowchart for obtaining an overall score. The phoneme sequence of the speech to be evaluated and the target speech to be uttered are given. The phonemes HMM are connected according to the phoneme sequence, and the input speech is segmented into phonemes (S1). Correct phoneme (p
correct absolute logarithm likelihood S ^G _P1 = x for Hon1) (S2), the formula (9), calculates the likelihood value in accordance with equation (10). The overlap of the two distributions in FIG. 2B is an error area where an erroneous determination is made. The following can be considered as a score function reflecting this overlap.

【００２７】Ｄ_P1P2（ｘ）＝Ｐ^G _P1（ｘ）−Ｐ^N _P1P2（ｘ）（14）ここで、ｘは正解絶対対数尤度である。Ｄ_P1P2（ｘ）が
正のときは正しい発声であり、負のときは誤った発声で
あるとみなす。最終的な絶対スコアＳＣ_P1（ｘ）は、以
下のように計算される（Ｓ３）。ＳＣ_P1（ｘ）＝ΣＤ_P1P2（ｘ）（15） Σはｐｈｏｎ１と一致しないすべてのｐｈｏｎ２につい
てのＤ_P1P2（ｘ）の和である。[0027] _{^{D P1P2 (x) = P G}} P1 (x) -P N P1P2 (x) (14) wherein, x is the absolute log-likelihood correct. When D _P1P2 (x) is positive, it is regarded as a correct utterance, and when it is negative, it is regarded as an erroneous utterance. The final absolute score SC _P1 (x) is calculated as follows (S3). SC _P1 (x) = _{{D P1P2} (x) (15)} is the sum of D _P1P2 (x) for all phon2s that do not match phon1.

【００２８】次に、各音素セグメントに対し、非正解音
素に対する非正解相対対数尤度Ｓ_P1 _P2を求める（Ｓ
４）。正解絶対対数尤度と非正解絶対対数尤度をもと
に、式（１１）に従って正解相対対数尤度を計算する
（Ｓ５）。絶対スコアの場合と同様に、正解相対対数尤
度分布と非正解相対対数尤度分布から、最終的な相対ス
コアΔＳＣ_P1（ｘ）を以下のように計算する（Ｓ６）。Next, for each phoneme segment, the non-correct relative log likelihood S _P1 _P2 for the non-correct phoneme is determined (S
4). Based on the correct absolute log likelihood and the non-correct absolute log likelihood, the correct relative log likelihood is calculated according to equation (11) (S5). As in the case of the absolute score, the final relative score ΔSC _P1 (x) is calculated as follows from the correct relative log likelihood distribution and the non-correct relative log likelihood distribution (S6).

【００２９】 Δ_P1P2（ｘ）＝ｑ^G _P1P2（ｘ）−ｑ^N _P1P2（ｘ）（16） ΔＳＣ_P1（ｘ）＝ΣΔ_P1P2（ｘ）（17）ここで、Σはｐｈｏｎ１以外のｐｈｏｎ２についてのΔ
_P1P2（ｘ）の総和であり、ｘは正解相対対数尤度であ
る。綜合スコアＳＣtoは、絶対スコアＳＣと相対スコア
ΔＳＣから求める。例えば、両者の線形結合を考える
（Ｓ７）。Δ _P1P2 (x) = q ^G _P1P2 (x) −q ^N _P1P2 (x) (16) ΔSC _P1 (x) = ΣΔ _P1P2 (x) (17) Here, Σ represents phon2 other than phon1. Δ
_P1P2 (x), where x is the correct relative log likelihood. The total score SCto is obtained from the absolute score SC and the relative score ΔSC. For example, consider a linear combination of both (S7).

【００３０】ＳＣto＝λ・ＳＣ＋（１−λ）・ΔＳＣ（18）ここで、λは結合係数である。λが０のときは絶対スコ
アのみ、λが１のときは相対スコアのみが綜合スコアに
反映される。両スコアの結合は、線形関数に限らず、非
線形関数でもよい。以上の実施例では、各音素ごとに評
価した。これを単語や文章単位で評価する場合は、各音
素ごとの綜合スコアの重み付き平均などが考えられる。SCto = λ · SC + (1−λ) · ΔSC (18) where λ is a coupling coefficient. When λ is 0, only the absolute score is reflected, and when λ is 1, only the relative score is reflected in the overall score. The combination of the two scores is not limited to a linear function, but may be a non-linear function. In the above examples, evaluation was made for each phoneme. When this is evaluated on a word or sentence basis, a weighted average of the overall score for each phoneme or the like can be considered.

【００３１】図６Ａは、音声特徴量として、メルケプス
トラムを用いてＨＭＭを構成し、この発明による評価方
法を適用した結果の例である。この図は複数の音素／ｂ
／の音声サンプルを音素／ｂ／として評価したときの結
果であり、各点がそれぞれのサンプルに対する結果であ
る。この実験は、正しい発声がなされたときを仮定した
ものである。横軸は、絶対スコアで、縦軸が相対スコア
である。また、図６Ｂは、複数の音素／ｂ／以外の音声
サンプルを音素／ｂ／として評価したときの結果であ
る。この実験は、誤った不適切な発声がなされたときを
仮定したものである。これらの結果から、正しい発声
（サンプル）の場合は、絶対スコアと相対スコアの両方
が大きい場合がほとんどであるが、相対スコアのみが大
きいものも数多く見うけられる。誤った発声（サンプ
ル）の場合は、絶対スコアと相対スコアの両方が小さい
場合が多いが、絶対スコアは必ずしも小さい値になると
は限らない。ここに、相対スコアを導入する効果が認め
られた。FIG. 6A is an example of the result of configuring an HMM using mel-cepstral as a speech feature and applying the evaluation method according to the present invention. This figure shows multiple phonemes / b
This is the result when the voice sample of / is evaluated as phoneme / b /, and each point is the result for each sample. This experiment assumes that the correct utterance was made. The horizontal axis is the absolute score, and the vertical axis is the relative score. FIG. 6B shows a result when voice samples other than a plurality of phonemes / b / are evaluated as phonemes / b /. This experiment assumes that a wrong and inappropriate utterance was made. From these results, in the case of a correct utterance (sample), in most cases, both the absolute score and the relative score are large, but there are many cases in which only the relative score is large. In the case of an incorrect utterance (sample), both the absolute score and the relative score are often small, but the absolute score is not always small. Here, the effect of introducing a relative score was observed.

【００３２】綜合スコアを求める際に線形関数を用いる
と、図６Ａと図６Ｂに示されるように、絶対スコアと相
対スコアで構成される２次元空間上で、直線上の点を同
じスコアとなるように計算される。非線形関数を用いれ
ば、曲線上の点を同じスコアとなるように与えることが
できるので、サンプルの分布に、より即した評価値を与
えることができる。If a linear function is used to obtain the overall score, as shown in FIGS. 6A and 6B, points on a straight line have the same score in a two-dimensional space composed of absolute scores and relative scores. Is calculated as If a non-linear function is used, points on the curve can be given so as to have the same score, so that a more appropriate evaluation value can be given to the sample distribution.

【００３３】上述では音素を単位とする確率モデルを用
いたが、その他の音声基本単位、例えば音節、単語など
でもよい。また絶対対数尤度を用いたが正しい音声単位
のカテゴリの確率モデルに対する正解尤度と、その正解
尤度がどのような広がりを示すかの正解尤度分布と、他
のカテゴリの確率モデルに対する非正解尤度と、その非
正解尤度がどのような広がりを示すかの非正解尤度分布
などを求めてもよい。In the above description, the probability model using a phoneme as a unit has been used, but other basic speech units such as syllables and words may be used. Also, using the absolute log likelihood, the correct likelihood for the probability model of the category of the correct speech unit, the correct likelihood distribution of how the correct likelihood shows, and the non- The likelihood distribution and the likelihood distribution of the correct likelihood and how the non-correct likelihood shows may be obtained.

【００３４】[0034]

【発明の効果】以上説明したように、音声認識のための
ＨＭＭなどの音響モデルを用いると、正しい発声でも絶
対スコアが悪いことがある。しかし、その場合でも相対
スコアが高いことが多い。従って、絶対スコアと相対ス
コアの両方を用いれば、より正しい発声品質評価が可能
となる。As described above, when an acoustic model such as an HMM for voice recognition is used, an absolute score may be bad even with correct utterance. However, even in that case, the relative score is often high. Therefore, if both the absolute score and the relative score are used, more accurate utterance quality evaluation can be performed.

【００３５】この発明は、例えば外国語の発声訓練に用
いることができる。The present invention can be used, for example, for vocal training in a foreign language.

[Brief description of the drawings]

【図１】学習音声サンプルから正解絶対対数尤度分布と
非正解絶対対数尤度分布を生成する手順を示す流れ図。FIG. 1 is a flowchart showing a procedure for generating a correct absolute log likelihood distribution and a non-correct absolute log likelihood distribution from learning speech samples.

【図２】Ａは正解絶対対数尤度、非正解絶対対数尤度の
各累積度分布をシグモイド関数で近似した例を示す図、
Ｂはこれらの確率度密度関数の例を示す図である。FIG. 2A is a diagram showing an example in which each cumulative degree distribution of a correct absolute log likelihood and a non-correct absolute log likelihood is approximated by a sigmoid function;
B is a diagram showing an example of these probability density functions.

【図３】正解絶対対数尤度と各非正解絶対対数尤度のそ
れぞれの系統分布関数の例を示す図。FIG. 3 is a diagram illustrating examples of systematic distribution functions of a correct absolute log likelihood and each non-correct absolute log likelihood.

【図４】学習音声サンプルから正解相対対数尤度分布と
非正解相対対数尤度分布を生成する手順を示す流れ図。FIG. 4 is a flowchart showing a procedure for generating a correct relative log likelihood distribution and a non-correct relative log likelihood distribution from learning speech samples.

【図５】この発明方法による品質評価の手順の例を示す
流れ図。FIG. 5 is a flowchart showing an example of a procedure of quality evaluation according to the method of the present invention.

【図６】この発明の実施例を示す図。FIG. 6 is a diagram showing an embodiment of the present invention.

【図７】Ａは３状態ＨＭＭの例を示す図、ＢはＨＭＭの
確率分布の表現例を示す図である。7A is a diagram illustrating an example of a three-state HMM, and FIG. 7B is a diagram illustrating an example of expressing a probability distribution of the HMM;

Claims

[Claims]

1. Using a plurality of linguistic categories of basic units such as phonemes, syllables, and words represented by a probability model related to acoustic features, and a likelihood of being a correct category calculated from the probability model in advance by learning speech. Inputting a voice sample to be evaluated and a basic unit sequence to be uttered corresponding to the voice, and calculating the voice unit as a basic unit. Decomposing into a basic unit; applying a probability model of a correct category to the decomposed basic unit to obtain a likelihood; distribution of the obtained likelihood and the likelihood of the correct category; From the distribution of likelihoods that fall into categories, a value that indicates the likelihood that the likelihood is correct and a value that indicates the likelihood that each likelihood is incorrect are obtained. -Up and voice quality evaluation method and a step of calculating an absolute score for evaluating quality from the value indicating these certainty.

2. The method according to claim 1, wherein the distribution of the likelihood as a correct category is a correct absolute log likelihood distribution, wherein the learning speech is applied to a probability model of the correct category to calculate a correct absolute log likelihood. Obtaining a correct answer absolute log likelihood distribution expressing what value spread the absolute log likelihood indicates, and the distribution of the likelihoods in each of the erroneous categories is a non-correct absolute log likelihood distribution. Calculating the non-correct absolute log likelihood by applying the above learning speech to the probability model of each incorrect category; and determining the value spread of each non-correct absolute log likelihood. Calculating a non-correct absolute log-likelihood distribution to be expressed.The likelihood obtained by applying the probability model of the correct category is a correct absolute log likelihood. 2. The voice quality evaluation according to claim 1, wherein the value indicating the probability and the value indicating the probability of being a non-correct answer are a correct absolute log likelihood probability density value and a non-correct absolute log likelihood probability density value, respectively. Method.

3. The step of obtaining the absolute score is a step of adding the difference between the correct absolute log likelihood probability density value and the non correct answer absolute log likelihood probability density value for each non correct answer category. 3. The voice quality evaluation method according to claim 2, wherein:

4. Using a plurality of linguistic categories of basic units such as phonemes, syllables, and words represented by a stochastic model related to acoustic features, and a likelihood of being a correct category calculated in advance from the stochastic model by a learning speech. And the distribution of the difference between the likelihoods of the wrong category (denoted as the relative distribution) and the distribution of the difference between the likelihoods of the wrong categories (denoted the non-correct relative distribution) are evaluated. Inputting a voice sample to be uttered and a basic unit sequence to be uttered corresponding to the voice, a step of decomposing the voice sample into basic units, and a probability model of a correct category for the decomposed voice basic unit. And calculating the likelihood of correct answer by applying the probability model of each erroneous category to the basic unit of the decomposed speech. Determining the difference between the correct likelihood and the non-correct likelihood as relative likelihood; and determining the relative likelihood from the relative likelihood and the correct relative distribution and the non-correct relative distribution as a correct category. Value indicating the likelihood that the difference between the wrong category and the wrong category (referred to as the relative correct likelihood value) and the value indicating the certainty that the relative likelihood is the difference between the wrong category and the wrong category (the non-correct answer) A voice quality evaluation method, comprising: a step of calculating a relative likelihood value); and a step of calculating a relative score to be evaluated from the two likelihood values.

5. The method according to claim 5, wherein said correct relative distribution is performed by applying said learning speech to a probability model of a correct category to calculate a correct absolute log likelihood; Calculating a correct absolute log likelihood; obtaining a difference between the correct absolute log likelihood and each of the non-correct absolute log likelihoods as a correct relative log likelihood; Obtaining a distribution representing whether the value spreads. The non-correct relative distribution is obtained by calculating a difference between the non-correct absolute log likelihoods as a non-correct relative log likelihood; and Obtaining a distribution expressing what value spread the log likelihood indicates. The correct likelihood is a correct absolute log likelihood, and the non-correct likelihood is a non-correct answer. Log likelihood, the relative likelihood is the difference between the correct absolute log likelihood and the non-correct absolute log likelihood, and the correct relative likelihood value and the non-correct relative likelihood value are respectively correct. The voice quality evaluation method according to claim 4, wherein the relative log likelihood probability density value and the non-correct relative log likelihood probability density value are used.

6. The step of calculating the relative score includes determining a difference between the correct relative log likelihood probability density value and the corresponding non-correct relative log likelihood probability density value by calculating a combination of error categories. 6. The voice quality evaluation method according to claim 5, further comprising the step of adding the difference between the two.

7. An overall score is obtained by linearly combining the absolute score obtained in any one of claims 1 to 3 and the corresponding relative score obtained in claim 4 to 7 as an evaluation result. A voice quality evaluation method characterized by the following.

8. Each of the log likelihood distributions is obtained by a step of obtaining a cumulative distribution function for the corresponding log likelihood value, and a step of calculating the probability distribution function from the sigmoid function. The voice quality evaluation method according to any one of claims 2, 3, 5, and 6.

9. The log likelihood distribution according to claim 2, wherein the log likelihood is calculated by calculating an average variance and applying a normal distribution to the corresponding log likelihood. Voice quality evaluation method described in Crab.

10. A recording medium on which a program for evaluating voice quality is recorded, wherein the program inputs a voice sample to be evaluated and a voice basic unit sequence to be uttered corresponding to the voice, Decomposing a speech sample into basic speech units; applying a correct category probability model to the decomposed basic speech units to calculate a correct absolute log likelihood; Calculating a correct absolute log likelihood probability density value from the absolute log likelihood distribution; and calculating a non-correct absolute log likelihood probability density value from the correct absolute log likelihood and a non-correct absolute log likelihood distribution obtained in advance. Calculating an absolute score representing the speech quality evaluation from the step and the correct absolute log likelihood probability density value and the non-correct absolute log likelihood probability density value A recording medium readable by a computer, comprising:

11. A recording medium on which a program for evaluating voice quality is recorded, wherein the program inputs a voice sample to be evaluated and a voice basic unit sequence to be uttered corresponding to the voice, Decomposing a voice sample into basic voice units; applying a correct category probability model to the decomposed basic voice unit to calculate a correct absolute log likelihood; Calculating the non-correct absolute log likelihood by applying the probability model of; calculating the difference between the correct absolute log likelihood and the non-correct absolute log likelihood to calculate the correct relative log likelihood; Calculating a correct relative log likelihood probability density value from a correct relative log likelihood distribution in which the correct relative log likelihood is obtained in advance; Calculating a non-correct relative log likelihood probability density value from the non-correct relative log likelihood distribution obtained in advance and the non-correct relative log likelihood probability density value and the non-correct relative log likelihood probability density value. Calculating a relative score representing the quality rating; and a computer readable recording medium.

12. A step of calculating a correct absolute log likelihood by applying a probability model of a correct category to the decomposed basic speech unit, and calculating a correct absolute log likelihood from the correct absolute log likelihood distribution and a previously calculated correct absolute log likelihood distribution. Calculating the absolute log likelihood probability density value; calculating the non-correct absolute log likelihood probability density value from the correct absolute log likelihood and the non-correct absolute log likelihood distribution obtained in advance; Calculating an absolute score representing quality evaluation from the likelihood probability density value and the non-correct absolute log likelihood probability density value; and obtaining a linear combination of the absolute score and the relative score to obtain a quality score. The recording medium according to claim 11, wherein the program is included.

13. A recording medium on which data used for evaluating voice quality is recorded, wherein the data is calculated by applying a voice basic unit of a learning voice to a probability model, and a correct absolute value representing a distribution of likelihood as a correct category. Log-likelihood distribution, non-correct absolute log-likelihood distribution indicating the distribution of likelihood as an incorrect category, and correct relative log indicating the distribution of the difference between likelihood as a correct category and likelihood as an incorrect category A computer-readable recording medium characterized by recording a likelihood distribution, and a non-correct log likelihood distribution representing a distribution of a difference between likelihoods each of which is an erroneous category.