JP2000099071A - Speech recognition device and method - Google Patents

Speech recognition device and method

Info

Publication number
JP2000099071A
JP2000099071A JP26416298A JP26416298A JP2000099071A JP 2000099071 A JP2000099071 A JP 2000099071A JP 26416298 A JP26416298 A JP 26416298A JP 26416298 A JP26416298 A JP 26416298A JP 2000099071 A JP2000099071 A JP 2000099071A
Authority
JP
Japan
Prior art keywords
segment
model
speech recognition
label
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP26416298A
Other languages
Japanese (ja)
Other versions
JP3583930B2 (en
Inventor
Shoichi Matsunaga
昭一 松永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP26416298A priority Critical patent/JP3583930B2/en
Publication of JP2000099071A publication Critical patent/JP2000099071A/en
Application granted granted Critical
Publication of JP3583930B2 publication Critical patent/JP3583930B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To improve the recognition performance by considering the relationship of acoustic features of a segment and the ones immediately before and after. SOLUTION: Speech to be learned is transformed into acoustic feature parameter. Then, the loci of acoustic parameters are obtained for any one of a section 1 which includes a recognition object phoneme Wi and the tail of a phoneme Wi-1 that is immediately before the phoneme Wi, a section 2 which includes the phoneme Wi and the leading top of a phoneme Wi+1 that is immediately after the phoneme Wi and a section 3 which includes the phoneme Wi, the tail of the phoneme Wi-1 and the phoneme Wi+1 that is immediately after the phoneme Wi. Then, appearance probabilities P(Bi-1, Ai|Wi), P(Ai, Bi+1|Wi) and P(Bi+1, Ai, Bi+1|Wi) are obtained against the phoneme Wi to generate models. Then, inputted voices are transformed into feature parameters during a speech recognition, loci of the section parameters corresponding to the sections of the models for each phoneme are obtained and the likelihood between the loci and each model is obtained.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】この発明は、音響特徴パラメ
ータの軌跡に基づいて音声を認識するセグメントモデル
を用いた音声認識装置及び方法に関する。
The present invention relates to a speech recognition apparatus and method using a segment model for recognizing speech based on the locus of acoustic feature parameters.

【0002】[0002]

【従来の技術】従来、音声認識における認識の基本単位
としては、音素単位、副単語(サブワード)単位、単語
単位等(以下これらをユニットと呼ぶ)があり、その単
位に対して隠れマルコフモデル(HMM)が音響モデル
として広く用いられている(例えば、中川聖一著、“確
率モデルによる音声認識”、電子情報通信学会、昭和6
3年7月発行参照。)。音声認識では音声をある一定時
間間隔(ここではこれをフレームと呼ぶ)でパラメータ
化する。このHMMに基づく方式では、隣接するフレー
ム間のパラメータの値は独立であるとして、音声のモデ
ル化、及び認識候補の尤度計算を行っていた。一方、人
間の発声機構の制約により、音声の特徴パラメータは隣
接するフレームでは独立とは考えられない。その点を補
強するモデルとしてユニット内でのパラメータの値の連
続性を仮定したセグメントモデルが提案されている(例
えば、M. Ostendorf他“From HMMs to segment models
:Aubified view of stochastic modeling for speech
recognition”IEEE Transactions on Speech and Audi
o Processing,SAP-4(5),pp.360-378(1996-9))。
2. Description of the Related Art Conventionally, basic units of speech recognition include phoneme units, subword (subword) units, word units, and the like (hereinafter, these are referred to as units). HMM) is widely used as an acoustic model (eg, Seiichi Nakagawa, “Speech Recognition by Stochastic Model”, IEICE, Showa 6)
See issue in July 2013. ). In speech recognition, speech is parameterized at certain fixed time intervals (here, this is called a frame). In the method based on the HMM, assuming that values of parameters between adjacent frames are independent, modeling of speech and calculation of likelihood of a recognition candidate are performed. On the other hand, due to the restriction of the human vocalization mechanism, the feature parameters of the speech are not considered to be independent in adjacent frames. As a model that reinforces this point, a segment model assuming continuity of parameter values in a unit has been proposed (for example, M. Ostendorf et al., “From HMMs to segment models”
: Aubified view of stochastic modeling for speech
recognition ”IEEE Transactions on Speech and Audi
o Processing, SAP-4 (5), pp. 360-378 (1996-9)).

【0003】[0003]

【発明が解決しようとする課題】従来のHMMではパラ
メータ値が独立と仮定され、パラメータの軌跡の連続性
を十分に扱えなかった。また、これまでのセグメントモ
デルはユニット内のパラメータの連続性については捉え
られていたが、ユニット外(隣接するユニット間)のパ
ラメータ値との連続性については扱っておらず、認識性
能はまだ十分ではなかった。この発明の目的は、当該セ
グメント(ユニット)の中だけではなく、隣接するセグ
メント(ユニット)とのパラメータ値の連続性を考慮す
ることで上記の問題点を解決し、これを効率よくモデル
化する方式を具備した、音声認識装置及び方法を提供す
ることにある。
In the conventional HMM, the parameter values are assumed to be independent, and the continuity of the trajectory of the parameter cannot be sufficiently handled. In addition, although the segment model up to now has been concerned with the continuity of the parameters inside the unit, it does not deal with the continuity with the parameter values outside the unit (between adjacent units), and the recognition performance is still insufficient. Was not. An object of the present invention is to solve the above problem by considering the continuity of parameter values not only within the segment (unit) but also with an adjacent segment (unit), and efficiently model the problem. An object of the present invention is to provide a speech recognition apparatus and method having a method.

【0004】[0004]

【課題を解決するための手段】この発明によれば、入力
された音声を音声音響特徴パラメータに分析し、その特
徴パラメータの軌跡の情報に基づいて認識をおこなうセ
グメントモデルを用いた音声認識装置において、認識を
行うセグメントの直前のセグメントの末尾の部分を含め
た区間、あるいは直後のセグメントの先頭の部分を含め
た区間、あるいは直前のセグメントの末尾の部分及び直
後のセグメントの先頭の部分を含めた区間、すなわち隣
接するセグメントへの遷移部分の特徴パラメータと、認
識を行うセグメントの特徴パラメータを含めたセグメン
ト区間の特徴パラメータを併せて、パラメータの軌跡を
求め、その軌跡の情報に基づいたセグメントの尤度を用
いて音声を認識することを特徴とする。つまり前記遷移
部分を含む特徴パラメータの軌跡のそのセグメント情報
に対する出現確率をモデルとして予め求めておき、この
モデルと入力音声信号の特徴パラメータの軌跡との尤度
を求める。
According to the present invention, there is provided a speech recognition apparatus using a segment model for analyzing an inputted speech into speech acoustic feature parameters and performing recognition based on information of a locus of the feature parameters. , The section including the end of the segment immediately before the segment to be recognized, the section including the beginning of the immediately following segment, or the end of the immediately preceding segment and the beginning of the immediately following segment The trajectory of the parameter is obtained by combining the feature parameter of the segment, that is, the feature parameter of the segment section including the feature parameter of the segment to be recognized, and the likelihood of the segment based on the information of the trajectory. It is characterized in that speech is recognized using degrees. That is, the appearance probability of the trajectory of the feature parameter including the transition portion with respect to the segment information is obtained in advance as a model, and the likelihood between this model and the trajectory of the feature parameter of the input audio signal is obtained.

【0005】また、請求項2記載の発明では請求項1記
載の発明において、上記セグメントの尤度計算におい
て、当該セグメントの前後のセグメントのラベル情報も
考慮して、当該セグメントの尤度を計算することを特徴
とする。
According to a second aspect of the present invention, in the first aspect of the invention, the likelihood of the segment is calculated in consideration of the label information of segments before and after the segment in the likelihood calculation of the segment. It is characterized by the following.

【0006】[0006]

【発明の実施の形態】以下、図面を参照してこの発明に
係る実施形態について説明する。図1は、この発明の要
部である特徴パラメータの軌跡を求める範囲を示す図で
ある。図1に認識対象となるi番目のセグメント(具体
的には音素、副単語(サブワード)、単語)のラベルを
wi、その前のセグメントのラベルをwi-1 、後ろのセ
グメントのラベルをwi+1 とそれぞれする。また、それ
ぞれのセグメントのラベルwi,wi-1 ,wi+1 におけ
る各フレームごとに得られる特徴パラメータの軌跡をそ
れぞれAi,Ai-1 ,Ai+1 とする。この発明では、前
後のセグメントのすべてを用いると、処理量が多くなる
ばかりでなく、軌跡の推定精度も落ちるため、前後のセ
グメントの遷移部分、即ち認識を行うセグメントの直前
のセグメントに関しては末尾の部分Bi-1 、直後のセグ
メントに関しては先頭の部分Bi+1 のみを考慮する。具
体的には、セグメントが音素の場合、その長さは通常5
0〜100ミリ秒程度であるが、遷移部分Bi-1 ,Bi+
1 は10〜50ミリ秒程度とする。
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a range for obtaining a locus of a characteristic parameter which is a main part of the present invention. In FIG. 1, the label of the i-th segment (specifically, phoneme, sub-word (sub-word), word) to be recognized is wi, the label of the preceding segment is wi-1, and the label of the following segment is wi + 1 and each. The trajectories of the characteristic parameters obtained for each frame at the labels wi, wi-1 and wi + 1 of the respective segments are denoted by Ai, Ai-1 and Ai + 1, respectively. In the present invention, if all of the preceding and succeeding segments are used, not only does the processing amount increase, but also the trajectory estimation accuracy decreases, so the transition part of the preceding and following segments, that is, the segment immediately before the segment to be recognized, has a trailing end. Only the leading portion Bi + 1 is considered for the portion Bi-1 and the immediately succeeding segment. Specifically, if the segment is a phoneme, its length is usually 5
It is about 0 to 100 milliseconds, but the transition parts Bi-1 and Bi +
1 is about 10 to 50 milliseconds.

【0007】認識を行うセグメントの直前のセグメント
の末尾の部分を含めた区間でパラメータの軌跡を求める
場合は図1中の区間1となり、その軌跡の出現する確
率、つまりラベルwiの時に、パラメータ軌跡Bi-1 ,
Aiが生じる確率は、 P(Bi-1 ,Ai|wi) あるいは前のセグメントの出現確率で正規化した確率 P(Bi-1 ,Ai|wi)/P(Bi-1 |wi) で表す。また、直後のセグメントの先頭の部分を含めた
区間でパラメータの軌跡を求める場合は区間2となり、
その軌跡の出現する確率は、 P(Ai,Bi+1 |wi) あるいは後のセグメントの出現確率で正規化した確率 P(Ai,Bi+1 |wi)/P(Bi+1 |wi) で表す。また、直前のセグメントの末尾の部分及び直後
のセグメントの先頭の部分を含めた区間でパラメータの
軌跡を求める場合は区間3となり、その軌跡の出現する
確率は、P(Bi-1 ,Ai,Bi+1 |wi)で表す。
When a parameter trajectory is obtained in a section including the end of the segment immediately before the segment to be recognized, the section 1 in FIG. 1 is obtained. Bi-1,
The probability of occurrence of Ai is represented by P (Bi-1, Ai | wi) or the probability P (Bi-1, Ai | wi) / P (Bi-1 | wi) normalized by the appearance probability of the previous segment. In addition, when the locus of the parameter is obtained in the section including the leading part of the immediately following segment, the section becomes the section 2,
The probability of the locus appearing is P (Ai, Bi + 1 | wi) or the probability P (Ai, Bi + 1 | wi) / P (Bi + 1 | wi) normalized by the appearance probability of the subsequent segment. Represent. When a parameter trajectory is obtained in a section including the end of the immediately preceding segment and the beginning of the immediately following segment, the section 3 is obtained, and the probability of occurrence of the trajectory is represented by P (Bi-1, Ai, Bi). +1 | wi).

【0008】一方、請求項2のコンテキスト(例えば音
素環境)依存の音響セグメントモデルに関しては、認識
を行うセグメントの直前のセグメントの末尾の部分を含
めた区間でパラメータの軌跡を求める場合は区間1とな
り、その軌跡の出現する確率は、 P(Bi-1 ,Ai|wi-1 ,wi,wi+1 ) あるいは前のセグメントの出現確率で正規化した確率 P(Bi-1 ,Ai|wi-1,wi,wi+1)/ P(Bi-1 |w
i-1,wi,wi+1 ) で表す。また、直後のセグメントの先頭の部分を含めた
区間でパラメータの軌跡を求める場合は区間2となり、
その軌跡の出現する確率は、 P(Ai,Bi+1 |wi-1 ,wi,wi+1 ) あるいは後のセグメントの出現確率で正規化した確率 P(Ai,Bi+1 |wi-1,wi,wi+1)/ P(Bi+1 |w
i-1,wi,wi+1 ) で表す。また、直前のセグメントの末尾の部分及び直後
のセグメントの先頭の部分を含めた区間でパラメータの
軌跡を求める場合は区間3となり、その軌跡の出現する
確率は、P(Bi-1 ,Ai,Bi+1 |wi-1 ,wi,w
i+1 )で表す。
On the other hand, with regard to the acoustic segment model depending on the context (for example, phoneme environment), when the trajectory of the parameter is obtained in the section including the end of the segment immediately before the segment to be recognized, the section becomes the section 1. , The probability that the locus appears is P (Bi−1, Ai | wi−1, wi, wi + 1) or the probability P (Bi−1, Ai | wi−1) normalized by the appearance probability of the previous segment. , wi, wi + 1) / P (Bi-1 | w
i-1, wi, wi + 1). In addition, when the locus of the parameter is obtained in the section including the leading part of the immediately following segment, the section becomes the section 2,
The probability of appearance of the trajectory is P (Ai, Bi + 1 | wi-1, Wi, wi + 1) or the probability P (Ai, Bi + 1 | wi-1, normalized by the appearance probability of the subsequent segment. wi, wi + 1) / P (Bi + 1 | w
i-1, wi, wi + 1). When a parameter trajectory is obtained in a section including the last part of the immediately preceding segment and the first part of the immediately following segment, the trajectory becomes Section 3, and the appearance probability of the trajectory is represented by P (Bi-1, Ai, Bi). +1 | wi-1, wi, w
i + 1).

【0009】このコンテキスト依存の音響セグメントモ
デルとしては、認識を行うセグメントのラベル情報と、
その直前又は直後のセグメントのラベル情報のみを考慮
してもよい。図2はこの実施例において使用する音響セ
グメントモデルの作成のブロック図である。入力された
学習音声データは、特徴抽出部12でケプストラム等の
特徴パラメータに変換され、軌跡計算部13で上記軌跡
の推定区間に応じて、各パラメータの軌跡を推定する。
これらの軌跡の集合と入力学習音声データのラベルデー
タ(発声内容を記述したもの)を用いてモデル作成部1
4で音響セグメントモデルを作成し、メモリ15に蓄積
する。
The context-dependent acoustic segment model includes label information of a segment to be recognized,
Only the label information of the segment immediately before or immediately after may be considered. FIG. 2 is a block diagram for creating an acoustic segment model used in this embodiment. The input learning speech data is converted into feature parameters such as cepstrum by the feature extraction unit 12, and the trajectory of each parameter is estimated by the trajectory calculation unit 13 according to the estimated section of the trajectory.
A model creation unit 1 uses a set of these trajectories and label data of input learning speech data (in which utterance contents are described).
In step 4, an acoustic segment model is created and stored in the memory 15.

【0010】図3はこの実施例の音声認識システムのブ
ロック図である。入力端子21より入力された音声は、
特徴抽出部22で、ケプストラム等の特徴パラメータに
変換され、上記軌跡の推定区間に応じて、軌跡計算部2
3で各パラメータの軌跡を推定する。メモリ24から、
この推定区間の対応する音響セグメントモデルを用い
て、単語辞書25と文法記述26を用いて生成した認識
候補の確からしさ(尤度)を求め、最も確からしさの高
い認識候補を認識結果として出力する。
FIG. 3 is a block diagram of the speech recognition system of this embodiment. The voice input from the input terminal 21 is
The feature extraction unit 22 converts the parameters into feature parameters such as cepstrum and the like.
In step 3, the trajectory of each parameter is estimated. From memory 24,
The likelihood of the recognition candidate generated using the word dictionary 25 and the grammar description 26 is obtained using the acoustic segment model corresponding to this estimated section, and the recognition candidate with the highest likelihood is output as a recognition result. .

【0011】以上、説明したように、この発明によれば
前後のセグメントとの関連を考慮した音響セグメントモ
デルを作成し、それを用いて認識する方法を提供するこ
とができる。
As described above, according to the present invention, it is possible to provide a method of creating an acoustic segment model in consideration of the relationship with the preceding and following segments and using the model to recognize the segment.

【0012】[0012]

【発明の効果】以上、詳述したように、この発明によれ
ば、音響セグメントの軌跡を基に音声を認識する技術に
おいて、前後のセグメントの音響的特徴の関連性を考慮
してモデル化することにより、それを用いた音声認識に
おいて、従来のHMMに代表される音響モデルより、よ
り優れた認識性能を提供できるという利点がある。
As described above in detail, according to the present invention, in a technique for recognizing speech based on the trajectory of an acoustic segment, modeling is performed in consideration of the relevance of the acoustic features of the preceding and following segments. As a result, there is an advantage that it is possible to provide more excellent recognition performance in speech recognition using the same than an acoustic model represented by a conventional HMM.

【0013】以下に実施例を述べる。学習用に15人の
男性と、15人の女性とを用い、試験用に5人の男性
と、5人の女性を用いた。音声の25ミリ秒の窓に対
し、13メルオープドケプストラム係数のベクトルを1
0ミリ秒ごとに計算した。ある実験では、この静的係数
に、いわゆるデルタ及び加速係数を加算して使用した。
発声者の変化を強調するため、単語をパラメータ化した
後、平均ベクトルを決定し、各フレームごとのパラメー
タベクトルから平均ベクトルを差し引いた。この実験で
は全てのモデルは、コンテキスト依存(三音素)であ
り、各モデルは3混合であり、HMMモデルは3状態を
もち、セグメントモデルはHMMモデル及びセグメント
モデルのパラメータの数は同一である、HMMは固有の
エキスポネンシャル間隔モデルを用い、セグメントモデ
ルはガラシアン間隔モデルを用いた。セグメントモデル
は直前のセグメントの末尾の30ミリ秒だけを考慮し
た。この値は、全遷移領域を含むように選定したが、離
れた音響データの使用を避けた。音素モデルのHMMを
使用した場合の誤り率は静的パラメータでは15.47
%、静的+△+△△パラメータでは13.57%、とな
り、ポリノミナルセグメントモデルを用いた場合の誤り
率はそれぞれ11.53%、10.18%となり、この
発明のモデルを用いた場合はそれぞれ10.05%、
9.31%となった。セグメントモデルの使用によれ
ば、HMMモデルの使用よりも誤り率が25%よりな
り、この発明によれば誤り率が更に9〜13%よくな
り、この発明が優れていることが理解される。
An embodiment will be described below. Fifteen men and fifteen women were used for learning, and five men and five women were used for testing. For a 25 ms window of audio, the vector of 13 mel-opened cepstrum coefficients is 1
Calculated every 0 ms. In some experiments, so-called delta and acceleration factors were added to this static factor.
After parameterizing the words to emphasize speaker changes, the average vector was determined and the average vector was subtracted from the parameter vector for each frame. In this experiment, all models are context-dependent (triphones), each model is three-mixed, the HMM model has three states, and the segment model has the same number of parameters for the HMM model and the segment model. The HMM used a unique exponential interval model, and the segment model used a Galassian interval model. The segment model considered only the last 30 ms of the previous segment. This value was chosen to include the entire transition region, but avoided using distant acoustic data. The error rate when the phoneme model HMM is used is 15.47 for static parameters.
% And the static + で は + 静 的 parameter are 13.57%, and the error rates when the polynomial segment model is used are 11.53% and 10.18%, respectively, and when the model of the present invention is used. Are 10.05%,
It was 9.31%. It can be understood that the use of the segment model has an error rate of 25% or more than the use of the HMM model, and that the present invention further improves the error rate by 9 to 13%.

【図面の簡単な説明】[Brief description of the drawings]

【図1】この発明に用いる音響モデルにおいて特徴パラ
メータの軌跡を求める範囲を示す図。
FIG. 1 is a diagram showing a range in which a locus of a characteristic parameter is obtained in an acoustic model used in the present invention.

【図2】この発明に用いる音響モデルの生成過程を示す
ブロック図。
FIG. 2 is a block diagram showing a process of generating an acoustic model used in the present invention.

【図3】この発明に係る一実施形態である音声認識装置
の機能構成を示すブロック図。
FIG. 3 is a block diagram showing a functional configuration of a speech recognition device according to an embodiment of the present invention.

Claims (4)

【特許請求の範囲】[Claims] 【請求項1】 入力された音声信号を音声音響特徴パラ
メータに分析し、このパラメータの軌跡と、音素、副単
語もしくは単語を単位とするセグメント毎にそのモデル
と比較して認識を行う音声認識装置において、 当該セグメントの直前のセグメントの末尾を含めた第1
の区間、当該セグメントの直後のセグメントの先頭を含
めた第2の区間、あるいは当該セグメントの直前のセグ
メントの末尾及び直後のセグメントの先頭を含めた第3
の区間の少くとも1つの区間について各セグメントのラ
ベルごとに特徴パラメータの軌跡を表わすセグメントモ
デルを記憶するメモリと、 入力音声信号の音声音響パラメータを算出する手段と、 上記算出した音声音響パラメータの、上記メモリ内のセ
グメントモデルと対応した上記区間ごとの軌跡を計算す
る手段と、 その計算された軌跡の、上記メモリ内の各セグメントモ
デルに対する尤度を求める手段と、 上記尤度を用いて認識候補を求める手段とを具備するこ
と特徴とする音声認識装置。
1. A speech recognition apparatus for analyzing an inputted speech signal into speech acoustic feature parameters, and comparing the locus of the parameters with a model for each segment in units of phonemes, sub-words or words to perform recognition. In the first, including the end of the segment immediately before the segment
, A second section including the beginning of the segment immediately after the segment, or a third section including the end of the segment immediately before the segment and the beginning of the segment immediately after the segment.
A memory for storing a segment model representing the trajectory of the characteristic parameter for each segment label for at least one of the sections, a means for calculating the sound and audio parameters of the input sound signal, Means for calculating a trajectory for each section corresponding to the segment model in the memory; means for obtaining the likelihood of the calculated trajectory for each segment model in the memory; and recognition candidates using the likelihood A voice recognition device.
【請求項2】 請求項1記載の音声認識装置において、 上記メモリに記憶された上記各セグメントモデルは、そ
の各セグメントモデルのラベルとその直前のセグメント
のラベル及び直後のセグメントラベルの少くとも一方を
も考慮したモデルであり、 上記尤度の計算を行う手段において、該当セグメントの
直前、直後のセグメントのラベル情報の少くとも一方を
も考慮して行う手段であることを特徴とする音声認識装
置。
2. The speech recognition apparatus according to claim 1, wherein each of the segment models stored in the memory includes at least one of a label of each segment model, a label of a segment immediately before the segment model, and a label of a segment immediately after the segment model. A speech recognition apparatus characterized in that in the means for calculating the likelihood, at least one of the label information of the segments immediately before and after the corresponding segment is also considered.
【請求項3】 入力音声信号の音声音響特徴パラメータ
を分析し、そのパラメータの軌跡に基づいて音素、副単
語、もしくは単語を単位とするセグメント毎に音声認識
をする音声認識方法において、 学習音声から、当該セグメントの直前のセグメントの末
尾を含む第1の区間、当該セグメントの直後のセグメン
トの先頭を含む第2の区間、当該セグメントの直前のセ
グメントの末尾及び直後のセグメントの先頭を含む第3
の区間の少くとも1つの区間について各セグメントのラ
ベルごとに特徴パラメータの軌跡を表わすセグメントモ
デルを予め作っておき、これをメモに記憶しておき、 音声認識時には、入力音声信号の音声音響パラメータを
算出し、 その算出された音声音響パラメータの、上記メモリ内の
セグメントモデルと対応した上記区間ごとの軌跡を計算
し、 その計算された軌跡の上記メモリ内の各セグメントモデ
ルに対する尤度を求め、 その尤度を用いて音声認識を行うことを特徴とする音声
認識方法。
3. A speech recognition method for analyzing speech acoustic feature parameters of an input speech signal and performing speech recognition for each segment based on phonemes, sub-words, or words based on the locus of the parameters. A first section including the end of the segment immediately preceding the segment, a second section including the beginning of the segment immediately following the segment, a third section including the end of the segment immediately preceding the segment and the beginning of the segment immediately following the segment.
A segment model representing the trajectory of the characteristic parameter is created in advance for at least one section of each section for each label of each segment, and this is stored in a memo. At the time of speech recognition, the speech acoustic parameters of the input speech signal are Calculating a trajectory of each of the sections corresponding to the segment model in the memory of the calculated voice acoustic parameter, obtaining a likelihood of the calculated trajectory for each segment model in the memory, A speech recognition method characterized by performing speech recognition using likelihood.
【請求項4】 請求項3記載の音声認識方法において、 上記セグメントモデルを、そのモデルのラベルとその直
前のセグメントのラベル及び直後のセグメントラベルの
少くとも一方を考慮したモデルを作成し、 上記尤度の計算において、該当セグメントの直前、直後
のセグメントのラベル情報を上記モデルに応じて考慮し
て尤度計算を行うことを特徴とする音声認識方法。
4. The speech recognition method according to claim 3, wherein the segment model is created by considering at least one of a label of the model, a label of a segment immediately before the model, and a segment label immediately after the model. A speech recognition method characterized in that in calculating a degree, likelihood calculation is performed in consideration of label information of segments immediately before and after a corresponding segment according to the model.
JP26416298A 1998-09-18 1998-09-18 Speech recognition apparatus and method Expired - Fee Related JP3583930B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP26416298A JP3583930B2 (en) 1998-09-18 1998-09-18 Speech recognition apparatus and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP26416298A JP3583930B2 (en) 1998-09-18 1998-09-18 Speech recognition apparatus and method

Publications (2)

Publication Number Publication Date
JP2000099071A true JP2000099071A (en) 2000-04-07
JP3583930B2 JP3583930B2 (en) 2004-11-04

Family

ID=17399328

Family Applications (1)

Application Number Title Priority Date Filing Date
JP26416298A Expired - Fee Related JP3583930B2 (en) 1998-09-18 1998-09-18 Speech recognition apparatus and method

Country Status (1)

Country Link
JP (1) JP3583930B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004090867A1 (en) * 2003-04-09 2004-10-21 Toyota Jidosha Kabushiki Kaisha Change information recognition device and change information recognition method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004090867A1 (en) * 2003-04-09 2004-10-21 Toyota Jidosha Kabushiki Kaisha Change information recognition device and change information recognition method
US7302086B2 (en) 2003-04-09 2007-11-27 Toyota Jidosha Kabushiki Kaisha Change information recognition apparatus and change information recognition method
US7508959B2 (en) 2003-04-09 2009-03-24 Toyota Jidosha Kabushiki Kaisha Change information recognition apparatus and change information recognition method

Also Published As

Publication number Publication date
JP3583930B2 (en) 2004-11-04

Similar Documents

Publication Publication Date Title
JP3049259B2 (en) Voice recognition method
US5787396A (en) Speech recognition method
US5865626A (en) Multi-dialect speech recognition method and apparatus
US6317711B1 (en) Speech segment detection and word recognition
JP2006038895A (en) Device and method for speech processing, program, and recording medium
US7653541B2 (en) Speech processing device and method, and program for recognition of out-of-vocabulary words in continuous speech
JP2002215187A (en) Speech recognition method and device for the same
JPH10254475A (en) Speech recognition method
JP2002358097A (en) Voice recognition device
JP3583930B2 (en) Speech recognition apparatus and method
JP2003044078A (en) Voice recognizing device using uttering speed normalization analysis
JP2003271185A (en) Device and method for preparing information for voice recognition, device and method for recognizing voice, information preparation program for voice recognition, recording medium recorded with the program, voice recognition program and recording medium recorded with the program
JP3532248B2 (en) Speech recognition device using learning speech pattern model
JP3368989B2 (en) Voice recognition method
JPH10143190A (en) Speech recognition device
JP3316352B2 (en) Voice recognition method
JPH08314490A (en) Word spotting type method and device for recognizing voice
JPH09160586A (en) Learning method for hidden markov model
JP2875179B2 (en) Speaker adaptation device and speech recognition device
JP2975540B2 (en) Free speech recognition device
JP2986703B2 (en) Voice recognition device
JPH07230295A (en) Speaker adaptive system
JP3105708B2 (en) Voice recognition device
JPH096387A (en) Voice recognition device
JPH06266389A (en) Phoneme labeling device

Legal Events

Date Code Title Description
A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20031224

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20040219

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Effective date: 20040706

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Effective date: 20040730

Free format text: JAPANESE INTERMEDIATE CODE: A61

FPAY Renewal fee payment (prs date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080806

Year of fee payment: 4

FPAY Renewal fee payment (prs date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080806

Year of fee payment: 4

FPAY Renewal fee payment (prs date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090806

Year of fee payment: 5

FPAY Renewal fee payment (prs date is renewal date of database)

Year of fee payment: 5

Free format text: PAYMENT UNTIL: 20090806

FPAY Renewal fee payment (prs date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100806

Year of fee payment: 6

FPAY Renewal fee payment (prs date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100806

Year of fee payment: 6

FPAY Renewal fee payment (prs date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110806

Year of fee payment: 7

FPAY Renewal fee payment (prs date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120806

Year of fee payment: 8

FPAY Renewal fee payment (prs date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130806

Year of fee payment: 9

LAPS Cancellation because of no payment of annual fees