JP3457578B2

JP3457578B2 - Speech recognition apparatus and method using speech synthesis

Info

Publication number: JP3457578B2
Application number: JP18030899A
Authority: JP
Inventors: 靖子加藤
Original assignee: NEC Electronics Corp
Current assignee: NEC Electronics Corp
Priority date: 1999-06-25
Filing date: 1999-06-25
Publication date: 2003-10-20
Anticipated expiration: 2019-06-25
Also published as: JP2001013983A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声合成を用いた
音声認識装置および音声認識方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device and a voice recognition method using voice synthesis.

【０００２】[0002]

【従来の技術】従来の音素単位のマッチングによる音声
認識装置を図６を参照して説明する。2. Description of the Related Art A conventional speech recognition apparatus based on phoneme-based matching will be described with reference to FIG.

【０００３】図６において、「認識辞書作成部」は認識
対象となる「登録単語文字列」から認識処理時に必要な
情報を取り出して「認識単語辞書」を作成する。In FIG. 6, a "recognition dictionary creating section" creates a "recognition word dictionary" by extracting information necessary for the recognition process from the "registered word character string" to be recognized.

【０００４】「入力音声分析部」は話者がマイクから入
力した音声の「特徴パターン」を抽出する。「認識マッ
チング部」は上記作成された「認識単語辞書」および
「標準パターン」を用いて音素単位のマッチングによる
認識処理を行い、「認識結果」である認識結果候補を出
力する。The "input voice analysis section" extracts the "feature pattern" of the voice input by the speaker from the microphone. The "recognition matching unit" performs recognition processing by phoneme-based matching using the "recognition word dictionary" and "standard pattern" created above, and outputs recognition result candidates that are "recognition results".

【０００５】しかしながら、従来の音素単位のマッチン
グによる音声認識装置においては、次のような課題があ
る。すなわち、単語を登録する際に音響的によく似た単
語が登録されている場合、それらを構成する音素に共通
するものが多くなってしまう。そのため、マッチング
処理で用いるマルコフモデルパターンが類似したものに
なり、認識時に両単語の識別が困難となる。However, the conventional speech recognition device based on phoneme-based matching has the following problems. That is, when words that are acoustically similar to each other are registered when the words are registered, there are many common phonemes that compose them. Therefore, the Markov model patterns used in the matching process become similar to each other, and it becomes difficult to identify both words at the time of recognition.

【０００６】よって誤認識する割合が増加したり、認識
結果の確度を高めるために、発話者に再度発声を求める
必要が生じたりする。Therefore, the rate of erroneous recognition increases, and it becomes necessary to ask the speaker to utter again in order to increase the accuracy of the recognition result.

【０００７】この従来例を改良した技術の一例として特
開平9-6387号公報がある。Japanese Patent Laid-Open No. 9-6387 discloses an example of a technique that improves this conventional example.

【０００８】この従来例は、類似単語同士の識別能力が
優れた音声認識装置を提供することを目的とした装置で
ある。図７に示すように、音声入力手段であるマイクに
より入力された音声から「単語音声切り出し部」が単語
音声を切り出し、「特徴抽出部」において特徴データを
抽出する。This conventional example is an apparatus for the purpose of providing a speech recognition apparatus having an excellent ability to discriminate between similar words. As shown in FIG. 7, the “word voice cutout unit” cuts out a word voice from the voice input by the microphone that is the voice input unit, and the “feature extraction unit” extracts the feature data.

【０００９】「状態数推定部」は、特徴データからマル
コフモデルによりモデル化する際の単語音声に対する状
態数を推定する。「類似単語判定部」は、新たに登録し
ようとする単語音声と類似した単語が既に登録されてい
ないか判定する。The "state number estimating section" estimates the number of states for word speech when modeling from feature data by a Markov model. The “similar word determination unit” determines whether or not a word similar to the word voice to be newly registered has already been registered.

【００１０】「状態数加算部」は、推定した状態数を増
やし、「学習部」では、特徴データを単語モデルに当て
はめてマルコフモデルパラメータを求める。The "state number adding unit" increases the estimated number of states, and the "learning unit" applies the feature data to the word model to obtain Markov model parameters.

【００１１】「照合判定部」は、各単語モデルに対して
尤度計算を行い、認識候補を判定し、「判定結果出力
部」から認識結果を出力する。「照合判定部」では、マ
ッチング処理の際、学習したマルコフモデルパラメータ
からなる「音声辞書ファイル」を用いる。The "matching determination unit" performs likelihood calculation for each word model, determines a recognition candidate, and outputs a recognition result from the "determination result output unit". The "matching determination unit" uses the "voice dictionary file" composed of the learned Markov model parameters in the matching process.

【００１２】[0012]

【発明が解決しようとする課題】しかしながら、前述の
改良を行った音声認識装置においては、次のような課題
がある。However, the voice recognition device improved as described above has the following problems.

【００１３】類似単語の登録時には発声音声を元にマル
コフモデルパラメータを学習する必要があるため、認識
対象単語の登録処理が容易でない。Since it is necessary to learn the Markov model parameters based on the uttered voice when registering the similar words, it is not easy to register the recognition target words.

【００１４】また、発声による登録が必要となるため、
不特定話者の認識率を向上させるためには多数の話者に
よる発声音声を収集する必要が生じてしまう。Since it is necessary to register by utterance,
In order to improve the recognition rate of the unspecified speaker, it becomes necessary to collect voicing voices from many speakers.

【００１５】本発明の目的は、類似単語間の差を拡大
し、類似した単語が認識対象として登録された場合に、
認識誤りを減少できる音声認識装置及び方法を提供する
ことにある。An object of the present invention is to expand the difference between similar words, and when similar words are registered as recognition targets,
An object of the present invention is to provide a voice recognition device and method capable of reducing recognition error.

【００１６】[0016]

【課題を解決するための手段】本発明の音声認識装置
は、構成音素が類似した単語が格納される類似単語辞書
部と、通常の認識処理に使用する認識単語が格納される
認識単語辞書部と、前記類似単語辞書部のデータから形
成される合成音声のそれぞれの特徴パターンを格納する
合成音声特徴パターン部と、音声入力手段と、前記音声
入力手段により入力された入力音声の特徴パターンを形
成する入力音声分析部と、前記入力音声の特徴パターン
と前記認識単語辞書のデータから認識結果候補を出力す
る認識マッチング部と、前記認識結果候補に類似単語が
存在するかどうかを判定する判定手段と、前記判定結果
が類似単語が存在すると判定したとき前記入力音声の特
徴パターンと前記類似単語辞書部に格納されている類似
単語毎のマッチング処理を行い第１の類似単語認識結果
を出力する第１の類似単語マッチング部と、前記判定結
果が類似単語が存在すると判定したとき前記入力音声の
特徴パターンと前記合成音声特徴パターンのデータとの
マッチング処理を行い前記類似単語毎に第２の類似単語
認識結果を出力する第２の類似単語マッチング部と、前
記第１の類似単語認識結果及び第２の類似単語認識結果
を比較して確度の高い単語を認識結果として出力する類
似単語認識結果比較部とを有することを特徴とする。A speech recognition apparatus according to the present invention comprises a similar word dictionary section in which words having similar phonemes are stored, and a recognition word dictionary section in which recognition words used in normal recognition processing are stored. A synthesized voice feature pattern portion for storing respective feature patterns of synthesized voice formed from data of the similar word dictionary portion, voice input means, and the voice
An input speech analysis section for forming a feature pattern of the input input speech by the input means, the recognition matching unit for outputting a recognition result candidates from the data of the recognition word dictionary, wherein the pattern of the input speech, the recognition result candidate Determination means for determining whether or not a similar word exists, and a feature pattern of the input voice when the determination result determines that a similar word exists and a similar word stored in the similar word dictionary unit A first similar word matching unit that performs a matching process for each and outputs a first similar word recognition result; and, when the determination result determines that there is a similar word ,
Of the characteristic pattern and the data of the synthetic speech characteristic pattern
A second similar word matching unit that performs a matching process and outputs a second similar word recognition result for each of the similar words is compared with the first similar word recognition result and the second similar word recognition result to determine the accuracy. It is characterized by having a similar word recognition result comparison unit that outputs a high word as a recognition result.

【００１７】本発明の音声認識方法は、構成音素が類似
した単語を類似単語辞書部に格納するステップと、通常
の認識処理に使用する認識単語を認識単語辞書部に格納
するステップと、前記類似単語辞書部のデータから形成
される合成音声のそれぞれの特徴パターンを合成音声特
徴パターン部に格納するステップと、音声入力ステップ
と、前記音声入力ステップにより入力された入力音声の
特徴パターンを形成する入力音声分析ステップと、前記
入力音声の特徴パターンと前記認識単語辞書のデータか
ら認識結果候補を出力する認識マッチングステップと、
前記認識結果候補に類似単語が存在するかどうかを判定
する判定ステップと、前記判定結果が類似単語が存在す
ると判定したとき前記入力音声特徴パターンと前記類似
単語辞書部に格納されている類似単語毎のマッチング処
理を行い第１の類似単語認識結果を出力する第１の類似
単語マッチングステップと、前記判定結果が類似単語が
存在すると判定したとき前記入力音声の特徴パターンと
前記合成音声特徴パターンのデータとのマッチング処理
を行い類似単語毎に第２の類似単語認識結果を出力する
第２の類似単語マッチングステップと、前記第１の類似
単語認識結果及び第２の類似単語認識結果を比較して確
度の高い単語を認識結果として出力する類似単語認識結
果比較ステップとを有することを特徴とする。The speech recognition method of the present invention comprises the steps of storing words having similar constituent phonemes in a similar word dictionary section, storing a recognized word used in normal recognition processing in the recognized word dictionary section, The step of storing the respective characteristic patterns of the synthesized speech formed from the data of the word dictionary section in the synthesized speech characteristic pattern section, the speech input step, and the input forming the characteristic pattern of the input speech input by the speech input step. A voice analysis step, and
A recognition matching step of outputting a recognition result candidate from the feature pattern of the input voice and the data of the recognition word dictionary;
A determination step of determining whether or not a similar word exists in the recognition result candidate; and the input voice feature pattern and the similarity when the determination result determines that a similar word exists.
Matching process for each similar word stored in the word dictionary
A first similar word matching step of performing a processing to output a first similar word recognition result; and a characteristic pattern of the input voice when the determination result determines that a similar word exists.
Matching processing with the data of the synthetic voice feature pattern
And a second similar word matching step for outputting a second similar word recognition result for each similar word, and the first similar word recognition result and the second similar word recognition result are compared, and a word with high accuracy is determined. And a similar word recognition result comparing step of outputting as a recognition result.

【００１８】[0018]

【発明の実施の形態】次に、本発明の第１の実施例を図
面を参照して説明する。BEST MODE FOR CARRYING OUT THE INVENTION Next, a first embodiment of the present invention will be described with reference to the drawings.

【００１９】図１において、「認識辞書作成部」は、認
識対象として入力される登録単語文字列から、認識処理
時に必要となる情報を抽出して認識単語辞書２を作成
し、同時に構成音素が類似した単語を抽出して類似単語
辞書１を作成する。In FIG. 1, a "recognition dictionary creating section" creates a recognition word dictionary 2 by extracting information necessary for a recognition process from a registered word character string input as a recognition target, and at the same time, a constituent phoneme is generated. Similar words are extracted to create a similar word dictionary 1.

【００２０】「警告出力部」は、抽出された類似単語を
発話者に提示し、発声時の注意を促す。The "warning output section" presents the extracted similar word to the speaker to call attention to the speaker.

【００２１】「入力音声分析部」は、マイクから入力さ
れた音声からその特徴パターンを抽出する。The "input voice analysis section" extracts the characteristic pattern from the voice input from the microphone.

【００２２】「認識マッチング部３」は、予め音素毎に
学習された標準パターンと認識単語辞書と前述の特徴パ
ターンを元にマッチング処理を行い、入力された音声に
対して最も類似度の高い単語を認識結果候補として出力
する。The "recognition matching unit 3" performs matching processing based on the standard pattern previously learned for each phoneme, the recognition word dictionary, and the above-mentioned characteristic pattern, and the word having the highest similarity to the input speech. Is output as a recognition result candidate.

【００２３】「認識結果候補判定部」では、出力された
認識結果候補に類似単語が含まれ、かつ、それら候補間
の認識尤度の差が小さいものについて行われる、より詳
細な情報を得るための認識処理を行うかどうかを判定す
る。認識結果候補に類似単語が含まれない、または、候
補間の認識尤度が大きい、つまり、第一位候補が発声し
た単語である確率が高い場合は、次の認識処理を行わ
ず、それを認識結果として出力する。In order to obtain more detailed information, the "recognition result candidate determination unit" is performed for the output recognition result candidates including similar words and having a small difference in recognition likelihood between the candidates. It is determined whether or not to perform recognition processing. If the recognition result candidates do not include similar words, or the recognition likelihood between candidates is high, that is, if the probability that the first candidate is the word uttered is high, the next recognition process is not performed and Output as a recognition result.

【００２４】「合成音声出力部」は、入力される類似単
語のテキストから合成音声波形を出力する。The "synthesized speech output unit" outputs a synthesized speech waveform from the input text of similar words.

【００２５】「合成音声分析部」は、合成音声波形から
その特徴パターンを抽出する。The "synthesized speech analysis section" extracts the characteristic pattern from the synthesized speech waveform.

【００２６】「類似単語認識マッチング部１」は、入力
された音声の特徴パターンを元に類似単語を対象として
マッチング処理を行い、類似単語認識結果１を出力す
る。The "similar word recognition matching section 1" performs matching processing for similar words based on the input feature pattern of the voice, and outputs a similar word recognition result 1.

【００２７】「類似単語認識マッチング部２」は、合成
音声から抽出された特徴パターンを元に類似単語を対象
としてマッチング処理を行い、類似単語認識結果２を出
力する。The "similar word recognition matching unit 2" performs a matching process for similar words based on the characteristic pattern extracted from the synthetic speech, and outputs a similar word recognition result 2.

【００２８】「類似単語認識結果比較部」は、類似単語
認識結果１と類似単語認識結果２を元に最終的な認識結
果を判定して出力する。The "similar word recognition result comparing section" determines and outputs a final recognition result based on the similar word recognition result 1 and the similar word recognition result 2.

【００２９】このようにして、本願発明では、合成音声
波形を入力とする音声認識処理より出力される認識結果
と、認識対象単語より抽出した類似単語を対象とした音
声認識処理より出力される認識結果とをあわせて認識結
果判定を行うので、構成音素が類似した認識対象単語に
おける認識性能を向上することができる。As described above, according to the present invention, the recognition result output by the voice recognition process using the synthetic voice waveform as an input and the recognition result output by the voice recognition process for the similar words extracted from the recognition target words. Since the recognition result determination is performed together with the result, it is possible to improve the recognition performance in the recognition target words having similar constituent phonemes.

【００３０】図１を参照すると、本発明の一実施例とし
ての音声認識装置が示されている。図において、本実施
例は、認識辞書作成部と警告出力部と入力音声分析部と
認識マッチング部と認識結果候補判定部と合成音声出力
部と合成音声分析部と類似単語認識マッチング部１と類
似単語認識マッチング部２と類似単語認識結果比較部と
を含む。Referring to FIG. 1, there is shown a voice recognition apparatus as an embodiment of the present invention. In the figure, this embodiment is similar to the recognition dictionary creation unit, the warning output unit, the input voice analysis unit, the recognition matching unit, the recognition result candidate determination unit, the synthetic voice output unit, the synthetic voice analysis unit, and the similar word recognition matching unit 1. The word recognition matching unit 2 and the similar word recognition result comparison unit are included.

【００３１】図１を参照して構成について詳細を説明す
る。The configuration will be described in detail with reference to FIG.

【００３２】図１の認識辞書作成部における動作を図２
に示すフローチャートを使用して説明する。The operation of the recognition dictionary creating section of FIG. 1 is shown in FIG.
This will be described using the flowchart shown in.

【００３３】まず、入力された登録単語文字列をそれぞ
れ単語を構成する音素列に変換する(STEP10)。First, the input registered word character string is converted into a phoneme string forming each word (STEP 10).

【００３４】次に、変換された音素列を元に、登録され
た単語群の中から構成音素が類似した単語を検出する(S
TEP11)。このとき、例えば単語間で一致する構成音素列
の単語全体に対する割合を示す値に閾値を設けることな
どにより、類似であるか否かの判定を行うことができ
る。Next, based on the converted phoneme sequence, words having similar constituent phonemes are detected from the registered word group (S
TEP11). At this time, for example, by setting a threshold value for the value indicating the ratio of the constituent phoneme strings that match between words to the entire word, it is possible to determine whether or not they are similar.

【００３５】次に、STEP11で類似単語が検出されたかど
うかを判定し、検出されていればSTEP13へ、検出されて
いなければSTEP14へ進む(STEP12)。Next, in STEP 11, it is judged whether or not a similar word is detected. If it is detected, the process proceeds to STEP 13, and if it is not detected, the process proceeds to STEP 14 (STEP 12).

【００３６】STEP13では、検出された類似単語から類似
単語辞書を作成する。類似単語辞書に格納される情報と
しては、単語を構成する音素列の他に例えば類似単語を
構成する各音素に対するガウス分布を示す混合数を通常
の認識処理における値より拡張したものや類似単語の表
記（発声時のアクセント情報がわかるもの）などがあ
る。In STEP 13, a similar word dictionary is created from the detected similar words. As the information stored in the similar word dictionary, in addition to the phoneme sequence that forms a word, for example, a mixture number that indicates a Gaussian distribution for each phoneme that forms a similar word is expanded from a value in a normal recognition process or a similar word There are notations (things that show the accent information when speaking).

【００３７】STEP14では、通常の認識処理時に使用する
認識単語辞書を作成する。例えば単語を構成する音素列
情報を認識単語辞書として格納する。At STEP 14, a recognition word dictionary used in normal recognition processing is created. For example, the phoneme string information that constitutes a word is stored as a recognized word dictionary.

【００３８】次に、図１のマイクから音声を入力し、認
識結果候補を判定するまでの処理における動作を図３に
示すフローチャートを使用して説明する。Next, the operation in the process of inputting voice from the microphone of FIG. 1 and determining the recognition result candidate will be described with reference to the flowchart shown in FIG.

【００３９】まず、認識対象となる単語から類似単語が
検出されたかどうか判定し(STEP20)、検出されていれば
STEP21へ、検出されていなければSTEP22へ進む。First, it is determined whether a similar word is detected from the words to be recognized (STEP 20).
Proceed to STEP21, and if not detected, proceed to STEP22.

【００４０】STEP21では、警告出力部より類似単語一覧
を発話者に提示し発声時の注意を促す。At STEP 21, the warning output unit presents a list of similar words to the speaker to call attention to the utterance.

【００４１】STEP22では、マイクから入力された音声の
分析処理を行い、特徴パターンを出力する。In STEP 22, the voice input from the microphone is analyzed and the characteristic pattern is output.

【００４２】このとき、共立出版株式会社、今井聖著
「音声認識」（以下文献１とする）に記載されているメ
ルケプストラム分析を行うことで特徴パターンを得るこ
とができる。At this time, the characteristic pattern can be obtained by performing the mel cepstrum analysis described in “Voice Recognition” by Sei Imai, Kyoritsu Shuppan Co., Ltd. (hereinafter referred to as Reference 1).

【００４３】次に、得られた特徴パターンと標準パター
ン間のマッチング処理を行う(STEP23)。例えば文献１に
記載されているＤＰマッチング法およびＨＭＭを用いた
方法により特徴パターンと標準パターン間の距離を計算
し、認識対象となる各単語の累積距離を算出することが
できる。Next, a matching process between the obtained characteristic pattern and standard pattern is performed (STEP 23). For example, the distance between the feature pattern and the standard pattern can be calculated by the method using the DP matching method and the HMM described in Document 1, and the cumulative distance of each word to be recognized can be calculated.

【００４４】認識対象単語の中からSTEP23で得られた処
理尤度の高い単語を認識結果候補として出力する(STEP2
4)。From the words to be recognized, the word with high processing likelihood obtained in STEP 23 is output as a recognition result candidate (STEP 2
Four).

【００４５】次に、出力された認識結果候補の中に類似
単語として検出されたものが含まれているか判定する(S
TEP25)。含まれていればSTEP26へ、含まれていなければ
STEP31へ進む。Next, it is determined whether the output recognition result candidates include those detected as similar words (S
TEP25). If it is included, go to STEP 26. If not, go to STEP 26.
Go to STEP31.

【００４６】STEP26では、認識結果候補中に含まれた類
似単語の尤度から認識結果候補の確度を判定する。判定
の基準には、例えば類似単語候補間の尤度差を用いるこ
とができる。また、判定時に用いる尤度差の閾値は、例
えば、システムを評価することにより予め決定しておく
ことができる。At STEP 26, the accuracy of the recognition result candidate is determined from the likelihood of the similar word included in the recognition result candidate. For example, the likelihood difference between similar word candidates can be used as the criterion for the determination. The threshold value of the likelihood difference used at the time of determination can be determined in advance by evaluating the system, for example.

【００４７】次に、認識結果候補の確度により類似単語
認識処理を行うかどうか判定する(STEP27)。行う場合は
STEP28へ、行わない場合はSTEP31へ進む。Next, it is determined whether or not the similar word recognition process is performed based on the accuracy of the recognition result candidate (STEP 27). If you do
Proceed to STEP28. If not, proceed to STEP31.

【００４８】STEP28では、類似単語認識処理１を行う。
ここでの処理の詳細については、別途図４を用いて説明
する。At STEP 28, similar word recognition processing 1 is performed.
Details of the processing here will be described separately with reference to FIG.

【００４９】STEP29では、類似単語認識処理２を行う。
ここでの処理の詳細については、別途図５を用いて説明
する。At STEP 29, similar word recognition processing 2 is performed.
Details of the processing here will be described separately with reference to FIG.

【００５０】STEP30では、STEP28およびSTEP29で得られ
た双方の類似単語認識結果を比較検討して確度が高いと
判断された単語を最終的な認識結果として出力する。At STEP 30, the similar word recognition results obtained at STEP 28 and STEP 29 are compared and examined, and the word judged to have high accuracy is output as the final recognition result.

【００５１】次に、図４を参照すると、本発明の一実施
例における類似単語認識マッチング処理１のフローチャ
ートが示されている。Next, referring to FIG. 4, there is shown a flowchart of the similar word recognition matching process 1 in one embodiment of the present invention.

【００５２】まず、図３におけるSTEP22で得られる特徴
パターンを入力とし、類似単語でのマッチング処理を行
う(STEP100)。この時、図２におけるSTEP13において追
加された、類似単語の構成音素に対するガウス分布を示
す混合数でもってマッチング処理を行うことにより、よ
り確度の高い結果を得ることができる。First, the characteristic pattern obtained in STEP 22 in FIG. 3 is input, and matching processing is performed on similar words (STEP 100). At this time, a more accurate result can be obtained by performing the matching process with the mixture number added in STEP 13 in FIG. 2 and showing the Gaussian distribution for the constituent phonemes of the similar word.

【００５３】次に、各類似単語毎の認識結果を類似単語
認識結果１として出力する(STEP101)。Next, the recognition result for each similar word is output as the similar word recognition result 1 (STEP 101).

【００５４】次に、図５を参照すると、本発明の一実施
例における類似単語認識マッチング処理２のフローチャ
ートが示されている。Next, referring to FIG. 5, there is shown a flowchart of the similar word recognition matching process 2 in the embodiment of the present invention.

【００５５】まず、類似単語文字列を音声合成処理への
入力とし、それぞれの合成音声を出力する(STEP200)。
ここで単語文字列から合成音声を出力する方法としては
例えば啓学出版、新居康彦・大崎正巳著、「音声処理と
ＤＳＰ」（以下文献２とする）に記載されている、テキ
ストを構文解析した後、得られた音素の素片編集を行う
といった方法がある。First, the similar word character string is input to the speech synthesis processing, and each synthesized speech is output (STEP 200).
Here, as a method of outputting a synthetic voice from a word character string, for example, a text is syntactically analyzed, which is described in “Voice Processing and DSP” by Keigaku Shuppan, Yasuhiko Arai and Masami Osaki (hereinafter referred to as Reference 2). After that, there is a method of editing the phoneme of the obtained phoneme.

【００５６】次に、出力された合成音声の分析処理を行
いそれぞれの特徴パターンを出力する(STEP201)。この
時の分析処理は図３におけるSTEP22と同様の処理を行え
ばよい。Next, the output synthetic speech is analyzed and each characteristic pattern is output (STEP 201). The analysis process at this time may be the same as that of STEP22 in FIG.

【００５７】次に、STEP201で得られた合成音声の特徴
パターンと図３におけるSTEP22で得られた入力音声の特
徴パターン間のマッチング処理を行い、それらの距離値
を求める(STEP202)。Next, a matching process is performed between the characteristic pattern of the synthesized voice obtained in STEP 201 and the characteristic pattern of the input voice obtained in STEP 22 in FIG. 3 to obtain the distance value between them (STEP 202).

【００５８】次に、STEP202で得られた各類似単語に対
する合成音声と発声音声間の距離値を類似単語認識結果
２としてそれぞれ出力する(STEP203)。Next, the distance value between the synthesized voice and the voiced voice for each similar word obtained in STEP 202 is output as a similar word recognition result 2 (STEP 203).

【００５９】本発明の他の実施例として、その基本的構
成は上記の通りであるが、「類似単語認識マッチング部
１」と「類似単語認識マッチング部２」をひとつにまと
め、「類似単語認識マッチング部」としてもよい。As another embodiment of the present invention, the basic structure is as described above, but the "similar word recognition matching unit 1" and the "similar word recognition matching unit 2" are combined into one, and "similar word recognition" is performed. It may be a “matching unit”.

【００６０】この場合、「入力音声分析部」では、類似
単語が検出されている時には特徴パターンの他にパラメ
ータ数を拡張した拡張特徴パターンを抽出する。パラメ
ータ数の拡張は、例えば、図３におけるSTEP22の処理で
抽出されるメルケプストラムの次元数を増加させるなど
の手法が挙げられる。また、「合成音声分析部」でも同
様に、入力される合成音声波形に対する拡張特徴パター
ンを抽出する。「類似単語認識マッチング部」では、入
力された音声の拡張特徴パターンと合成音声波形に対す
る拡張特徴パターン間のマッチング処理を行って距離値
を求め、「類似単語認識結果比較部」でそれらの類似度
によって認識結果を判定する。ここでのマッチング処理
は、通常のマッチング処理と同様に文献１に記載されて
いるＤＰマッチング法およびＨＭＭを用いる方法で行え
ばよい。In this case, the "input speech analysis section" extracts an extended characteristic pattern in which the number of parameters is extended in addition to the characteristic pattern when a similar word is detected. The number of parameters can be expanded by, for example, a method of increasing the number of dimensions of the mel cepstrum extracted in the processing of STEP22 in FIG. Similarly, the "synthetic speech analysis unit" also extracts an extended characteristic pattern for the input synthetic speech waveform. The "similar word recognition matching unit" performs a matching process between the extended feature pattern of the input voice and the extended feature pattern for the synthetic speech waveform to obtain a distance value, and the "similar word recognition result comparison unit" calculates the degree of similarity between them. The recognition result is determined by. The matching processing here may be performed by the method using the DP matching method and the HMM described in Document 1 as in the normal matching processing.

【００６１】また、その他の実施例として、その基本的
構成は前述の通りであるが、類似単語辞書として格納さ
れる情報の内、類似単語のアクセント情報を得るための
手段として、前述した実施例中で用いた単語の表記でな
く、単語のかなとアクセント情報を組み合わせたものを
用いてもよい。この場合、合成音声出力部では、表記を
解析して出力する音声波形に対応する発音情報に変換す
る処理を省くことができる。Further, as another embodiment, the basic structure thereof is as described above, but as the means for obtaining the accent information of the similar word among the information stored as the similar word dictionary, the embodiment described above is used. It is also possible to use a combination of the kana of the word and the accent information, instead of the notation of the word used in the inside. In this case, the synthesized voice output unit can omit the process of analyzing the notation and converting it into pronunciation information corresponding to the output voice waveform.

【００６２】[0062]

【発明の効果】本発明によれば、以上説明したように、
本発明においては、以下に記載するような効果を奏す
る。According to the present invention, as described above,
The present invention has the following effects.

【００６３】第１の効果は、認識対象単語から音素列の
類似した単語を抜き出し、それらについてより詳細な特
徴パターンを用いてマッチング処理を行うことにより、
類似単語間の差を拡大し、類似した単語が認識対象とし
て登録された場合に、それらの発話に対する認識誤りを
減少させることである。The first effect is to extract words having similar phoneme strings from the recognition target words and perform matching processing using more detailed feature patterns for them.
It is to increase the difference between similar words and reduce the recognition error for those utterances when similar words are registered as recognition targets.

【００６４】第２の効果は、一定のパターンで音声を出
力できる音声合成を用いることによって一意に決まる特
徴パターンを自動的に作成することができ、特に音声認
識機能と音声合成機能とを搭載したシステムにおいて
は、特徴パラメータを拡張する場合に新規学習を行う必
要がなく、また、拡張された特徴パラメータを格納する
領域を必要としないことである。The second effect is that it is possible to automatically create a characteristic pattern that is uniquely determined by using voice synthesis capable of outputting a voice in a fixed pattern. In particular, a voice recognition function and a voice synthesis function are installed. In the system, it is not necessary to perform new learning when expanding the characteristic parameter, and the area for storing the expanded characteristic parameter is not necessary.

【００６５】第３の効果は、予め抜き出した音素列の類
似した単語を話者に明示することにより、発話時に丁寧
に発話することを促すことができることである。保
持不良に関しては訂正回路で一旦訂正後、直ちに対象と
なThe third effect is that by clearly indicating to the speaker words similar to the phoneme string extracted in advance, it is possible to encourage careful utterance at the time of utterance. Immediately after the correction circuit corrects the retention failure, it becomes the target immediately.

[Brief description of drawings]

【図１】本発明の第１の実施例を示すシステムの図であ
る。FIG. 1 is a diagram of a system showing a first embodiment of the present invention.

【図２】図１に示す認識単語辞書作成のフローを示す図
である。FIG. 2 is a diagram showing a flow of creating a recognized word dictionary shown in FIG.

【図３】図１に示す認識結果出力までのフローを示す図
である。FIG. 3 is a diagram showing a flow until the recognition result output shown in FIG.

【図４】図１に示す類似単語認識処理のフローを示す図
である。FIG. 4 is a diagram showing a flow of a similar word recognition process shown in FIG. 1.

【図５】図１に示す類似単語認識処理２をフローを示す
図である。5 is a diagram showing a flow of a similar word recognition process 2 shown in FIG.

【図６】第１の従来例を示す図である。FIG. 6 is a diagram showing a first conventional example.

【図７】第２の従来例を示す図である。FIG. 7 is a diagram showing a second conventional example.

[Explanation of symbols]

１類似単語辞書２認識単語辞書３認識マッチング部４類似単語認識マッチング部１５類似単語認識マッチング部２６類似単語認識結果比較部 1 Similar word dictionary 2 recognition word dictionary 3 Recognition matching section 4 Similar word recognition matching unit 1 5 Similar word recognition matching unit 2 6 Similar word recognition result comparison unit

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/06 G10L 15/22 Front page continuation (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10L 15/06 G10L 15/22

Claims

(57) [Claims]

1. A similar word dictionary unit in which words having similar constituent phonemes are stored, a recognition word dictionary unit in which recognition words used in normal recognition processing are stored, and data of the similar word dictionary unit. and synthesized speech feature pattern unit for storing each of the feature patterns of the synthesized speech that, an audio input unit, an input speech analysis section for forming a feature pattern of the input input speech by said speech input means, said input speech < A recognition matching unit that outputs a recognition result candidate from the characteristic pattern and the data of the recognition word dictionary; a determination unit that determines whether or not a similar word exists in the recognition result candidate;
When the determination result determines that there is a similar word,
The characteristic pattern of the input voice and the similar word stored in the dictionary section.
A first similar word matching unit that performs matching processing for each similar word that is output and outputs a first similar word recognition result, and the determination result is that a similar word exists
The characteristic pattern of the input voice and the synthetic voice feature pattern
A second similar word matching unit that performs a matching process with the data of the second similar word and outputs a second similar word recognition result for each similar word; and the first similar word recognition result and the second similar word recognition result. A speech recognition apparatus using speech synthesis, comprising: a similar word recognition result comparison unit that outputs a highly accurate word as a recognition result by comparison.

2. The voice recognition device using voice synthesis according to claim 1, further comprising a warning output unit for generating a caution when the similar word is detected when the similar word is detected.

3. The similar word dictionary unit converts an input registered word character string into a phoneme string forming each word, and stores words having similar phonemes as a similar word. Recognition device using the above speech synthesis.

4. A step of storing words having similar phonemes in a similar word dictionary section, a step of storing a recognition word used for normal recognition processing in the recognition word dictionary section, and data from the similar word dictionary section. it synthesized speech to be formed
Storing the feature pattern Les synthetic speech feature pattern portion, a voice input step, the speech input step
An input speech analysis step of forming a feature pattern of the input input speech by flop, a recognition matching step of outputting the recognition result candidates from data of the feature pattern and the recognition word dictionary of the input speech, similar to the recognition result candidates whether a determination step of determining a word is present, the input when the judgment result is judged as similar word exists
Stored in the similar voice dictionary and the force voice feature pattern
A first similar word matching step in which a matching process is performed for each similar word that is present and a first similar word recognition result is output; and when the determination result determines that a similar word exists
The characteristic pattern of the input voice and the synthetic voice feature pattern
Second matching for each similar word
Second similar word matching step for outputting the similar word recognition result of the above and the similar word recognition for outputting a highly accurate word as a recognition result by comparing the first similar word recognition result and the second similar word recognition result. And a result comparing step.

5. The voice recognition method according to claim 4, further comprising a warning output step of generating a caution when the similar word is detected when the similar word is detected.

6. The similar word dictionary unit converts an input registered word character string into a phoneme string forming each word and stores words having similar phonemes as a similar word. Voice recognition method.