JPH06308994A

JPH06308994A - Japanese language voice recognizing method

Info

Publication number: JPH06308994A
Application number: JP5099694A
Authority: JP
Inventors: Tomokazu Yamada; 智一山田; Shoichi Matsunaga; 昭一松永; Kiyohiro Kano; 清宏鹿野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1993-04-26
Filing date: 1993-04-26
Publication date: 1994-11-04

Abstract

PURPOSE:To provide a Japanese language voice recognizing method by which processing time is shortened, and the conversion performance can be improved by deleting statistically a grammatically erroneous candidate, and preventing many candidates by a combination of simple characters from being generated. CONSTITUTION:As for a statistical language model, a statistical language model related to occurrence sequence information 101 of a character, and occurrence sequence information 102 of a part of speech, generated from a learning text data base 100 is used, and as for a standard pattern, a syllable standard pattern 201 and a syllable standard pattern 202 for reading of a character of a learning voice data base 200 are used.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、日本語音声認識方法に
係り、特に、音韻を認識する方法としてパタンマッチン
グに基づくものと、特徴抽出に基づくもののうち、確率
的に状態遷移を行い、その状態遷移の際のある確率で記
号を出力する隠れマルコフモデル（例えば、中川聖一
「確立モデルによる音声認識」電子情報通信学会編（１
９８８））と、統計的言語モデル（例えば、L.R. Bahl
他“A Statistical Approach toContinuous Speech Rec
ognition ”IEEE Trans. on PAMI(1983))とを用いた日
本語音声認識方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a Japanese speech recognition method, and in particular, as a method for recognizing phonemes, a method based on pattern matching and a method based on feature extraction are probabilistically performed for state transition. Hidden Markov model that outputs a symbol with a certain probability during state transition (for example, Seiichi Nakagawa "Speech Recognition by Established Model", The Institute of Electronics, Information and Communication Engineers (1
988)) and statistical language models (eg LR Bahl
Other “A Statistical Approach to Continuous Speech Rec
ognition "IEEE Trans. on PAMI (1983)" and Japanese speech recognition method.

【０００２】[0002]

【従来の技術】図４は、従来の日本語音声認識システム
を説明するための図である。2. Description of the Related Art FIG. 4 is a diagram for explaining a conventional Japanese speech recognition system.

【０００３】同図に示すシステムは、仮名・漢字生起順
序に関する統計的言語モデル情報１１を有する学習用テ
キストデータベース１０、学習用テキストデータベース
１０の統計的言語モデル情報１１を予め記憶しておく統
計的言語モデルメモリ５０、隠れマルコフモデルの音素
標準パターン情報２１を有する学習用音声データベース
２０、学習用音声データベースの音素標準パターン情報
２１を予め記憶しておく文字・読みに対応する音素系列
メモリ６０、入力音声に対して既に認識処理が済んでい
る直前の音素から音素候補を選出する音素候補選出部３
０、学習用テキストデータベース１０の標準モデルと入
力音声を照合し、学習用音声データベース２０の音素標
準パターンの情報により総合的尤度の最も高い候補の仮
名・漢字文字を認識し、出力する認識部４０より構成さ
れる。The system shown in FIG. 1 has a learning text database 10 having statistical language model information 11 regarding kana / kanji occurrence order, and statistical language model information 11 of the learning text database 10 stored in advance. A language model memory 50, a learning voice database 20 having phoneme standard pattern information 21 of a hidden Markov model, a phoneme sequence memory 60 corresponding to characters / readings in which the phoneme standard pattern information 21 of the learning voice database is stored in advance, input A phoneme candidate selection unit 3 that selects a phoneme candidate from the phonemes immediately before the recognition process for the voice has been completed.
0, a recognition unit that collates the standard model of the learning text database 10 with the input voice, and recognizes and outputs the candidate kana / kanji character having the highest overall likelihood based on the phoneme standard pattern information of the learning voice database 20. It consists of 40.

【０００４】次に、従来のシステムの動作を説明する。
従来の隠れマルコフモデル及び統計的言語モデルを用い
た音声認識で入力音声を仮名・漢字系列に変換する方法
として、学習用テキストデータベース１０より、仮名・
漢字の生起順序に関する統計的言語モデル１１を、学習
用音声データベース２０より隠れマルコフモデルの音素
標準パターン２１を、各々統計的言語モデルメモリ５
０、文字読みに対応する音素系列メモリ６０に予め作成
しておき、音声候補選出部３０で入力音声に対して、統
計的言語モデル１１と、文字の読みに対応する音素系列
の情報を用いて、既に認識した直前の複数の音素から、
次に生起する確率の高い複数の音素候補を選出し、これ
ら選出した音素候補のそれぞれについて、認識部４０で
その音素標準パターンと入力音声とを照合して、統計的
言語モデルによる生起尤度と、隠れマルコフモデルによ
る標準パターンとの類似尤度との総合的尤度の最も高い
候補の仮名・漢字文字を認識結果として出力することが
提案されている。Next, the operation of the conventional system will be described.
As a method of converting an input speech into a kana / kanji sequence by speech recognition using a conventional hidden Markov model and a statistical language model, a kana /
The statistical language model 11 regarding the occurrence order of Chinese characters, the phoneme standard pattern 21 of the hidden Markov model from the learning speech database 20, and the statistical language model memory 5 are stored.
0, created in advance in the phoneme sequence memory 60 corresponding to character reading, and using the statistical language model 11 and the phoneme sequence information corresponding to character reading for the input speech in the speech candidate selection unit 30. , From the previous phonemes already recognized,
Next, a plurality of phoneme candidates with a high probability of occurrence are selected, and for each of these selected phoneme candidates, the recognition unit 40 compares the phoneme standard pattern with the input speech to determine the occurrence likelihood based on the statistical language model. , It has been proposed to output a candidate kana / kanji character having the highest total likelihood of similarity with a standard pattern by a hidden Markov model as a recognition result.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記従
来の変換方法は、統計的言語モデルが仮名・漢字の文字
面だけを扱い、また、音素系列から仮名・漢字文字への
変換文字への変換情報として、各仮名・漢字文字の読み
に対応する音韻系列の情報が与えられているだけなの
で、単純な文字の組み合わせによる候補が多数生成され
るという問題がある。However, in the above-mentioned conventional conversion method, the statistical language model handles only the character faces of kana / kanji, and the conversion information from the phoneme sequence to the converted characters of kana / kanji characters is used. As a result, since only the information of the phoneme sequence corresponding to the reading of each kana / kanji character is given, there is a problem that many candidates are generated by a simple combination of characters.

【０００６】本発明は、上記の点に鑑みなされたもの
で、上記従来の問題点を解決し、文法的に正しい候補を
統計的に選択することで、文法的に誤った候補を統計的
に削除し、単純な文字の組み合わせによる候補が多数生
成されることを防ぐことにより処理時間を短縮し、変換
性能を向上させることができる日本語音声認識方法を提
供することを目的とする。The present invention has been made in view of the above points, and solves the above-mentioned conventional problems and statistically selects grammatically correct candidates to statistically select grammatically incorrect candidates. An object of the present invention is to provide a Japanese speech recognition method that can be deleted to prevent a large number of candidates generated by simple character combinations from being generated, thereby reducing processing time and improving conversion performance.

【０００７】[0007]

【課題を解決するための手段】図１は、本発明の原理を
説明するための図である。FIG. 1 is a diagram for explaining the principle of the present invention.

【０００８】本発明は、入力音声を特徴パラメータの時
系列とし、学習用テキストデータベース１００より作成
された生起順序に関する統計的言語モデル１０１を用い
て、入力音声の特徴パラメータ時系列について、複数の
音声認識候補を選出し、これら各音声認識候補につい
て、学習用音声データベース２００の隠れマルコフモデ
ルの標準パターンと入力音声の特徴パラメータと入力音
声の特徴パラメータ時系列とをそれぞれ照合して、生起
の尤度との総合尤度の高い候補を認識結果とする日本語
音声認識方法において、統計的言語モデルとして、学習
用テキストデータベース１００から作成された文字の生
起順序情報１０１、及び品詞の生起順序情報１０２に関
する統計的言語モデルを用いる。The present invention uses the input speech as a time series of characteristic parameters, and uses the statistical language model 101 relating to the occurrence order created from the learning text database 100, and uses a plurality of speeches for the characteristic parameter time series of the input speech. A recognition candidate is selected, and for each of these speech recognition candidates, the standard pattern of the hidden Markov model of the learning speech database 200 is compared with the characteristic parameter of the input speech and the time series of the characteristic parameter of the input speech, respectively, and the likelihood of occurrence occurs. In a Japanese speech recognition method in which a candidate having a high total likelihood of is as a recognition result, regarding the occurrence order information 101 of characters and the occurrence order information 102 of part-of-speech created from the learning text database 100 as a statistical language model. Use a statistical language model.

【０００９】また、本発明は、入力音声を特徴パラメー
タの時系列とし、学習用テキストデータベース１００よ
り作成された生起順序に関する統計的言語モデルを用い
て、入力音声の特徴パラメータ時系列について、複数の
音声認識候補を選出し、これら各音声認識候補につい
て、学習用音声データベース２００の隠れマルコフモデ
ルの標準パターンと入力音声の特徴パラメータと入力音
声の特徴パラメータ時系列とをそれぞれ照合して、生起
の尤度との総合尤度の高い候補を認識結果とする日本語
音声認識方法において、標準パターンとして、学習用音
声データベース２００の音節標準パターン２０１及び文
字の読みに対する音節標準パターン２０２を用いる。Further, according to the present invention, the input speech is set as a time series of the characteristic parameters, and the statistical language model regarding the occurrence order created from the learning text database 100 is used. A voice recognition candidate is selected, and for each of these voice recognition candidates, the standard pattern of the hidden Markov model of the learning voice database 200 is compared with the input voice feature parameter and the input voice feature parameter time series, respectively, and the likelihood of occurrence occurs. In the Japanese speech recognition method in which a candidate having a high total likelihood with respect to the degree is used as a recognition result, a syllable standard pattern 201 of the learning speech database 200 and a syllable standard pattern 202 for reading a character are used as standard patterns.

【００１０】[0010]

【作用】本発明は、学習用テキストデータベースの情報
として仮名・漢字の生起順序に加え、品詞の生起順序を
考慮し、学習用音声データベースの情報として、隠れマ
ルコフモデルの音素標準パターン情報（音節標準パター
ン）に加え、文字の読み標準パターン情報を考慮するこ
とにより、単純な文字の組み合わせにより生成される候
補に比べ、文法的に正しい候補が統計的に選択され、生
成される候補が絞り込まれるので、入力された音声から
仮名・漢字系列へ変換するのに要する処理時間を短縮
し、変換性能を向上させることができる。The present invention considers the occurrence order of part-of-speech in addition to the occurrence order of kana / kanji as the information of the learning text database, and the phoneme standard pattern information (syllabic standard) of the hidden Markov model as the information of the learning speech database. In addition to (Pattern), by considering the reading standard pattern information of characters, grammatically correct candidates are statistically selected and candidates generated are narrowed down compared to candidates generated by simple character combinations. , It is possible to shorten the processing time required to convert the input voice into a kana / kanji sequence and improve the conversion performance.

【００１１】[0011]

【実施例】以下、図面と共に本発明の実施例を説明す
る。Embodiments of the present invention will be described below with reference to the drawings.

【００１２】図１は本発明の一実施例のブロック図であ
る。FIG. 1 is a block diagram of an embodiment of the present invention.

【００１３】同図に示すシステムは音声信号を入力する
音声信号入力端子１、特徴パラメータを抽出する特徴抽
出部２、文字候補を選出し、特徴パラメータとの類似尤
度を求める認識部３、音節連鎖標準パターンを記憶する
標準パターンメモリ４、文字品詞統計的言語モデルを記
憶する文字・品詞統計的言語モデルメモリ５、及び認識
部３からの認識結果を出力する認識結果出力部６より構
成される。The system shown in FIG. 1 has a voice signal input terminal 1 for inputting a voice signal, a feature extraction unit 2 for extracting feature parameters, a recognition unit 3 for selecting character candidates and obtaining a likelihood of similarity with feature parameters, and a syllable. A standard pattern memory 4 for storing a chained standard pattern, a character / part-of-speech statistical language model memory 5 for storing a character part-of-speech statistical language model, and a recognition result output unit 6 for outputting a recognition result from the recognition unit 3. .

【００１４】まず、最初に予め、学習用テキストデータ
ベース１００の生起順序情報１０１及び品詞生起順序情
報１０２により統計的言語モデルを記憶する標準パター
ンメモリ４と、学習用音声データベース２００の隠れマ
ルコフモデルの音声標準パターン情報２０１、文字読み
標準パターン情報２０２により文字・品詞統計的言語モ
デルを読みだし、文字・品詞統計的言語モデルメモリ５
を生成する。First, a standard pattern memory 4 for storing a statistical language model based on the occurrence order information 101 and the part-of-speech occurrence order information 102 in the learning text database 100 and the hidden Markov model speech in the learning speech database 200. The character / part-of-speech statistical language model is read out using the standard pattern information 201 and the character reading standard pattern information 202, and the character / part-of-speech statistical language model memory 5 is read.
To generate.

【００１５】入力端子１から入力された音声は、特徴抽
出部２でディジタル信号に変換され、更にＬＰＣケプス
トラム分析された後、１フレーム（例えば、１０ミリ
秒）毎に特徴パラメータに変換される。この特徴パラメ
ータは、例えば、ＬＰＣケプストラム係数である。The voice input from the input terminal 1 is converted into a digital signal by the feature extraction unit 2, further LPC cepstrum analyzed, and then converted into a feature parameter every frame (for example, 10 milliseconds). This characteristic parameter is, for example, an LPC cepstrum coefficient.

【００１６】学習用音声データベース２００より、特徴
ベクトルと同一形式で、隠れマルコフモデルの音節標準
パターン及び漢字の読みに対する音節連鎖標準パターン
等の標準パターンを予め作り、標準パターンメモリ４に
記憶しておき、また、学習用テキストデータベース１０
０より文字（仮名・漢字）及び品詞の生起順序に関する
統計的言語モデルを作成し、文字・品詞統計的言語モデ
ルメモリ５に記憶してある。From the learning speech database 200, standard patterns such as syllabic standard patterns of hidden Markov models and syllable chain standard patterns for reading Chinese characters are created in advance in the same format as the feature vectors and stored in the standard pattern memory 4. , Learning text database 10
A statistical language model concerning the occurrence sequence of characters (kana / kanji) and part of speech is created from 0 and stored in the character / part of speech statistical language model memory 5.

【００１７】認識部３は、文字・品詞統計的言語モデル
メモリ５に記憶されている文字・品詞統計的言語モデル
を用いて選出した複数の文字候補について、その文字候
補の読みを表す標準パターンを標準パターンメモリ４か
ら読み出し、入力音声の特徴パラメータとの類似尤度を
それぞれ求める。The recognizing unit 3 sets a standard pattern representing the reading of the character candidates for a plurality of character candidates selected by using the character / part of speech statistical language model stored in the character / part of speech statistical language model memory 5. It is read from the standard pattern memory 4 and the likelihood of similarity with the characteristic parameter of the input voice is obtained.

【００１８】認識結果出力部６は、認識部３の認識結果
に基づいてディスプレイ等に出力する。The recognition result output unit 6 outputs the result of recognition by the recognition unit 3 to a display or the like.

【００１９】図３は本発明の一実施例の認識処理を説明
するための図であり、同図は、認識部３においてｉ番目
の認識のための候補文字を選出し、これより認識結果を
出力する場合を示す。つまり、例えば、入力音声に対す
るｉ番目の文字を認識するには、文字・品詞統計的言語
モデルから仮名・漢字と品詞の生起順序に関する条件つ
き確率を用いて、（ｉ−２）番目と（ｉ−１）番目との
各文字の認識結果と、（ｊ−１）番目と（ｊ−２）番目
との各単語の品詞の認識結果と、次に現れると仮定され
るｊ番目の単語の品詞とを基に、ｉ番目に出現すると予
測される尤度が高い文字の複数を候補文字ｋ1 〜ｋn と
して選出する。FIG. 3 is a diagram for explaining the recognition processing according to the embodiment of the present invention. In FIG. 3, the recognition unit 3 selects a candidate character for i-th recognition and the recognition result is selected from the candidate characters. Indicates the case of output. That is, for example, in order to recognize the i-th character with respect to the input speech, conditional probabilities regarding the occurrence order of kana / kanji and part-of-speech are used from the character / part-of-speech statistical language model to calculate the (i-2) th and (i The recognition result of each character of -1) th, the recognition result of the part of speech of each word of the (j-1) th and the (j-2) th, and the part of speech of the jth word assumed to appear next. Based on and, a plurality of characters with a high likelihood of being predicted to appear i-th are selected as candidate characters k1 to kn.

【００２０】ここで、ｉ番目に出現すると予測される文
字は、ｊ番目の単語の一部である。ｊ番目の単語として
は、存在する全ての品詞を仮定してもよいし、ｉ番目に
出現すると予測される文字により仮定できる品詞を限定
してもよい。ｉ−２，ｉ−１，ｉ番目に現れる文字をｃ
_i-2，ｃ_i-1，ｃ_iとし、ｊ−２，ｊ−１，ｊ番目に現
れる単語の品詞をｓ_j-2，ｓ_j-1，ｓ_jとすると、前述
の条件つき確率は、Here, the i-th predicted character is a part of the j-th word. As the j-th word, all existing part-of-speech may be assumed, or the part-of-speech that can be assumed by the character predicted to appear at the i-th may be limited. i-2, i-1, the character that appears i-th is c
_If the parts of speech of the j-2, j-1, j-th word are s _j-2 , s _j-1 , s _j , _i-2 , c _i-1 , c _i , then the above conditional probability is ,

【００２１】[0021]

【数１】となる。このデータを得るためのテキスト及び品詞列が
十分に用意できない場合は、[Equation 1] Becomes If you don't have enough text and part-of-speech sequences to get this data,

【００２２】[0022]

【数２】などを用いて同様の効果を得ることも可能である。ま
た、ｉ−２，ｉ−１，ｉ番目に現れるｃ_i-2，ｃ_i-1，
ｃ_iとしては、単純な文字そのものでなく、予め読みの
情報を付与した文字コードを用いても差し支えない。[Equation 2] It is also possible to obtain the same effect by using, for example. In addition, i-2, i-1, i-th appearing c _i-2 , c _i-1 ,
As c _i , not a simple character itself, but a character code to which reading information is added in advance may be used.

【００２３】次にこれら選出された各候補文字に対し、
仮定する品詞の数に応じて別々の候補を作成する。Next, for each of the selected candidate characters,
Create different candidates according to the assumed number of parts of speech.

【００２４】例えば、図２に示すように、文字ｋ₁に対
し、品詞ｈ₁，ｈ₂，ｈ₃を仮定する場合、その組み合
わせによる３つの候補が作成される。そしてこれらの候
補の各々について、標準パターンメモリ４から文字の読
みに対する標準パターンを読みだし、ｉ番目の入力音声
との尤度を求め、文字・品詞統計的言語モデルメモリ５
から読みだした文字・品詞統計的言語モデルに基づくそ
の候補文字のｉ番目に生起する尤度との和を総合尤度と
し、この総合尤度の最も高い候補、例えば、ｋ ₁，ｈ₂
が最も総合尤度が高ければ、文字ｋ₁をｉ番目の認識結
果文字として認識結果出力部６へ出力する。For example, as shown in FIG. 2, the letter k₁Against
And part of speech h₁, H₂, H₃If we assume
Three candidates by combination are created. And these symptoms
For each supplement, read the character from the standard pattern memory 4.
Reads the standard pattern for the i-th input voice
And the likelihood of character and part-of-speech statistical language model memory 5
Characters and parts of speech read from
The sum of the i-th likelihood of the candidate character of
However, the candidate with the highest total likelihood, for example, k ₁, H₂
Is the highest overall likelihood, the letter k₁Is the i-th recognition result
The result is output to the recognition result output unit 6 as a character.

【００２５】ある文字が出力されたとき、総合尤度が最
も高かった候補が次の文字を出力した段階でも総合尤度
が最も高くなるとは限らない。そこで、総合尤度の高い
上記Ｂ個の候補を残し（これをビーム幅がＢであると言
う）、次の処理へ引き継ぐ。上位Ｂ個としたのは、すべ
ての候補を保持するとメモリ量が増大し、それに伴い処
理時間が長くなる等により困難であるためである。総合
尤度の最も高い候補の変更に伴い、認識結果出力部６へ
出力する候補文字も更新する。When a character is output, the total likelihood does not always become the highest even when the candidate having the highest total likelihood outputs the next character. Therefore, the above B candidates having a high total likelihood are left (the beam width is referred to as B), and the next process is carried over. The reason for setting the upper B is that it is difficult to hold all the candidates because the memory amount increases and the processing time increases accordingly. When the candidate having the highest total likelihood is changed, the candidate character output to the recognition result output unit 6 is also updated.

【００２６】この仮名・漢字候補選出及びその文字の品
詞の仮定と、それらについての標準パターンとの照合
と、その総合尤度から認識結果文字を尤度の高い順に仮
名・漢字系列として出力する。This kana / kanji candidate selection, the hypothesis of the part of speech of the character, the matching with the standard pattern for them, and the total likelihood thereof, the recognition result characters are output as a kana / kanji sequence in descending order of likelihood.

【００２７】なお、入力音声中に、学習テキストデータ
ベース中にない漢字があると、これを認識することがで
きない。この場合は、その認識できない文字（漢字）
を、認識結果の仮名・漢字系列中に空白として出力する
か、あるいは、音素、または仮名の生起順序に関する統
計的言語モデルと、隠れマルコフモデルの音素または音
節標準パターンとを儲け、データベースにない漢字は、
音素系列または仮名系列として出力してもよい。If there is a Kanji character in the input voice that is not in the learning text database, it cannot be recognized. In this case, the unrecognizable character (Kanji)
Is output as a blank in the kana / kanji sequence of the recognition result, or a statistical language model regarding the phoneme or kana occurrence sequence and the phoneme or syllable standard pattern of the hidden Markov model are obtained, and kanji that are not in the database Is
You may output as a phoneme sequence or a kana sequence.

【００２８】また、特徴抽出部２、認識部３、認識結果
出力部６はそれぞれ専用または、兼用のマイクロプロセ
ッサにより処理することができる。Further, the feature extraction unit 2, the recognition unit 3, and the recognition result output unit 6 can be processed by a dedicated or combined microprocessor.

【００２９】[0029]

【発明の効果】上述のように、本発明によれば、仮名・
漢字の生起順序に関する統計的言語モデルと隠れマルコ
フモデルの音節標準パターン及び漢字の読みに対する音
声連鎖標準パターンとを用いて、入力された音声から直
接漢字仮名混じり系列を出力する場合において、文字だ
けでなく品詞の生起順序に関する情報も用いて予め統計
的言語モデルを作成しておくことで、文法的に誤った候
補を統計的に削除することができ、処理時間を短縮し、
変換性能を向上させることができる。As described above, according to the present invention, the pseudonym
Using a statistical language model for the order of occurrence of Chinese characters, a syllabic standard pattern of Hidden Markov Models, and a phonetic chain standard pattern for reading of Chinese characters, when outputting a series of mixed Kana and Kana characters directly from the input speech, only the characters are output. By creating a statistical language model in advance using information on the part-of-speech occurrence order, it is possible to statistically delete grammatically incorrect candidates and reduce processing time.
The conversion performance can be improved.

[Brief description of drawings]

【図１】本発明の原理を説明するための図である。FIG. 1 is a diagram for explaining the principle of the present invention.

【図２】本発明の一実施例のブロック図である。FIG. 2 is a block diagram of an embodiment of the present invention.

【図３】本発明の一実施例の認識処理を説明するための
図である。FIG. 3 is a diagram illustrating a recognition process according to an embodiment of the present invention.

【図４】従来の日本語音声認識システムを説明するため
の図である。FIG. 4 is a diagram for explaining a conventional Japanese speech recognition system.

[Explanation of symbols]

１音声信号入力端子２特徴抽出部３認識部４標準パターンメモリ５文字・品詞統計的言語モデル６認識結果出力部１０学習用テキストデータベース１１仮名・漢字の生起順序に関する統計的言語モデル
情報２０学習用音声データベース２１隠れマルコフモデルの音素・標準パターン情報３０音素候補選出部４０認識部５０統計的言語モデルメモリ６０文字の読みに対応する音素系列メモリ１００学習用テキストデータベース１０１文字生起順序情報１０２品詞生起順序情報２００学習用音声データベース２０１隠れマルコフモデル音節標準パターン情報２０２文字読み標準パターン情報３００統計的言語モデル４００音節標準パターン1 voice signal input terminal 2 feature extraction unit 3 recognition unit 4 standard pattern memory 5 character / part-of-speech statistical language model 6 recognition result output unit 10 learning text database 11 statistical language model information about kana / kanji occurrence order 20 learning Speech database 21 Hidden Markov model phoneme / standard pattern information 30 Phoneme candidate selection unit 40 Recognition unit 50 Statistical language model memory 60 Phoneme sequence memory corresponding to reading characters 100 Text database for learning 101 Character occurrence sequence information 102 Part of speech occurrence order Information 200 Speech database for learning 201 Hidden Markov model syllable standard pattern information 202 Character reading standard pattern information 300 Statistical language model 400 Syllable standard pattern

Claims

[Claims]

1. A plurality of speech recognition candidates for a characteristic parameter time series of the input speech are obtained by using an input speech as a time series of characteristic parameters and using a statistical language model regarding an occurrence order created from a learning text database. For each of these speech recognition candidates, a candidate with a high total likelihood of occurrence likelihood and similar likelihood is selected by comparing the standard pattern of the hidden Markov model with the time series of the characteristic parameters of the input speech. In the Japanese speech recognition method in which the recognition result is, a statistical language model relating to the order of occurrence of characters and the order of occurrence of parts of speech created from the learning text database is used as the statistical language model. Japanese speech recognition method.

2. A plurality of speech recognition candidates are selected for the characteristic parameter time series of the input speech by using the input speech as a time series of characteristic parameters and using a statistical language model regarding the occurrence order created from a learning text database. For each of these speech recognition candidates, a candidate with a high total likelihood of occurrence likelihood and similar likelihood is selected by comparing the standard pattern of the hidden Markov model with the time series of the characteristic parameters of the input speech. In the Japanese speech recognition method using as a recognition result, a syllabic standard pattern and a syllable standard pattern for reading a character are used as the standard pattern.