JPH09244687A - Speech recognizing method - Google Patents

Speech recognizing method

Info

Publication number
JPH09244687A
JPH09244687A JP8049898A JP4989896A JPH09244687A JP H09244687 A JPH09244687 A JP H09244687A JP 8049898 A JP8049898 A JP 8049898A JP 4989896 A JP4989896 A JP 4989896A JP H09244687 A JPH09244687 A JP H09244687A
Authority
JP
Japan
Prior art keywords
environment
network
node
standard pattern
phoneme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP8049898A
Other languages
Japanese (ja)
Inventor
Tomokazu Yamada
智一 山田
Shigeki Sagayama
茂樹 嵯峨山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP8049898A priority Critical patent/JPH09244687A/en
Publication of JPH09244687A publication Critical patent/JPH09244687A/en
Pending legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To carry out highly precise speech recognition making good use of environment-dependent phoneme standard patterns. SOLUTION: While a network is generated by utilizing context free grammar having an environment-dependent phoneme as a termination symbol and an LR(land reserve) purser, a standard pattern is determined and while an input speech is matched against standard patterns of HMM(hidden markov model) a one-pass search is advanced; and environment-dependent phoneme standard patterns as used as the standard patterns of HMM and when an environment- dependent phoneme P3 is predicted by the LR purser from a tip node 3 at the time of the extension of the network, tracing from a new node 4 back to the last node is performed to set the environment-dependent phoneme standard patterns to the arc between nodes 3 and 4 by using phoneme strings P1 , P2 , and P3 as a triphone model.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【発明の属する技術分野】この発明は、隠れマルコフモ
デル(例えば中川聖一「確率モデルによる音声認識」電
子情報通信学会編(1988))と、一般化LR文脈解析アル
ゴリズム(例えばM.Tomita : "An Efficient Context-F
ree Parsing Algorithm for Natural Languages and it
s Applications ”, Carnegie-Mellon University(198
5)) と、One-Passサーチ・アルゴリズム(例えば J. Br
idle 他 :"An Algorithm For Connected Word Recogni
tion, ”ICASSP82 , pp. 899-902(1982)) とを用いて入
力音声を連続的に認識する方法に関する。
TECHNICAL FIELD The present invention relates to a hidden Markov model (for example, Seiichi Nakagawa “Speech Recognition by Probabilistic Model” edited by The Institute of Electronics, Information and Communication Engineers (1988)) and a generalized LR context analysis algorithm (for example, M. Tomita: " An Efficient Context-F
ree Parsing Algorithm for Natural Languages and it
s Applications ”, Carnegie-Mellon University (198
5)) and the One-Pass search algorithm (eg J. Br.
idle and others: "An Algorithm For Connected Word Recogni
, ICASSP82, pp. 899-902 (1982)) and a method for continuously recognizing input speech.

【0002】[0002]

【従来の技術】従来の音声認識において、LRパーザ
(文法解析機)を用いて、予め決められた文法にかなう
認識候補を探し、その認識候補の標準パターンと入力音
声との照合を行いながら認識を行う方法がある。また予
め決められたネットワークを用い、その一部について標
準パターンと入力音声との照合を行い、かつ、スコアの
小さいものを除き、処理量の増加を最小限にしながらサ
ーチを進め、最終的にスコアの最も高い候補を認識結果
とするOne−Passサーチ方法が知られている。
2. Description of the Related Art In conventional speech recognition, an LR parser (grammar analyzer) is used to search for a recognition candidate that meets a predetermined grammar, and recognition is performed while matching a standard pattern of the recognition candidate with an input speech. There is a way to do. In addition, using a predetermined network, the standard pattern is collated with the input voice for a part of it, and the search is advanced while minimizing the increase in the processing amount except for those with a small score, and finally the score is obtained. One-Pass search method is known in which the highest candidate of is the recognition result.

【0003】更に隠れマルコフモデル、一般化LRパー
ザ、One−Passアルゴリズムを統合した音声認識
の方法が知られている。この統合した認識方法としては
環境独立の音素による文脈自由文法からLRパーザを用
いてサブ・ネットワークを作成し、それらサブ・ネット
ワーク間の接合部分で環境依存の音素を考慮し、環境依
存の音素標準パターンを利用する方法(例えば伊藤克亘
他:「拡張LR構文解析法を用いた連続音声認識」信学
技報,SP90−74,49〜56頁)や、環境独立の
音素による文脈自由文法からLRパーザを用いて一つの
ネットワークを動的に作成し、環境独立の音素標準パタ
ーンを利用する方法があった。(例えば北研二他:「L
Rパーザ制御によるOne−Pass型連続音声認識ア
ルゴリズム」,信学技報、SP94−13(1994−
05)55〜62頁)。
Furthermore, a method of speech recognition is known in which a hidden Markov model, a generalized LR parser, and a One-Pass algorithm are integrated. As this integrated recognition method, a sub-network is created from the context-free grammar with environment-independent phonemes using the LR parser, and the environment-dependent phonemes are considered at the junctions between these sub-networks to determine the environment-dependent phoneme standard. From methods using patterns (for example, Katsutoshi Ito: “Continuous Speech Recognition Using Extended LR Parsing Method”, IEICE Technical Report, SP90-74, pp. 49-56) and context-free grammars using environment-independent phonemes. There was a method to dynamically create one network using a parser and use the phoneme standard pattern independent of environment. (For example, Kenji Kita et al .: "L
One-Pass continuous speech recognition algorithm by R parser control ", IEICE Tech., SP94-13 (1994-
05) 55-62).

【0004】効率の点ではサブ・ネットワークを用いず
に一つのネットワークで処理した方がよいし、音声認識
の精度の点では環境依存の音素標準パターンを利用した
方がよいが、両者を同時に利用できる方法はなかった。
From the viewpoint of efficiency, it is better to process with one network without using the sub-network, and from the viewpoint of accuracy of speech recognition, it is better to use the environment-dependent phoneme standard pattern, but both are used at the same time. There was no way I could do it.

【0005】[0005]

【発明が解決しようとする課題】この発明は、隠れマル
コフモデル、一般化LRパーザ、One−Passアル
ゴリズムを統合するに際し、一つのネットワークを使っ
て効率的に処理すると同時に、環境依存の音素標準パタ
ーンを利用することができ、認識率の高い音声認識方法
を提供することを目的とする。
SUMMARY OF THE INVENTION The present invention integrates a Hidden Markov Model, a generalized LR parser, and a One-Pass algorithm, efficiently processes them using a single network, and at the same time, uses environment-dependent phoneme standard patterns. It is an object of the present invention to provide a voice recognition method that can utilize the above and has a high recognition rate.

【0006】[0006]

【課題を解決するための手段】この発明によれば、環境
依存の音素標準パターンを用い、ネットワーク上で新た
に生成されたノードに対し、音素環境を考慮に入れる音
素数だけネットワークを遡ることによってLRパーザで
予測された環境独立の音素の組を特定し、そのノードへ
接続される弧に設定すべき環境依存の音素標準パターン
を決定する。
According to the present invention, by using the environment-dependent phoneme standard pattern, the network is traced back to the newly generated node on the network by the number of phonemes taking the phoneme environment into consideration. The set of environment-independent phonemes predicted by the LR parser is specified, and the environment-dependent phoneme standard pattern to be set in the arc connected to the node is determined.

【0007】[0007]

【発明の実施の形態】図1に、この発明の実施例を適用
した認識装置の構成を機能的に示す。入力端子1から入
力された音声は、特徴抽出部2においてディジタル信号
に変換され、更にLPCケプストラム分析された後、1
フレーム(例えば10ミリ秒)ごとに特徴パラメータに
変換される。この特徴パラメータは、例えばLPCケプ
ストラム係数である。
DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 functionally shows the structure of a recognition device to which an embodiment of the present invention is applied. The voice input from the input terminal 1 is converted into a digital signal in the feature extraction unit 2 and further subjected to LPC cepstrum analysis, and then 1
It is converted into a feature parameter every frame (for example, 10 milliseconds). This feature parameter is, for example, an LPC cepstrum coefficient.

【0008】学習用音声データベースより、上記特徴ベ
クトルと同一形式で、隠れマルコフモデルの環境依存の
音素標準パターンを作り、標準パターンメモリ4に記憶
してある。認識対象の範囲を規定する、環境独立(環境
非依存)の音素を終端記号とする文脈自由文法から、L
Rパーザ用のテーブルを作り、LRテーブル5に記憶し
てある。
An environment-dependent phoneme standard pattern of a hidden Markov model is created from the learning speech database in the same format as the above-mentioned feature vector and stored in the standard pattern memory 4. From the context-free grammar that defines the range of the recognition target and uses the environment-independent (environment-independent) phonemes as terminal symbols,
A table for the R parser is created and stored in the LR table 5.

【0009】認識部3での処理のフローチャートを図2
に示す。1フレーム分の入力音声の特徴パラメータを読
み込み(S1 )、(毎フレームまたは数フレームごと
に)ネットワークを成長させ(S2 )、入力音声特徴パ
ラメータと標準パターンとのパターン照合を行ない(S
3 )、次のステップでも処理を進めるアクティブ・ノー
ドを決定してステップS1 に戻る(S4 )。このことを
入力される音声がなくなるまで繰り返す。入力される音
声がなくならなくても、パターン照合によって得られた
スコアから、何らかの条件により処理を打ち切ってもよ
い。
FIG. 2 shows a flowchart of the processing in the recognition section 3.
Shown in The characteristic parameters of the input speech for one frame are read (S 1 ), the network is grown (every frame or every several frames) (S 2 ), and pattern matching between the input speech characteristic parameters and the standard pattern is performed (S 1).
3 ) Then, in the next step, the active node to be processed is determined and the process returns to step S 1 (S 4 ). This is repeated until there is no input voice. Even if the input voice does not disappear, the process may be terminated under some condition from the score obtained by the pattern matching.

【0010】ネットワークはノードと弧によって構成さ
れ、最初は初期ノード一つだけが存在する。各ノード
は、LRパーザを用いて生成される仮説で、スタックの
内容が同一のものに対して一つが割り当てられ、対応づ
けられる。そのため、LRパーザによってそれ以降の処
理が同じになるものは、同一のノードに統合される(同
一のノードに弧が接続される)。ここではこれをマージ
・ノードと呼ぶ。各ステップでの照合結果に基づき、得
られたスコアの高いものを、次のステップでも処理を進
めるべきアクティブ・ノードとする。それ以外のノード
は、そのままネットワーク中に保持してもよいし、メモ
リの有効利用のために、捨ててもよい。このようにして
ネットワークを、環境独立の音素終端信号とする文脈自
由文法とLRパーザを利用して拡張しながらOne−P
assサーチによりサーチを進めてゆく。
The network consists of nodes and arcs, initially there is only one initial node. Each node is a hypothesis generated by using the LR parser, and one having the same stack contents is assigned and associated. Therefore, LR parsers that perform the same processing thereafter are integrated into the same node (arcs are connected to the same node). Here, this is called a merge node. Based on the collation result in each step, the one with a high score is set as an active node to be processed in the next step. Other nodes may be retained in the network as they are, or may be discarded for effective use of memory. In this way, the One-P is extended while expanding the network by using the context-free grammar and the LR parser, which are environment-independent phoneme termination signals.
We will proceed with the search by the ass search.

【0011】図3を参照してこの発明の特徴部分である
ネットワークの生成(成長動作)を説明する。文脈自由
文法からは、無限の大きさのネットワークを作成可能な
ので、あらかじめネットワークの全体を作成しておけな
い場合が想定される。そこでネットワーク生成部では、
ネットワークの各アクティブ・ノードからLRパーザを
使って新たなノードを作成し、弧を伸ばす(ネットワー
クを成長させる)、といった動作を、有限個先の音素ま
でに限定する。ここでは一度に一つ音素を伸ばす場合で
説明する。図中、点線の上に示された記号は、LRパー
ザによって各ノード間に予測された環境独立の音素であ
る。ノード間に実線で示されているのがネットワークの
弧であり、この発明では環境依存の音素標準パターンが
設定される(図を見やすくするため、1−2,2−3,
5−6,6−3のノード間の弧は省略してある)。ここ
では、中心となる音素(C)の左側(L)及び右側
(R)の環境を考慮に入れた、いわゆるtriphon
eモデルの場合で説明し、L−C−Rと表記する。
The generation (growth operation) of the network, which is a characteristic part of the present invention, will be described with reference to FIG. Since a network of infinite size can be created from the context-free grammar, it is assumed that the entire network cannot be created in advance. So in the network generator,
The operation of creating a new node from each active node of the network using the LR parser and extending the arc (growing the network) is limited to a finite number of phonemes ahead. Here, a case where one phoneme is extended at a time will be described. In the figure, the symbols shown above the dotted lines are environment-independent phonemes predicted between the nodes by the LR parser. A solid line between the nodes is an arc of the network. In the present invention, an environment-dependent phoneme standard pattern is set (1-2, 2-3, in order to make the diagram easy to see).
Arcs between nodes 5-6 and 6-3 are omitted). Here, the so-called triphon, which takes into consideration the environment on the left side (L) and the right side (R) of the central phoneme (C).
This will be described in the case of the e model and will be referred to as LCR.

【0012】まず簡単な図3Aの場合で説明する。ノー
ド1〜3を生成済みのネットワークから、ノード3に対
してネットワークを成長させる場合を考える。LRパー
ザによって、ノード3から環境独立の音素P3 が予測さ
れ、新たなノード4で表されるスタックの状態に変化す
ることが分かったとする。ここで、ノード4から三つ前
のノードまでを遡ることにより、P1 、P2 、P3 とい
う3音素系列が得られる。これをもとにノード3−4間
の弧に、P1 −P2 −P3 という環境依存の音素標準パ
ターンを設定する。
First, a simple case of FIG. 3A will be described. Consider a case where the network is grown for the node 3 from the network in which the nodes 1 to 3 have been generated. It is assumed that the LR parser predicts the phoneme P 3 independent of the environment from the node 3 and changes it to the state of the stack represented by the new node 4. Here, by tracing back from the node 4 to the node three nodes before, a three-phoneme sequence of P 1 , P 2 , and P 3 is obtained. Based on this, an environment-dependent phoneme standard pattern of P 1 -P 2 -P 3 is set in the arc between the nodes 3-4.

【0013】ノード3がマージ・ノードである場合を図
3Bで説明する。この場合は、ノード4から三つ前のノ
ードまでを遡ることにより、ノード3−4間には、P1
−P 2 −P3 とP4 −P5 −P3 を設定する二本の弧が
必要であることが分かる。そして、音響的な整合性を確
保するため、ノード3を二つに分割するか、ノード2か
らの弧にはP1 −P2 −P3 の弧が、ノード6からの弧
にはP4 −P5 −P3が対応してデータの受渡しがされ
るように制約を設ける。
Diagram of the case where node 3 is a merge node
3B will be described. In this case, the node three nodes before node 4
By tracing back to the node, P1
−P Two−PThreeAnd PFour−PFive−PThreeThe two arcs that set
I find it necessary. And ensure the acoustic consistency
To keep node 3 split into two or node 2
P for those arcs1−PTwo−PThreeIs the arc from node 6
For PFour−PFive−PThreeAnd the data will be passed
So that there are restrictions.

【0014】処理量を削減するため、整合性を考慮しな
い方法も考えられる。この場合、得られる結果には誤差
が含まれる。最終的に、LRパーザ上で受理(acce
pt)された仮説は、一つのマージ・ノードとなる。全
ステップ終了時に、このマージ・ノードに最も高いスコ
アを伝搬した弧を遡ることにより、認識結果を得ること
ができる。音素の環境依存としてはtriphoneモ
デルに限らないが、標準パターンの音素環境依存モデル
の音素数と、ネットワーク拡張の際にネットワークを遡
る音素数とを等しくする。
In order to reduce the processing amount, it is possible to consider a method that does not consider consistency. In this case, the result obtained contains an error. Finally, on the LR parser
The pt) hypothesis becomes one merge node. At the end of all steps, the recognition result can be obtained by tracing back the arc that propagated the highest score to this merge node. The environment dependence of phonemes is not limited to the triphone model, but the number of phonemes in the phoneme environment-dependent model of the standard pattern is made equal to the number of phonemes going back in the network when the network is expanded.

【0015】[0015]

【発明の効果】仮に環境依存の音素を終端記号とするL
Rテーブルを用いた場合にマージ・ノードが少なく、文
法的には同一であるが、複数に分れたままネットワーク
が成長し、多数のパスが生じ、つまり多数の候補が生
じ、演算量が多くなる。しかしこの発明では、前記環境
依存の音素を終端記号とするLRテーブルを用いると分
散してしまうマージ・ノードを、環境独立(環境非依
存)の音素を終端記号としたLRテーブルを用いること
で回避し、しかもマージ・ノードから必要な数だけ弧と
ノードを辿って環境依存の音素標準パターンを決定し、
新たに作成したノードの弧に設定しているため、マージ
効率の高いネットワークを動的に作成しながら、環境依
存の音素標準パターンを利用した高精度の認識処理が可
能である。
EFFECT OF THE INVENTION Suppose that an environment-dependent phoneme is a terminal symbol.
When the R table is used, the number of merge nodes is small and the syntax is the same, but the network grows while being divided into a plurality of paths, that is, a large number of paths are generated, that is, a large number of candidates are generated, and a large amount of computation is required. Become. However, in the present invention, the merge node which is distributed when the LR table having the environment-dependent phonemes as the terminal symbols is dispersed is avoided by using the LR table having the environment-independent (environment-independent) phonemes as the terminal symbols. Moreover, the required number of arcs and nodes from the merge node are traced to determine the environment-dependent phoneme standard pattern,
Since it is set to the arc of the newly created node, it is possible to perform highly accurate recognition processing using environment-dependent phoneme standard patterns while dynamically creating a network with high merge efficiency.

【図面の簡単な説明】[Brief description of drawings]

【図1】この発明の実施例方法を適用した認識装置の機
能構成例を示すブロック図。
FIG. 1 is a block diagram showing a functional configuration example of a recognition device to which a method according to an embodiment of the present invention is applied.

【図2】この発明の実施例を示す認識処理部のフローチ
ャート。
FIG. 2 is a flowchart of a recognition processing unit showing an embodiment of the present invention.

【図3】この発明の実施例で、環境依存の音素標準パタ
ーンを、ネットワークの弧に設定する方法を説明する
図。
FIG. 3 is a diagram illustrating a method of setting an environment-dependent phoneme standard pattern in an arc of a network in an embodiment of the present invention.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 入力音声を特徴パラメータ時系列とし、 環境独立の音素を終端記号とする文脈自由文法とLRパ
ーザを利用してネットワークを生成しながら音素標準パ
ターンを決定すると共に、 上記入力音声の特徴パラメータ時系列と隠れマルコフモ
デルの上記標準パターンとの照合をとりながらOne−
Passサーチによりサーチを進めて類似尤度の高い候
補を認識結果とする音声認識方法において、 上記隠れマルコフモデルの標準パターンに環境依存の音
素標準パターンを用い、 上記ネットワークの拡張の際に、ネットワークの先端か
ら上記環境依存を考慮に入れた音素数だけネットワーク
を遡ることにより環境依存の音素標準パターンを決定し
てネットワーク上の弧を設定することを特徴とする音声
認識方法。
1. A phoneme standard pattern is determined while generating a network by using an LR parser and a context-free grammar whose input speech is a characteristic parameter time series and environment-independent phonemes are terminal symbols. One-while checking the characteristic parameter time series and the standard pattern of the hidden Markov model
In a speech recognition method in which a search is advanced by a Pass search and a candidate having a high similarity likelihood is used as a recognition result, an environment-dependent phoneme standard pattern is used as the standard pattern of the hidden Markov model, and when the network is expanded, A speech recognition method characterized in that an environment-dependent phoneme standard pattern is determined by tracing back the network from the tip by the number of phonemes taking the environment dependence into consideration and an arc on the network is set.
JP8049898A 1996-03-07 1996-03-07 Speech recognizing method Pending JPH09244687A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP8049898A JPH09244687A (en) 1996-03-07 1996-03-07 Speech recognizing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP8049898A JPH09244687A (en) 1996-03-07 1996-03-07 Speech recognizing method

Publications (1)

Publication Number Publication Date
JPH09244687A true JPH09244687A (en) 1997-09-19

Family

ID=12843845

Family Applications (1)

Application Number Title Priority Date Filing Date
JP8049898A Pending JPH09244687A (en) 1996-03-07 1996-03-07 Speech recognizing method

Country Status (1)

Country Link
JP (1) JPH09244687A (en)

Similar Documents

Publication Publication Date Title
CN108305634B (en) Decoding method, decoder and storage medium
Young et al. The HTK book
JP3453456B2 (en) State sharing model design method and apparatus, and speech recognition method and apparatus using the state sharing model
Kenny et al. A*-admissible heuristics for rapid lexical access
JP3834169B2 (en) Continuous speech recognition apparatus and recording medium
US7035802B1 (en) Recognition system using lexical trees
JP2000075895A (en) N best retrieval method for continuous speech recognition
JP3459712B2 (en) Speech recognition method and device and computer control device
JPH0583918B2 (en)
US6058365A (en) Speech processing using an expanded left to right parser
JP2841404B2 (en) Continuous speech recognition device
JP2002215187A (en) Speech recognition method and device for the same
US7464033B2 (en) Decoding multiple HMM sets using a single sentence grammar
KR100930714B1 (en) Voice recognition device and method
JP2003208195A (en) Device, method and program for recognizing consecutive speech, and program recording medium
Lee Hidden Markov models: past, present, and future.
JP2003208195A5 (en)
JPH09244688A (en) Speech recognizing method
JPH09244687A (en) Speech recognizing method
JPH1124693A (en) Speech recognition device
JP2000056795A (en) Speech recognition device
JP3535688B2 (en) Voice recognition method
JPH1097275A (en) Large-vocabulary speech recognition system
Bellegarda Context-dependent vector clustering for speech recognition
JP3818154B2 (en) Speech recognition method

Legal Events

Date Code Title Description
A621 Written request for application examination

Effective date: 20041029

Free format text: JAPANESE INTERMEDIATE CODE: A621

A977 Report on retrieval

Effective date: 20070621

Free format text: JAPANESE INTERMEDIATE CODE: A971007

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Effective date: 20070703

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20070704

FPAY Renewal fee payment (prs date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100713

Year of fee payment: 3

R150 Certificate of patent (=grant) or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (prs date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110713

Year of fee payment: 4

FPAY Renewal fee payment (prs date is renewal date of database)

Year of fee payment: 5

Free format text: PAYMENT UNTIL: 20120713

FPAY Renewal fee payment (prs date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120713

Year of fee payment: 5

FPAY Renewal fee payment (prs date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130713

Year of fee payment: 6

LAPS Cancellation because of no payment of annual fees