JPH08221091A - Voice recognition device - Google Patents

Voice recognition device

Info

Publication number
JPH08221091A
JPH08221091A JP7029432A JP2943295A JPH08221091A JP H08221091 A JPH08221091 A JP H08221091A JP 7029432 A JP7029432 A JP 7029432A JP 2943295 A JP2943295 A JP 2943295A JP H08221091 A JPH08221091 A JP H08221091A
Authority
JP
Japan
Prior art keywords
word
probability
occurrence probability
score
storage unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP7029432A
Other languages
Japanese (ja)
Other versions
JP3304665B2 (en
Inventor
Mitsuru Endo
充 遠藤
Tatsuro Ito
達朗 伊藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP02943295A priority Critical patent/JP3304665B2/en
Publication of JPH08221091A publication Critical patent/JPH08221091A/en
Application granted granted Critical
Publication of JP3304665B2 publication Critical patent/JP3304665B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

PURPOSE: To provide a device capable of using the whole situational limitation and with less storage capacity when a language score based on the probability that a present word is generated is calculated from a beforehand recognized preceding word system by using all of preceding words and using the conditional probability obtained by the operation among the generative probability, the syndetic co-occurrence probability and the non-syndetic co-occurrence probability. CONSTITUTION: From the beforehand recognized word system preserved in a word system preservation part 4, the generative probability of the word stored in a generative probability storage part 7, the syndetic co-occurrence probability stored in a syndetic co-occurrence probability storage part 8 and the non- syndetic co-occurrence probability stored in a non-syndetic co-occurrence probability storage part 9, the language score is calculated in a language score calculation part 5. A recognition part 3 receives the language score from the language score calculation part 5, and calculates a total score by an acoustic score calculated by the recognition part 3 and the language score, and it transfers the word system with the high total score to the word system preservation part 4. The word system preservation part 4 outputs the preserving word system as the recognition result after an input of a voice is ended.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は単語を連続して発声され
た音声の認識を行う音声認識装置に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device for recognizing a voice in which words are continuously uttered.

【0002】[0002]

【従来の技術】近年、統計的言語モデルを用いて音声認
識性能を向上させる試みが行われてきている。このよう
な音声認識装置の例としては、例えば、村上他”単語の
trigramを利用した文音声認識と自由発話認識への拡
張”電子情報通信学会技術報告、SP93-127(1994)をあげ
ることができる。統計的言語モデルとしては、2連続単
語に関する確率を用いるバイグラム、3連続単語に関す
る確率を用いるトライグラム、4連続単語に関する確率
を用いるテトラグラムなどがあり、入力の時間軸に沿っ
たレフト・トゥー・ライトの処理に置いては、バイグラ
ムは先行する1つの単語からの制約となり、トライグラ
ムは先行する2つの単語からの制約、テトラグラムは先
行する3つの単語からの制約となる。統計的言語モデル
としてバイグラムを用いた場合、制約が緩すぎるため、
認識結果に、日本語文として成り立たない非文が多く含
まれる。制約を強くするためには、トライグラム、テト
ラグラム、…というようにより大局的な制約を用いれば
よい。一方、必要な記憶容量の点からは、バイグラムが
認識対象に含まれる単語の語彙数の2乗、トライグラム
は語彙数の3乗、テトラグラムは語彙数の4乗の記憶容
量が必用となる。制約の強さと必要な記憶容量との兼ね
合いで、統計的言語モデルとしてはトライグラムが使わ
れているというのが現状である。
2. Description of the Related Art Recently, attempts have been made to improve speech recognition performance by using a statistical language model. An example of such a voice recognition device is, for example, Murakami et al.
Sentence speech recognition using trigram and extension to free speech recognition "Technical report of the Institute of Electronics, Information and Communication Engineers, SP93-127 (1994) can be cited. As a statistical language model, a bigram using probability for two consecutive words, There are trigrams using probabilities related to 3 consecutive words, tetragrams using probabilities related to 4 consecutive words, etc. In the left-to-right processing along the time axis of the input, the bigram is composed of the preceding one word. The trigram is a constraint from the preceding two words, the tetragram is a constraint from the preceding three words, and when the bigram is used as the statistical language model, the constraint is too loose.
The recognition result contains many non-sentences that do not hold as Japanese sentences. In order to strengthen the constraint, a global constraint such as trigram, tetragram, ... May be used. On the other hand, in terms of the required storage capacity, a bigram requires a storage capacity of the vocabulary number of the word included in the recognition target squared, a trigram has a storage capacity of the vocabulary number of the cube 3, and a tetragram requires a storage capacity of the vocabulary number of the fourth power. . Currently, Trigram is used as a statistical language model due to the balance between the strength of the constraint and the required memory capacity.

【0003】以下に統計的言語モデルとしてトライグラ
ムを使った従来の音声認識装置について説明する。
A conventional speech recognition apparatus using a trigram as a statistical language model will be described below.

【0004】3図は従来の音声認識装置の概略構成を示
すものであり、1は入力された音声を特徴パラメータの
時系列に変換する特徴パラメータ算出部、2は予め学習
データより作成された単語標準パターンを格納する単語
標準パターン格納部、3は特徴パラメータと単語標準パ
ターンとの類似度である音響スコアと言語スコア計算部
による言語スコアとにより単語系列に対する総合スコア
を計算する認識部、4は総合スコアの高い単語系列を保
存する単語系列保存部、5は既に認識された先行単語系
列から次に前記単語が生起する確率を基にしたスコアを
計算する言語スコア計算部、6はある単語が先行する2
単語の直後に生起する確率を格納するトライグラム格納
部である。
FIG. 3 shows a schematic structure of a conventional speech recognition apparatus. Reference numeral 1 is a characteristic parameter calculation unit for converting an input speech into a time series of characteristic parameters, and 2 is a word prepared in advance from learning data. A word standard pattern storage unit for storing a standard pattern, 3 is a recognition unit for calculating an overall score for a word sequence based on an acoustic score as a similarity between a feature parameter and a word standard pattern, and a language score by a language score calculation unit, and 4 is a recognition unit. A word sequence storage unit for storing a word sequence having a high total score, 5 is a language score calculation unit for calculating a score based on the probability that the word will occur next from an already recognized preceding word sequence, and 6 is a word score calculation unit. Preceding 2
It is a trigram storage unit that stores the probability of occurring immediately after a word.

【0005】以下に、音声認識装置の動作について簡単
に説明する。入力された音声を特徴パラメータ算出部1
で特徴パラメータ(例えば、LPCケプストラム)に変
換し、認識部3において、特徴パラメータ算出部1から
受け取った特徴パラメータと、単語標準パターン格納部
2に格納してある単語標準パターンとのマッチング処理
(例えば、DPマッチング)を行って、音響スコアを計
算する。一方、単語系列保存部4に保存してある既に認
識された単語系列と、トライグラム格納部6に格納して
あるトライグラムとから、言語スコア算出部5において
言語スコアを計算し、認識部3は、言語スコア計算部5
から言語スコアを受け取り、音響スコアと言語スコアと
による総合スコアを計算し、総合スコアの高い単語系列
を単語系列保存部4に渡す。単語系列保存部4は、音声
の入力が終了した後、保存している単語系列を認識結果
として出力する。
The operation of the voice recognition device will be briefly described below. The input voice is converted into the characteristic parameter calculation unit 1
In the recognition unit 3, a matching process is performed between the feature parameter received from the feature parameter calculation unit 1 and the word standard pattern stored in the word standard pattern storage unit 2 (for example, LPC cepstrum). , DP matching) to calculate an acoustic score. On the other hand, the language score calculation unit 5 calculates a language score from the already recognized word sequence stored in the word sequence storage unit 4 and the trigram stored in the trigram storage unit 6, and the recognition unit 3 Is the language score calculator 5
The language score is received from, the total score based on the acoustic score and the language score is calculated, and the word sequence having a high total score is passed to the word sequence storage unit 4. The word series storage unit 4 outputs the stored word series as a recognition result after the voice input is completed.

【0006】次に言語スコア計算部5における言語スコ
アの計算の方法をさらに詳しく説明する。
Next, the method of calculating the language score in the language score calculator 5 will be described in more detail.

【0007】入力として発声された単語系列が、w1,w2,
w3,w4,w5,w6である時に、この単語系列に対する言語ス
コアは、
The word sequence uttered as an input is w 1 , w 2 ,
When w 3 , w 4 , w 5 , w 6 , the language score for this word sequence is

【0008】[0008]

【数1】 [Equation 1]

【0009】の例えば対数値で表される。ここで、P
(w1,w2,w3,w4,w5,w6)は単語系列w1,w2,w3,w4,w5,w6の生
起確率であり、1番目の単語の生起確率P(w1)と、1番
目の単語が生起したという条件の下で2番目の単語が生
起するという条件付き確率P(w2|w1)と、1番目と2番目
の単語の単語系列が生起したという条件の下で3番目の
単語が生起するという条件付き確率P(w3|w1,w2)と、1
番目の単語から3番目までの単語系列が生起したという
条件の下で4番目の単語が生起するという条件付き確率
P(w4|w1,w2,w3)と、1番目の単語から4番目までの単語
系列が生起したという条件の下で5番目の単語が生起す
るという条件付き確率P(w5|w1,w2,w3,w4)と、1番目の
単語から5番目までの単語系列が生起したという条件の
下で6番目の単語が生起するという条件付き確率P(w6|w
1,w2,w3,w4,w5)との積で表される。そして、トライグラ
ムを使う場合、1番目の単語の生起確率P(w1)と、1番
目の単語が生起したという条件の下で2番目の単語が生
起するという条件付き確率P(w2|w 1)と、1番目の単語か
ら2番目までの単語系列が生起したという条件の下で3
番目の単語が生起するという条件付き確率P(w3|w1,w2)
と、2番目の単語から3番目までの単語系列が生起した
という条件の下で4番目の単語が生起するという条件付
き確率P(w4|w2,w3)と、3番目の単語から4番目までの
単語系列が生起したという条件の下で5番目の単語が生
起するという条件付き確率P(w5|w3,w4)と、4番目の単
語から5番目までの単語系列が生起したという条件の下
で6番目の単語が生起するという条件付き確率P(w6|w4,
w5)との積で近似する。この式にしたがって、認識の途
中段階では、既に認識された単語系列(トライグラムの
場合は先行単語の内の最後の2単語)が発声されたとい
う条件の下で、その直後に現在の単語が生起する確率、
例えば図4(a)のように4番目の単語であればP(w4|w2,w
3)、図4(b)のように5番目の単語であればP(w5|w3,w4)
が、言語的な制約として働く。
For example, it is represented by a logarithmic value. Where P
(w1, w2, w3, wFour, wFive, w6) Is the word sequence w1, w2, w3, wFour, wFive, w6Raw
Occurrence probability, which is the probability of occurrence of the first word P (w1) And No. 1
The second word is generated under the condition that the eye word is generated.
Conditional probability P (w2| w1) And the first and second
3rd term on the condition that the word sequence of
The conditional probability P (w3| w1, w2) And 1
The word sequence from the 1st word to the 3rd word occurred
Conditional probability that the fourth word will occur under the conditions
P (wFour| w1, w2, w3) And the first to fourth words
The fifth word occurs under the condition that the sequence occurs
Conditional probability P (wFive| w1, w2, w3, wFour) And the first
With the condition that the word sequence from the word to the fifth has occurred
Conditional probability P (w that the sixth word below occurs6| w
1, w2, w3, wFour, wFive) And the product. And Trigra
When using the program, the probability of occurrence of the first word P (w1) And No. 1
The second word is generated under the condition that the eye word is generated.
Conditional probability P (w2| w 1) And the first word
3 under the condition that the second word sequence has occurred
Conditional probability P (w3| w1, w2)
And the word sequence from the second word to the third word occurred
Conditional that the fourth word occurs under the condition
Probability P (wFour| w2, w3) And the 3rd word through the 4th
The fifth word is generated under the condition that the word sequence has occurred.
Conditional probability P (wFive| w3, wFour) And the fourth simple
Under the condition that the word sequence from the fifth word to the fifth word occurs
Conditional probability that the sixth word occurs in P (w6| wFour,
wFive) And the product. According to this formula,
In the middle stage, already recognized word sequences (trigram
In this case, the last two words of the preceding words are said to have been spoken.
The probability that the current word will occur immediately after
For example, if the fourth word is P (wFour| w2, w
3), And if it is the fifth word as shown in FIG. 4 (b), P (wFive| w3, wFour)
However, it works as a linguistic constraint.

【0010】[0010]

【発明が解決しようとする課題】しかしながら上記の従
来の構成では、2単語以上前の単語を現在の単語への制
約として利用できないので、大局的な制約を利用できな
いという課題を有していた。また、これを解決しようと
して、トライグラムの代わりにテトラグラム、ペンタグ
ラム、…を使用しようとすると、認識対象の単語の語彙
数の4乗、5乗、…の記憶容量が必用となり、実用的な
語彙数を対象とした場合には実現できなくなる。
However, the above-mentioned conventional configuration has a problem that the global constraint cannot be used because the word that is more than two words before cannot be used as the constraint for the current word. Further, when trying to solve this problem by using tetragrams, pentagrams, ... Instead of trigrams, the storage capacity of the fourth, fifth, ... This cannot be achieved when targeting a large number of vocabularies.

【0011】本発明は上記従来の課題を解決するもの
で、大局的な制約を利用できて、かつ記憶容量が少ない
音声認識装置を提供することを目的とする。
The present invention solves the above-mentioned conventional problems, and an object of the present invention is to provide a voice recognition device which can utilize global restrictions and has a small storage capacity.

【0012】[0012]

【課題を解決するための手段】この目的を達成するため
に本発明の音声認識装置は、既に認識された先行単語系
列から次に現在の単語が生起する確率を基にした言語ス
コアを計算する際に、先行単語のすべてを利用し、生起
確率と連接共起確率と非連接共起確率との演算により求
めた条件付き確率を使うことを特徴とする言語スコア計
算部を有している。
To achieve this object, the speech recognition apparatus of the present invention calculates a language score based on the probability that the next current word will occur from the already recognized preceding word sequence. At this time, the language score calculation unit is characterized by using all the preceding words and using conditional probabilities obtained by calculation of the occurrence probability, the concatenated co-occurrence probability, and the non-concatenated co-occurrence probability.

【0013】[0013]

【作用】この構成によって、先行するすべての単語から
の影響を現在の単語の認識スコアに反映するので大局的
な制約を利用でき、かつ、必要な情報は認識対象の単語
の語彙数の2乗の記憶容量で記憶できるので記憶容量が
少ない音声認識装置が実現できる。
With this configuration, the influence from all the preceding words is reflected in the recognition score of the current word, so global constraints can be used, and the necessary information is the square of the vocabulary of the recognition target word. Since it can be stored with the storage capacity of, a voice recognition device with a small storage capacity can be realized.

【0014】[0014]

【実施例】以下本発明の一実施例について、図面を参照
しながら説明する。
An embodiment of the present invention will be described below with reference to the drawings.

【0015】図1は本発明による音声認識装置の一実施
例の概略構成を示すブロック線図である。図1におい
て、1は入力された音声を特徴パラメータの時系列に変
換する特徴パラメータ算出部、2は予め学習データより
作成された単語標準パターンを格納する単語標準パター
ン格納部、3は特徴パラメータと単語標準パターンとの
類似度である音響スコアと言語スコア計算部による言語
スコアとにより単語系列に対する総合スコアを計算する
認識部、4は総合スコアの高い単語系列を保存する単語
系列保存部、5は既に認識された先行単語系列から次に
前記単語が生起する確率を基にした言語スコアを計算す
る際に、先行単語のすべてを利用し、生起確率と連接共
起確率と非連接共起確率との演算により求めた条件付き
確率を使うことを特徴とする言語スコア計算部、7はあ
る単語が生起する確率を格納する生起確率格納部、8は
前記単語が別のある単語の直後に生起する確率を格納す
る連接共起確率格納部、9は前記単語が別のある単語の
2つ以上後方に生起する確率を格納する非連接共起確率
格納部である。
FIG. 1 is a block diagram showing a schematic configuration of an embodiment of a voice recognition device according to the present invention. In FIG. 1, 1 is a feature parameter calculation unit that converts input speech into a time series of feature parameters, 2 is a word standard pattern storage unit that stores a word standard pattern created from learning data in advance, and 3 is a feature parameter. A recognition unit that calculates an overall score for a word sequence based on an acoustic score, which is the degree of similarity to a word standard pattern, and a linguistic score calculated by a linguistic score calculation unit, 4 is a word sequence storage unit that stores a word sequence with a high overall score, and 5 is When calculating a language score based on the probability that the word will occur next from the already recognized preceding word sequence, all of the preceding words are used, and the occurrence probability, the concatenated co-occurrence probability, and the non-concatenated co-occurrence probability are used. The language score calculation unit is characterized by using the conditional probability obtained by the calculation of, the occurrence probability storage unit for storing the probability of occurrence of a certain word, and the reference numeral 8 Is a concatenated co-occurrence probability storage unit that stores the probability of occurring immediately after another certain word, and 9 is a non-concatenated co-occurrence probability storage unit that stores the probability that the word occurs two or more times behind another certain word. is there.

【0016】以上のように構成された音声認識装置の動
作について簡単に説明する。入力された音声を特徴パラ
メータ算出部1で特徴パラメータ(例えば、LPCケプ
ストラム)に変換し、認識部3において、特徴パラメー
タ算出部1から受け取った特徴パラメータと、単語標準
パターン格納部2に格納してある単語標準パターンとの
マッチング処理(例えば、DPマッチング)を行って、
音響スコアを計算する。一方、単語系列保存部4に保存
してある既に認識された単語系列と、生起確率格納部7
に格納してある単語の生起確率と連接共起確率格納部8
に格納してある連接共起確率と非連接共起確率格納部に
格納してある非連接共起確率とから、言語スコア算出部
5において言語スコアを計算し、認識部3は、言語スコ
ア計算部5から言語スコアを受け取り、音響スコアと言
語スコアとにより総合スコアを計算(例えば和、又は線
形和等)し、総合スコアの高い単語系列を単語系列保存
部4に渡す。単語系列保存部4は、音声の入力が終了し
た後、保存している単語系列を認識結果として出力す
る。
The operation of the speech recognition apparatus configured as above will be briefly described. The input voice is converted into a characteristic parameter (for example, LPC cepstrum) by the characteristic parameter calculation unit 1, and the recognition unit 3 stores the characteristic parameter received from the characteristic parameter calculation unit 1 and the word standard pattern storage unit 2. By performing a matching process (for example, DP matching) with a certain word standard pattern,
Calculate the acoustic score. On the other hand, the already-recognized word sequence stored in the word sequence storage unit 4 and the occurrence probability storage unit 7
Occurrence probability and concatenated co-occurrence probability storage unit 8 stored in
From the concatenated co-occurrence probability stored in and the non-concatenated co-occurrence probability stored in the storage unit, the language score calculation unit 5 calculates a language score, and the recognition unit 3 calculates the language score. The language score is received from the unit 5, the total score is calculated (eg, sum or linear sum) from the acoustic score and the language score, and the word sequence having a high total score is passed to the word sequence storage unit 4. The word series storage unit 4 outputs the stored word series as a recognition result after the voice input is completed.

【0017】以上のように動作する音声認識装置の言語
スコア計算部5における言語スコアの計算の方法をさら
に詳しく説明する。
The method of calculating the language score in the language score calculation unit 5 of the speech recognition apparatus operating as described above will be described in more detail.

【0018】入力として発声された単語系列が、w1,w2,
w3,w4,w5,w6である時に、この単語系列に対する言語ス
コアは、
The word sequence uttered as an input is w 1 , w 2 ,
When w 3 , w 4 , w 5 , w 6 , the language score for this word sequence is

【0019】[0019]

【数2】 [Equation 2]

【0020】の例えば対数値で表される。ここで、P
(w1,w2,w3,w4,w5,w6)は単語系列w1,w2,w3,w4,w5,w6の生
起確率であり、1番目の単語の生起確率P(w1)と、1番
目の単語が生起したという条件の下で2番目の単語が生
起するという条件付き確率P(w2|w1)と、1番目の単語か
ら2番目までの単語系列が生起したという条件の下で3
番目の単語が生起するという条件付き確率P(w3|w1,w2)
と、1番目の単語から3番目までの単語系列が生起した
という条件の下で4番目の単語が生起するという条件付
き確率P(w4|w1,w2,w3)と、1番目の単語から4番目まで
の単語系列が生起したという条件の下で5番目の単語が
生起するという条件付き確率P(w5|w1,w2,w3,w4)と、1
番目の単語から5番目までの単語系列が生起したという
条件の下で6番目の単語が生起するという条件付き確率
P(w6|w1,w2,w3,w4,w5)との積で表される。ここで、既に
認識された先行単語系列から現在の単語への影響が、先
行単語のそれぞれから現在の単語へ独立に影響すると仮
定し、さらに、先行する単語系列のそれぞれの単語から
現在の単語への影響を直前の単語からの影響(連接共起
確率Pcon(wi|wj)で表す)と、2つ以上前の単語からの
影響(非連接共起確率Psep(wi|wj) で表す)との2種類
の影響で表現すると、ベイズの定理より次式のように近
似できる。
For example, it is represented by a logarithmic value. Where P
(w 1 , w 2 , w 3 , w 4 , w 5 , w 6 ) is the occurrence probability of the word sequence w 1 , w 2 , w 3 , w 4 , w 5 , w 6 and is the first word Occurrence probability P (w 1 ) and conditional probability P (w 2 | w 1 ) that the second word occurs under the condition that the first word occurs, and the first word to the second 3 under the condition that the word sequence of
Conditional probability P (w 3 | w 1 , w 2 ) that the th word occurs
And the conditional probability P (w 4 | w 1 , w 2 , w 3 ) that the 4th word occurs under the condition that the 1st word to the 3rd word sequence occur, and the 1st word Conditional probability P (w 5 | w 1 , w 2 , w 3 , w 4 ) that the fifth word occurs under the condition that the fourth to the fourth word sequence occurs, and 1
Conditional probability that the 6th word occurs under the condition that the 5th word sequence up to the 5th word occurs
It is represented by the product of P (w 6 | w 1 , w 2 , w 3 , w 4 , w 5 ). Here, it is assumed that the influence of the previously recognized preceding word sequence on the current word independently influences each of the preceding words to the current word, and further, each word of the preceding word sequence changes to the current word. Influence from the immediately preceding word (represented by the concatenated co-occurrence probability P con (w i | w j )) and the influence from two or more previous words (non-concatenated co-occurrence probability P sep (w i | w j It can be approximated by the following formula from Bayes' theorem.

【0021】[0021]

【数3】 (Equation 3)

【0022】ここで、Pcon(wi|wj)は単語wjの直後に単
語wiが生起する確率、Psep(wi|wj)は、単語wjの2つ以
上後方に単語wiが生起する確率、P(wi)は単語wiが生起
する確率である。
Here, P con (w i | w j ) is the probability that the word w i occurs immediately after the word w j , and P sep (w i | w j ) is two or more words behind the word w j. The probability that the word w i occurs, P (w i ) is the probability that the word w i occurs.

【0023】従来例との違いは、認識の途中段階で、既
に認識された単語系列が発声されたという条件の下で、
その直後に単語wiが生起する確率に基づいた言語スコア
を求める際に、従来例のトライグラムを使う場合には、
The difference from the conventional example is that, in the middle of the recognition, the already recognized word sequence is uttered,
Immediately thereafter, when using the trigram of the conventional example when obtaining a language score based on the probability that the word w i occurs,

【0024】[0024]

【数4】 [Equation 4]

【0025】で近似し、本実施例によれば、According to the present embodiment,

【0026】[0026]

【数5】 (Equation 5)

【0027】で近似する。例えば、本実施例では、図2
(a)のように4単語目の単語に対する言語スコアは、1
番目の単語の後方に4番目の単語が生起する非連接共起
確率Ps ep(w4|w1)と、2番目の単語の後方に4番目の単
語が生起する非連接共起確率Pse p(w4|w2)と、3番目の
単語直後に4番目の単語の直後に4番目の単語が生起す
る連接共起確率Pcon(w4|w3)との積を4番目の単語の生
起確率P(w4)の2乗で割った値の対数値である。同様に
して、図2(b)のように5番目の単語に対する言語スコ
アは、1番目の単語の後方に5番目の単語が生起する非
連接共起確率Psep(w5|w1)と2番目の単語の後方に5番
目の単語が生起する非連接共起確率Psep(w5|w2)と3番
目の単語の後方に5番目の単語が生起する非連接共起確
率Psep(w5|w3)と4番目の単語の直後に5番目の単語が
生起する連接共起確率Pcon(w5|w4)との積を5番目の単
語の生起確率P(w5)の3乗で割った値の対数値である。
It is approximated by For example, in this embodiment, as shown in FIG.
As in (a), the language score for the fourth word is 1
Non-joint co-occurrence probability P s ep (w 4 | w 1 ) that the fourth word occurs after the second word and non-joint co-occurrence probability P that the fourth word occurs after the second word The product of se p (w 4 | w 2 ) and the concatenation co-occurrence probability P con (w 4 | w 3 ) in which the fourth word occurs immediately after the fourth word immediately after the third word is the fourth Is the logarithmic value of the value obtained by dividing the occurrence probability P (w 4 ) of the word by. Similarly, as shown in FIG. 2B, the linguistic score for the fifth word is the non-joint co-occurrence probability P sep (w 5 | w 1 ) in which the fifth word occurs after the first word. Non-joint co-occurrence probability P sep (w 5 | w 2 ) in which the fifth word occurs after the second word and non-joint co-occurrence probability P sep in which the fifth word occurs after the third word The product of (w 5 | w 3 ) and the concatenated co-occurrence probability P con (w 5 | w 4 ) that occurs immediately after the fourth word, P con (w 5 | w 4 ), is the occurrence probability P (w 5 of the fifth word. ) Is the logarithmic value of the value divided by the third power.

【0028】本実施例による音声認識装置の認識対象の
文(文と単語系列は同義である)の数と従来の音声認識
装置の認識対象の文の数を(表1)に比較して示す。
The number of sentences to be recognized by the speech recognition apparatus according to this embodiment (sentence and word sequence have the same meaning) and the number of sentences to be recognized by the conventional speech recognition apparatus are shown in comparison with (Table 1). .

【0029】[0029]

【表1】 [Table 1]

【0030】この(表1)は、「申告するものは、あり
ません。」「全日空のカウンターはどこですか。」など
の空港での会話を集めた162文例のデータから、従来
例ではトライグラムを、本実施例では連接共起確率、非
連接共起確率、生起確率を求め、認識方法にしたがって
求めた単語系列に対する生起確率が0でない単語系列を
すべて求めて、単語系列に含まれる単語数毎にまとめた
ものである。
This (Table 1) shows a trigram in the conventional example from the data of 162 sentences collected from conversations at the airport such as "There is nothing to declare.""Where is the counter at ANA?" In the present embodiment, concatenated co-occurrence probabilities, non-concatenated co-occurrence probabilities, and occurrence probabilities are obtained, all word sequences whose occurrence probabilities are not 0 with respect to the word sequences obtained according to the recognition method are obtained, and the number of words included in the word sequence It is a summary.

【0031】例文から取り出した情報(トライグラム
や、連接共起確率など)により再合成した文の数は、少
ないことが望ましく、文例の数に近いほど良い。それ
は、認識対象が少ない方が、誤認識の可能性が少ないの
で認識に有利となるからである。
It is desirable that the number of sentences recombined by the information (trigram, concatenation co-occurrence probability, etc.) extracted from the example sentences is small, and the closer to the number of sentence examples, the better. This is because the smaller the number of recognition targets, the less the possibility of erroneous recognition, and the more advantageous the recognition.

【0032】この(表1)から明らかになように、認識
対象となる文の数は、本実施例の方が例文の数に近く、
認識対象を限定する制約として強く働いている。認識対
象文の過剰生成を表す、文の増加分を見ると本実施例の
51文という数は、従来例の119文の半分以下に押さ
えられている。
As is clear from this (Table 1), the number of sentences to be recognized is closer to the number of example sentences in this embodiment,
It works strongly as a constraint to limit the recognition target. Looking at the increased number of sentences, which represents the excessive generation of recognition target sentences, the number of 51 sentences in the present embodiment is suppressed to less than half of the 119 sentences in the conventional example.

【0033】また、過剰生成された文の内容を見ると、
以下のようなことがわかる。従来例では、制約が局所的
な制約であるために、大局的には成立しないような文で
あっても局所的な制約を満たせば認識の対象となり得
る。例えば「全日空のカウンターへこの荷物をタクシー
乗り場まで運んで下さい。」「両替はどこから出ていま
すか?」「ニューヨークには何時に始りますか。」など
は、明らかに日本語としておかしいが、このような55
の非文(日本語として成り立たない文)が認識対象に含
まれている。一方、本実施例の認識対象における例文以
外の文は、ある単語系列を省略したと考えられる文(例
えば「ニューヨークはどうですか。」など)がほとんど
で、全く意味がとれない非文は、「飛行機の切符はどこ
でできますか。」「切符はどこでできますか。」という
2例のみであった。このように本実施例において非文が
少ない理由は、先行するすべての単語と共起する単語の
みを先行単語系列につなげることができるので、認識対
象となる文は、1文中から取り出した単語対のすべてが
共起する単語対からなる、つまり、文中の単語は意味的
に整合の取れた単語集団であることが期待でき、また、
1文中から取り出したすべての2連続単語が、存在し得
る2連続単語であることから構文的に正しい文であるこ
とが期待できるからである。
Looking at the contents of the overgenerated sentence,
You can see the following. In the conventional example, since the constraint is a local constraint, even a sentence that does not hold globally can be a target of recognition if the local constraint is satisfied. For example, "Please bring this baggage to the taxi stand to the ANA counter.""Where does the money exchange come from?""What time does it start in New York?" 55 like this
Non-sentences (sentences that do not hold as Japanese) are included in the recognition target. On the other hand, most of the sentences other than the example sentences in the recognition target of the present embodiment are sentences that are considered to have omitted a certain word sequence (for example, "How about New York?"), And non-sentences that have no meaning at all are "airplane". Where can I get the ticket? ”" Where can I get the ticket? " As described above, the reason why the number of non-sentences is small in this embodiment is that only words that co-occur with all preceding words can be connected to the preceding word series, and therefore the sentence to be recognized is a word pair extracted from one sentence. Consists of co-occurring word pairs, that is, the words in the sentence can be expected to be a semantically consistent group of words, and
This is because all the two consecutive words extracted from one sentence are the two consecutive words that can exist, so that it can be expected that the sentence is syntactically correct.

【0034】ところで、記憶容量に関して言えば、認識
対象内の単語の語彙数が256であるから、従来例は25
63=16777216の記憶容量が必要であるが、本実施例で
は、256 2×2+256=131328という従来例の1/100以下の記
憶容量があれば良い。
By the way, in terms of storage capacity, recognition
Since the number of vocabulary of words in the target is 256, the conventional example is 25
63= 16777216 storage capacity is required, but in this embodiment
Is 256 2× 2 + 256 = 131328, 1/100 or less of the conventional example
It is enough if there is enough storage capacity.

【0035】以上のように本実施例によれば、既に認識
された先行単語系列から次に現在の単語が生起する確率
を基にした言語スコアを計算する際に、先行単語のすべ
てを利用し、生起確率と連接共起確率と非連接共起確率
との演算により求めた条件付き確率を使うことを特徴と
する言語スコア計算部を設けることにより、大局的な制
約を利用でき、かつ、記憶容量が少ない、優れた音声認
識装置を実現できるものである。
As described above, according to this embodiment, all the preceding words are used when calculating the language score based on the probability that the next current word will occur from the already recognized preceding word sequence. , A global score constraint can be utilized by providing a language score calculator characterized by using conditional probabilities obtained by calculation of occurrence probabilities, concatenated co-occurrence probabilities, and non-concatenated co-occurrence probabilities. It is possible to realize an excellent voice recognition device having a small capacity.

【0036】[0036]

【発明の効果】以上のように本発明は、既に認識された
先行単語系列から次に現在の単語が生起する確率を基に
した言語スコアを計算する際に、先行単語のすべてを利
用し、生起確率と連接共起確率と非連接共起確率との演
算により求めた条件付き確率を使うことを特徴とする言
語スコア計算部を設けることにより、大局的な制約を利
用でき、かつ、記憶容量が少ない、優れた音声認識装置
を実現できるものである。
INDUSTRIAL APPLICABILITY As described above, the present invention utilizes all the preceding words when calculating the language score based on the probability that the current word will occur next from the already recognized preceding word sequence, By providing a linguistic score calculation unit characterized by using conditional probabilities obtained by the calculation of the occurrence probability, the concatenated co-occurrence probability, and the non-concatenated co-occurrence probability, global constraints can be used and the memory capacity can be increased. It is possible to realize an excellent voice recognition device with a small amount of noise.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例における音声認識装置を示す
概略ブロック図
FIG. 1 is a schematic block diagram showing a voice recognition device according to an embodiment of the present invention.

【図2】(a)同実施例により4番目の単語を認識する
際の動作例の概念図 (b)同実施例により5番目の単語を認識する際の動作
例の概念図
FIG. 2A is a conceptual diagram of an operation example when recognizing a fourth word according to the embodiment. FIG. 2B is a conceptual diagram of an operation example when recognizing a fifth word according to the embodiment.

【図3】従来の音声認識装置を示す概略ブロック図FIG. 3 is a schematic block diagram showing a conventional voice recognition device.

【図4】(a)従来の音声認識装置により4番目の単語
を認識する際の動作例の概念図 (b)従来の音声認識装置により5番目の単語を認識す
る際の動作例の概念図
FIG. 4A is a conceptual diagram of an operation example when recognizing a fourth word by a conventional voice recognition device. FIG. 4B is a conceptual diagram of an operation example when recognizing a fifth word by a conventional voice recognition device.

【符号の説明】[Explanation of symbols]

1 特徴パラメータ算出部 2 単語標準パターン格納部 3 認識部 4 単語系列保存部 5 言語スコア計算部 6 トライグラム格納部 7 生起確率格納部 8 連接共起確率格納部 9 非連接共起確率格納部 1 feature parameter calculation unit 2 word standard pattern storage unit 3 recognition unit 4 word sequence storage unit 5 language score calculation unit 6 trigram storage unit 7 occurrence probability storage unit 8 concatenated co-occurrence probability storage unit 9 non-concatenated co-occurrence probability storage unit

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 入力された音声を前記音声の特徴を表す
特徴パラメータの時系列に変換する特徴パラメータ算出
部と、予め作成された単語標準パターンを格納する単語
標準パターン格納部と、第1の単語が生起する確率であ
る生起確率を格納する生起確率格納部と、前記第1の単
語が第2の単語の直後に生起する確率である連接共起確
率を格納する連接共起確率格納部と、前記第1の単語が
第3の単語の2つ以上後方に生起する確率である非連接
共起確率を格納する非連接共起確率格納部と、既に認識
された先行単語系列を利用して前記生起確率、前記連接
共起確率及び前記非連接共起確率との演算により、前記
先行単語系列の次に前記第1の単語が生起する確率を表
す言語スコアを計算する言語スコア計算部と、前記特徴
パラメータと前記単語標準パターンとの類似度である音
響スコアと前記言語スコアとにより単語系列に対する尤
度を表す総合スコアを計算する認識部と、前記総合スコ
アの高い単語系列を保存する結果保存部とを具備する音
声認識装置。
1. A feature parameter calculation unit for converting an input voice into a time series of feature parameters representing the feature of the voice, a word standard pattern storage unit for storing a word standard pattern created in advance, and An occurrence probability storage unit that stores an occurrence probability that is a probability that a word occurs, and a concatenation co-occurrence probability storage unit that stores a concatenation co-occurrence probability that is a probability that the first word occurs immediately after the second word. , Using a non-connected co-occurrence probability storage unit that stores a non-connected co-occurrence probability that is a probability that the first word occurs two or more times behind the third word, and a previously recognized preceding word sequence A language score calculation unit that calculates a language score representing a probability that the first word occurs next to the preceding word sequence by calculating the occurrence probability, the concatenated co-occurrence probability, and the non-concatenated co-occurrence probability; The characteristic parameter and the single A recognition unit that calculates an overall score that indicates a likelihood for a word sequence based on an acoustic score that is a similarity to a word standard pattern and the language score, and a result storage unit that stores the word sequence with a high overall score. Speech recognizer.
JP02943295A 1995-02-17 1995-02-17 Voice recognition device Expired - Fee Related JP3304665B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP02943295A JP3304665B2 (en) 1995-02-17 1995-02-17 Voice recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP02943295A JP3304665B2 (en) 1995-02-17 1995-02-17 Voice recognition device

Publications (2)

Publication Number Publication Date
JPH08221091A true JPH08221091A (en) 1996-08-30
JP3304665B2 JP3304665B2 (en) 2002-07-22

Family

ID=12275984

Family Applications (1)

Application Number Title Priority Date Filing Date
JP02943295A Expired - Fee Related JP3304665B2 (en) 1995-02-17 1995-02-17 Voice recognition device

Country Status (1)

Country Link
JP (1) JP3304665B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001048737A2 (en) * 1999-12-23 2001-07-05 Intel Corporation Speech recognizer with a lexical tree based n-gram language model
JP2002041080A (en) * 2000-07-11 2002-02-08 Internatl Business Mach Corp <Ibm> Vocabulary prediction method, voice recognition method, vocabulary prediction equipment, voice recognition equipment, computer system, memory medium and program trasmitting equipment
US7031923B1 (en) 2000-03-06 2006-04-18 International Business Machines Corporation Verbal utterance rejection using a labeller with grammatical constraints
JP2010266947A (en) * 2009-05-12 2010-11-25 Ntt Data Corp Device, method, and program for extracting candidate word
JP2010540976A (en) * 2007-10-04 2010-12-24 株式会社東芝 Method and apparatus for automatic speech recognition
JP2011169960A (en) * 2010-02-16 2011-09-01 Nec Corp Apparatus for estimation of speech content, language model forming device, and method and program used therefor
JP2014077865A (en) * 2012-10-10 2014-05-01 Nippon Hoso Kyokai <Nhk> Speech recognition device, error correction model learning method and program

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001048737A2 (en) * 1999-12-23 2001-07-05 Intel Corporation Speech recognizer with a lexical tree based n-gram language model
WO2001048737A3 (en) * 1999-12-23 2002-11-14 Intel Corp Speech recognizer with a lexical tree based n-gram language model
US7031923B1 (en) 2000-03-06 2006-04-18 International Business Machines Corporation Verbal utterance rejection using a labeller with grammatical constraints
JP2002041080A (en) * 2000-07-11 2002-02-08 Internatl Business Mach Corp <Ibm> Vocabulary prediction method, voice recognition method, vocabulary prediction equipment, voice recognition equipment, computer system, memory medium and program trasmitting equipment
JP2010540976A (en) * 2007-10-04 2010-12-24 株式会社東芝 Method and apparatus for automatic speech recognition
JP2010266947A (en) * 2009-05-12 2010-11-25 Ntt Data Corp Device, method, and program for extracting candidate word
JP2011169960A (en) * 2010-02-16 2011-09-01 Nec Corp Apparatus for estimation of speech content, language model forming device, and method and program used therefor
JP2014077865A (en) * 2012-10-10 2014-05-01 Nippon Hoso Kyokai <Nhk> Speech recognition device, error correction model learning method and program

Also Published As

Publication number Publication date
JP3304665B2 (en) 2002-07-22

Similar Documents

Publication Publication Date Title
EP1922653B1 (en) Word clustering for input data
US6973427B2 (en) Method for adding phonetic descriptions to a speech recognition lexicon
US5878390A (en) Speech recognition apparatus equipped with means for removing erroneous candidate of speech recognition
US6178401B1 (en) Method for reducing search complexity in a speech recognition system
US5835888A (en) Statistical language model for inflected languages
KR102375115B1 (en) Phoneme-Based Contextualization for Cross-Language Speech Recognition in End-to-End Models
JP3426176B2 (en) Speech recognition device, method, computer system and storage medium
JPH0772840B2 (en) Speech model configuration method, speech recognition method, speech recognition device, and speech model training method
US20210193117A1 (en) Syllable based automatic speech recognition
US10714080B2 (en) WFST decoding system, speech recognition system including the same and method for storing WFST data
Drexler et al. Combining end-to-end and adversarial training for low-resource speech recognition
Ahmed et al. End-to-end lexicon free arabic speech recognition using recurrent neural networks
JP4499389B2 (en) Method and apparatus for generating decision tree questions for speech processing
CN115132196A (en) Voice instruction recognition method and device, electronic equipment and storage medium
JPH08221091A (en) Voice recognition device
JP4600706B2 (en) Voice recognition apparatus, voice recognition method, and recording medium
JP2938865B1 (en) Voice recognition device
JP2006343405A (en) Speech-understanding device, speech-understanding method, method for preparing word/semantic expression merge database, its program and storage medium
JP3240691B2 (en) Voice recognition method
JP3513284B2 (en) Voice recognition method and apparatus
WO2024086265A1 (en) Context-aware end-to-end asr fusion of context, acoustic and text representations
JP2975540B2 (en) Free speech recognition device
JP3121530B2 (en) Voice recognition device
JPS59185400A (en) Monosyllable sound recognition system
CN115798462A (en) Voice recognition method and device, electronic equipment and chip

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090510

Year of fee payment: 7

LAPS Cancellation because of no payment of annual fees