JPH08221091A

JPH08221091A - Voice recognition device

Info

Publication number: JPH08221091A
Application number: JP7029432A
Authority: JP
Inventors: Mitsuru Endo; 充遠藤; Tatsuro Ito; 達朗伊藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1995-02-17
Filing date: 1995-02-17
Publication date: 1996-08-30
Anticipated expiration: 2017-07-22
Also published as: JP3304665B2

Abstract

PURPOSE: To provide a device capable of using the whole situational limitation and with less storage capacity when a language score based on the probability that a present word is generated is calculated from a beforehand recognized preceding word system by using all of preceding words and using the conditional probability obtained by the operation among the generative probability, the syndetic co-occurrence probability and the non-syndetic co-occurrence probability. CONSTITUTION: From the beforehand recognized word system preserved in a word system preservation part 4, the generative probability of the word stored in a generative probability storage part 7, the syndetic co-occurrence probability stored in a syndetic co-occurrence probability storage part 8 and the non- syndetic co-occurrence probability stored in a non-syndetic co-occurrence probability storage part 9, the language score is calculated in a language score calculation part 5. A recognition part 3 receives the language score from the language score calculation part 5, and calculates a total score by an acoustic score calculated by the recognition part 3 and the language score, and it transfers the word system with the high total score to the word system preservation part 4. The word system preservation part 4 outputs the preserving word system as the recognition result after an input of a voice is ended.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は単語を連続して発声され
た音声の認識を行う音声認識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device for recognizing a voice in which words are continuously uttered.

【０００２】[0002]

【従来の技術】近年、統計的言語モデルを用いて音声認
識性能を向上させる試みが行われてきている。このよう
な音声認識装置の例としては、例えば、村上他”単語の
trigramを利用した文音声認識と自由発話認識への拡
張”電子情報通信学会技術報告、SP93-127(1994)をあげ
ることができる。統計的言語モデルとしては、２連続単
語に関する確率を用いるバイグラム、３連続単語に関す
る確率を用いるトライグラム、４連続単語に関する確率
を用いるテトラグラムなどがあり、入力の時間軸に沿っ
たレフト・トゥー・ライトの処理に置いては、バイグラ
ムは先行する１つの単語からの制約となり、トライグラ
ムは先行する２つの単語からの制約、テトラグラムは先
行する３つの単語からの制約となる。統計的言語モデル
としてバイグラムを用いた場合、制約が緩すぎるため、
認識結果に、日本語文として成り立たない非文が多く含
まれる。制約を強くするためには、トライグラム、テト
ラグラム、…というようにより大局的な制約を用いれば
よい。一方、必要な記憶容量の点からは、バイグラムが
認識対象に含まれる単語の語彙数の２乗、トライグラム
は語彙数の３乗、テトラグラムは語彙数の４乗の記憶容
量が必用となる。制約の強さと必要な記憶容量との兼ね
合いで、統計的言語モデルとしてはトライグラムが使わ
れているというのが現状である。2. Description of the Related Art Recently, attempts have been made to improve speech recognition performance by using a statistical language model. An example of such a voice recognition device is, for example, Murakami et al.
Sentence speech recognition using trigram and extension to free speech recognition "Technical report of the Institute of Electronics, Information and Communication Engineers, SP93-127 (1994) can be cited. As a statistical language model, a bigram using probability for two consecutive words, There are trigrams using probabilities related to 3 consecutive words, tetragrams using probabilities related to 4 consecutive words, etc. In the left-to-right processing along the time axis of the input, the bigram is composed of the preceding one word. The trigram is a constraint from the preceding two words, the tetragram is a constraint from the preceding three words, and when the bigram is used as the statistical language model, the constraint is too loose.
The recognition result contains many non-sentences that do not hold as Japanese sentences. In order to strengthen the constraint, a global constraint such as trigram, tetragram, ... May be used. On the other hand, in terms of the required storage capacity, a bigram requires a storage capacity of the vocabulary number of the word included in the recognition target squared, a trigram has a storage capacity of the vocabulary number of the cube 3, and a tetragram requires a storage capacity of the vocabulary number of the fourth power. . Currently, Trigram is used as a statistical language model due to the balance between the strength of the constraint and the required memory capacity.

【０００３】以下に統計的言語モデルとしてトライグラ
ムを使った従来の音声認識装置について説明する。A conventional speech recognition apparatus using a trigram as a statistical language model will be described below.

【０００４】３図は従来の音声認識装置の概略構成を示
すものであり、１は入力された音声を特徴パラメータの
時系列に変換する特徴パラメータ算出部、２は予め学習
データより作成された単語標準パターンを格納する単語
標準パターン格納部、３は特徴パラメータと単語標準パ
ターンとの類似度である音響スコアと言語スコア計算部
による言語スコアとにより単語系列に対する総合スコア
を計算する認識部、４は総合スコアの高い単語系列を保
存する単語系列保存部、５は既に認識された先行単語系
列から次に前記単語が生起する確率を基にしたスコアを
計算する言語スコア計算部、６はある単語が先行する２
単語の直後に生起する確率を格納するトライグラム格納
部である。FIG. 3 shows a schematic structure of a conventional speech recognition apparatus. Reference numeral 1 is a characteristic parameter calculation unit for converting an input speech into a time series of characteristic parameters, and 2 is a word prepared in advance from learning data. A word standard pattern storage unit for storing a standard pattern, 3 is a recognition unit for calculating an overall score for a word sequence based on an acoustic score as a similarity between a feature parameter and a word standard pattern, and a language score by a language score calculation unit, and 4 is a recognition unit. A word sequence storage unit for storing a word sequence having a high total score, 5 is a language score calculation unit for calculating a score based on the probability that the word will occur next from an already recognized preceding word sequence, and 6 is a word score calculation unit. Preceding 2
It is a trigram storage unit that stores the probability of occurring immediately after a word.

【０００５】以下に、音声認識装置の動作について簡単
に説明する。入力された音声を特徴パラメータ算出部１
で特徴パラメータ（例えば、ＬＰＣケプストラム）に変
換し、認識部３において、特徴パラメータ算出部１から
受け取った特徴パラメータと、単語標準パターン格納部
２に格納してある単語標準パターンとのマッチング処理
（例えば、ＤＰマッチング）を行って、音響スコアを計
算する。一方、単語系列保存部４に保存してある既に認
識された単語系列と、トライグラム格納部６に格納して
あるトライグラムとから、言語スコア算出部５において
言語スコアを計算し、認識部３は、言語スコア計算部５
から言語スコアを受け取り、音響スコアと言語スコアと
による総合スコアを計算し、総合スコアの高い単語系列
を単語系列保存部４に渡す。単語系列保存部４は、音声
の入力が終了した後、保存している単語系列を認識結果
として出力する。The operation of the voice recognition device will be briefly described below. The input voice is converted into the characteristic parameter calculation unit 1
In the recognition unit 3, a matching process is performed between the feature parameter received from the feature parameter calculation unit 1 and the word standard pattern stored in the word standard pattern storage unit 2 (for example, LPC cepstrum). , DP matching) to calculate an acoustic score. On the other hand, the language score calculation unit 5 calculates a language score from the already recognized word sequence stored in the word sequence storage unit 4 and the trigram stored in the trigram storage unit 6, and the recognition unit 3 Is the language score calculator 5
The language score is received from, the total score based on the acoustic score and the language score is calculated, and the word sequence having a high total score is passed to the word sequence storage unit 4. The word series storage unit 4 outputs the stored word series as a recognition result after the voice input is completed.

【０００６】次に言語スコア計算部５における言語スコ
アの計算の方法をさらに詳しく説明する。Next, the method of calculating the language score in the language score calculator 5 will be described in more detail.

【０００７】入力として発声された単語系列が、w₁,w₂,
w₃,w₄,w₅,w₆である時に、この単語系列に対する言語ス
コアは、The word sequence uttered as an input is w ₁ , w ₂ ,
When w ₃ , w ₄ , w ₅ , w ₆ , the language score for this word sequence is

【０００８】[0008]

【数１】 [Equation 1]

【０００９】の例えば対数値で表される。ここで、P
(w₁,w₂,w₃,w₄,w₅,w₆)は単語系列w₁,w₂,w₃,w₄,w₅,w₆の生
起確率であり、１番目の単語の生起確率P(w₁)と、１番
目の単語が生起したという条件の下で２番目の単語が生
起するという条件付き確率P(w₂|w₁)と、１番目と２番目
の単語の単語系列が生起したという条件の下で３番目の
単語が生起するという条件付き確率P(w₃|w₁,w₂)と、１
番目の単語から３番目までの単語系列が生起したという
条件の下で４番目の単語が生起するという条件付き確率
P(w₄|w₁,w₂,w₃)と、１番目の単語から４番目までの単語
系列が生起したという条件の下で５番目の単語が生起す
るという条件付き確率P(w₅|w₁,w₂,w₃,w₄)と、１番目の
単語から５番目までの単語系列が生起したという条件の
下で６番目の単語が生起するという条件付き確率P(w₆|w
₁,w₂,w₃,w₄,w₅)との積で表される。そして、トライグラ
ムを使う場合、１番目の単語の生起確率P(w₁)と、１番
目の単語が生起したという条件の下で２番目の単語が生
起するという条件付き確率P(w₂|w ₁)と、１番目の単語か
ら２番目までの単語系列が生起したという条件の下で３
番目の単語が生起するという条件付き確率P(w₃|w₁,w₂)
と、２番目の単語から３番目までの単語系列が生起した
という条件の下で４番目の単語が生起するという条件付
き確率P(w₄|w₂,w₃)と、３番目の単語から４番目までの
単語系列が生起したという条件の下で５番目の単語が生
起するという条件付き確率P(w₅|w₃,w₄)と、４番目の単
語から５番目までの単語系列が生起したという条件の下
で６番目の単語が生起するという条件付き確率P(w₆|w₄,
w₅)との積で近似する。この式にしたがって、認識の途
中段階では、既に認識された単語系列（トライグラムの
場合は先行単語の内の最後の２単語）が発声されたとい
う条件の下で、その直後に現在の単語が生起する確率、
例えば図４(a)のように４番目の単語であればP(w₄|w₂,w
₃)、図４(b)のように５番目の単語であればP(w₅|w₃,w₄)
が、言語的な制約として働く。For example, it is represented by a logarithmic value. Where P
(w₁, w₂, w₃, w_Four, w_Five, w₆) Is the word sequence w₁, w₂, w₃, w_Four, w_Five, w₆Raw
Occurrence probability, which is the probability of occurrence of the first word P (w₁) And No. 1
The second word is generated under the condition that the eye word is generated.
Conditional probability P (w₂| w₁) And the first and second
3rd term on the condition that the word sequence of
The conditional probability P (w₃| w₁, w₂) And 1
The word sequence from the 1st word to the 3rd word occurred
Conditional probability that the fourth word will occur under the conditions
P (w_Four| w₁, w₂, w₃) And the first to fourth words
The fifth word occurs under the condition that the sequence occurs
Conditional probability P (w_Five| w₁, w₂, w₃, w_Four) And the first
With the condition that the word sequence from the word to the fifth has occurred
Conditional probability P (w that the sixth word below occurs₆| w
₁, w₂, w₃, w_Four, w_Five) And the product. And Trigra
When using the program, the probability of occurrence of the first word P (w₁) And No. 1
The second word is generated under the condition that the eye word is generated.
Conditional probability P (w₂| w ₁) And the first word
3 under the condition that the second word sequence has occurred
Conditional probability P (w₃| w₁, w₂)
And the word sequence from the second word to the third word occurred
Conditional that the fourth word occurs under the condition
Probability P (w_Four| w₂, w₃) And the 3rd word through the 4th
The fifth word is generated under the condition that the word sequence has occurred.
Conditional probability P (w_Five| w₃, w_Four) And the fourth simple
Under the condition that the word sequence from the fifth word to the fifth word occurs
Conditional probability that the sixth word occurs in P (w₆| w_Four,
w_Five) And the product. According to this formula,
In the middle stage, already recognized word sequences (trigram
In this case, the last two words of the preceding words are said to have been spoken.
The probability that the current word will occur immediately after
For example, if the fourth word is P (w_Four| w₂, w
₃), And if it is the fifth word as shown in FIG. 4 (b), P (w_Five| w₃, w_Four)
However, it works as a linguistic constraint.

【００１０】[0010]

【発明が解決しようとする課題】しかしながら上記の従
来の構成では、２単語以上前の単語を現在の単語への制
約として利用できないので、大局的な制約を利用できな
いという課題を有していた。また、これを解決しようと
して、トライグラムの代わりにテトラグラム、ペンタグ
ラム、…を使用しようとすると、認識対象の単語の語彙
数の４乗、５乗、…の記憶容量が必用となり、実用的な
語彙数を対象とした場合には実現できなくなる。However, the above-mentioned conventional configuration has a problem that the global constraint cannot be used because the word that is more than two words before cannot be used as the constraint for the current word. Further, when trying to solve this problem by using tetragrams, pentagrams, ... Instead of trigrams, the storage capacity of the fourth, fifth, ... This cannot be achieved when targeting a large number of vocabularies.

【００１１】本発明は上記従来の課題を解決するもの
で、大局的な制約を利用できて、かつ記憶容量が少ない
音声認識装置を提供することを目的とする。The present invention solves the above-mentioned conventional problems, and an object of the present invention is to provide a voice recognition device which can utilize global restrictions and has a small storage capacity.

【００１２】[0012]

【課題を解決するための手段】この目的を達成するため
に本発明の音声認識装置は、既に認識された先行単語系
列から次に現在の単語が生起する確率を基にした言語ス
コアを計算する際に、先行単語のすべてを利用し、生起
確率と連接共起確率と非連接共起確率との演算により求
めた条件付き確率を使うことを特徴とする言語スコア計
算部を有している。To achieve this object, the speech recognition apparatus of the present invention calculates a language score based on the probability that the next current word will occur from the already recognized preceding word sequence. At this time, the language score calculation unit is characterized by using all the preceding words and using conditional probabilities obtained by calculation of the occurrence probability, the concatenated co-occurrence probability, and the non-concatenated co-occurrence probability.

【００１３】[0013]

【作用】この構成によって、先行するすべての単語から
の影響を現在の単語の認識スコアに反映するので大局的
な制約を利用でき、かつ、必要な情報は認識対象の単語
の語彙数の２乗の記憶容量で記憶できるので記憶容量が
少ない音声認識装置が実現できる。With this configuration, the influence from all the preceding words is reflected in the recognition score of the current word, so global constraints can be used, and the necessary information is the square of the vocabulary of the recognition target word. Since it can be stored with the storage capacity of, a voice recognition device with a small storage capacity can be realized.

【００１４】[0014]

【実施例】以下本発明の一実施例について、図面を参照
しながら説明する。An embodiment of the present invention will be described below with reference to the drawings.

【００１５】図１は本発明による音声認識装置の一実施
例の概略構成を示すブロック線図である。図１におい
て、１は入力された音声を特徴パラメータの時系列に変
換する特徴パラメータ算出部、２は予め学習データより
作成された単語標準パターンを格納する単語標準パター
ン格納部、３は特徴パラメータと単語標準パターンとの
類似度である音響スコアと言語スコア計算部による言語
スコアとにより単語系列に対する総合スコアを計算する
認識部、４は総合スコアの高い単語系列を保存する単語
系列保存部、５は既に認識された先行単語系列から次に
前記単語が生起する確率を基にした言語スコアを計算す
る際に、先行単語のすべてを利用し、生起確率と連接共
起確率と非連接共起確率との演算により求めた条件付き
確率を使うことを特徴とする言語スコア計算部、７はあ
る単語が生起する確率を格納する生起確率格納部、８は
前記単語が別のある単語の直後に生起する確率を格納す
る連接共起確率格納部、９は前記単語が別のある単語の
２つ以上後方に生起する確率を格納する非連接共起確率
格納部である。FIG. 1 is a block diagram showing a schematic configuration of an embodiment of a voice recognition device according to the present invention. In FIG. 1, 1 is a feature parameter calculation unit that converts input speech into a time series of feature parameters, 2 is a word standard pattern storage unit that stores a word standard pattern created from learning data in advance, and 3 is a feature parameter. A recognition unit that calculates an overall score for a word sequence based on an acoustic score, which is the degree of similarity to a word standard pattern, and a linguistic score calculated by a linguistic score calculation unit, 4 is a word sequence storage unit that stores a word sequence with a high overall score, and 5 is When calculating a language score based on the probability that the word will occur next from the already recognized preceding word sequence, all of the preceding words are used, and the occurrence probability, the concatenated co-occurrence probability, and the non-concatenated co-occurrence probability are used. The language score calculation unit is characterized by using the conditional probability obtained by the calculation of, the occurrence probability storage unit for storing the probability of occurrence of a certain word, and the reference numeral 8 Is a concatenated co-occurrence probability storage unit that stores the probability of occurring immediately after another certain word, and 9 is a non-concatenated co-occurrence probability storage unit that stores the probability that the word occurs two or more times behind another certain word. is there.

【００１６】以上のように構成された音声認識装置の動
作について簡単に説明する。入力された音声を特徴パラ
メータ算出部１で特徴パラメータ（例えば、ＬＰＣケプ
ストラム）に変換し、認識部３において、特徴パラメー
タ算出部１から受け取った特徴パラメータと、単語標準
パターン格納部２に格納してある単語標準パターンとの
マッチング処理（例えば、ＤＰマッチング）を行って、
音響スコアを計算する。一方、単語系列保存部４に保存
してある既に認識された単語系列と、生起確率格納部７
に格納してある単語の生起確率と連接共起確率格納部８
に格納してある連接共起確率と非連接共起確率格納部に
格納してある非連接共起確率とから、言語スコア算出部
５において言語スコアを計算し、認識部３は、言語スコ
ア計算部５から言語スコアを受け取り、音響スコアと言
語スコアとにより総合スコアを計算（例えば和、又は線
形和等）し、総合スコアの高い単語系列を単語系列保存
部４に渡す。単語系列保存部４は、音声の入力が終了し
た後、保存している単語系列を認識結果として出力す
る。The operation of the speech recognition apparatus configured as above will be briefly described. The input voice is converted into a characteristic parameter (for example, LPC cepstrum) by the characteristic parameter calculation unit 1, and the recognition unit 3 stores the characteristic parameter received from the characteristic parameter calculation unit 1 and the word standard pattern storage unit 2. By performing a matching process (for example, DP matching) with a certain word standard pattern,
Calculate the acoustic score. On the other hand, the already-recognized word sequence stored in the word sequence storage unit 4 and the occurrence probability storage unit 7
Occurrence probability and concatenated co-occurrence probability storage unit 8 stored in
From the concatenated co-occurrence probability stored in and the non-concatenated co-occurrence probability stored in the storage unit, the language score calculation unit 5 calculates a language score, and the recognition unit 3 calculates the language score. The language score is received from the unit 5, the total score is calculated (eg, sum or linear sum) from the acoustic score and the language score, and the word sequence having a high total score is passed to the word sequence storage unit 4. The word series storage unit 4 outputs the stored word series as a recognition result after the voice input is completed.

【００１７】以上のように動作する音声認識装置の言語
スコア計算部５における言語スコアの計算の方法をさら
に詳しく説明する。The method of calculating the language score in the language score calculation unit 5 of the speech recognition apparatus operating as described above will be described in more detail.

【００１８】入力として発声された単語系列が、w₁,w₂,
w₃,w₄,w₅,w₆である時に、この単語系列に対する言語ス
コアは、The word sequence uttered as an input is w ₁ , w ₂ ,
When w ₃ , w ₄ , w ₅ , w ₆ , the language score for this word sequence is

【００１９】[0019]

【数２】 [Equation 2]

【００２０】の例えば対数値で表される。ここで、P
(w₁,w₂,w₃,w₄,w₅,w₆)は単語系列w₁,w₂,w₃,w₄,w₅,w₆の生
起確率であり、１番目の単語の生起確率P(w₁)と、１番
目の単語が生起したという条件の下で２番目の単語が生
起するという条件付き確率P(w₂|w₁)と、１番目の単語か
ら２番目までの単語系列が生起したという条件の下で３
番目の単語が生起するという条件付き確率P(w₃|w₁,w₂)
と、１番目の単語から３番目までの単語系列が生起した
という条件の下で４番目の単語が生起するという条件付
き確率P(w₄|w₁,w₂,w₃)と、１番目の単語から４番目まで
の単語系列が生起したという条件の下で５番目の単語が
生起するという条件付き確率P(w₅|w₁,w₂,w₃,w₄)と、１
番目の単語から５番目までの単語系列が生起したという
条件の下で６番目の単語が生起するという条件付き確率
P(w₆|w₁,w₂,w₃,w₄,w₅)との積で表される。ここで、既に
認識された先行単語系列から現在の単語への影響が、先
行単語のそれぞれから現在の単語へ独立に影響すると仮
定し、さらに、先行する単語系列のそれぞれの単語から
現在の単語への影響を直前の単語からの影響（連接共起
確率P_con(w_i|w_j)で表す）と、２つ以上前の単語からの
影響（非連接共起確率P_sep(w_i|w_j) で表す）との２種類
の影響で表現すると、ベイズの定理より次式のように近
似できる。For example, it is represented by a logarithmic value. Where P
(w ₁ , w ₂ , w ₃ , w ₄ , w ₅ , w ₆ ) is the occurrence probability of the word sequence w ₁ , w ₂ , w ₃ , w ₄ , w ₅ , w ₆ and is the first word Occurrence probability P (w ₁ ) and conditional probability P (w ₂ | w ₁ ) that the second word occurs under the condition that the first word occurs, and the first word to the second 3 under the condition that the word sequence of
Conditional probability P (w ₃ | w ₁ , w ₂ ) that the th word occurs
And the conditional probability P (w ₄ | w ₁ , w ₂ , w ₃ ) that the 4th word occurs under the condition that the 1st word to the 3rd word sequence occur, and the 1st word Conditional probability P (w ₅ | w ₁ , w ₂ , w ₃ , w ₄ ) that the fifth word occurs under the condition that the fourth to the fourth word sequence occurs, and 1
Conditional probability that the 6th word occurs under the condition that the 5th word sequence up to the 5th word occurs
It is represented by the product of P (w ₆ | w ₁ , w ₂ , w ₃ , w ₄ , w ₅ ). Here, it is assumed that the influence of the previously recognized preceding word sequence on the current word independently influences each of the preceding words to the current word, and further, each word of the preceding word sequence changes to the current word. Influence from the immediately preceding word (represented by the concatenated co-occurrence probability P _con (w _i | w _j )) and the influence from two or more previous words (non-concatenated co-occurrence probability P _sep (w _i | w _j _It can be approximated by the following formula from Bayes' theorem.

【００２１】[0021]

【数３】 (Equation 3)

【００２２】ここで、P_con(w_i|w_j)は単語w_jの直後に単
語w_iが生起する確率、P_sep(w_i|w_j)は、単語w_jの２つ以
上後方に単語w_iが生起する確率、P(w_i)は単語w_iが生起
する確率である。Here, P _con (w _i | w _j ) is the probability that the word w _i occurs immediately after the word w _j , and P _sep (w _i | w _j ) is two or more words behind the word w _j. The probability that the word w _i occurs, P (w _i ) is the probability that the word w _i occurs.

【００２３】従来例との違いは、認識の途中段階で、既
に認識された単語系列が発声されたという条件の下で、
その直後に単語w_iが生起する確率に基づいた言語スコア
を求める際に、従来例のトライグラムを使う場合には、The difference from the conventional example is that, in the middle of the recognition, the already recognized word sequence is uttered,
Immediately thereafter, when using the trigram of the conventional example when obtaining a language score based on the probability that the word w _i occurs,

【００２４】[0024]

【数４】 [Equation 4]

【００２５】で近似し、本実施例によれば、According to the present embodiment,

【００２６】[0026]

【数５】 (Equation 5)

【００２７】で近似する。例えば、本実施例では、図２
(a)のように４単語目の単語に対する言語スコアは、１
番目の単語の後方に４番目の単語が生起する非連接共起
確率P_s _ep(w₄|w₁)と、２番目の単語の後方に４番目の単
語が生起する非連接共起確率P_se _p(w₄|w₂)と、３番目の
単語直後に４番目の単語の直後に４番目の単語が生起す
る連接共起確率P_con(w₄|w₃)との積を４番目の単語の生
起確率P(w₄)の２乗で割った値の対数値である。同様に
して、図２(b)のように５番目の単語に対する言語スコ
アは、１番目の単語の後方に５番目の単語が生起する非
連接共起確率P_sep(w₅|w₁)と２番目の単語の後方に５番
目の単語が生起する非連接共起確率P_sep(w₅|w₂)と３番
目の単語の後方に５番目の単語が生起する非連接共起確
率P_sep(w₅|w₃)と４番目の単語の直後に５番目の単語が
生起する連接共起確率P_con(w₅|w₄)との積を５番目の単
語の生起確率P(w₅)の３乗で割った値の対数値である。It is approximated by For example, in this embodiment, as shown in FIG.
As in (a), the language score for the fourth word is 1
Non-joint co-occurrence probability P _s _ep (w ₄ | w ₁ ) that the fourth word occurs after the second word and non-joint co-occurrence probability P that the fourth word occurs after the second word The product of _se _p (w ₄ | w ₂ ) and the concatenation co-occurrence probability P _con (w ₄ | w ₃ ) in which the fourth word occurs immediately after the fourth word immediately after the third word is the fourth Is the logarithmic value of the value obtained by dividing the occurrence probability P (w ₄ ) of the word by. Similarly, as shown in FIG. 2B, the linguistic score for the fifth word is the non-joint co-occurrence probability P _sep (w ₅ | w ₁ ) in which the _fifth word occurs after the first word. Non-joint co-occurrence probability P _sep (w ₅ | w ₂ ) in which the fifth word occurs after the second word and non-joint co-occurrence probability P _{sep in} which the fifth word occurs after the third word The product of (w ₅ | w ₃ ) and the concatenated co-occurrence probability P _con (w ₅ | w ₄ ) that occurs immediately after the fourth word, P _con (w ₅ | w ₄ ), is the occurrence probability P (w _{5 of the fifth} word. ) Is the logarithmic value of the value divided by the third power.

【００２８】本実施例による音声認識装置の認識対象の
文（文と単語系列は同義である）の数と従来の音声認識
装置の認識対象の文の数を（表１）に比較して示す。The number of sentences to be recognized by the speech recognition apparatus according to this embodiment (sentence and word sequence have the same meaning) and the number of sentences to be recognized by the conventional speech recognition apparatus are shown in comparison with (Table 1). .

【００２９】[0029]

【表１】 [Table 1]

【００３０】この（表１）は、「申告するものは、あり
ません。」「全日空のカウンターはどこですか。」など
の空港での会話を集めた１６２文例のデータから、従来
例ではトライグラムを、本実施例では連接共起確率、非
連接共起確率、生起確率を求め、認識方法にしたがって
求めた単語系列に対する生起確率が０でない単語系列を
すべて求めて、単語系列に含まれる単語数毎にまとめた
ものである。This (Table 1) shows a trigram in the conventional example from the data of 162 sentences collected from conversations at the airport such as "There is nothing to declare.""Where is the counter at ANA?" In the present embodiment, concatenated co-occurrence probabilities, non-concatenated co-occurrence probabilities, and occurrence probabilities are obtained, all word sequences whose occurrence probabilities are not 0 with respect to the word sequences obtained according to the recognition method are obtained, and the number of words included in the word sequence It is a summary.

【００３１】例文から取り出した情報（トライグラム
や、連接共起確率など）により再合成した文の数は、少
ないことが望ましく、文例の数に近いほど良い。それ
は、認識対象が少ない方が、誤認識の可能性が少ないの
で認識に有利となるからである。It is desirable that the number of sentences recombined by the information (trigram, concatenation co-occurrence probability, etc.) extracted from the example sentences is small, and the closer to the number of sentence examples, the better. This is because the smaller the number of recognition targets, the less the possibility of erroneous recognition, and the more advantageous the recognition.

【００３２】この（表１）から明らかになように、認識
対象となる文の数は、本実施例の方が例文の数に近く、
認識対象を限定する制約として強く働いている。認識対
象文の過剰生成を表す、文の増加分を見ると本実施例の
５１文という数は、従来例の１１９文の半分以下に押さ
えられている。As is clear from this (Table 1), the number of sentences to be recognized is closer to the number of example sentences in this embodiment,
It works strongly as a constraint to limit the recognition target. Looking at the increased number of sentences, which represents the excessive generation of recognition target sentences, the number of 51 sentences in the present embodiment is suppressed to less than half of the 119 sentences in the conventional example.

【００３３】また、過剰生成された文の内容を見ると、
以下のようなことがわかる。従来例では、制約が局所的
な制約であるために、大局的には成立しないような文で
あっても局所的な制約を満たせば認識の対象となり得
る。例えば「全日空のカウンターへこの荷物をタクシー
乗り場まで運んで下さい。」「両替はどこから出ていま
すか？」「ニューヨークには何時に始りますか。」など
は、明らかに日本語としておかしいが、このような５５
の非文（日本語として成り立たない文）が認識対象に含
まれている。一方、本実施例の認識対象における例文以
外の文は、ある単語系列を省略したと考えられる文（例
えば「ニューヨークはどうですか。」など）がほとんど
で、全く意味がとれない非文は、「飛行機の切符はどこ
でできますか。」「切符はどこでできますか。」という
２例のみであった。このように本実施例において非文が
少ない理由は、先行するすべての単語と共起する単語の
みを先行単語系列につなげることができるので、認識対
象となる文は、１文中から取り出した単語対のすべてが
共起する単語対からなる、つまり、文中の単語は意味的
に整合の取れた単語集団であることが期待でき、また、
１文中から取り出したすべての２連続単語が、存在し得
る２連続単語であることから構文的に正しい文であるこ
とが期待できるからである。Looking at the contents of the overgenerated sentence,
You can see the following. In the conventional example, since the constraint is a local constraint, even a sentence that does not hold globally can be a target of recognition if the local constraint is satisfied. For example, "Please bring this baggage to the taxi stand to the ANA counter.""Where does the money exchange come from?""What time does it start in New York?" 55 like this
Non-sentences (sentences that do not hold as Japanese) are included in the recognition target. On the other hand, most of the sentences other than the example sentences in the recognition target of the present embodiment are sentences that are considered to have omitted a certain word sequence (for example, "How about New York?"), And non-sentences that have no meaning at all are "airplane". Where can I get the ticket? ”" Where can I get the ticket? " As described above, the reason why the number of non-sentences is small in this embodiment is that only words that co-occur with all preceding words can be connected to the preceding word series, and therefore the sentence to be recognized is a word pair extracted from one sentence. Consists of co-occurring word pairs, that is, the words in the sentence can be expected to be a semantically consistent group of words, and
This is because all the two consecutive words extracted from one sentence are the two consecutive words that can exist, so that it can be expected that the sentence is syntactically correct.

【００３４】ところで、記憶容量に関して言えば、認識
対象内の単語の語彙数が２５６であるから、従来例は25
6³=16777216の記憶容量が必要であるが、本実施例で
は、256 ²×2+256=131328という従来例の1/100以下の記
憶容量があれば良い。By the way, in terms of storage capacity, recognition
Since the number of vocabulary of words in the target is 256, the conventional example is 25
6³= 16777216 storage capacity is required, but in this embodiment
Is 256 ²× 2 + 256 = 131328, 1/100 or less of the conventional example
It is enough if there is enough storage capacity.

【００３５】以上のように本実施例によれば、既に認識
された先行単語系列から次に現在の単語が生起する確率
を基にした言語スコアを計算する際に、先行単語のすべ
てを利用し、生起確率と連接共起確率と非連接共起確率
との演算により求めた条件付き確率を使うことを特徴と
する言語スコア計算部を設けることにより、大局的な制
約を利用でき、かつ、記憶容量が少ない、優れた音声認
識装置を実現できるものである。As described above, according to this embodiment, all the preceding words are used when calculating the language score based on the probability that the next current word will occur from the already recognized preceding word sequence. , A global score constraint can be utilized by providing a language score calculator characterized by using conditional probabilities obtained by calculation of occurrence probabilities, concatenated co-occurrence probabilities, and non-concatenated co-occurrence probabilities. It is possible to realize an excellent voice recognition device having a small capacity.

【００３６】[0036]

【発明の効果】以上のように本発明は、既に認識された
先行単語系列から次に現在の単語が生起する確率を基に
した言語スコアを計算する際に、先行単語のすべてを利
用し、生起確率と連接共起確率と非連接共起確率との演
算により求めた条件付き確率を使うことを特徴とする言
語スコア計算部を設けることにより、大局的な制約を利
用でき、かつ、記憶容量が少ない、優れた音声認識装置
を実現できるものである。INDUSTRIAL APPLICABILITY As described above, the present invention utilizes all the preceding words when calculating the language score based on the probability that the current word will occur next from the already recognized preceding word sequence, By providing a linguistic score calculation unit characterized by using conditional probabilities obtained by the calculation of the occurrence probability, the concatenated co-occurrence probability, and the non-concatenated co-occurrence probability, global constraints can be used and the memory capacity can be increased. It is possible to realize an excellent voice recognition device with a small amount of noise.

[Brief description of drawings]

【図１】本発明の一実施例における音声認識装置を示す
概略ブロック図FIG. 1 is a schematic block diagram showing a voice recognition device according to an embodiment of the present invention.

【図２】（ａ）同実施例により４番目の単語を認識する
際の動作例の概念図（ｂ）同実施例により５番目の単語を認識する際の動作
例の概念図FIG. 2A is a conceptual diagram of an operation example when recognizing a fourth word according to the embodiment. FIG. 2B is a conceptual diagram of an operation example when recognizing a fifth word according to the embodiment.

【図３】従来の音声認識装置を示す概略ブロック図FIG. 3 is a schematic block diagram showing a conventional voice recognition device.

【図４】（ａ）従来の音声認識装置により４番目の単語
を認識する際の動作例の概念図（ｂ）従来の音声認識装置により５番目の単語を認識す
る際の動作例の概念図FIG. 4A is a conceptual diagram of an operation example when recognizing a fourth word by a conventional voice recognition device. FIG. 4B is a conceptual diagram of an operation example when recognizing a fifth word by a conventional voice recognition device.

[Explanation of symbols]

１特徴パラメータ算出部２単語標準パターン格納部３認識部４単語系列保存部５言語スコア計算部６トライグラム格納部７生起確率格納部８連接共起確率格納部９非連接共起確率格納部 1 feature parameter calculation unit 2 word standard pattern storage unit 3 recognition unit 4 word sequence storage unit 5 language score calculation unit 6 trigram storage unit 7 occurrence probability storage unit 8 concatenated co-occurrence probability storage unit 9 non-concatenated co-occurrence probability storage unit

Claims

[Claims]

1. A feature parameter calculation unit for converting an input voice into a time series of feature parameters representing the feature of the voice, a word standard pattern storage unit for storing a word standard pattern created in advance, and An occurrence probability storage unit that stores an occurrence probability that is a probability that a word occurs, and a concatenation co-occurrence probability storage unit that stores a concatenation co-occurrence probability that is a probability that the first word occurs immediately after the second word. , Using a non-connected co-occurrence probability storage unit that stores a non-connected co-occurrence probability that is a probability that the first word occurs two or more times behind the third word, and a previously recognized preceding word sequence A language score calculation unit that calculates a language score representing a probability that the first word occurs next to the preceding word sequence by calculating the occurrence probability, the concatenated co-occurrence probability, and the non-concatenated co-occurrence probability; The characteristic parameter and the single A recognition unit that calculates an overall score that indicates a likelihood for a word sequence based on an acoustic score that is a similarity to a word standard pattern and the language score, and a result storage unit that stores the word sequence with a high overall score. Speech recognizer.