JPS60179799A

JPS60179799A - Voice recognition equipment

Info

Publication number: JPS60179799A
Application number: JP59036447A
Authority: JP
Inventors: 文雄前原
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1984-02-27
Filing date: 1984-02-27
Publication date: 1985-09-13

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は音声認識装置に関する。[Detailed description of the invention] Industrial applications The present invention relates to a speech recognition device.

従来例の構成とその問題点従来、音声認識装置では入力音声信号を分析することに
よって得られるｎ次元の特徴ベクトル系列（＆　１．　
ａ　２　、・・・・・・ａＩ）に対し辞書としてあらか
じめ装置内に登録しであるＰ個の標準パターンベクトル
系列（”１　ｔ　”２　ｔ　”””　ｂ））　”””　
（ｂ＊　＋’２＋・・・・ｂＵ）　の中からこれと距離
の最も近いもの、もしくは最も類似性の太きいものをも
って認識結果としているが、このとき入力ベクトル系列
（ａｌ。Configuration of conventional example and its problems Conventionally, a speech recognition device uses an n-dimensional feature vector series (&1.
P standard pattern vector sequences ("1 t "2 t """ b)) """ that are registered in advance in the device as a dictionary for a 2 , ... aI)
(b*+'2+...bU) The closest one or the one with the greatest similarity is used as the recognition result.In this case, the input vector series (al.

ａ　２　、・−、ａｌ）と標準パターンベクトル系列の
うちの１つ、例えば（ｂ′、ｂＬ・・・・・・、ｂ、４
１（但し１　２’ ７＝１〜Ｐ）の比較に際して（ａ　１．　ａ　２　、・
・・・・。a 2 ,...-, al) and one of the standard pattern vector series, e.g. (b', bL..., b, 4
1 (however, 1 2' 7 = 1 ~ P), (a 1. a 2 , ・
....

ａ工ｌ　の１要素ベクトルａ　ｉと（ｂｌ　、　ｂｌ　
、　、；、・、。The one-element vector ai of a and (bl, bl
、、；、・、。

　２ｂＭ）の中の１要素ベクトルｂこの市街距離、もしくは
ユークリッド距離を計算し、これを確からしさの尺度と
し、これをもとに２つのベクトル系列の総距離を、グイ
ナミノクプログラミングや線形伸縮などの手法を用いて
計算するものが大部分である。Calculate the city distance or Euclidean distance of the one-element vector b in 2 bM), use this as a measure of certainty, and use Guinaminok programming or linear expansion/contraction to calculate the total distance of the two vector series based on this. Most of the calculations are performed using methods such as

但し、市街距離、ユークリッド距離は次式で与えられる
。However, the city distance and Euclidean distance are given by the following formula.

ａｉ　＝（ａｉ　、１　、ａｉ　、２．−−　、　ａｉ
　、Ｎ’１ｂｔ＝（ｂｔ　、ｂｔ　・・・・、ｂ盃、Ｎ
ｌとするときｍ　ｍ’１　ｍ２＋ ”ｌ、ｍ−Σ　ｌ　ａｌ、ｒ　”！、ｒｌ（市街距離）
−１７ｔ、ｍ−Σ　（ａ１’　ｒ　”富、ｒ）２（ユ　９リ
パｒ＝１距離）以下、確からしさの値の尺度として上記のような距離尺
度を用いた場合を例として説明する。ai = (ai, 1, ai, 2.--, ai
, N'1bt=(bt , bt ..., b cup, N
When l, m m'1 m2+ "l, m-Σ l al, r"! , rl (city distance)
−1 7t, m−Σ (a1′ r ” wealth, r) 2 (Yu 9 r par r = 1 distance) Below, we will explain the case where the above distance scale is used as a measure of the certainty value as an example. .

上記の原理にもとすく音声認識装置の一つとして、日本
語ワードプロセノザ−への入力を目的として、仮名キー
の入力の代りに、音節単位に区切って発声したもの、例
えば「ア」、「力」、「イ」。Based on the above principle, one of the speech recognition devices is to input words into a Japanese word processor by dividing them into syllables instead of inputting kana keys, such as ``a'', ``power'', etc. ","stomach".

「ハ」、「す」、を認識する単音節認識や、連続音声を
音節単位に分割して、音節単位で認識を行うものが用い
られる。Single syllable recognition, which recognizes "ha" and "su", and methods that divide continuous speech into syllable units and perform recognition on a syllable basis are used.

ところで上記のような音節型認識装置を日本語ワードプ
ロセッサー（以下ワープロと略す）の入力として用いる
時、ワープロの持つ、カナ漢字変換のだめの単語辞書１
文章辞書もしくは単語辞書と言語処理機能の組合せなど
を用いて音節認識における誤りを訂正するものが有る。By the way, when using the above-mentioned syllable type recognition device as input to a Japanese word processor (hereinafter abbreviated as word processor), the word processor has a word dictionary 1 that cannot be used for kana-kanji conversion.
There is a method that corrects errors in syllable recognition using a combination of a sentence dictionary or a word dictionary and a language processing function.

すなわち、音節認識部において、音節単位のパターンと
してあらかじめ登録されているＰ個の標埠パターンと入
力パラメータベクトル列との比較に際して、最小距離を
与える標準パターン１つを選ぶ代りに、距離の小さいも
の１個（■は正の整数）を選択し、例えば入力音声がｌ
個（工は正の整数）の音節から成るとすると、ＩｘＪ個
の音節候補マトリックスとその各々についての標準パタ
ーンとの距離ｄ□７．（但し１＋］は１≦１≦Ｉ。That is, when the syllable recognition unit compares the input parameter vector sequence with the P marker patterns registered in advance as patterns for each syllable, instead of selecting one standard pattern that provides the minimum distance, it selects the one with the smaller distance. Select one (■ is a positive integer), for example, if the input audio is
(where t is a positive integer) syllables, the distance between IxJ syllable candidate matrices and the standard pattern for each of them is d□7. (However, 1+] is 1≦1≦I.

１≦ｊ≦■　なる整数）を用いて、１１通りの構成可能
文章の各々について累積距離Ｓｒ（、）を工Ｓ　−Σ　ｄｒ（１）１．ｒ（１）＋＝１（但しｒ（１）はｉの関数で１の各段において１≦ｊ≦
■　なる】のうちの１つに対応する。）として計算し、
このうち累積距離の小さいものからＮ個の文章を選びだ
し、あらかじめ記憶されでいる文章辞書（一般のカナ漢
字変換に使われるものが流用可能）を検索し、一致した
ものを文章認識結果とするものが有る。この場合、累積
距離計算のだめの加算回数は１１回である。この間の動
作を具体例によってさらに説明する。1≦j≦■), calculate the cumulative distance Sr(,) for each of the 11 possible sentences. r(1) +=1 (where r(1) is a function of i and 1≦j≦ in each stage of 1
■ Corresponds to one of the following. ),
Select N sentences from those with the smallest cumulative distance, search a pre-memorized sentence dictionary (the one used for general kana-kanji conversion can be used), and use the matching sentences as sentence recognition results. There is something. In this case, the number of additions required for cumulative distance calculation is 11. The operation during this time will be further explained using a specific example.

第１図は、Ｉ−６，Ｔ−４とした場合の例で、１つの音
節入力に対して４つの音節候補を出力し文章中の音節数
が５の場合で、正しい発声は、「あたらしい」とする。Figure 1 shows an example of I-6 and T-4, where four syllable candidates are output for one syllable input and the number of syllables in the sentence is five. ”.

第１音節（ｉ＝１）では、音節候補として（ア７゛ハタ
）（下段は距１３　６２　６３　７１離を表わす）が得られたことを示す。Ｉ−５゜Ｔ＝４の
時生成可能な文章は「あだたちい」。For the first syllable (i=1), it is shown that (A7゛hata) (the lower row represents a distance of 13 62 63 71 distance) was obtained as a syllable candidate. The sentence that can be generated when I-5°T=4 is ``Adachii''.

「あただちび」・・・・・「たささいし」で■１−４５
−１０２４通り存在する。"Atada Chibi"... "Tasasaishi" ■1-45
-1024 types exist.

誤りの訂正には、１０２４通りの文章中から、先に述べ
た、累積距離の小さいものＮ個（Ｎは正の整数）を用い
、あらかじめ記憶されている文章辞書との比較を行う。For error correction, the aforementioned N sentences (N is a positive integer) with the smallest cumulative distance are used from among the 1024 sentences and compared with a pre-stored sentence dictionary.

第１図の例では第１候補　（あだたぢい）　累積距離　８２２　（あだ
だしい）　８θ ３　（あたらちい）８７４　（あたらしい）〃９１となり、辞書中に「あたらしい」という文章のみが存在
するとすると誤りが訂正できる。実際文章中では第１〜
３候補の様な例は存在しないことは明らかであるので誤
り訂正が可能である。In the example in Figure 1, the first candidate (adatashii) has a cumulative distance of 822 (adadashii) 8θ 3 (atarachii) 87 4 (new) 〃91, and the only sentence in the dictionary is ``new''. If it exists, the error can be corrected. In fact, in the text, the first
It is clear that there are no examples like the three candidates, so error correction is possible.

ところで、この方法では累積距離の計算に際して工１通
りの組合せに対して、これを行う必要が有り、割算量が
膨大であり、処理に時間がかかるという欠点を有する。By the way, this method has the disadvantage that it is necessary to perform this calculation for each combination of steps when calculating the cumulative distance, and the amount of division is enormous and the processing takes time.

発明の目的本発明は上記欠点に鑑み、複数の音韻もしくは音節認識
候補を用いて、文章辞書検索により誤りの訂正を行なう
場合における計算回数の減少を目的とする。OBJECTS OF THE INVENTION In view of the above drawbacks, the present invention aims to reduce the number of calculations when correcting errors by searching a text dictionary using a plurality of phoneme or syllable recognition candidates.

発明の構成本発明は、この目的を達成するために、入力された音節
の各段において、前段迄の累積距離のうち小さいものＮ
個を選択して記憶する累積距離記憶手段とその遷移状態
を記憶する遷移記憶手段を′設け、累積距離記憶手段に
記憶されているＮ個の累積距離と現役における各候補音
節との総距離を計算し、その内の距離の小さいものＮ個
を選択し、これを新たに累積距離記憶手段に記憶すると
共に、選択された候補とその時の累積距離の順位を遷移
情報として遷移Ｎ記憶手段に記憶し、音声入力終了後、
前記遷移Ｎ記憶手段の遷移を逆にたどることにより複数
個の文章認識候補を得、文章辞書とのマツチングに供す
るように構成している。Structure of the Invention In order to achieve this object, the present invention calculates, in each stage of an input syllable, the smallest cumulative distance N to the previous stage.
A cumulative distance storage means for selecting and storing the selected syllables and a transition storage means for storing the transition state thereof are provided, and the total distance between the N cumulative distances stored in the cumulative distance storage means and each candidate syllable in active use is The calculation is performed, N items with the smallest distances are selected, and these are newly stored in the cumulative distance storage means, and the ranks of the selected candidates and the cumulative distances at that time are stored as transition information in the transition N storage means. Then, after finishing voice input,
The structure is such that a plurality of sentence recognition candidates are obtained by retracing the transitions in the transition N storage means and are matched with a sentence dictionary.

実施例の説明以下、本発明の一実施例について図面を参照し外から説
明する。DESCRIPTION OF EMBODIMENTS An embodiment of the present invention will be described from the outside with reference to the drawings.

第２図は本発明の一実施例における音声認識装置のブロ
ック図である。同図において１は、入力音声をパラン〜
り分析してＮ次元のパラメータベクトル列（ａ　１　、
　ａ　２　、・・・・、ａＩｌに遂次変換するパラメー
タ分析部で、フィルタバ７り、７−ＩＪ工変換器、線形
予線系数型分析器々とにより構成される。FIG. 2 is a block diagram of a speech recognition device in one embodiment of the present invention. In the same figure, 1 is the input audio
The N-dimensional parameter vector sequence (a 1 ,
This is a parameter analysis section that sequentially converts a2, .

２は標準パターン記憶部で、あらかじめパラメータ分析
された音声を音節標準パターン（ｂ１１ｂｌ、・・・・・・、ｂ））・・・・・、　（ｂ￥、す
、・・・・・・、ｂｉ）トｂて記憶する。Reference numeral 2 is a standard pattern storage unit, which stores the voice whose parameters have been analyzed in advance into a syllable standard pattern (b11 bl,..., b))..., (b\,su,... , bi) and memorize it.

３は比較部で、前記入力パラメータベクトル列（ａ　１
　、　ａ　２　、・・・・、ａｌｌと前記パターン記憶
部３に記憶されている標準パターンとの距離を計算する
。Reference numeral 3 denotes a comparison unit which compares the input parameter vector sequence (a 1
, a 2 , . . . , all and the standard pattern stored in the pattern storage section 3 are calculated.

４は判定部で、比較部３で得られた距離のうち、小さい
ものから１個（工は正の整数）を選択する。Reference numeral 4 denotes a determining unit, which selects one of the distances obtained by the comparison unit 3 from the smallest distance (factor is a positive integer).

これをｄ０９．とする。但し１≦１≦■で、第１番目に
出現した音節を示し、１≦ｊ≦Ｉで、距離が第１番目に
小さいことを示す添字である。This is d09. shall be. However, 1≦1≦■ indicates the first appearing syllable, and 1≦j≦I indicates the first smallest distance.

６は累積距離記憶部でｉ番目の音節の処理に先立って発
声された（ｉ−１）個の音節の累積距離５ｉ−Ｉｎ（但
し１≦ｎ≦Ｎで最終的にＮ個の文章を文章認識候補とし
て辞書マツチングに供する。）を記憶する。6 is the cumulative distance storage unit, which is the cumulative distance 5i-In of the (i-1) syllables uttered prior to processing the i-th syllable (however, if 1≦n≦N, the final N sentences are ) is stored for dictionary matching as a recognition candidate.

６は累積距離計算部で、上記累積距離記憶部に記憶され
ている累積距離５ｉ−１，ｎと、上記判定部４により得
られる距離ｄ□、ｊのすべての組合せＴ十Ｎ通りに関し
て和Ｓ１　をめる。Reference numeral 6 denotes a cumulative distance calculation unit which calculates a sum S1 for all T1N combinations of the cumulative distances 5i-1,n stored in the cumulative distance storage unit and the distances d□,j obtained by the determination unit 4. I put it on.

１、ｎすなわちＳ、ｆ　＝Ｓ＋ｄ、、　・・・・・・・・・　（１）１
、ｎ　”　’＋”　１＋１但し１．ｎ、］は、１≦ｉ≦Ｉ、１≦ｎ≦Ｎ。1, n i.e. S, f = S + d, ...... (1) 1
, n ” '+” 1+1 However, 1. n, ] is 1≦i≦I, 1≦n≦N.

１≦ｊ≦１とする。1≦j≦1.

７は選択部で、上記累積距離部によって計算された（Ｎ
十Ｊ）個の累積距＠　Ｓ、ｉ　のうち距離の１、ｎ小さいものから順にＮ個を第き音節における累積距離Ｓ
１．ｎ（１≦ｎ≦Ｎ）として累積距離記憶部５に記憶す
る。7 is a selection section, which is calculated by the cumulative distance section (N
10 J) cumulative distances @ S, i of the distances 1, n N in order from the smallest to the cumulative distance S in the syllable
1. It is stored in the cumulative distance storage unit 5 as n (1≦n≦N).

８は遷移記憶部で、上記累積距離Ｓｉ、ｎのｎの各々に
ついて、その場合の（ｉ　、ｎ）の組合せを要素とする
ベクトル　”よ、ｎ−（＋ｔ、ｎｌ　）を遷移情報とし
て記憶する。但しくｊ、、ｎ、　）は、１≦ｊ、≦工。Reference numeral 8 denotes a transition storage unit, which stores, as transition information, a vector ``y, n-(+t, nl) whose elements are the combination of (i, n) in that case, for each n of the above-mentioned cumulative distances Si, n. .However, j,, n, ) is 1≦j,≦k.

１≦ｎｔ≦Ｎのうちの１つの組合せとする。One combination of 1≦nt≦N.

９は遷移トレース部で、遷移記憶部８に記憶されている
遷移情報ｒ、ｎ−（！ｔ＋Ｊ　）をもとにこれを逆にト
レースすることにより、第Ｎ候補迄の文章認識候補を出
力する。１０は文章辞書であり、使用に供されるすべて
の文章が記憶されている。Reference numeral 9 denotes a transition tracing unit, which outputs sentence recognition candidates up to the Nth candidate by tracing the transition information r, n−(!t+J) stored in the transition storage unit 8 in reverse. . Reference numeral 10 is a text dictionary in which all texts to be used are stored.

１１は辞書マツチング部で、遷移トレース部９により得
られたＮ個の文章候補を文章辞書１０内の文章と順次比
較し、一致したものを最終認識結果λして出力する。Reference numeral 11 denotes a dictionary matching section which sequentially compares the N sentence candidates obtained by the transition tracing section 9 with the sentences in the sentence dictionary 10, and outputs the matched ones as the final recognition result λ.

次に上記のように構成された装置の動作について第１図
に示す具体例を用いて説明する。Next, the operation of the apparatus configured as described above will be explained using a specific example shown in FIG.

全５音節より成る文章「あたらしい」が発声されたとす
る。パラメータ分析部１におけるパラメータ分析、比較
部２における標準パターンとのパターン比較の後、判定
部４の出力として第１図に示すマトリックスが順次出現
するとする。今この中から累積距離の小さい文章のうち
４つ（Ｎ＝４）を選択する場合について説明する、判定
部で与えられる距離を第１図中（）内の数字で示す。Assume that the sentence ``Atarashi'' consisting of all five syllables is uttered. It is assumed that after parameter analysis in the parameter analysis section 1 and pattern comparison with a standard pattern in the comparison section 2, the matrices shown in FIG. 1 appear sequentially as the output of the determination section 4. Now, a case will be explained in which four (N=4) of sentences with small cumulative distances are selected from among these sentences.The distances given by the determination section are shown by numbers in parentheses in FIG.

第３図は本発明の入力音節の各段における処理を説明す
る図である。同図においてマ）　ＩＪノクス３５なるマ
トリックス全体の音節の配置は第１図のものと同一であ
る。FIG. 3 is a diagram illustrating the processing at each stage of input syllables according to the present invention. In the same figure, the arrangement of syllables in the entire matrix IJ Nox 35 is the same as that in FIG.

同図において先づｉ＝１の時点で音節「あ」が入力する
。このとき判定部４の出力には、４つの認識候補「あ」
「か」「ば」「た」とその距離（ｄｌ、１．ｄｌ、２．
ｄｌ、３．ｄｌ、４）＝（１３，５２，６ｓ、７ｓ）が
現れる。In the figure, the syllable "a" is first input at the time i=1. At this time, the output of the determination unit 4 includes four recognition candidates "A".
"Ka", "Ba", "Ta" and their distances (dl, 1.dl, 2.
dl, 3. dl, 4) = (13, 52, 6s, 7s) appears.

１−１段目では、初期値として距離ｄ１２．（１≦１≦
４）の小さいもの４つをその″１．寸累積距離として累
積距離記憶部６（第２図）に記憶する。すなわち　Ｓｌ
、ｎ＝ｄ１’；　、　（１≦ｊ≦Ｎ）とする０又遷移情
報としてｒｌ、ｎ＝　（１、ｎ）すなわちｒｌ　、１−
（１，１）ｒｌ、２＝（１，２）ｒｌ、３−（１，３）
ｒｌ、４−（１，４）を遷移記憶部４に記憶する。２１
゜２３．２５，２７．２９に１−１〜５段における累積
距離記憶部の内容を、又２２　、２４　、２６　、２８
゜３０に遷移記憶部８（第２図）の遷移情報の内容を示
す。ｉ＝２段目において、音節「た」が発声された時、
判定部４（第２図）の出力として（た。In the 1-1st stage, the initial value is the distance d12. (1≦1≦
4) are stored in the cumulative distance storage unit 6 (FIG. 2) as their "1." cumulative distances. That is, Sl
, n=d1'; , rl, n= (1, n), i.e. rl, 1-, as zero or transition information with (1≦j≦N)
(1,1)rl, 2=(1,2)rl, 3-(1,3)
rl, 4-(1, 4) is stored in the transition storage section 4. 21
゜23.25, 27.29 the contents of the cumulative distance storage section in stages 1-1 to 5, and 22, 24, 26, 28
30 shows the contents of the transition information in the transition storage section 8 (FIG. 2). i = When the syllable "ta" is uttered in the second row,
As the output of the determination unit 4 (FIG. 2):

か、だ、さ）が得られ、距離として（ｄ２，１１ｄ２，
２゜ｄ２，３．ｄ２，４””　（１９、３２、５３、６
２）が得られたとする。この時累積距離Ｓ４．ｎとして
（Ｓｌ、１１Ｓ１，２２Ｓ１，３１Ｓ１，４）と（ｄ２
，１　”２．２　ｌｄ２，３１ｄ２，４）のすべての組
合せを累積距離計算部６（第２図）によって上記式（１
）により計算する０この結果を第３図３１．３２，３３
．３４に示す。選択部７（第２図）では１６通りの３４
　、ｎから距離の小さいもの４つを選択し、これを１−
２段目における累積距離として累積距離記憶部５（第２
・図）に記憶する。第３図の例では（Ｓ２，１＃Ｓ２，２・”２．３１８２，４）＝　（Ｓ
２，１・Ｓ２，１’Ｓ３．Ｓ’　）＝（３２，４５，６
６，７１）２．１　２．２（第３図の２３）が選択される。この時遷移情報ｒ２．ｎとして、上記４
つの組合せの添字（ｔｌ、Ｊ）を遷移記憶部８（第２図
）に記憶する、すなわちｒ２，１＝（１，１）ｒ２，２＝（２，１）ｒ２，３−（３，１）ｒ２，４＝（１，２）となる。ka, da, sa) is obtained, and the distance is (d2, 11d2,
2°d2,3. d2,4”” (19, 32, 53, 6
Suppose that 2) is obtained. At this time, cumulative distance S4. As n, (Sl, 11S1, 22S1, 31S1, 4) and (d2
, 1 "2.2 ld2, 31d2, 4) by the cumulative distance calculation unit 6 (Fig.
) is calculated by 0. This result is shown in Figure 3.
．． 34. The selection section 7 (Fig. 2) selects 34 out of 16 ways.
, select four with small distances from n and divide them into 1-
The cumulative distance storage unit 5 (second
・Store in Figure). In the example of Figure 3, (S2,1#S2,2・”2.3182,4)=(S
2,1・S2,1'S3. S') = (32, 45, 6
6,71) 2.1 2.2 (23 in Figure 3) is selected. At this time, transition information r2. n, the above 4
The subscripts (tl, J) of the two combinations are stored in the transition storage unit 8 (FIG. 2), i.e., r2,1=(1,1) r2,2=(2,1) r2,3−(3,1 ) r2,4=(1,2).

ｉ＝２段目で行ったと同様の動作をｉ＝３．４゜６につ
いても行う、この結果遷移記憶部８（第２図）には第３
図２１〜３ｏに示す様に累積距離Ｓｉ、ｎ並びに遷移情
報ｒｉ、ｎの値が得られる。The same operation as performed for i = 2nd stage is also performed for i = 3.4°6. As a result, the transition storage section 8 (Fig. 2) contains the 3rd row.
As shown in FIGS. 21 to 3o, the values of the cumulative distance Si, n and the transition information ri, n are obtained.

遷移トレース部９では、遷移記憶部８に記憶されている
遷移情報ｒｉｎＯ値をもとにＮ＝４個の文章候補を決定
する。すなわち第１段目の第ｎ候補の逆トレースは、遷
移情報をｒｌｎ−（＋４．ｎｔ）であるとすると、第３
図のマトリックス３５内において（ｉ　、　ｉｌ　）の
マトリ、、ｐス要素に対応する音節を音節認識結果とし
、１〜１段目の遷移情報のうちｒ（１−ＩＬ”ｔを（ｉ
−１）段目の遷移情報とする。これは、先に遷移ｒｉ、
ｎが、ｒｌ　１．ｎｌから決定されたものであることに
よる。The transition tracing unit 9 determines N=4 sentence candidates based on the transition information rinO value stored in the transition storage unit 8. In other words, if the transition information is rln-(+4.nt), the reverse trace of the n-th candidate in the first stage is
In the matrix 35 shown in the figure, the syllables corresponding to the matrices, , ps elements of (i, il) are taken as the syllable recognition results, and among the transition information in the first to first stages, r(1-IL"t is
-1) Transition information for the row. This first transitions ri,
n is rl 1. This is because it is determined from nl.

具体例を用いて説明すると、例えば文章の第３候補Ｎ、
＝３を逆トレースによって決定する場合、先づｉ−５段
目の遷移情報ｒ、３−（１，３）　よりｉ＝５段目の音節はマトリックス位置（ｔ’、＋（）−（５，１）より「い」と決定でき、Ｊ
−３より（ｉ−１）−４段目の遷移情報としてｒ４，３
−（１，２）となる、従って、４段目の音節はマトリッ
クス位置（ｉ　、　１ｌ）＝（４，１）より「い」と決
定できｎｔ＝２より３段目でｒ３，２をトレースする。To explain using a specific example, for example, the third candidate N of the sentence,
= 3 by reverse tracing, the syllable in the i=5th row is determined by the matrix position (t', +()-(5 , 1), it can be determined that "Yes", and J
From -3, (i-1)-4th stage transition information is r4,3
- (1, 2), therefore, the syllable in the 4th row can be determined as "i" from the matrix position (i, 1l) = (4, 1), and trace r3,2 in the 3rd row from nt = 2. do.

以上の動作を１＝１段目迄行い、文章候補として「あた
らちい」を得る。The above operation is performed until the 1=1st row, and "Atarachii" is obtained as a sentence candidate.

同様の動作をＮ−１〜４のすべてについて行うと　第１
候補　「あだたちい」１２　＃　「あたたしい」〃３　〃　「あたらちい」１４Ｎ　「あたらしい」なる結果をうる。辞書マツチング部１１では上記４つの
候補を文章辞書１０内の標準文章と比較し、一致したも
のを文章認識結果とする。上記の例では第１〜第３候補
のような文章は一般の辞書内には存在しないので第４候
補が選択されることは明らかである。If the same operation is performed for all of N-1 to N-4, the first
Candidate ``Adatachii'' 12 ＃ ``Adatashii'' 〃3〃〃 ``Atarachii'' 14N ``New'' Get the result. The dictionary matching section 11 compares the four candidates with the standard sentences in the sentence dictionary 10, and takes the matches as the sentence recognition results. In the above example, since sentences such as the first to third candidates do not exist in general dictionaries, it is clear that the fourth candidate is selected.

以上の動作により音節認識の第１候補として、「あたた
ちい」という誤シが生じても、複数候補を選択すること
により誤りの訂正が可能となる。Even if the above-described operation causes an error in selecting "Attachii" as the first candidate for syllable recognition, the error can be corrected by selecting a plurality of candidates.

上記辞書マツチング部１１には、バイナリ−サーチ、ハ
ツシュ関数、Ｄ、Ｐ、マツチング等を用いる方法が有る
がいずれも公知であるので説明を省略する。又文章辞書
１ｏの代りに単語辞書を用いてもよく、さらに文法属性
情報を有する単語辞書やい、文法属性情報として品詞情
報、活用情報を用い単語前後の接続関係を決定する文法
解析処理手段を用いることも可能である。文法解析処理
手段については公知のカナ漢字変換技術が流用可能であ
るので説明を省略する。The dictionary matching section 11 has methods using binary search, hash function, D, P, matching, etc., but since all of these methods are well known, their explanations will be omitted. In addition, a word dictionary may be used instead of the sentence dictionary 1o, and a word dictionary having grammatical attribute information, and a grammatical analysis processing means that uses part-of-speech information and conjugation information as the grammatical attribute information to determine connections before and after words. It is also possible to use As for the grammar analysis processing means, a known kana-kanji conversion technique can be used, so a description thereof will be omitted.

以上の説明から明らかなように上記動作において累積距
離計算の加算回数はＮｘ　Ｊ　Ｘ　Ｉ回となり、前述の
公知例における加算回数１１に比し著しく少ない、Ｉ＝
ｓ　、Ｊ＝４　、Ｎ＝４の例をとればＮｘ　Ｔ　ｘ　Ｉ
　＝ａ。As is clear from the above explanation, the number of additions in the cumulative distance calculation in the above operation is Nx J
s, J=4, N=4, Nx T x I
=a.

■Ｉ＝１０２４　となって１０倍以上の差を有する。■I=1024, which is a difference of more than 10 times.

以上のように本実施例によれば出現音韻の第１段目にお
ける累積距離を計算する累積距離計算部６と、このうち
のＮ個を選択する選択部７、この結果を記憶し次段の累
積距離計算に供するだめの累積距離記憶部５と、その遷
移情報を記憶する遷移記憶部８、音声入力終了後その遷
移情報を逆トレースする遷移トレース部９を設けること
により、与えられた音節マトリックスから文章候補を阜
榊Ｔ　＋　Ｉ回の加算による累積距離計算によって決定
し、文章辞書とのマツチングに供することにより、効率
的な音韻認識誤り訂正を実現することができる。As described above, according to the present embodiment, there is a cumulative distance calculation section 6 that calculates the cumulative distance of the first stage of appearing phonemes, a selection section 7 that selects N of the cumulative distances, and a selection section 7 that stores this result and uses the cumulative distance in the next stage. By providing a cumulative distance storage section 5 for calculating cumulative distance, a transition storage section 8 for storing its transition information, and a transition tracing section 9 for back tracing the transition information after voice input is completed, a given syllable matrix can be calculated. Efficient phoneme recognition error correction can be realized by determining sentence candidates from T + I by calculating the cumulative distance and matching with the sentence dictionary.

なお本実施例は、この出力をカナ漢字変換機能と組合せ
出力結果を漢字かなまじり文で出力することが可能であ
る。又本実施例はコンピュータを用いプログラム的にこ
れを行うことが可能である。Note that in this embodiment, it is possible to combine this output with a kana-kanji conversion function and output the output result in a sentence with a mixture of kanji and kana. Further, this embodiment can be performed programmatically using a computer.

さらに本実施例では、音韻候補選択の尺度として距離を
用いたが、これを類似度、もしくは確からしさを表現す
る値（確率など）としても同様に適用可能である。Further, in this embodiment, distance is used as a measure for phoneme candidate selection, but it can be similarly applied as a value (probability, etc.) expressing similarity or certainty.

発明の効果以上のように本発明の音声認識装置は、複数個の音節認
識候補とその時の距離をもとに、音節の各段において累
積距離の小さいものを複数個選択し、その累積距離を記
憶し、次段の累積距離計算に供し、合せてその遷移情報
を記憶し、最終音節入力後、上記遷移情報を逆にトレー
スすることにより複数個の文章候補を決定し、文章辞書
とのマチングに供することにより、音節候補総当りによ
る候補選択に比し、計算量を著しく減少させ、認識性能
の向上を図ることができ、その工業的価値は大なるもの
が有る。Effects of the Invention As described above, the speech recognition device of the present invention selects a plurality of syllable recognition candidates with a small cumulative distance in each stage of the syllable based on the distance between the plurality of syllable recognition candidates and calculates the cumulative distance. It is stored and used in the next stage of cumulative distance calculation, and the transition information is also stored. After inputting the final syllable, multiple sentence candidates are determined by tracing the transition information in reverse, and matching with the sentence dictionary is performed. Compared to candidate selection by exhaustive syllable candidate selection, the amount of calculation can be significantly reduced and recognition performance can be improved, which has great industrial value.

[Brief explanation of the drawing]

第１図は本発明の音節認識結果の一例を示す認識図、第
２図は本発明の一実施例における音声認識装置のブロッ
ク図、第３図は本発明の実施例の動作を説明するだめの
説明図である０１・・・・・・パラメータ分析部、２・・・・・・比較
部、３・・・・・・比較部、４・・・・・・判定部、５
・・・・・・累積距離記憶部、６・・・・・・累積距離
計算部、７・・・・・・選択部、８・・・・・・遷移記
憶部、９・・・・・・遷移トレース部、１０・・・・・
・文章辞書、１１・・・・・・辞書マツチング部。FIG. 1 is a recognition diagram showing an example of the syllable recognition results of the present invention, FIG. 2 is a block diagram of a speech recognition device in an embodiment of the present invention, and FIG. 3 is a diagram for explaining the operation of the embodiment of the present invention. 0 1...Parameter analysis section, 2...Comparison section, 3...Comparison section, 4...Judgment section, 5
... Cumulative distance storage unit, 6... Cumulative distance calculation unit, 7... Selection unit, 8... Transition storage unit, 9...・Transition trace part, 10...
・Sentence dictionary, 11...Dictionary matching section.

Claims

[Claims]

(1) Phoneme identification means that identifies the phonemes of input speech and outputs a plurality of identification candidates and their likelihood values for each phoneme, and a phoneme identification means that identifies the top plurality of cumulative likelihood values of the phonemes that appeared earlier. a cumulative likelihood storage means for storing and calculating a cumulative likelihood value at the time of the next phoneme appearance; and a sentence candidate selection unit that stores transitions leading to selection of phoneme candidates having the plurality of top cumulative likelihood values. 1. A speech recognition device, comprising: transition storage means for storing data.

(2) A phoneme identification means that identifies the phonemes of the input speech and outputs one identification candidate (factor is a positive integer) and its distance for each phoneme, and an accumulation of phonemes that appeared before the appearance of the corresponding phoneme. an output of the cumulative distance storage means for storing N items having small distances (N is a positive integer); a transition storage means for storing transitions leading to the selection of N items having small cumulative distances; and the output of the phoneme identification means. A cumulative distance calculation means calculates the sum by a combination of one distance given as . a selection means for selecting N items from among them, storing the distance in the cumulative distance storage means, and storing in the transition storage means the number and rank of the cumulative distance of which phoneme candidate was selected; tracing the transition information stored in the transition storage means;
A speech recognition device comprising: transition tracing means for outputting N discrimination result candidates.