JPS62253199A

JPS62253199A - Continuous voice recognition equipment

Info

Publication number: JPS62253199A
Application number: JP61093222A
Authority: JP
Inventors: 教幸藤本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-04-24
Filing date: 1986-04-24
Publication date: 1987-11-04
Also published as: JPH0458637B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔概　要〕連続発声された単語や音節を認識する連続音声認識装置
において、音声レベル処理により入力連続音声パターン
の各部分区間パターンと標準音声との距離を求め、文レ
ベル処理において入力連続音声パターンを区分するに際
し、その区分数に対応して予め設定された音声長の範囲
内で区分して文レベル処理を行い、連続音声を認識する
。これにより、不自然な区分が阻止され認識率を向上さ
せると共に、処理量を低減さゼることが出来る。[Detailed Description of the Invention] [Summary] In a continuous speech recognition device that recognizes continuously uttered words and syllables, the distance between each subinterval pattern of an input continuous speech pattern and standard speech is determined by speech level processing, and When classifying the input continuous speech pattern in level processing, the continuous speech is recognized by segmenting it within a preset range of speech length corresponding to the number of segments and performing sentence level processing. This prevents unnatural classification, improves the recognition rate, and reduces the amount of processing.

[Industrial application field]

本発明は、連続して発声された単語や音節等の音声を認
識する連続音声認識装置、特に、入力連続音声パターン
に対する不自然な区分な仕方が発生ずるのを防いで認識
率を向」ニさせる様に改良した連続音声認識装置に関す
る。The present invention relates to a continuous speech recognition device that recognizes speech such as words and syllables that are continuously uttered, and in particular, to improving the recognition rate by preventing the occurrence of unnatural segmentation of input continuous speech patterns. This invention relates to a continuous speech recognition device that has been improved so as to allow continuous speech recognition.

[Conventional technology]

連続して発声された単語音声を認識する場合、認識率が
良好なことから、従来、入力単語音声パターンを予め登
録されている単語標準パターンと照合する連続単語音声
認識方式、特に２段ＤＰ法（Ｔｗｏ−１ｅｖｅｌ　ｄｙ
ｎａｍｉｃ　　ｐｒｏｇｒａｍｍｉｎｇ　　ｍａｔｃｈ
ｉｎｇ）又はこれに類する方式が多く用いられている。Since the recognition rate is good when recognizing continuously uttered word sounds, conventional continuous word speech recognition methods, especially the two-stage DP method, have been used to match input word sound patterns with pre-registered word standard patterns. (Two-1 level dy
namic programming match
ing) or similar methods are often used.

第４図は、従来の２段ＤＰ法による連続単語音声認識方
式の基本構成をブロック図で示したものである。FIG. 4 is a block diagram showing the basic configuration of a conventional two-stage DP method for continuous word speech recognition.

第４図において、連続単語音声が音声分析部２１０に入
力されると、音声分析部２１０は、連続単語音声の特徴
を表すパラメタや区間検出等を行って連続単語音声パタ
ーンを作成し、単語レベルＤＰ処理部２２０に入力する
。In FIG. 4, when continuous word speech is input to the speech analysis section 210, the speech analysis section 210 creates a continuous word speech pattern by detecting parameters and intervals representing the characteristics of the continuous word speech, and then creates a continuous word speech pattern at the word level. It is input to the DP processing section 220.

入力連続単語音声パターン（Ａで表す）は、特徴ベクト
ルの時系列データとして、次の（１，１式で表現される
。The input continuous word speech pattern (represented by A) is expressed by the following equation (1,1) as time series data of feature vectors.

Ａ＝ａ（１１，ａ（２１，・・・・・・ａ　　（７＋）
　−ａ　　（Ｔ）　−（１）ここで、ａ　（７りは、フ
レームｌにおける入力単語音声パターンＡのスペクトル
特徴を示すパラメタをベクトル量で表したものである。A=a(11,a(21,...a (7+)
-a (T) -(1) Here, a (7) is a parameter indicating the spectral characteristics of the input word speech pattern A in frame l, expressed as a vector quantity.

■は人力単語音声パターンＡの終端フレームで、入力単
語音声パターンＡの語長（フレーム数）も表すものであ
る。フレームは窓関数を乗じて切り出された音声区間で
ある。3 is the terminal frame of the human word speech pattern A, and also represents the word length (number of frames) of the input word speech pattern A. A frame is an audio section cut out by multiplying by a window function.

一方、単語標準パターン辞書２３０には、認識対象とな
る各単語の標準パターンが登録されている。認識対象と
なる単語ｎの標準パターン（Ｂｎとする）も、入力単語
音声パターンＡと同様に、特徴ベクトルの時系列データ
として、次の（２）式で表現される。On the other hand, the word standard pattern dictionary 230 stores standard patterns for each word to be recognized. Similarly to the input word speech pattern A, the standard pattern of word n (referred to as Bn) to be recognized is also expressed as time-series data of feature vectors by the following equation (2).

Ｂ　ｎ　＝　ｂ（＋、１　、　ｂ２（２１、・”−ｂ（
Ｊｌ−ｂ　（Ｊｎ）・・・（２）ここで、ｂ（Ｊ）は、
フレームｊにおける単語標準パターンのスペクトル特徴
を示すパラメタをベクトル量で表したものである。Ｊｎ
は単語標準パターンＢｎの終端フレームで、単語ｎの標
準パターンＢｎの語長（フレーム数）も表すものである
。B n = b(+, 1, b2(21,・”-b(
Jl-b (Jn)...(2) Here, b(J) is
Parameters indicating the spectral characteristics of the word standard pattern in frame j are expressed as vector quantities. Jn
is the final frame of the standard word pattern Bn, and also represents the word length (number of frames) of the standard pattern Bn of word n.

単語レベルＤＰ処理部２２０は、入力連続単語音声パタ
ーンＡ内におけるあらゆる種類の部分区間パターンを切
り出し、各部分区間パターンがそれぞれ１個の単語音声
パターンに対応すると仮定して、単語標準パターン辞書
２３０の各単語標準パターンとＤＰ照合を行って、それ
ぞれの距離を算出する。The word-level DP processing unit 220 cuts out all kinds of sub-interval patterns in the input continuous word speech pattern A, and, assuming that each sub-interval pattern corresponds to one word speech pattern, searches the word standard pattern dictionary 230. Perform DP matching with each word standard pattern to calculate the respective distances.

１つの部分区間（／、ｍ）の部分区間パターンと各単語
標準パターン間の距離を求め、その距離が最小である最
適の単語を認識率１ｉＮ（Ｉｌ、ｍ）とし、この単語Ｎ
（７！、ｍ）との距離Ｄ　（１＋　ｍ　）を、その部分
区間パターンと単語との距離とする。The distance between the subinterval pattern of one subinterval (/, m) and each word standard pattern is determined, and the optimal word with the minimum distance is set to a recognition rate of 1iN (Il, m), and this word N
Let the distance D (1+ m ) from (7!, m) be the distance between the partial interval pattern and the word.

入力連続単語音声パターン内におけるあらゆる部分区間
パターン、即ち０≦１２＜ｍ≦■なる条件の下で、ｌ及
びｍをあらゆる組合せで変化させて形成される各部分区
間パターンについて、部分区間パターンと単語との距離
及び対応する最適の単語とをそれぞれ求め、図示しない
テーブルに格納する。For every subinterval pattern in the input continuous word speech pattern, that is, for each subinterval pattern formed by changing l and m in any combination under the condition 0≦12<m≦■, the subinterval pattern and the word and the corresponding optimal word are calculated and stored in a table (not shown).

文レベル処理処理部２４０は、入力連続単語パターンを
任意に区分して形成される各部分区間パターン系列に対
応する単語系列の距離を、各部分区間パターンと単語と
の距離の和より求め、その和を参照にする区分に対応す
る最適単語系列を認識連続単語とする。The sentence level processing unit 240 calculates the distance of the word series corresponding to each subinterval pattern series formed by arbitrarily dividing the input continuous word pattern from the sum of the distances between each subinterval pattern and the word. The optimal word sequence corresponding to the segment using the sum as a reference is the recognized continuous word.

いま、入力連続単語音声の単語数（以下、桁数と呼ぶ場
合もある）がｋｆｌｌｉｌで既知であるとすると、最適
単語系列の距％１ｔ（Ｔｋ（１）とする）は、次の漸化
式（３）をＤＰ法により解くことにより求められる。Now, assuming that the number of words (hereinafter sometimes referred to as the number of digits) of the input continuous word speech is known as kflil, the distance %1t of the optimal word sequence (referred to as Tk(1)) is calculated by the following recurrence. It is obtained by solving equation (3) using the DP method.

ここで、Ｔに　（ｍ）は、入力連続単語音声パターンＡ
の始端からフレームｍの部分区間にＸ個の単語を仮定し
た場合の、その単語系列の最小距離である。Here, in T (m) is the input continuous word speech pattern A
It is the minimum distance of the word sequence when it is assumed that there are X words in the partial section of frame m from the start of the frame m.

この漸化式（３）を、ＤＰ法によりｍについては１から
■　（入力連続単語音声パターンＡのフレーム数）まで
計算し、Ｘについては１からｋまで計算することにより
、Ｔｋ　　（Ｉ）が求められる。By calculating this recurrence formula (3) using the DP method for m from 1 to ■ (the number of frames of input continuous word speech pattern A) and for X from 1 to k, Tk (I) can be obtained. Desired.

もし、入力連続単語音声の単語数（桁数）が未知の場合
は、漸化式（３）より求まるＴｋ　　（１）において、
桁数ｋを変数として次の（４）式より求まる桁数ｋｏを
入力連続単語の桁数とする。If the number of words (number of digits) of the input continuous word speech is unknown, in Tk (1) found from recurrence formula (3),
Let the number of digits ko obtained from the following equation (4) using the number of digits k as a variable be the number of digits of the input continuous word.

ｋｏ＝　ａｒｇ　　ｍｉｎ　Ｔｋ　　（１）　　−・−
・・・・・（４１にここでａｒｇ　　ｍｉｎ　　（）は、（）内にの最小化条件を満足するｋを選択する機能を有するオペ
レータである。ko= arg min Tk (1) −・−
(41) Here, arg min () is an operator that has the function of selecting k that satisfies the minimization condition in ().

この桁数ｋｏを持った最適単語系列の距離は、Ｔｋｏ　
（Ｉ）で表される。The distance of the optimal word sequence with this number of digits ko is Tko
It is represented by (I).

以」二の様にして求められた最小距１１ｉＴｋ　　（Ｔ
）（又はＴｋｏ（１）に対応する最適単語系列が認識単
語として出力される。The minimum distance 11iTk (T
) (or the optimal word sequence corresponding to Tko(1) is output as a recognized word.

所で、前掲の漸化式（３）を解く場合、単語境界を決め
る区間ｌの探索範囲は、次の（５）式に示す様に、一定
の範囲内で行われる。By the way, when solving the above-mentioned recurrence formula (3), the search range of the interval l that determines the word boundary is performed within a certain range, as shown in the following formula (5).

ｘ＝ｌの場合　：ｊ！＝ＯＸ＞１の場合　：（５）ｍ−１ｍＢ×　≦β≦ｍ−／ｍｉｎここで、ｉｔ　ｍｉｎは、入力連続単語音声パターン中
の１つの’Ｒ８％に許された最小フレーム数であり、Ｒ
ｍａには、入力連続単語音声パターン中の１つの単語に
許された最大フレーム数である。このβｍｉｎ及びρｍ
ａｘは、予め与えておくことも出来るし、１段目のＤＰ
処理におけるバス制限の結果、自動的に決められる場合
もあるが、何れの場合も、−変法められると単語の桁数
に関係なく、固定された値をとる。If x=l: j! =O When X>1: (5) m-1mB× ≦β≦m-/min Here, it min is the minimum number of frames allowed for one 'R8% in the input continuous word speech pattern. ,R
ma is the maximum number of frames allowed for one word in the input continuous word speech pattern. This βmin and ρm
Ax can be given in advance, and the DP of the first stage
In some cases, it is automatically determined as a result of bus limitations in processing, but in any case, when the -transformation is applied, it takes a fixed value regardless of the number of digits in the word.

従って、前掲の漸化式（３）は、より詳細には、次の（
３）′式で表現される。Therefore, the above recurrence formula (3) can be expressed in more detail as follows (
3) It is expressed by the equation.

第５図は、入力連続単語音声パターンへのフレーム数１
＝１８０．ｎｍａｘ＝２００，４８ｍ１ｎ＝５０の場合
の文しヘルＤ処理方式を示したものである。Figure 5 shows the number of frames 1 to the input continuous word speech pattern.
=180. This figure shows the text HELD processing method when nmax=200, 48m1n=50.

１ｍａｘ−２００、ｊ！ｍｉｎ　＝５０であるので、入
力連続単語音声パターンＡの桁数Ｘとしては、■。1max-200, j! Since min = 50, the number of digits X of the input continuous word speech pattern A is ■.

２．３の３通りが考えられる。2.3 are possible.

第５図（ａｌは、桁数ｘ＝ｌの場合、同図（ｂｌは桁数
ｘ＝２の場合、同図（Ｃ１は桁数ｘ＝３の場合の文レベ
ルＤＰ処理を示したものである。Figure 5 (al is the same figure when the number of digits x = l (bl is the same figure when the number of digits x = 2) (C1 is the sentence level DP processing when the number of digits x = 3) be.

第５図（ａｔではｘ＝ｌであるので、ｍ−１（＝１８０
）　、ｎ＝ｏとなって、漸化式（３）又は（３）′の〔
Ｔ＋　　（ｍ）　−′Ｄ　（０、ｍ））ｍ＝　１が適用
される。Figure 5 (at at, x=l, so m-1(=180
), n=o, and the recurrence formula (3) or (3)' [
T+ (m) −′D (0, m)) m=1 applies.

従って、ｌの探索範囲は存在しない。Therefore, a search range of l does not exist.

第５図（ｂｌではｘ＝２であるので、ｍ＝１（＝１８０
）、１桁目のｌの最大探索範囲Ｎ＋ｍａＸ＝ｍ−Ｒｍｉ
ｎは１３０　（＝１８０−５０）である。又、１桁目の
ｌの最小探索範囲／＋ｍｉｎはＲｍｉｎに等しいから５
０である。Figure 5 (In bl, x=2, so m=1 (=180
), maximum search range of 1st digit l N+maX=m-Rmi
n is 130 (=180-50). Also, the minimum search range/+min of the first digit l is equal to Rmin, so 5
It is 0.

第５図ｔｅｌではｘ＝３であるので、ｍ＝Ｔ（−１８０
）、２桁目のβの最大探索範囲Ａ２ｍａｘ＝ｍ−ｌ　ｍ
ｉｎは１３０　（＝１８０−５０）であり、１桁目のβ
の最大探索範囲ｊ！＋　ｍａｙ　＝ｌ１２ｍａｘ　　１
ｍａｘは８０　（１３０−５０）となる。又、１桁目の
ｌの最小探索範囲７！＋ｍｉｎはｌ　ｍｉｎの値５０で
あり、２桁目の最小探索範囲１２ｍｆｎ　＝　１２１ｍ
１ｎ＋　ｌ１ｍ１ｎは１００である。In Figure 5 tel, x=3, so m=T(-180
), maximum search range of second digit β A2max=ml m
in is 130 (=180-50), and the first digit β
Maximum search range j! + may =l12max 1
The max is 80 (130-50). Also, the minimum search range of the first digit l is 7! +min is the value of l min 50, and the minimum search range of the second digit is 12mfn = 121m
1n+l1m1n is 100.

[Problem that the invention seeks to solve]

従来の入力連続単語音声認識方式の文レベル処理処理に
おいては、最適単語系列の距離を漸化式（３）′によっ
て求める場合、単語境界を決めるｌの探索範囲は、固定
されたｌ　ｍｉｎ及び７！ｍａにを用い、単語の桁数に
関係な（ｍ−１ｍａχ≦ｌ≦ｍ−ｆｆｍ１ｎとして漸化
式（３）′を一度に解き、Ｔｘ　Ｎ）の最小値を与える
Ｘを入力連続単語の単語数（桁数）としていた。In the sentence level processing of the conventional input continuous word speech recognition method, when the distance of the optimal word sequence is calculated by the recurrence formula (3)', the search range of l that determines the word boundary is fixed l min and 7 ! Using ma, solve the recurrence formula (3)′ at once, which is related to the number of digits of the word (m-1maχ≦l≦m-ffm1n, and input X that gives the minimum value of Tx N). Words of continuous words It was expressed as a number (number of digits).

この為、第５図の（ｂｌ　、　（Ｃ１に示す様に、未知
入力連続単語音声パターンの区分の仕方に、不自然な場
合が生じる。例えば、第５図（ｂｌにおいて、β１ｍａ
にで区分した場合、１桁目の単語は極端に長く発音され
、２桁目の単語は極端に短く発音されたことを意味する
。又、／＋＋ｎｉｎで区分された場合、１桁目の単語は
極端に短く発音され、２桁目は極端に長く発音されたこ
とを意味する。しかし、実際に人間が自然に発生する場
合、そのようなことは通常おこらない。For this reason, as shown in (bl, (C1) in FIG.
When divided by , it means that the word in the first digit is pronounced extremely long, and the word in the second digit is pronounced extremely short. Also, when the words are divided by /++nin, the first digit means that the word is pronounced extremely short, and the second digit means that the word is pronounced extremely long. However, when humans actually occur naturally, this usually does not occur.

従来方式では、この様な不自然な区分の仕方が、即ち不
自然なりＰパスが許される為、認識率が低下すると共に
、計算量もそれだけ多くなるという問題があった。In the conventional method, since such an unnatural classification method, that is, an unnatural P path is allowed, there is a problem that the recognition rate decreases and the amount of calculation increases accordingly.

本発明は、この様な不自然な区分の仕方が発生するのを
阻止し、認識率を向上させると共に計算量を低減させる
様にした入力連続音声認識装置を提供することを目的と
する。SUMMARY OF THE INVENTION An object of the present invention is to provide an input continuous speech recognition device that prevents such unnatural classification from occurring, improves the recognition rate, and reduces the amount of calculation.

なお、単語は音節（Ｓｙｌｌｂｌｅ　）から成り立ち、
音節は通常１個の母音と１個の子音が結合して出来てい
る。音節は、日本語の場合、約１００種類程存在する。Furthermore, words are made up of syllables.
A syllable is usually made up of one vowel and one consonant. There are approximately 100 types of syllables in Japanese.

本発明において、連続音声は連続単語音声の他、連続単
音節音声の場合も含むものである。In the present invention, continuous speech includes not only continuous word speech but also continuous monosyllabic speech.

[Means for solving problems]

単語音声や音節音声等の音声を複数連続して発声して情
報や意志を伝達する場合、ある音声だけを極端にゆっく
り発声したり、逆に、ある音声だけを極端に短かく発声
することは、通常起り得ないことである。When conveying information or intentions by uttering multiple sounds such as word sounds or syllable sounds in succession, it is important to avoid uttering only one sound extremely slowly or, conversely, uttering only one sound extremely short. , which normally cannot occur.

本発明はこの事実に着目し、連続音声認識時に行われる
文レベル処理において、単語の桁数に対応して単語境界
の探索範囲を変化させることによす、不自然な区分の仕
方、即ち不自然なりＰパスの発生を阻止して、認識率を
向」ニさせると共に処理量を低減させる様にしたもので
ある。The present invention focuses on this fact, and in sentence-level processing performed during continuous speech recognition, the search range for word boundaries is changed in accordance with the number of digits of the word. This method is designed to prevent the natural occurrence of P passes, thereby increasing the recognition rate and reducing the amount of processing.

以下、従来の連続単語音声認識方式における前述の問題
点を解決する為に本発明が講じた手段を、第１図を参照
して説明する。Hereinafter, the means taken by the present invention to solve the above-mentioned problems in the conventional continuous word speech recognition system will be explained with reference to FIG.

第１図は、本発明の基本構成をブロック図で示したもの
である。FIG. 1 is a block diagram showing the basic configuration of the present invention.

第１図において、１１０は音声レベル処理手段で、入力
連続音声から作成された入力連続音声パターンより複数
の部分区間パターンを切り出し、各部分区間パターン毎
に、予め登録されている各標準音声パターンとの距離を
求め、それらの距離の最小値からその部分区間パターン
と標準音声との距離を求める。In FIG. 1, reference numeral 110 denotes an audio level processing means, which cuts out a plurality of sub-interval patterns from an input continuous audio pattern created from input continuous audio, and divides each sub-interval pattern into each pre-registered standard audio pattern. The distance between the subinterval pattern and the standard voice is determined from the minimum value of these distances.

１２０は音声長範囲設定手段で、入力連続音声パターン
を複数の部分区間パターンに区分するとき、その区分数
に対応して各部分区間パターンの音声長の範囲を設定す
る。Reference numeral 120 denotes a voice length range setting means which, when dividing the input continuous voice pattern into a plurality of partial section patterns, sets the voice length range of each partial section pattern in accordance with the number of sections.

１３０は文レベル処理手段で、入力連続音声バターンを
複数の部分区間パターンに区分するに際し、その区分数
に対応して前記音声長範囲設定手段１２０によって設定
された音声長の範囲内で区分し、この区分によって形成
された各部分区間パターン系列に対応する音声系列の距
離を、各部分区間パターンと標準音声との距離の和より
求め、その和を最小にする区分に対応する最適音声系列
を認識連続音声とする。Reference numeral 130 denotes a sentence level processing means which, when dividing the input continuous speech pattern into a plurality of sub-segment patterns, divides the input continuous speech pattern into a plurality of sub-segment patterns within the speech length range set by the speech length range setting means 120 corresponding to the number of segments; The distance of the speech sequence corresponding to each subinterval pattern series formed by this division is calculated from the sum of the distances between each subinterval pattern and the standard speech, and the optimal speech sequence corresponding to the division that minimizes the sum is recognized. Continuous audio.

[For production]

人力連続音声から作成された人力連続音声パターンが入
力されると、音声レベル処理手段１１０は、入力連続音
声パターンより複数の部分区間パターンを切り出し、各
部分区間パターン毎に、予め登録されている各標準音声
パターンとの距離を求め、その距離の最小値から、その
部分区間パターンと標準音声との距離を算出する。When a human continuous voice pattern created from a human continuous voice is input, the voice level processing means 110 cuts out a plurality of partial interval patterns from the input continuous voice pattern, and for each partial interval pattern, each of the pre-registered The distance from the standard speech pattern is determined, and the distance between the partial section pattern and the standard speech is calculated from the minimum value of the distance.

これにより、切り出された各部分区間パターンがそれぞ
れ１個の標準音声パターンに対応すると仮定したとき、
その部分区間パターンとその標準ＣＳ音声との距離が求められる。As a result, assuming that each extracted subinterval pattern corresponds to one standard speech pattern,
The distance between the subinterval pattern and the standard CS voice is determined.

文レベル処理手段１３０は、入力連続音声パターンを区
分して部分区間パターンの系列を作成する。その際、そ
の区分数に対応して音声長範囲設定手段１２０によって
設定された音声長の範囲内で区分する。The sentence level processing means 130 divides the input continuous speech pattern and creates a series of partial interval patterns. At that time, the audio data is classified within the audio length range set by audio length range setting means 120 corresponding to the number of classifications.

この区分によって形成された各部分区間パターン系列に
対応する音声系列の距離を、その部分区間パターン系列
を形成する各部分区間パターンと標準音声との距離の和
より求め、その和を最小にする区分によって形成された
部分区間パターン系列に対応する音声系列、即ち最適音
声系列を認識連続音声とする。The distance of the voice sequence corresponding to each subinterval pattern series formed by this division is calculated from the sum of the distances between each subinterval pattern forming the subinterval pattern series and the standard voice, and the division minimizes the sum. The speech sequence corresponding to the subinterval pattern series formed by , that is, the optimal speech sequence is defined as the recognized continuous speech.

なお、連続音声には、連続単語音声の他に連続音節音声
も含まれるものであることは、既に述べた通りである。Note that, as already mentioned, continuous speech includes not only continuous word speech but also continuous syllable speech.

以上の様に、入力連続音声パターンを区分する際、その
区分範囲を区分数に対応して制限することにより、不自
然な区分の仕方が発生するのが阻止され、認識率が向上
すると共に計算量を低減さ　Ｇせることが出来る。As described above, when classifying an input continuous speech pattern, by limiting the classification range according to the number of classifications, unnatural classification is prevented from occurring, the recognition rate is improved, and the calculation The amount of G can be reduced.

〔Example〕

本発明の一実施例を、第２図及び第３図を参照して説明
する。An embodiment of the present invention will be described with reference to FIGS. 2 and 3.

第２図は、本発明の一実施例の構成をプロ・ツク図で示
したものであり、第３図は、同実施例の文レベルＤＰ処
理方式の説明図である。FIG. 2 is a block diagram showing the configuration of an embodiment of the present invention, and FIG. 3 is an explanatory diagram of the sentence level DP processing method of the embodiment.

（Ａ）実施例の構成第２図において、音声レベル処理手段１１０、音声長範
囲設定手段１２０及び文レベル処理手段１３０について
は、第１図で説明した通りである。(A) Structure of the Embodiment In FIG. 2, the voice level processing means 110, the voice length range setting means 120, and the sentence level processing means 130 are as described in FIG.

１４０はマイクロボンで、話者（図示せず）の発声した
連続音声又は登録用音声が入力される。Reference numeral 140 denotes a microbon, into which continuous voice uttered by a speaker (not shown) or voice for registration is input.

１５０はパラメタ抽出部で、マイクロホン１４０から入
力された連続音声又は登録用音声の特徴を表すパラメタ
を抽出する。Reference numeral 150 denotes a parameter extraction unit that extracts parameters representing the characteristics of the continuous voice input from the microphone 140 or the voice for registration.

１６０は区間検出部で、パラメタ抽出部１５０によって
抽出されたパラメタに基づいて区間検出を行って、入力
連続音声パターン又は登録用の標準音声パターンを作成
する。Reference numeral 160 denotes a section detecting section that detects sections based on the parameters extracted by the parameter extracting section 150 and creates an input continuous speech pattern or a standard speech pattern for registration.

１７０は切替え回路で、入力連続音声パターンと登録用
の標準音声パターンに応じた切替えを行う。170 is a switching circuit that performs switching according to the input continuous voice pattern and the standard voice pattern for registration.

音声レベル処理手段１１０において、１１１は標準音声
パターン辞書で、認識対象となる音声の標準パターンが
単語学位又は音節単位で登録されている。In the speech level processing means 110, reference numeral 111 is a standard speech pattern dictionary in which standard patterns of speech to be recognized are registered in units of words or syllables.

１１２は音声レベルＤＰ計算部で、区間検出部１６０よ
り入力された入力連続音声パターンより複数の部分区間
パターンを切り出し、各部分区間パターン毎に、標準音
声パターン辞書１１１に登録されている各標準音声パタ
ーンとの距離を求め、その距離の最小値からその部分区
間パターンと標準音声との距離を算出する。Reference numeral 112 denotes a voice level DP calculation unit that cuts out a plurality of partial interval patterns from the input continuous voice pattern inputted from the interval detection unit 160, and extracts a plurality of partial interval patterns from the input continuous voice pattern inputted from the interval detection unit 160, and extracts each standard voice registered in the standard voice pattern dictionary 111 for each partial interval pattern. The distance to the pattern is determined, and the distance between the subsection pattern and the standard voice is calculated from the minimum value of the distance.

文レベル処理手段１３０において、１３１１〜１３１ｐ
は、それぞれ漸化式計算部であり、１３２は、連続音声
判定部である。In the sentence level processing means 130, 1311 to 131p
are recurrence formula calculation units, and 132 is a continuous speech determination unit.

漸化式計算部１３１１は、入力連続音声バク−ンをｉ　
（ｆｌｉｔで区分して１桁の部分区間パターン系列を作
成する。その際、桁数ｉに対応して音声範囲設定手段１
２０によって設定された音声長の範囲内で区分する。こ
の区分によって形成されたｉ桁部分パターン系列に対応
するｉ桁音声系列の距離を、１桁部分パターン系列を形
成する各部分区間パターンと標準音声との距離の和より
求め、その和を最小にする区分によって形成されるｉ桁
部分区間パターン系列に対応するｉ桁音声系列を１桁音
声候補系列とし、前記和の最小値をその距離として出力
する。The recurrence formula calculation unit 1311 calculates the input continuous speech back
(Divide by flit to create a one-digit partial interval pattern series. At this time, the voice range setting means 1 corresponds to the number of digits i.
20 within the range of audio length set. The distance of the i-digit voice sequence corresponding to the i-digit partial pattern sequence formed by this division is calculated from the sum of the distances between each partial interval pattern forming the 1-digit partial pattern sequence and the standard voice, and the sum is minimized. The i-digit voice sequence corresponding to the i-digit subinterval pattern sequence formed by the segmentation is set as a 1-digit voice candidate sequence, and the minimum value of the sum is output as the distance.

連続音声判定部１３２は、各漸化式計算１３１１〜１３
１ｐより入力された１桁音声候補系列〜ｐ桁音声候補系
列からその距離の最も小さい音声候補系列、即ち最適音
声系列を認識連続音声と判定する。The continuous speech determination unit 132 performs each recurrence formula calculation 1311 to 13
The voice candidate sequence having the smallest distance from the 1-digit voice candidate series to the p-digit voice candidate series inputted from 1p, that is, the optimal voice sequence, is determined to be the recognized continuous voice.

音声長範囲設定手段１２０は、各漸化式計算部１３１１
〜１３１ｐに、その桁数即ち部分区間パターンの区分数
に対応して音声長の範囲をそれぞれ設定して入力する。The audio length range setting means 120 includes each recurrence formula calculation unit 1311
~131p, the voice length range is set and input in accordance with the number of digits, that is, the number of sections of the partial section pattern.

（Ｂ）実施例の動作実施例の動作を、連続音声認識時に行われる各動作に分
けて説明する。(B) Operation of the Embodiment The operation of the embodiment will be explained by dividing it into each operation performed during continuous speech recognition.

（Ｂ−１）登録動作話者の発声した連続音声に対する認識処理が行われる前
に、標準音声パターン辞書１１１には各音声の標準パタ
ーンが、単語単位又は音節単位で登録される。(B-1) Registration Operation Before recognition processing is performed on continuous speech uttered by a speaker, standard patterns for each speech are registered in the standard speech pattern dictionary 111 on a word-by-word or syllable-by-syllable basis.

標準音声パターン辞書１１１に標準音声パターンを登録
する場合は、切替え回路１７０を標準音声パターン辞書
１１１側に接続し、マイクロホン１４０より単語単位又
は音節単位で発声された音声をパラメタ抽出部１５０に
入力する。When registering a standard voice pattern in the standard voice pattern dictionary 111, the switching circuit 170 is connected to the standard voice pattern dictionary 111 side, and the voice uttered from the microphone 140 in units of words or syllables is input to the parameter extraction unit 150. .

パラメタ抽出部１５０は、入力された各音声の特徴を表
すパラメタを抽出し、区間検出部１５０は区間検出を行
って各音声の標準パターン（標準音声パターン）を作成
して、標準音声パターン辞書１１１に登録する。これら
パラメタ抽出部１５０及び区間検出部１６０の構成及び
動作は何れも公知であるので、それらについての詳細な
説明は省略する。The parameter extraction unit 150 extracts parameters representing the characteristics of each input voice, and the interval detection unit 150 performs interval detection to create a standard pattern (standard voice pattern) for each voice, and the standard voice pattern dictionary 111 Register. The configurations and operations of the parameter extracting section 150 and the section detecting section 160 are all well known, so a detailed explanation thereof will be omitted.

（Ｂ−２）入力連続音声パターン作成動作入力された連
続音声の認識を行う場合は、切替え回路１７０は、音声
レベルＤＰ計算部１１２側に接続される。(B-2) Input continuous speech pattern creation operation When recognizing input continuous speech, the switching circuit 170 is connected to the speech level DP calculating section 112 side.

マイクロホン１４０より連続音声が入力されると、前述
の標準音声パターンの登録の場合と同様にして、パラメ
タ抽出部１５０及び区間検出部１６０は入力連続音声パ
ターンを作成し、音声レベルＤＰ計算部１１２に入力す
る。又、区間検出部１６０は、検出した入力連続音声パ
ターンのフレーム数を音声範囲設定手段１２０に入力す
る。When continuous sound is input from the microphone 140, the parameter extraction unit 150 and section detection unit 160 create an input continuous sound pattern and send it to the sound level DP calculation unit 112 in the same way as in the case of registering the standard sound pattern described above. input. Furthermore, the section detection unit 160 inputs the detected frame number of the input continuous audio pattern to the audio range setting unit 120.

作成された入力連続音声パターン（Ｓで表す）は、前述
の人力連続単語音声パターンＡと同様に、特徴ベクトル
の時系列データで表現され、次の（６）式で表される。The created input continuous speech pattern (represented by S) is expressed by time-series data of feature vectors, similarly to the above-mentioned human-powered continuous word speech pattern A, and is expressed by the following equation (6).

５＝ｓ（１１，５ｆ２１．−”ｓ＋ｍ）−・・ｓ　　（
Ｕ）　−（６）ここで、ｓ　（ｍｌは、フレームｍにお
ける入力音声パターンＳのスペクトル特徴を示すパラメ
タをペルトル量で表したものである。Ｕは入力音声パタ
ーンＳの終端フレームで、入力音声パターンＳの音声長
（フレーム数）も表すものである。5=s(11,5f21.-"s+m)-...s (
U) - (6) Here, s (ml is a parameter indicating the spectral characteristics of the input speech pattern S in frame m, expressed as a Pertl quantity. U is the terminal frame of the input speech pattern S, It also represents the audio length (number of frames) of pattern S.

同様に、音声ｎの標準音声パターンＲｎも特徴ベクトル
の時系列データとして、次の（７）式で表現される。Similarly, the standard voice pattern Rn of voice n is also expressed as time series data of feature vectors by the following equation (7).

Ｒｎ＝ｒ（１１，ｒ（２１，・・・・・・ｒ（Ｊ）・−
・ｒ　　（Ｖｎ）−（７１ここで、ｒｏｌは、フレーム
ｊにおける標準音声パターンＲｎのスペクトル特徴を示
すパラメタをベクトル量で表したものである。Ｖｎは標
準音声パターンＲｎの終端フレームで、音声ｎの標準音
声パターンＲｎの音声長（フレーム数）も表すものであ
る。Rn=r(11, r(21,...r(J)・-
・r (Vn) - (71 Here, rol is a parameter indicating the spectral characteristics of the standard voice pattern Rn in frame j, expressed as a vector quantity.Vn is the terminal frame of the standard voice pattern Rn, and the voice n It also represents the audio length (number of frames) of the standard audio pattern Rn.

（Ｂ−３）音声長範囲の設定音声長範囲設定手段１２０は、区間検出部１６０より入
力音声パターンＳのフレーム数Ｕが入力されると、入力
連続音声パターンＳ中の１つの音声に許される最小のフ
レーム数７！ｍｉｎ及び最大のフレーム数１　ｍａｘに
基づいて、入力音声パターンＳに仮定される桁数を求め
る。仮定される桁数の最大値は、Ｕ／ｆｆｍ１ｎよりも
小さくて、それに最も近い整数値から求められ、仮定さ
れる桁数の最小値は、Ｕ／／ｍａｙよりも大きくて、そ
れに最も近い整数値から求められる。このｅ　ｍｉｎ及
びｌｌｍａＸば、予め与えておくことも出来るし、音声
レベルＤＰ処理におけるＤＰパス制限の結果、自動的に
決められる場合もある。(B-3) Setting the audio length range When the audio length range setting means 120 receives the number of frames U of the input audio pattern S from the section detection unit 160, the audio length range setting means 120 determines the number of frames allowed for one audio in the input continuous audio pattern S. Minimum number of frames is 7! The number of digits assumed for the input audio pattern S is determined based on min and the maximum number of frames (1 max). The maximum assumed number of digits is determined from the nearest integer value that is less than U/ffm1n, and the minimum assumed number of digits is determined from the nearest integer value that is greater than U//may. Determined from numerical values. These e min and llmaX can be given in advance, or may be automatically determined as a result of DP path restriction in audio level DP processing.

次に、仮定された桁数に対応して、各桁に当る部分区間
パターンにおける音声長の範囲を設定し、仮定された桁
数に対応する漸化式計算部に入力する。Next, corresponding to the assumed number of digits, a range of voice length in the partial interval pattern corresponding to each digit is set and inputted to the recurrence formula calculation unit corresponding to the assumed number of digits.

音声長の範囲は、例えば、次の（８）弐又は（９）式に
より設定される。The audio length range is set, for example, by the following equation (8) or (9).

ｆ　ｍｉｎ　（ｋ）＝　−Ｘ　（１ｃｒ）・・・（８）ｆ　ｍａｘ　（ｋｌ−−Ｘ　（１＋　α）ｆ　ｍｉｎ　
　（ｋｌ　＝−−ｘα ・・・（９）（Ｊｆ　ｍａｘ　（ｋｌ　＝　　−ｘ　ｔｘ−’にここで、ｋ：仮定された桁数で、最小値ｋ　ｍｉｎから最大値ｋ
ｍ８Ｋまでの値を取る。f min (k)=-X (1cr)...(8) f max (kl--X (1+ α)f min
(kl =--xα...(9) (J f max (kl = -x tx-') where k: the assumed number of digits, from the minimum value k min to the maximum value k
Takes values up to m8K.

ｆｍａに：音声長範囲の上限値ｆｍｉｒ＋：音声長範囲の下限値Ｕ：入力連続音声パターンのフレーム数α：定数で、通
常１〉α〉０の範囲の値を取る。fma: Upper limit value of audio length range fmir+: Lower limit value of audio length range U: Number of frames of input continuous audio pattern α: Constant, usually takes a value in the range 1>α>0.

一般にαを０に近づけると計算量は減少するが認識率は
低下する。然し、各音声の音声長が均一である場合はα
を小さくしても認識率の低下は少い。そこで、認識率や
計算量、更に入力連続音声のカテゴリ等を考慮して実験
的に決められる。Generally, as α approaches 0, the amount of calculation decreases, but the recognition rate decreases. However, if the length of each voice is uniform, α
Even if , the recognition rate decreases little. Therefore, it is determined experimentally by taking into account the recognition rate, the amount of calculation, and the category of continuous input speech.

もし、入力連続音声パターンの桁数が既知の場合は、前
述の仮定された桁数を算出する処理は必要でない。If the number of digits of the input continuous speech pattern is known, the process of calculating the assumed number of digits described above is not necessary.

なお、この様にして求められたｆ　ｍｉｎ及びｆ　ｍａ
ｘの使用法は、後記の漸化式００）において説明する（
Ｂ−４）音声レベルＤＰ処理音声レベルＤＰ計算部１１２は、区間検出部１６０より
入力された入力連続音声パターンＳより複数の部分区間
パターンを切り出し、各部分区間パターンがそれぞれ１
個の音声パターンに対応すると仮定して標準音声パター
ン辞書１１１に登録されている各標準音声パターンとＤ
Ｐ照合を行って、それぞれの距離を算出する。Furthermore, f min and f ma obtained in this way
The usage of x will be explained in the recurrence formula 00 below (
B-4) Audio level DP processing The audio level DP calculation unit 112 cuts out a plurality of partial interval patterns from the input continuous audio pattern S input from the interval detection unit 160, and each partial interval pattern is
Each standard voice pattern registered in the standard voice pattern dictionary 111 and D
Perform P matching and calculate respective distances.

先に説明した従来方式と同様に、１つの部分区間（７！
、ｍ）の部分区間パターンと各標準音声パターン間の距
離を求め、その距離が最小である最適の音声を認識音声
Ｎ５（Ａ、ｍ）とし、この音声Ｎ５（ｎ、ｍ）との距１
ｉＩｔ！Ｄｓ（７！、ｍ）を、その部分区間パターンと
音声との距離とする。Similar to the conventional method described above, one partial interval (7!
, m) and each standard speech pattern, and the optimal speech with the minimum distance is defined as the recognized speech N5(A, m), and the distance from this speech N5(n, m) is 1.
iIt! Let Ds(7!, m) be the distance between the subsection pattern and the voice.

入力連続音声パターンＳ内におけるあらゆる部分区間パ
ターン、即ちＯ≦ｌ＜ｍ≦Ｕなる条件の下で、ｌ及びｍ
をあらゆる組合せて変化させて形成される各部分区間パ
ターンについて、部分区間パターンと音声との距離及び
対応する最適の音声とをそれぞれ求め、図示しないテー
ブルに格納する。Any subinterval pattern in the input continuous speech pattern S, that is, under the condition O≦l<m≦U, l and m
For each subsection pattern formed by changing all combinations of , the distance between the subsection pattern and the voice and the corresponding optimal voice are determined and stored in a table (not shown).

（Ｂ−５）文レベルＤＰ処理における漸化式の計算漸化
式計算部１３１１は、入力連続音声パターンＳを１個で
区分して１桁の部分区間パターン系列を作成し、このｉ
桁部分区間パターン系列に対応するｉ桁音声候補系列を
求める。(B-5) Calculation of recurrence formula in sentence level DP processing The recurrence formula calculation unit 1311 divides the input continuous speech pattern S into one piece to create a one-digit subinterval pattern series, and
An i-digit voice candidate sequence corresponding to the digit subsection pattern sequence is determined.

いま、ｉ桁音声候補系列の距離をＴｊ（Ｕ）とする、こ
のＴｉ（Ｕ）は、次の漸化式００）をＤＰ法により解く
ことにより求められる。Now, assuming that the distance of the i-digit voice candidate sequence is Tj (U), this Ti (U) can be found by solving the following recurrence formula 00) using the DP method.

この漸化式００）は、前掲の従来方式における漸化式（
３）′　に対応するものである。即ち、Ｔ　ｉ　ｘ　ｆ
ｍｌは、入力連続音声パターンＳを１個の部分区間パタ
ーンに区分したとき、その始端からフレームｍの部分区
間にＸ個の音声を仮定した場合の、その音声系列の最小
距離である。This recurrence formula 00) is the recurrence formula (00) in the conventional method mentioned above.
3) It corresponds to '. That is, T i x f
ml is the minimum distance of a voice sequence when the input continuous voice pattern S is divided into one subsection pattern, assuming that there are X voices in the subsection of frame m from the starting end of the pattern.

この漸化式（１０）を、ＤＰ法によりｍについては１か
らＵ（入力連続音声パターンＳフレーム数）まで計算し
、Ｘについては１からｉ　（桁数）まで計算することに
より、Ｔｉ（Ｕ）が求められる。By calculating this recurrence formula (10) using the DP method for m from 1 to U (the number of input continuous speech pattern S frames) and for X from 1 to i (number of digits), Ti (U ) is required.

各漸化式計算部１３１１〜１３１ｐば、それぞれの桁数
に応じて漸化式ＯＩを解いて、各桁毎の音声候補系列の
距離を算出し、その音声候補系列と共に連続音声判定部
１３２に入力する。なお、漸化式００）の桁数ｉは、各
漸化式計算部の桁数に対応して変化する。Each recurrence formula calculation unit 1311 to 131p solves the recurrence formula OI according to the number of digits, calculates the distance of the voice candidate series for each digit, and sends the voice candidate series together with the continuous voice determination unit 132. input. Note that the number of digits i of the recurrence formula 00) changes depending on the number of digits of each recurrence formula calculation unit.

第３図は、入力連続音声パターンＳのフレーム数Ｕ＝１
８０．７！ｍａｙ　＝　２００．６ｍ１ｎ＝５０、α−
０，２の場合の文レベルＤＰ処理方式の例を示したもの
で、同図（ａ）は桁数が１の場合、同図ｆｂｌは桁数が
２の場合、同図（ｃ＋は桁数が３の場合の例である。Figure 3 shows the number of frames U = 1 of the input continuous audio pattern S.
80.7! may = 200.6m1n = 50, α-
This figure shows an example of the sentence level DP processing method in the case of 0, 2. In the figure (a), when the number of digits is 1, fbl in the same figure shows the case where the number of digits is 2, and in the figure (c+ is the number of digits). This is an example when is 3.

／ｍａｙ＝２００、＃ｍ１ｎ＝５０であるので、入力連
続音声パターンＳの桁数は、１，２．３の３通りが仮定
される。Since /may=200 and #m1n=50, the number of digits of the input continuous speech pattern S is assumed to be 1, 2.3, or 3.

又、ｆ　ｍｉｎ及びｆ　ｍａＸは（８）式により求める
ことにし、■桁、２桁及び３桁の場合のｆ　ｍｉｎ及び
ｆｍａｘをｆ　ｍ１ｎｔ　、　ｆ　ｍａｘ＋　ｉ　ｆ　
ｍ１ｎ２．　ｆ　ｍａＸ２及びｆ　ｍ１ｎａ　　、　　
ｆ　ｍａｘ３　とすると、（８）式より次の様に算出さ
れる。In addition, f min and f max are determined by formula (8), and f min and f max in the case of ■ digit, 2 digit, and 3 digit are f m1nt , f max + i f
m1n2. f maX2 and f m1na ,
When f max3 is assumed, it is calculated as follows from equation (8).

　Ｑｆ　　ｍａｘｌ　＝　　２１６　　　ｆ　　ｍａｘ２　
＝１０８　　　　ｆ　　ｍａｘ３　＝７２ｆ　　ｍｉｎ
＋　　＝　　１４４　　　ｆ　　ｍ１ｎ２　＝　　７２
　　　　ｆ　　ｍ１ｎ３　＝４８第８図ｆａ）では桁数
が１であるので、ｍ＝Ｕ　（−１，８０）、＃＝Ｏとな
って、漸化式α０）のＣＴＣ（ｍｌ＝Ｄ　（０、ｍ））
ｍ＝Ｕが適用される。Q f maxl = 216 f max2
=108 f max3 =72 f min
+ = 144 f m1n2 = 72
f m1n3 = 48 In Figure 8 fa), the number of digits is 1, so m = U (-1, 80), # = O, and CTC (ml = D (0, m ))
m=U applies.

従って、βの探索範囲は存在しない。Therefore, there is no search range for β.

第３図（ｂ）では、桁数が２であるのでｍ＝Ｕ　（＝１
８０）、１桁目のｌの最大探索範囲１１ｍａＸ−ｍ　−
ｆ　ｍ１ｎ２は１０８　（＝１８０−７２）である。In Figure 3(b), the number of digits is 2, so m=U (=1
80), maximum search range of 1st digit l 11maX-m −
f m1n2 is 108 (=180-72).

又、１桁目のβの最小探索範囲Ｌｍｉｎ＝ｍ−ｆ１１１
ａＸ２は、７２　（＝１８０−１０８）である。Also, the minimum search range Lmin of the first digit β is Lmin=m−f111
aX2 is 72 (=180-108).

第３図（Ｃ）では桁数が３であるので、ｍ＝Ｕ　（＝１
８０）、２桁目のβの最大探索範囲１２ｒｎａｘ＝ｍ−
ｆ　ｍ１ｎ３は１３２　（−１８（１４Ｂ）であり、２
桁目のｐの、最小探索範囲１２ｍ１ｎ　＝ｍ　−ｆ　ｍ
ａｘ３は１０８　（１８０−７２）である。又、１桁目
のｌの最大探索範囲／ＩｍａＸは、入力連続音声パター
ンの始端から　ｆ　ｔｎａｘ３の範囲であるから７２で
あり、１桁目のｌの最小探索範囲Ａ＋ｍｉｎは、始乙０端から　ｆ　ｍ１ｎ３の範囲であるから４８である。In Figure 3(C), the number of digits is 3, so m=U (=1
80), maximum search range of second digit β 12rnax=m-
f m1n3 is 132 (-18(14B), 2
Minimum search range of digit p 12m1n = m - f m
ax3 is 108 (180-72). Also, the maximum search range/ImaX of the first digit l is 72 since it is the range f tnax3 from the starting end of the input continuous speech pattern, and the minimum search range A+min of the first digit l is from the beginning end 0. Since it is in the range of f m1n3, it is 48.

以上の様にして求められた第３図と同じ条件である従来
の第５図とを対比すると、桁数が２桁の場合、本発明の
方式ではｌの探索範囲の大きさは３６　（＝１０８−７
２）であるのに対し、従来方式では８０　（１３０−５
０）である。又、３桁の場合は、本発明の方式は２４　
　（＝１３２−１０８＝７２−４８）であるのに対し、
従来方式では３０　（−１３０−１００＝８０−５０）
である。Comparing Figure 3 obtained in the above manner with the conventional Figure 5 under the same conditions, when the number of digits is 2, the size of the search range of l in the method of the present invention is 36 (= 108-7
2), whereas in the conventional method it is 80 (130-5
0). In addition, in the case of 3 digits, the method of the present invention is 24
(=132-108=72-48), whereas
In the conventional method, it is 30 (-130-100=80-50)
It is.

これにより、本発明では従来方式にみられる不自然な区
分の仕方、即ち不自然なりＰパスが発生せず、認識率を
向上させると共に、計算量を低減させることが出来る。As a result, in the present invention, the unnatural classification method seen in the conventional method, that is, the unnatural P path does not occur, and the recognition rate can be improved and the amount of calculation can be reduced.

（Ｂ−６）連続音声の認識連続音声判定部１３２は、各漸化式計算部１３１１〜１
３１ｐより入力された各桁の音声候補列及びそれらの距
１１ｔＴ＋　　（Ｕ）〜’ｒｐ　　（Ｕ）から、その距
離の最も小さい音声候補系列、即ち最適音声系列を検出
して認識連続音声とする。(B-6) Recognition of Continuous Speech The continuous speech determination unit 132 performs each recurrence formula calculation unit 1311 to 1.
From the voice candidate sequence of each digit input from 31p and their distances 11tT+ (U) to 'rp (U), the voice candidate sequence with the smallest distance, that is, the optimal voice sequence is detected and used as a recognized continuous voice.

この連続音声判定部１３２の処理は、従来方式の前掲（
４）式に当る次の（１１）式で示す処理を行っているも
のである。The process of this continuous speech determination unit 132 is similar to the above-mentioned process of the conventional method (
The processing shown in the following equation (11) corresponding to equation 4) is performed.

最適音声系列の桁数に＝　ａｒｇ　　ｍｉｎ　Ｔ　ｉ　
　（Ｕ）　−（１１）漸化式計算部１３１１〜１３１ｐ
により漸化式００）を並列計算することにより、（１１
）式が１度で解かれるので、最適音声系列の検出即ち連
続単語の認識を速やかに行うことが出来る。The number of digits of the optimal speech sequence = arg min T i
(U) - (11) Recurrence formula calculation unit 1311 to 131p
By parallel calculation of recurrence formula 00), (11
) can be solved in one go, so the optimal speech sequence can be detected quickly, that is, continuous words can be recognized quickly.

以」二、本発明の一実施例について説明したが、本発明
の各構成は、この実施例の各構成に限定されるものでは
ない。例えば、漸化式計算部を１個のもので構成し、漸
化式００）を各桁毎に直列方式で解く様にしてもよい。Hereinafter, one embodiment of the present invention has been described, but each structure of the present invention is not limited to each structure of this embodiment. For example, the recurrence formula calculation unit may be configured with one unit, and the recurrence formula 00) may be solved for each digit in a serial manner.

又、連続音声が連続単語音声の他、連続音節音声を含む
ものであることは、既に述べた通りである。Furthermore, as already mentioned, continuous speech includes not only continuous word speech but also continuous syllable speech.

本発明によれば入力連続音声の認識率を向上させること
が出来るが、特に、登録標準音声パターンの音声長の幅
が少ない場合に効果が大きい。例えば、１０進数を登録
標準音声パターンとして連続数字を認識する場合、単音
節音声を登録標準音声パターンとして連続音節音声を認
識する場合等は、登録標準音声パターンの音声長の幅が
少く、その時間長が比較的均一であるので、認識率を向
上させる効果が大きい。According to the present invention, it is possible to improve the recognition rate of continuous input speech, and the effect is particularly great when the registered standard speech pattern has a small speech length range. For example, when recognizing continuous digits using a decimal number as a registered standard speech pattern, or when recognizing continuous syllabic speech using a monosyllabic speech as a registered standard speech pattern, the range of speech length of the registered standard speech pattern is small, and the time Since the length is relatively uniform, the effect of improving the recognition rate is large.

〔Effect of the invention〕

以上説明した様に、本発明によれば、次の諸効果が得ら
れる。As explained above, according to the present invention, the following effects can be obtained.

（イ）入力連続音声パターンに対する不自然な区分の仕
方が発生ずるのが阻止され、連続音声の認識率を向上さ
せることが出来る。特に、音声長の幅が少くその時間長
が均一である場合に効果が大きい。(a) Unnatural segmentation of input continuous speech patterns is prevented from occurring, and the recognition rate of continuous speech can be improved. This is particularly effective when the range of voice lengths is small and the time lengths are uniform.

（ロ）入力連続音声パターンを文レベル処理において部
分区間パターンに区分するとき、その区分数に対応して
部分区間パターンの音声長の範囲を設定する様にしたの
で、処理量を低減させることが出来る。(b) When dividing an input continuous speech pattern into sub-segment patterns in sentence-level processing, the range of audio length of the sub-segment pattern is set corresponding to the number of segments, which reduces the amount of processing. I can do it.

、３１, 31

[Brief explanation of drawings]

第１図・・・本発明の基本構成の説明図、第２図・・・
本発明の一実施例の構成の説明図、第３図・・・同実施
例の文レベルＤＰ処理方式の説明図、第４図・・・従来の連続単語音声認識方式の説明図、第
５図・・・従来の連続単語音声認識方式における文レベ
ルＤＰ処理方式の説明図。第１図及び第２図において、１１０・・・音声レヘル処理手段、１２０・・・音声長
範囲設定手段、１３０・・・文レベル処理手段、Ｉ４０
・・・マイクロホン、１５０・・・パラメタ抽出部、１
６０・・・区間検出部、１７０・・・切替え回路。Fig. 1...Explanatory diagram of the basic configuration of the present invention, Fig. 2...
FIG. 3 is an explanatory diagram of the configuration of an embodiment of the present invention. FIG. 4 is an explanatory diagram of the sentence level DP processing method of the same embodiment. FIG. Figure: An explanatory diagram of a sentence level DP processing method in a conventional continuous word speech recognition method. 1 and 2, 110... Voice level processing means, 120... Voice length range setting means, 130... Sentence level processing means, I40
... Microphone, 150 ... Parameter extraction section, 1
60... Section detection unit, 170... Switching circuit.

Claims

[Claims]

(1) In a continuous speech recognition device that recognizes speech such as continuously uttered words and syllables, (a) A plurality of partial interval patterns are cut out from an input continuous speech pattern created from input continuous speech, and each partial interval is (b) audio level processing means (110) that calculates the distance from each pre-registered standard audio pattern for each pattern, and calculates the distance between the partial interval pattern and the standard audio from the minimum value of those distances; When an input continuous speech pattern is divided into a plurality of partial section patterns, a voice length range setting means (1
20) and (c) When dividing the input continuous speech pattern into a plurality of sub-segment patterns, the segmentation is performed within the range of the audio length set by the audio length range setting means 120 corresponding to the number of segments, and The distance of the speech sequence corresponding to each subinterval pattern series formed by the segmentation is calculated from the sum of the distances between each subinterval pattern and the standard speech, and the optimal speech sequence corresponding to the segment that minimizes the sum is recognized continuously. A continuous speech recognition device comprising: sentence level processing means (130) for converting speech into speech.

(2) The continuous speech recognition device according to claim 1, wherein the input continuous speech pattern is an input continuous word speech pattern created from input continuous word speech.

(3) The continuous speech recognition device according to claim 1, wherein the input continuous speech pattern is an input continuous syllable speech pattern created from input continuous syllable speech.