JPS619696A

JPS619696A - Voice recognition equipment

Info

Publication number: JPS619696A
Application number: JP59130714A
Authority: JP
Inventors: 曜一郎佐古; 平岩　篤信; 誠赤羽; 雅男渡
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1984-06-25
Filing date: 1984-06-25
Publication date: 1986-01-17
Anticipated expiration: 2009-08-31
Also published as: JPH0668678B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は音声を認識する音声認ｔ（ル装置に関する。[Detailed description of the invention] Industrial applications The present invention relates to a voice recognition device for recognizing voice.

背景技術とその問題点従来、音声の発声速度変動に対処した音゛戸１認６１＆
装置として例えば特開昭５０−９６１０４号公報に示さ
れるようなりＰマツチング処理を行なうようにしたもの
が提案されている。BACKGROUND TECHNOLOGY AND THEIR PROBLEMS Conventionally, there has been a sound system that deals with variations in speech rate.
For example, an apparatus has been proposed that performs P matching processing, as disclosed in Japanese Patent Laid-Open No. 50-96104.

先ず、このＤＰマツチング処理にて音声認識を行なうよ
うにした音声認識装置について説明する。First, a speech recognition device that performs speech recognition using this DP matching process will be described.

第１図においで、（１）は音声信号入力部としてのマイ
クロボンをボし、このマイクロホン（１）からの音声信
号が音響分析部（２）に供給され、この音響分析ｆｆ１
ｉ　（２１にて音響パラメータ時系列Ｐｉ（ｎｌが得ら
れる。In FIG. 1, (1) is a microphone as an audio signal input section, and the audio signal from this microphone (1) is supplied to an acoustic analysis section (2), and the acoustic analysis section ff1
i (21, the acoustic parameter time series Pi(nl) is obtained.

この音響分析部（２）において例えばバントパスフィル
タバンクの整流平滑化出力が音響パラメータ時系列Ｐｉ
（ｎｌ　（ｉ　＝　１．・・・・＋Ｉ；　Ｉばバンドパ
スフィルタバンクのチャンネル数、ｎ−１，・・・・、
ＮＵＮは音ＰＲ区間判定により切り出されたフレーム数
である。）として得られる。In this acoustic analysis section (2), for example, the rectified and smoothed output of the bandpass filter bank is calculated as the acoustic parameter time series Pi.
(nl (i = 1....+I; I is the number of channels of the bandpass filter bank, n-1,...,
NUN is the number of frames extracted by sound PR section determination. ) is obtained as

この音響分析部（２）の音響パラメータ時系列Ｐｉｆｎ
）がモー　Ｆ切換スイッチ（３）により、登録モードに
おいては認識対象語毎に標準バクーンメモ１月４）に格
納され、認識モードにおいてはＤＰマツチング距離計算
部（５）の一端に供給される。又、この認識モードにお
いては標準パターンメモリ（４）に格納されている標準
パターンがＤＰマツチング距離計算部（５）の他端に供
給される。The acoustic parameter time series Pifn of this acoustic analysis section (2)
) is stored in the standard Bakun memo 1/4) for each recognition target word in the registration mode by the mode F changeover switch (3), and is supplied to one end of the DP matching distance calculation unit (5) in the recognition mode. Further, in this recognition mode, the standard pattern stored in the standard pattern memory (4) is supplied to the other end of the DP matching distance calculation section (5).

このＤＰマツチング距距離針部部５）にてその時人力さ
れている音声の音響パラメータ時系列Ｐｉ（ｎｌよりな
る入力パターンと標準パターンメモ１月４）の標準パタ
ーンとのＤＰＰマツチング離計算処理がなされ、このＤ
ＰＰマツチング離計算部（５）のＤＰＰマツチング離を
示す距離信号が最小距離判定部（６）に供給され、この
最小距離判定部（６）にて入力パターンに対してＤＰＰ
マツチング離が最小となる標準パターンが判定され、こ
の判定結果より入力音声を示す認識結果が出力端子（７
）に得られる。This DP matching distance needle section 5) performs a DPP matching distance calculation process between the input pattern consisting of the acoustic parameter time series Pi (nl) of the human input voice at that time and the standard pattern of the standard pattern memo January 4. , this D
A distance signal indicating the DPP matching distance from the PP matching distance calculating section (5) is supplied to the minimum distance determining section (6), and the minimum distance determining section (6) calculates the DPP matching distance for the input pattern.
The standard pattern with the minimum matching distance is determined, and from this determination result, the recognition result indicating the input voice is output to the output terminal (7
) can be obtained.

ところで、一般に標準パターンメモ１月４）に格納され
る標準パターンのフレーム数Ｎは発声速度変動や単語長
の違いにより異なっている。Ｄ　Ｐマツチング処理によ
りこの発声速度変動やｌｉ語長の違いに対処する為の時
間軸正規化がなされる。Incidentally, the number N of frames of the standard pattern stored in the standard pattern memo (January 4) generally varies depending on variations in speaking speed and differences in word length. The D P matching process performs time axis normalization to deal with variations in speaking speed and differences in word length.

以下、このＤＰＰマツチング理について説明する。ここ
で、簡単の為に音響パラメータ時系列Ｐ　ｉ　（ｎｌの
周波数軸方向ｉに対応する次元を省略して標準パターン
のパラメータ時系列をｂｌ、　・・・・。This DPP matching process will be explained below. Here, for simplicity, the dimension corresponding to the frequency axis direction i of the acoustic parameter time series P i (nl is omitted, and the parameter time series of the standard pattern is expressed as bl, . . . ).

ｂＮ・入力パターンのパラメータ時系列をａ□。bN・The parameter time series of the input pattern is a□.

・・・・＋８Ｍとして、端点固定のＤＰ−バスの場合の
ＤＰＰマツチング理について説明する。. . +8M, the DPP matching process in the case of a DP-bus with fixed end points will be explained.

第２図はＤＰＰマツチング理の概念図を承し、横軸に入
力パラメータ（Ｍ＝１９）が並べられ、縦軸に標準パラ
メータ（Ｎ　＝　１２）が並べられ、この第２図に示す
（Ｍ、Ｎ）格子状平面に於ける・点はＭＸＮ個であり、
この各・点に１つの距離が対応する。例えばａ３とｂ５
との距離がａ３がら縦に伸した直線と、ｂ５から横に伸
した直線との交点に位置する・に対応する。この場合、
距離として例えばチェビシェフ距離を取れば、ａ３とｂ
５とのチェビシェフ距離ｄ　（３，５）はとなる（この
場合、周波数軸方向ｉに対応する次元を省略しているの
でＩ＝１である。）。そして、端点固定のＤＰ−バスと
して、格子点（ｍ、ｎ）に対してこの格子点（ｍ、ｎ）
に結びつける前の状態として左側の格子点（ｍｌ　、ｎ
　）　、斜め左−ド側の格子点（ｍ−１，ｎ−１）及び
ト側の格子点（ｍ、ｎ−１）の３つ２１だけを許した場
合、始点、即ちａｌとｂｌとのチェビシェフ距１ｔｌｌ
ｔ　ｌ）　１１を示す点０から出発し、バス（経路）と
して３方向２Ｉを選び、終点、即ちａＭとす、とのチェ
ビシェフ距離ｄ　（Ｍ、Ｎ）を示す点◎に至るバスで、
通過する各格子点の距離の総和が最小になるものを求め
、この距離の総和を入力パラメータ数Ｍと標準パラメー
タ数Ｎとの和より値１を減算した（Ｍ＋Ｎ−１）にて除
算して得られた結果が入力パターンのパラメータ時系列
ａｌ＋　　・・・・＋８Ｍと標準パター　ンのパラメー
タ時系列ｂｘ、　　・・・・ｒｂＮとのＤＰＰマツチン
グ離となされる。この様な処理を示す初期条件及び漸化
式は初期条件ｇ　（１，，１）　−ｄ　（１，１）漸化式と表され、これよりＤＰマツチング距１１ｔＤ　（Ａ、
　　Ｂ）はＤ　　（Ａ、、Ｂ）＝ｇ　　（Ｍ、Ｎ）／　（Ｍ＋Ｎ−
１）と表される（（Ｍ＋Ｎ−１）でｇ　（Ｍ、Ｎ）を割
っているのは標準パターンのフレーム数Ｎの違いによる
距離の値の差を補正するためである。）。Figure 2 shows a conceptual diagram of the DPP matching process, with input parameters (M = 19) arranged on the horizontal axis and standard parameters (N = 12) arranged on the vertical axis. , N) There are MXN points in the grid plane,
One distance corresponds to each point. For example a3 and b5
The distance corresponds to the intersection of the straight line extending vertically from a3 and the straight line extending horizontally from b5. in this case,
For example, if we take the Chebyshev distance as the distance, a3 and b
The Chebyshev distance d (3, 5) with respect to 5 is (in this case, I=1 because the dimension corresponding to the frequency axis direction i is omitted). Then, as a DP-bus with fixed end points, this grid point (m, n) is
The left grid point (ml, n
), if only three lattice points (m-1, n-1) on the diagonal left side (m-1, n-1) and grid points (m, n-1) on the g-side are allowed, then the starting point, that is, the connection between al and bl. Chebyshev distance 1tll
t l) Starting from point 0, which shows 11, select 3 directions 2I as the bus (route), and take a bus to point ◎, which shows the Chebyshev distance d (M, N) from the end point, that is, aM,
Find the one that minimizes the sum of the distances of each passing grid point, and divide this sum of distances by (M+N-1), which is the sum of the number of input parameters M and the number of standard parameters N, minus the value 1. The obtained result is subjected to DPP matching between the parameter time series al+...+8M of the input pattern and the parameter time series bx,...rbN of the standard pattern. The initial condition and recurrence formula showing such processing are expressed as the initial condition g (1,,1) -d (1,1) recurrence formula, and from this, the DP matching distance 11tD (A,
B) is D (A,,B)=g (M,N)/(M+N-
1) (The reason why g (M, N) is divided by (M+N-1) is to correct the difference in distance value due to the difference in the number of frames N of the standard pattern.)

この様な処理により標準パターンの数が５個ある場合に
は入力パターンに対するＤＰＰマツチング離が５個求め
られ、この５個のＤＰマツチング距離中最小の距離とな
る標準パターンが認識結果となされる。Through such processing, when there are five standard patterns, five DPP matching distances for the input pattern are obtained, and the standard pattern with the minimum distance among these five DP matching distances is determined as the recognition result.

この様なりＰマツチング処理による音声認識装置によれ
ば発声速度変動や単語長の違いに対処、即ち時間軸正規
化のなされた音声認識を行なうことができる。As described above, the speech recognition device using P matching processing can deal with variations in speaking speed and differences in word length, that is, perform speech recognition with time axis normalization.

然し乍ら、この様なりＰマツチング処理により音声認識
を行なうものにおいては、音声の定常部がＤＰＰマツチ
ング離に大きく反映し、部分的に類似しているような語
い間に於いて誤認識し易いということが明らかとなった
。However, when speech recognition is performed using P matching processing like this, the stationary parts of the speech are largely reflected in the DPP matching distance, and it is easy to misrecognize between words that are partially similar. It became clear that

即ち、音響パラメータ時系列Ｐｉｆｎｊはそのパラメー
タ空間で軌跡を描くと考えることができる。実際には各
フレームｎのパラメータがパラメータ空間内の１点に対
応す名ことから、点列ではあるが時系列方向に曲線で結
んでいくと始点から終点迄の１つの軌跡が考えられる。That is, the acoustic parameter time series Pifnj can be considered to draw a trajectory in the parameter space. Actually, since the parameters of each frame n correspond to one point in the parameter space, if the points are connected by curves in the time series direction, one locus from the starting point to the ending point can be considered.

例えば２種類の単語“”ＳＡＮ”と“ＨＡＩ”とを登録
した場合、夫々の標準パターンＡ’、Ｂ’は第３図に示
す如く“Ｓ”、”Ａ″、”Ｎ″、”Ｈ”、　　“Ａ″、
′■”の各音韻領域を通過する軌跡を描く。そして、認
識モードで“”ＳＡＮ”と発声した場合、全体的にみれ
ば入力パターンＡに対する標準パターンＢ′の類似する
部分は非密に少ないが、この入力パターンＡの”ＳＡＮ
”の”Ａ”の部分が標準パターンＡ′の“ＳＡＮ”の“
Ａ”の部分より標準パターンＢ′の“ＨＡＩ″の”Ａ”
の部分により類似し、且つその部分（準定常部）に点数
が多い場合がある。For example, when two types of words ""SAN" and "HAI" are registered, the respective standard patterns A' and B' are "S", "A", "N", and "H" as shown in Figure 3. , "A",
Draw a trajectory that passes through each phonetic area of ``■''. Then, when ``SAN'' is uttered in the recognition mode, overall, there are very few similarities between standard pattern B' and input pattern A. However, the “SAN” of this input pattern A is
The “A” part of “” is the “SAN” of standard pattern A’.
"A" of "HAI" of standard pattern B' from part "A"
, and there are cases where there are many points in that part (quasi-stationary part).

ここで、第３図に示す如く入力パターンＡのパラメータ
が全体的には標準パターンＡ′のパラメータに類似し、
部分的には標準パターンＢ′のバラメークに類似する場
合にＤＰマツチング処理により誤認識を招く場合を１次
元パラメータを例に説明する。この場合、第３図に示す
状況、即ち部分的に類似し°ζいる語い間の関係と同様
の１次元パラメータ時系列として第４図に示す如き入力
パターンＡ　；　２．４．６．８．８．８．８．６．４
．４．４．６゜８と１．第５図に示す如き標準パターン
Ａ′；３，５゜７、９．９．９．９．７．５．５．７．
９と、第６図に示す如き標準パターンＢ’　　、　７．
６．６．８．８．８．８．６゜４、４．４とを考える。Here, as shown in FIG. 3, the parameters of the input pattern A are generally similar to the parameters of the standard pattern A',
A case will be described using a one-dimensional parameter as an example, where DP matching processing causes erroneous recognition when the pattern is partially similar to the variation of standard pattern B'. In this case, the situation shown in FIG. 3, that is, the input pattern A as shown in FIG. 4 as a one-dimensional parameter time series similar to the relationship between partially similar words; 2.4.6.8 .8.8.8.6.4
．． 4.4.6°8 and 1. Standard pattern A' as shown in FIG. 5; 3,5°7, 9.9.9.9.7.5.5.7.
9, and a standard pattern B' as shown in FIG. 6, 7.
6.6.8.8.8.8.6°4, 4.4.

これら第４図乃至第６図のパターンより明らかな如く入
力パターンＡは標準パターンＡ′と判定されて欲しいパ
ターンである。As is clear from the patterns shown in FIGS. 4 to 6, input pattern A is a pattern that is desired to be determined as standard pattern A'.

ところが、入力パターンＡに対する標準パターンＡ′及
びＢ′のＤＰマツチング距離を計算すると、入力パター
ンＡは標準パターンＢ′に近いことが示される。However, when calculating the DP matching distance of standard patterns A' and B' with respect to input pattern A, it is shown that input pattern A is close to standard pattern B'.

即ち、入力パターンＡに対する標準パターンＡ′のＤＰ
マツチング処理として第２図と同様、第７図に示す如く
横軸に入力パターンＡのパラメータ時系列ｉ　２．４．
６．８．８．８．８．６．４．４．４．６゜８を並べ、
縦軸に標準パターンＡ′のパラメータ時系列、　３．５
．７．９．９．９．９．７．５．５．７．９を並べ、格
子状平面に於ける交点に対応して入力パターンＡの個々
のパラメータに対する標準パターンＡ′の個々のパラメ
ータのチェビシェフ距離を求める。そして、人力パラメ
ータＡのパラメータ時系列の第１番目のパラメータ２と
、標準パラメータＡ′のパラメータ時系列の第１番目の
バラメーク３とのチェビシェフ距１ｉ１ｔｄ’（１，１
）　−１（７）点を始点とし、入力パターンへのパラメ
ータ時系列の第１３番目のパラメータ８と、標準パター
ンＡ′のパラメータ時系列の第１２番目のパラメータ９
とのチェビシェフ距１１１ｄ　（１３，’１２）　−１
の点を終点とし、ＤＰ−パスとして第２図の場合と同様
、仕怠の点に対する前の状態としてその任意の点の左側
の点、下側の点及び斜め左＋側の点を取ることを許した
場合（このパスを実線矢印にて示す。）、パス上の点は
ｄ　（１，１）−ｄ　（２，２）−ｄ（３，３）−ｄ　
（４，４）　−ｄ　（，５，５）−ｄ（６，６）　　−
ｄ　　（７，７）−ｄ　　（８，８）　　−ｄ（９，９
＞−ｄ　　（１０，１０）　　−ｄ　　（ＩＬ　　１０
）−ｄ（１２，１０）　　−ｃｌ　　（１３，１１）　
　−ｄ　　（１３，１２）の１４点であり、その距離の
総和は２４であり、このＤＰマツチング距離Ｄ　（Ａ、
Ａ’）は１である。That is, the DP of standard pattern A' for input pattern A
Similar to FIG. 2, as for matching processing, as shown in FIG. 7, the horizontal axis represents the parameter time series i of input pattern A. 2.4.
6.8.8.8.8.6.4.4.4.6゜8 arranged,
The vertical axis is the parameter time series of standard pattern A', 3.5
．． 7.9.9.9.9.7.5.5.7.9 are arranged, and each parameter of standard pattern A' is calculated for each parameter of input pattern A corresponding to the intersection in the grid plane. Find the Chebyshev distance. Then, the Chebyshev distance 1i1td'(1,1
) -1(7) point as the starting point, the 13th parameter 8 of the parameter time series to the input pattern, and the 12th parameter 9 of the parameter time series of the standard pattern A'
Chebyshev distance 111d (13,'12) -1
Set the point as the end point, and as in the case of Figure 2 as the DP-path, take the point to the left of that arbitrary point, the point below it, and the point on the diagonal left + side as the previous state for the point of negligence. (This path is indicated by a solid arrow), the points on the path are d (1, 1) - d (2, 2) - d (3, 3) - d
(4,4) −d (,5,5) −d(6,6) −
d (7, 7) - d (8, 8) - d (9, 9
>-d (10,10) -d (IL 10
)-d(12,10)-cl(13,11)
-d (13, 12), the total distance is 24, and this DP matching distance D (A,
A') is 1.

一方、入力パターンＡに対する標準パターンＢ′のＤＰ
マツチング処理を上述第７図に承ず場合と同様、第８図
に示す如く行なう。即ち、入力パターンＡの（ＩＭ々の
パラメータ；　２．４．６．８．８．８゜８、６．４．
４．４．６．８に対する標準パターンＢ′のイ固（固の
パラメータｉ　７．６．６．８．８．８．　’８．６゜
４、４．４のチェビシェフ距離を求め、ＤＰ−パスとし
°ζ任意の点に対する前の状態としてその任意の点の左
側の点、下側の点及び斜め左下側の点を取ることを許し
た場合（このパスを実線矢印にて示す。）、パス上の点
はｄ　（１，１）−ｄ　（２，２）−ｄ　（３，３，）
　−ｄ　（４，４）　−ｄ　（５，５）　−ｄ　（６，
６）−ｄ　　（７，７）−ｄ　　（８，８）−ｄ（９，
９）　−ｄ　（１０，１０）　−ｄ　（１１，１１）　
−ｄ（１２，１１）　−ｄ　　（１３，１１）の１３点
であり、その距離の総和は１５であり、このＤＰマツチ
ング距ｍ１ｔＤ（Ａ、Ｂ’）は０．６５である。、・このＤＰ−パスを３方向力とした結果より明らかな様に
入力パターンＡがそのＤＰマツチング距離の小さな標準
パターンＢ′と判定され、判定されるべき結果が得られ
ない。この様にＤＰマツチング処理においては部分的に
類似しているような語い間に於いて誤認識し易い。On the other hand, DP of standard pattern B' for input pattern A
The matching process is performed as shown in FIG. 8, as in the case of FIG. 7 described above. That is, input pattern A's (IM parameters; 2.4.6.8.8.8°8, 6.4.
Find the Chebyshev distance of standard pattern B' for 4.4.6.8 (fixed parameter i 7.6.6.8.8.8. '8.6°4, 4.4, and calculate DP- As a path °ζIf we are allowed to take the points to the left, the points below, and the points diagonally to the lower left of any point as the previous state for that point (this path is shown by a solid arrow), The points on the path are d (1,1)-d (2,2)-d (3,3,)
-d (4,4) -d (5,5) -d (6,
6)-d (7,7)-d (8,8)-d(9,
9) -d (10,10) -d (11,11)
-d (12, 11) -d (13, 11), and the sum of their distances is 15, and this DP matching distance m1tD (A, B') is 0.65. , As is clear from the result of using this DP-path as a three-directional force, the input pattern A is determined to be the standard pattern B' whose DP matching distance is small, and the result to be determined cannot be obtained. In this way, in the DP matching process, it is easy to misrecognize words that are partially similar.

又、ＤＰマツチング処理においては上述した様に標準パ
ターンのフレーム数Ｎが不定であり、しかも入力パター
ンに対して全標準パターンをＤＰマツチング処理する必
要があり、語いが多くなるとそれに伴って演算量が飛躍
的に増加し、標準パターンメ禾１月４）の記憶容量や演
算量の点で問題があった。In addition, in the DP matching process, as mentioned above, the number of frames N of the standard pattern is undefined, and it is necessary to perform the DP matching process on all standard patterns for the input pattern, and as the number of words increases, the amount of calculation increases accordingly. This has caused problems in terms of the storage capacity and amount of calculation required for standard pattern memory.

この為、部分的に類似しているような語い間に於いても
誤認識することが比較的少なく、且つ標準パターンメモ
１月４）の記憶容量や処理の為の演算量が比較的少ない
音声認識装置と巳で第９図に示す如きものが考えられて
いる。For this reason, there are relatively few misrecognitions even between words that are partially similar, and the storage capacity and amount of calculation for processing of the standard pattern memo (January 4) is relatively small. A voice recognition device and a snake as shown in FIG. 9 have been considered.

第９図において、（１１は音声信号入力部としてのマイ
クロホンを示し、このマイクロホンＴｌ）からの音声信
号を音響分析部（２）の増幅器（８）に供給し、この増
幅器（８）の音声信号をカットオフ周波数５．５ＫＨｚ
のクーバスフィルタ（９）を介してサンプリング周波数
１２．５ＫＨｚの１２ビツトＡ／Ｄ変換器ａωに供給し
、このＡ／Ｄ変換器αψのデジタル音声信号を１５チヤ
ンネルのデジタルバンドパスフィルタバンク（ＩＬＡ）
　、　　（ｌｌｓ）　、　”・・、　　（ｌｌｏ）に供
給する。In FIG. 9, (11 indicates a microphone as an audio signal input section, the audio signal from this microphone Tl) is supplied to the amplifier (8) of the acoustic analysis section (2), and the audio signal of this amplifier (8) is The cutoff frequency 5.5KHz
The digital audio signal from the A/D converter αψ is supplied to a 12-bit A/D converter aω with a sampling frequency of 12.5 KHz through a Cubas filter (9) of 15 channels.
, (lls) , ”..., (llo).

この１５チヤンネルのデンタルバンドパスフィルタバン
ク　（ｌｌＡ）　、　　（１１Ｂ）　、　”、　　（ｌ
ｌｏ）は例えばバターワース４次のデジタルフィルタに
て構成し、２５０Ｈｚから５．５ＫＨｚまでの帯域が対
数軸上で等間隔となるように割り振られている。そして
、各デジタルパ′ンドバスフイルり（１１八）、（ｌｌ
ｓ）。This 15-channel dental bandpass filter bank (llA), (11B), ”, (l
lo) is composed of, for example, a Butterworth fourth-order digital filter, and the bands from 250 Hz to 5.5 KHz are distributed at equal intervals on the logarithmic axis. And each digital bus fill (118), (ll
s).

・・・・、（ｌｌｏ）の出力信号を１５チヤンネルの整
流器（１２Ａ　）　＋　　（１２Ｂ）　＋　・・・・＋
　　（１２ｏ　）に夫々供給し、これら整流器（１２＾
）、（１２ｓ）、・・・・。..., (llo) output signal through a 15-channel rectifier (12A) + (12B) + ....+
(12o) respectively, and these rectifiers (12^
), (12s),...

（１２ｏ）の２乗出力を１５チヤンネルのデジタルロー
パスフィルタ（１３Ａ　）　　、　　（１３ｇ　）　、
・・・・、（１３ｏ）に夫々供給する。これらデジタル
ローパスフィルタ（１３Ａ）　、　　（１３ｓ）　、　
”、　　（１３ｏ）はカットオフ周波数５２．８８Ｚの
ＦＩＲ（有限インパルス応答形）ローパスフィルタにて
構成する。The square output of (12o) is passed through a 15-channel digital low-pass filter (13A), (13g),
..., (13o), respectively. These digital low-pass filters (13A), (13s),
”, (13o) is constituted by an FIR (finite impulse response type) low-pass filter with a cutoff frequency of 52.88Z.

そして、各デジタルローパスフィルタ（ｌｌＡ）。and each digital low-pass filter (llA).

（１３Ｂ）、・・・・、（１３ｏ）の出力信号をサンプ
リング周期５．１２ｍ５のサンプラー（１４）に供給す
る。The output signals of (13B), . . . , (13o) are supplied to a sampler (14) with a sampling period of 5.12 m5.

このサンプラー（１４）によりデジタルローパスフィル
タ（１３Ａ　＞　、　　（１３ｓ　）　、・・・・、（
１３ｏ）の出力信号をフレーム周期５．１２ｍ５毎にサ
ンプリングし、このサンプラー（１４）のサンプリング
信号を音曲情報正規化器（１５）に供給する。この音源
情ｌ１１１．’ｉＥ規化器（１５）は認識しようとする
音声の話者による声帯音源特性の違いを除去するもので
ある。This sampler (14) allows digital low-pass filters (13A > , (13s) , ..., (
The output signal of 13o) is sampled every frame period of 5.12m5, and the sampling signal of this sampler (14) is supplied to the music information normalizer (15). This sound source information l111. The 'iE normalizer (15) removes differences in vocal cord sound source characteristics depending on the speaker of the speech to be recognized.

即ち、フレーム周期毎にサンプラー（１４）から供給さ
れるサンプリング信号Ａ１１ｎ）　（ｉ　＝　１．・・
・・。That is, the sampling signal A11n) (i = 1...) supplied from the sampler (14) every frame period.
....

１５Ｈｎ：フレーム番号）に対してＡｉ（ｎｌ＝　　ｌｏｇ　（Ａｉ（ｎｌ＋Ｂ）　　　　
　’　　・・ｉｆ）なる対数変換がなされる。この（１
）式において、Ｂはバイアスでノイズレベルが隠れる程
度の値を設定する。そして、声帯音源特性をｙｉ＝ａ−
１＋ｂなる式で近似する。このａ及びｂの計数は次式に
より決定される。15Hn: frame number) for Ai(nl=log(Ai(nl+B)
'...if) is performed. This (1
), B is set to a value such that the noise level is hidden by the bias. Then, the vocal cord sound source characteristics are yi=a−
It is approximated by the formula 1+b. The counts of a and b are determined by the following equation.

（Ｎ＝１５）　　　　・・・　（２）Ｎ（Ｎ−１）（Ｎ＝１５）　　　　・・・　（３）そしζ、音源の正規化されたパラメータをＰｉ（ｎｌと
すると、ａＨ＜　０のときパラメータＰｉ（ｎｌはＰｉ
Ｔｎ）−ＡｉＴｎｌ　−（ａ（ｎｉ　ｉ　＋ｂＴｎｌ）
　　　・・・＋４１と表わされる。(N=15) ... (2) N (N-1) (N=15) ... (3) Then ζ, if the normalized parameter of the sound source is Pi (nl), when aH < 0 Parameter Pi (nl is Pi
Tn) −AiTnl −(a(ni i +bTnl)
... is expressed as +41.

又、ａ　（ｎｌ上０のときレベルの正規化のみ行ない、
パラメータＰｉ（ｎ）は・・・　（５）と表わされる。Also, a (only normalizes the level when nl is 0,
The parameter Pi(n) is expressed as... (5).

この様な処理により声帯音源特性の正規化されたパラメ
ータＰｉｆｎ）を音声区間内パラメータメモリ（１６）
に供給する。この音声区間内パラメータメモリ（１６）
は後述する音声区間判定部（１７）からの音声区間判定
信号を受けて声帯音源特性の正規化されたパラメータＰ
ｉｆｎ）を音声区間毎に格納する。Through such processing, the normalized parameters Pifn) of the vocal cord sound source characteristics are stored in the vocal interval parameter memory (16).
supply to. Parameter memory within this voice section (16)
is a normalized parameter P of the vocal cord sound source characteristics in response to a voice interval determination signal from a voice interval determination unit (17) to be described later.
ifn) is stored for each audio section.

一方、Ａ／Ｄ変換器（１０）のデジタル音声信号を音声
区間判定部（１７）のゼロクロスカウンタ（１８）及び
パワー算出器（１９）に夫々供給する。このゼロクロス
カウンタ（１８）は５．１２ｍ５毎にその区間の６４点
のデジタル音声信号のゼロクロス数をカウントし、その
カウント値を音声区間判定器（２０）の第１の入力端に
供給する。又、パワー算出器（１９）は５．１２ｍ５毎
にその区間のデジタル音声信号のパワー、即ち２乗和を
求め、その区間内パワーを不ずパワー信号を音声区間判
定器（２０）の第２の入力端に供給する。更に、音源情
報正規化器（１５）の音源正規化情報ａ　（ｎｌ及びｂ
　ｔｎ＋を音声区間判定器（２０）の第３の入力端に供
給する。そして、音声区間判定器（２０）においてはゼ
ロクロス数、区間内パワー及び音源正規化情報ａ　ｆｎ
ｌ、　　ｂ　（ｎ）を複合的に処理し、無音、無声音及
び有声音の判定処理を行ない、音声区間を決定する。こ
の音声区間判定器（２０）の音声区間を示す音声区間判
定信号を音声区間判定部（１７）の出力として音声区間
内パラメータメモリ　（１６）に供給する。On the other hand, the digital audio signal from the A/D converter (10) is supplied to a zero cross counter (18) and a power calculator (19) of the audio section determining section (17), respectively. This zero cross counter (18) counts the number of zero crosses of the digital audio signal at 64 points in that section every 5.12 m5, and supplies the count value to the first input terminal of the audio section determiner (20). Also, the power calculator (19) calculates the power of the digital audio signal in that section every 5.12 m5, that is, the sum of squares, and calculates the power signal without calculating the power within the section and sends the power signal to the second voice section determiner (20). Supplied to the input terminal of Furthermore, the sound source normalization information a (nl and b) of the sound source information normalizer (15)
tn+ is supplied to the third input of the speech segment determiner (20). Then, in the voice section determiner (20), the number of zero crossings, the power within the section, and the sound source normalization information a fn
l, b (n) are processed in a composite manner, and a process for determining silence, unvoiced sound, and voiced sound is performed to determine a voice section. A voice interval determination signal indicating the voice interval of the voice interval determiner (20) is supplied to the voice interval parameter memory (16) as an output of the voice interval determination unit (17).

この音声区間内パラメータメモリ　（１６）に格納され
た音声区間毎に声帯音源特性の正規化された音響パラメ
ータＰｉｆｎｌをその時系列方向にＳＡＴ（Ｎｏｒｍａ
ｌｉｚａｔｉｏｎ　Ａｌｏｎｇ　Ｔｒａｊｅｃｔｏｒｙ
）処理部（２１）に供給する。このＮＡＴ処理部（２１
）は　ＮＡＴ処理として音響パラメータ時系列ｐｉｔｎ
ｌからそのパラメータ空間における軌跡を直線近似にて
推定し、この軌跡に沿って直線補間にて新たな音響パラ
メータ時系列Ｏ１（＠を形成する。The normalized acoustic parameters Pifnl of the vocal cord sound source characteristics are stored in this intra-speech-segment parameter memory (16) for each speech period and are SAT (Normal) in the chronological direction.
lization Along Trajectory
) is supplied to the processing section (21). This NAT processing unit (21
) is the acoustic parameter time series pitn as NAT processing.
A trajectory in the parameter space is estimated from l by linear approximation, and a new acoustic parameter time series O1 (@) is formed by linear interpolation along this trajectory.

ここで、このＮＡＴ処理部（２１）について更に説明す
る。音響パラメータ時系列Ｐｉ（ｎ）　（ｉ　＝　１．
・・・・＋Ｉ　ｉ　ｎ−１＋・・・・、Ｎ）はそのパラ
メータ空間に点列を描く。第１０図に２次元パラメータ
空間に分布する点列の例を示す。この第１０図にボず如
く音声の非定常部の点列は粗に分布し、準定當部は密に
分布する。この事は完全に定常であればパラメータは変
化せず、その場合には点列はパラメータ空間に停留する
ことからも明らかである。Here, this NAT processing section (21) will be further explained. Acoustic parameter time series Pi(n) (i = 1.
...+I i n-1+ ..., N) draws a point sequence in the parameter space. FIG. 10 shows an example of a point sequence distributed in a two-dimensional parameter space. As shown in FIG. 10, the point sequence of the non-stationary part of the voice is roughly distributed, and the quasi-constant part is densely distributed. This is clear from the fact that if it is completely stationary, the parameters will not change, and in that case the point sequence will remain in the parameter space.

第１１図は第１０図に示す如き点列上に滑らかな曲線よ
りなる軌跡を推定し描いた例を示す。この第１１図に示
す如く点列に対して軌跡を推定できれば、音声の発声速
度変動に対して軌跡は殆ど不変であると考えることがで
きる。何故ならば、音声の発声速度変動による時間長の
違いは殆どが準定常部の時間的伸縮（第１０図に示す如
き点列においては準定常部の点列密度の違いに相当する
。）に起因し、非定常部の時間長の影響は少ないと考え
られるからである。FIG. 11 shows an example in which a locus consisting of a smooth curve is estimated and drawn on a series of points as shown in FIG. If a trajectory can be estimated for a sequence of points as shown in FIG. 11, it can be considered that the trajectory remains almost unchanged with respect to variations in speech rate. This is because most of the differences in time length due to variations in speech rate are due to the temporal expansion and contraction of the quasi-stationary part (in the dot sequence shown in Figure 10, this corresponds to the difference in the density of the dot sequence of the quasi-stationary part). This is because it is thought that the influence of the time length of the unsteady part is small.

ＮＡＴ処理部（２１）においてはこの様な音声の発声速
度変動に対する軌跡の不変性に着目して時間軸正規化を
行なう。The NAT processing unit (21) performs time axis normalization by focusing on the invariance of the trajectory with respect to such variations in speech rate.

即ち、第１に音響パラメータ時系列Ｐｉ（ｎｌに対して
始点Ｐｉ（１）から終点Ｐｉ（９）迄を連続曲線で措い
た軌跡を推定し、この軌跡を示す曲線を’Ｐ’ｉ［ｓ）
　（０≦Ｓ≦Ｓ）とする。この場合、必ずしもＰｉ（ｏ
ｌ　＝　Ｐｉｆｌｌ　。That is, first, a trajectory is estimated as a continuous curve from the start point Pi (1) to the end point Pi (9) for the acoustic parameter time series Pi (nl), and the curve representing this trajectory is defined as 'P'i[s )
(0≦S≦S). In this case, Pi(o
l = Pifll.

点列全体を近似的に通過するようなものであれば良い。It suffices if it passes approximately through the entire point sequence.

第２に推定されたＰ　１（ｓｌから軌跡の長さＳＬを求
め、第１２図に○印にて示す如く軌跡に沿って一定長で
新たな点列をリサンプリングする。例えばＭ点にサンプ
リングする場合、一定長さ、即ちり号ンプリング間隔Ｔ
＝ＳＬ／（Ｍ−１）を基準として軌跡上をリサンプリン
グする。このリサンプリングされた点列を旧ｆｍｌ　（
ｉ　＝Ｌ−・・・、Ｉ；　ｍ＝１．−・・−、Ｍ）この
様にして得られた新たなパラメータ時系列Ｑｉ（ｍｌは
軌跡の基本情報を有しており、しかも音声の発声速度変
動に対して殆ど不変なパラメータとなる。即ち、新たな
パラメータ時系列旧（ｍｌは時間軸正規化がなされたパ
ラメータ時系列となる。Second, obtain the length SL of the trajectory from the estimated P1(sl), and resample a new point sequence at a constant length along the trajectory as shown by the circle in Fig. 12. For example, sample at point M. In this case, a fixed length, i.e., the number sampling interval T
The trajectory is resampled based on =SL/(M-1). This resampled point sequence is converted into the old fml (
i=L-...,I; m=1. -...-, M) The new parameter time series Qi (ml) obtained in this way has the basic information of the trajectory and is a parameter that is almost invariant to variations in the speech rate. , the new parameter time series old (ml is the parameter time series that has been time-axis normalized).

この様な処理の為に、音声区間内パラメータメモ’Ｊ　
　（１６）の音響パラメータ時系列Ｐｉｆｎｌを軌跡長
算出器（２２）に供給する。この軌跡長算出器（２２）
は音響パラメータ時系ＷｔｌＰｉ（ｎ）がそのパラメー
タ空間において描く直線近恭による軌跡の長さ、即ち軌
跡長を算出するものである。この場合、Ｉ次元ベクトル
ａ１及びｂ□間の距離として例えばユークリッド距１ｉ
ｌｌｔＤ　　（ａｉ　、　　Ｎ　）をとれば■ ・・・　（６）である。尚、この距離としてはチェビシェフ距離、平方
距離等をとることを可とする。そこで、■次元の音響パ
ラメータ時系列Ｐｉｌ１１＞　（１＝　Ｌ・・・・、ｌ
；ｎ−１，・・・・、Ｎ）より、直線近似により軌跡を
１１ト定した場合の時系列方向に隣接するパラメータ間
距１ｉ１１ｉ　Ｓ　（ｎｌは５（ｎｌ　−Ｄ　　（Ｐｉ　　（ｎｌｘ　　）　　、　
　Ｐｉ（ｎｌ）　　　（ｎ　　＝１．・・　・・、　　
Ｎ−１）・　・　・　　（７）と表わされる。そして、時系列方向における第１番目の
パラメータＰｉ（１１から第ｎ番目のパラメータＰｉＣ
ｎｌ迄の距離ｓｔ、ｔｎ＞はへ表わされる。尚、５Ｌ（１１＝　０である。更に、軌
跡長ＳＬはと表わされる。軌跡長算出器（２２）はこの（７）式、
（８）式及び（９）式にて示す信号処理を行なう如くな
す。For this kind of processing, parameter memo 'J
The acoustic parameter time series Pifnl of (16) is supplied to the trajectory length calculator (22). This trajectory length calculator (22)
calculates the length of the trajectory drawn by the acoustic parameter time series WtlPi(n) in its parameter space based on a straight line, that is, the trajectory length. In this case, as the distance between the I-dimensional vectors a1 and b□, for example, the Euclidean distance 1i
If lltD (ai, N) is taken, ■... (6). Note that this distance may be Chebyshev distance, square distance, or the like. Therefore, ■-dimensional acoustic parameter time series Pil11> (1= L..., l
;n-1,...,N), the distance between adjacent parameters in the time series direction when 11 trajectories are determined by linear approximation is 1i11iS (nl is 5(nl - D (Pi (nlx),
Pi(nl) (n = 1......,
N-1)・・・・(7) Then, the first parameter Pi (11th to nth parameter PiC
The distance st, tn> to nl is expressed as. In addition, 5L (11 = 0.Furthermore, the trajectory length SL is expressed as.The trajectory length calculator (22) uses this equation (7),
The signal processing shown in equations (8) and (9) is performed.

この軌跡長算出器（２２）の軌跡長ＳＬを示す軌跡長信
号を補間間隔算出器（２３）に供給する。この補間間隔
算出器（２３）は軌跡に沿って直線補間により新たな点
列をリサンプリングす゛る一定長のりサンプリング間隔
Ｔを算出するものである。この場合、Ｍ点にリサンプリ
ングするとすれば、リサンプリング間隔ＴはＴ＝ＳＬ／　（Ｍ−１）　　　　　　　　・−・ＱＯＩ
と表わされる。補間間隔算出器（２３）はこの（１（１
１式にて示す信号処理を行なう如くなす。A trajectory length signal indicating the trajectory length SL of this trajectory length calculator (22) is supplied to an interpolation interval calculator (23). This interpolation interval calculator (23) calculates a constant length sampling interval T for resampling a new point sequence by linear interpolation along the locus. In this case, if resampling is performed at M points, the resampling interval T is T=SL/ (M-1) ・-・QOI
It is expressed as The interpolation interval calculator (23) calculates this (1(1
The signal processing shown in equation 1 is performed.

この補間間隔算出器（２３）のりサンプリング間隔Ｔを
示すリサンプリング、間隔信号を補間点抽出器（２４）
の一端に供給すると共に音声区間内パラメータメモリ　
（１６）の音響パラメータ時系列１’１（ｎ）を補間点
抽出器（２４）の他端に供給する。この補間点抽出器（
２４）は音響パラメータ時系列Ｐｉ［ｎｌのそのパラメ
ータ空間における軌跡例えばパラメータ間を直線近似し
た軌跡に沿ってリサンプリング間隔Ｔで新たな点列をリ
サンプリングし、この新たな点列より新たな音響パラメ
ータ時系列Ｑｉ（ｍ）を形成するものである。This interpolation interval calculator (23) resamples the interval signal indicating the sampling interval T to the interpolation point extractor (24)
Parameter memory within the speech interval as well as supplying to one end
The acoustic parameter time series 1'1(n) of (16) is supplied to the other end of the interpolation point extractor (24). This interpolation point extractor (
24) resamples a new point sequence at the resampling interval T along the trajectory of the acoustic parameter time series Pi[nl in its parameter space, for example, a trajectory that is a linear approximation between the parameters, and from this new point sequence, a new acoustic It forms a parameter time series Qi(m).

ここで、この補間点抽出器（２４）における信号処理を
第１３図に示す流れ図に沿っ゛ζ説明する。先ず、ブロ
ック（２４ａ）にてリサンプリング点の時系列方向にお
ける番号を示す底敷Ｊに値１が設定されると共に音響パ
ラメータ時系列Ｐｉ（ｎｌの時系列方向における番号を
示す変数ＩＣに値ｌが設定される。そして、ブロック（
２４ｂ）にて変数Ｊがインクリメントされ、ブロック（
２４ｃ）にＣそのときの変数Ｊが（Ｍ−１）以）である
かどうかにより、そのときのりサンプリング点の時系列
方向における番号かりサンプリングする必要のある最後
の番　号になっていないかどうかを判断し、なっていれ
ばこの補間点抽出器（２４）の信号処理を終了し、なっ
°ζいなければブロック（２４ｄ　）にて第１番目のり
サンプリング点から第３番目のりサンプリング点までの
りサンプル距ｍＤＬが算出され、ブロック（２４ｅ）に
て変数１Ｇがインクリメントされ、ブロック（２４ｆ）
にてリサンプル距離ＤＬが音響パラメータ時系列Ｐｉ（
ｎｌの第１番目のパラメータＰ　Ｈ１＋から第１Ｃ番目
のパラメータＰｉＱｃ）までの距１１１ｔ　Ｓ　ＬＧｃ
　）よりも小さいかどうかにより、そのときのりサンプ
リング点が軌跡上においてそのときのパラメータＰ　１
（ＩＣ）よりも軌跡の始端側に位置するかどうかを判断
し、位置し”ζいなければブロック（２４ｅ）にて変数
ＩＣをインクリメントした後再びブロック（２４ｆ　）
にてリサンプリング点とパラメータＰｉ（Ｉｃ）との軌
跡上における位置の比較をし、リサンプリング点が軌跡
上においてパラメータＰ　１（ＩＣ＞よりも始端側に位
置すると判断されたとき、ブロック（２４ｇ＞にぶりサ
ンプリングにより軌跡に沿う新たな音響パラメータＱｉ
（Ｊ）が形成される。即ち、先ず第５番目のりサンプリ
ング点によるリザンプル距離ＤＬからこの第５番目のり
サンプリング点よりも始端側に位置する第（１（、−１
）番ＨのパラメータＰ　１（ＩＣ−１）による距１ｉ１
１１ｓＬ（Ｉｃ−ｘ）を減算して第（ＩＣ−１）番Ｈの
パラメータＰｉＱｃ−ｘ）から第３番１」のりサンプリ
ング点迄の距離ＳＳを求める。次に、軌跡」二において
この第５番目のりサンプリング点の両側に位置するパラ
メータＰ　ｉ（Ｉｃ−１）及びパラメータＰｉ（Ｉｃ）
間の距％ｌｉ　Ｓ　０ｃ−１）　（この距１ｊｉｔ　Ｓ
　（Ｉｃ−１＞は（７）式にてｊくされる信号処理にて
得られる。）にてこの距離ＳＳを除算ｓＳ／　Ｓ　（Ｉ
Ｃ−１）　Ｌ、この除算結果ＳＳ／Ｓθ（ニー１）に１
ｌｌｌＬ跡上において第５番目のりサンプリング点の両
側に位置するパラメータＰ　ｉＱｃ　）とＰ　１（Ｉｃ
−ｔ）との差（Ｐｉ（ＩＣ）−Ｐ　ｉ（Ｉｃ　−ｔ）　
）を掛算（Ｐｉ（ｌｃ）　−Ｐｉ（Ｉｃ−１））　＊　
５５／　Ｓ（＋ｃ−ｔ）して、軌跡上において第５番目
のりサンプリング点のこのリサンプリング点よりも始端
側に隣接して位置する第（ＩＣ−１）番目のパラメータ
Ｐｉ（ｌｃ−１）からの補間量を算出し、この補間量と
第５番目のりサンプリング点よりも始端側に隣接し゛ζ
位置する第（ＩＣ−１）番目のパラメータＰｉ（＋ｃ−
ｚ）とを加算して、軌跡に沿う新たな音響パラメータ口
１（Ｊ）が形成される。第１４図に２次元の音響パラメ
ータ時系列Ｐ（１１，Ｐ（２１，・・・・、Ｐ（８１に
対してパラメータ間を直線近似して軌跡を推定し、この
軌跡に沿って直線補間により６点の新たな音響パラメー
タ時系列Ｑｆｌｌ、　Ｑ（２）、　　・・・・、Ｑ（６
１を形成した例を示す。Here, the signal processing in this interpolation point extractor (24) will be explained along the flowchart shown in FIG. First, in block (24a), the value 1 is set to the base J indicating the number in the time series direction of the resampling point, and the value l is set to the variable IC indicating the number in the time series direction of the acoustic parameter time series Pi (nl). is set. Then, the block (
24b), the variable J is incremented, and the block (
In 24c), depending on whether the variable J at that time is (M-1) or higher, it is determined whether the number of the sampling point in the time series direction at that time is the last number that needs to be sampled. If it is, the signal processing of this interpolation point extractor (24) is finished, and if it is not, a block (24d) extracts the signal from the first glue sampling point to the third glue sampling point. The sample distance mDL is calculated, a variable 1G is incremented in block (24e), and block (24f)
The resampling distance DL is the acoustic parameter time series Pi (
Distance 111t S LGc from the first parameter P H1+ of nl to the first C-th parameter PiQc)
), depending on whether the glue sampling point is smaller than the parameter P 1 on the trajectory.
It is determined whether the position is closer to the starting end of the trajectory than (IC), and if it is not located, the variable IC is incremented at block (24e), and then block (24f) is executed again.
The positions of the resampling point and the parameter Pi (Ic) on the trajectory are compared at ＞New acoustic parameter Qi along the trajectory by Niburi sampling
(J) is formed. That is, first, from the resample distance DL at the fifth glue sampling point, the (1(, -1
) Distance 1i1 by parameter P 1 (IC-1) of number H
11sL(Ic-x) is subtracted to find the distance SS from the parameter PiQc-x) of No. 3 H (IC-1) to the sampling point No. 3 No. 1. Next, the parameter P i (Ic-1) and the parameter Pi (Ic) located on both sides of this fifth glue sampling point in the trajectory "2"
distance between %li S 0c-1) (this distance 1jit S
(Ic-1> is obtained by signal processing divided by j in equation (7).) Divide this distance SS by sS/S (I
C-1) L, this division result SS/Sθ (knee 1) is 1
Parameters P iQc ) and P 1 (Ic
-t) (Pi(IC) - Pi(Ic -t)
) multiplied by (Pi(lc) - Pi(Ic-1)) *
55/S(+c-t), and the (IC-1)th parameter Pi(lc-1) located adjacent to the starting end side of the fifth resampling point on the trajectory. Calculate the amount of interpolation from ゛ζ
The located (IC-1)th parameter Pi(+c-
z) to form a new acoustic parameter mouth 1 (J) along the trajectory. Figure 14 shows a two-dimensional acoustic parameter time series P(11, P(21,..., P(81), for which a trajectory is estimated by linear approximation between the parameters, and linear interpolation is performed along this trajectory. 6 new acoustic parameter time series Qfll, Q(2), ..., Q(6
An example of forming 1 is shown below.

又、このブロック（２４ｇ　）においては周波数系列方
向に１次元分（ｉ＝］、、・・・・、Ｉ）の信号処理が
行なわれる。Further, in this block (24g), one-dimensional (i=], . . . , I) signal processing is performed in the frequency sequence direction.

この様にしてブロック（２４ｂ）乃至（２４ｇ　）にて
始点及び終点（ごれらはＱｉ（１１−ｐｉ（ｏｌ　、　
Ｑｉ（Ｍ）　−Ｐｉ（Ｓ）である。）を除＜　　（Ｍ−
２）点のりサンプリングにより新たな音響パラメータ時
系列Ｑｉ（＠が形成され乙。In this way, blocks (24b) to (24g) are set at the starting and ending points (Qi(11-pi(ol,
Qi(M)-Pi(S). ) except < (M-
2) A new acoustic parameter time series Qi (@) is formed by point sampling.

このＮＡＴ処理部（２１）の新たな音響パラメータ時系
列Ｑｉｔに）をモード切換スイッチ（３）により、登録
モードにおいては認識対象梧毎に標準パターンメモ１月
４）に格納し、認識モードにおいてはチェビシェフ距離
算出部（２５）の一端に供給する。又、この認識モード
においては標準パターンメモｉ月４）に格納されている
標準パターンをチェビシェフ距離算出部（２５）の他端
に供給する。このチェビシェフ距離算出部（２５）にお
いてはその時入力されている音声の時間軸の正規化され
た新たな音響パラメータ時系列Ｏ１（＠よりなる入力パ
ターンと、標準パターンメ９モ１月４）の標準パターン
とのチェビシェフ距離算出処理がなされる。This new acoustic parameter time series Qit of the NAT processing unit (21) is stored in the standard pattern memo (January 4) for each recognition target in the registration mode by the mode changeover switch (3), and in the recognition mode. It is supplied to one end of the Chebyshev distance calculating section (25). Also, in this recognition mode, the standard pattern stored in the standard pattern memo i month 4) is supplied to the other end of the Chebyshev distance calculation section (25). In this Chebyshev distance calculation unit (25), a new acoustic parameter time series O1 (input pattern consisting of @ and standard pattern memo January 4) which is normalized on the time axis of the audio input at that time is used. Chebyshev distance calculation processing with the pattern is performed.

そして、このチェビシェフ距離を示す距離信号を最小距
離判定部（６）に供給し、この最小距離判定部（６）に
て入力パターンに対するチェビシェフ距離が最小となる
標準パターンが判定され、この判定結果より入力音声を
示す認識結果を出力端子（７）に供給する。Then, the distance signal indicating this Chebyshev distance is supplied to the minimum distance determining section (6), and the minimum distance determining section (6) determines the standard pattern that has the minimum Chebyshev distance with respect to the input pattern, and based on this determination result. A recognition result indicating the input speech is supplied to an output terminal (7).

この様にしてなる音声認識装置の動作につい゛ζ説明す
る。The operation of the speech recognition device constructed in this way will be explained.

マイクロホン（１）の音声信号が音響分析部（２）にて
音声区間毎に声帯音源特性の正規化された音響パラメー
タ時系列Ｐ　ｉ　（ｎｌに変換され、この音響、パラメ
ータ時系列ＰｉｌｎｌがＮＡＴ処理部（２１）に供給さ
れ、このＮＡＴ処理部（２１）にて音響パラメータ時系
列Ｐｉ（ｎｌからそのパラメータ空間における直線近似
による軌跡が推定され、この軌跡に沿って直線補間され
時間軸正規化のなされた新たな音響パラメータ時系列旧
ｆｍ）が形成され、登録モードにおいてはこの新たな音
響パラメータ時系列Ｑｉ（ｍ）がモード切換スイッチ（
３）を介して標準パターンメモリ（４）に格納される。The audio signal of the microphone (1) is converted into an acoustic parameter time series P i (nl), which is a normalized vocal cord sound source characteristic of vocal cord sound source characteristics, for each voice section in the acoustic analysis unit (2), and this acoustic parameter time series Pilnl is subjected to NAT processing. The NAT processing unit (21) estimates a trajectory by linear approximation in the parameter space from the acoustic parameter time series Pi (nl), performs linear interpolation along this trajectory, and performs time axis normalization. A new acoustic parameter time series (old fm) is formed, and in the registration mode, this new acoustic parameter time series Qi(m) is selected by the mode changeover switch (
3) and stored in the standard pattern memory (4).

又、認識モードにおいては、ＮＡＴ処理部（２１）の新
たな音響パラメータ時系列旧（＠がモード切換スイッチ
（３）を介してチェビシェフ距離算出部（２５）に供給
されると共に標準パターンメモリ（４）の標準パターン
がチェビシェフ距離算出部（２５）に供給される。第１
５図乃至第１７図に第４図乃至第６図に示す１次元の入
力パターンＡのパラメータ時系列；　２．４．６．８．
８．８．８．６．４．４．４．６．８　、標準パターン
Ａ′のパラメータ時系列；　３．５．７．９゜９、９．
９．７．５．５．７．９、標準パターンＢ′のパラメー
タ時系列纂７．６．６．８．８．８．８．６．４．４゜
４をＮＡＴ処理部（２１）にて直線近似にて軌跡を推定
し、リサンプリング点を８点とする処理をした１次元の
入力パターンＡのパラメータ時系列；２．４．６．８．
６．４．６．８、標準パターンＡ′のノマラメータ時系
列ｉ　３．５．”７．９．７．５．７．９、標準パター
ンＢ′のパラメータ時系列、　７．６．７．８゜７、６
．５．４を夫々示す。この場合、音響パラメータ時系列
Ｐｉ（ｎｌからそのパラメータ空間における軌跡を推定
し、この軌跡に沿って新たな音響パラメータ時系列Ｑｉ
ｌ（６）が形成されるので、入力音声を変換した音響パ
ラメータ時系列ＰＬ（ｎｌ自身により時間軸正規化がな
される。そして、チェビシェフ距１Ｉｌｔｔ算出部（２
５）において入力パターンＡと標準パターンＡ′との間
のチェビシェフ距Ｎ８が算出されると共に入力パターン
Ａと標準パターンＢ′との間のチェビシェフ距離１６が
算出され、これら距離８及び距離１６を夫々示す距離信
号が最小能１ｉ１１１１１１定部（６）に供給され、こ
の最小距離判定部＋６）にて距離８が距離１６よりも小
さいことから標準パターンＡが入力パターンＡ′である
と判定され、この判定績°果より入力音声が標準パター
ンＡであることを示す認識結果が出力端子（７）に得ら
れる。従って、部分的に類似し・ているような措い間に
於いても誤認識することが比較的少ない音声認識を行な
うことができる。In the recognition mode, the new acoustic parameter time series old (@) of the NAT processing unit (21) is supplied to the Chebyshev distance calculation unit (25) via the mode changeover switch (3), and is also stored in the standard pattern memory (4). ) is supplied to the Chebyshev distance calculation unit (25).
Parameter time series of one-dimensional input pattern A shown in FIGS. 4 to 6 in FIGS. 5 to 17; 2.4.6.8.
8.8.8.6.4.4.4.6.8, Parameter time series of standard pattern A'; 3.5.7.9°9, 9.
9.7.5.5.7.9, parameter time series collection 7.6.6.8.8.8.8.6.4.4°4 of standard pattern B' to NAT processing unit (21) Parameter time series of one-dimensional input pattern A whose trajectory was estimated by linear approximation and the resampling points were set to 8; 2.4.6.8.
6.4.6.8, Noramameter time series i of standard pattern A' 3.5. "7.9.7.5.7.9, Parameter time series of standard pattern B', 7.6.7.8°7,6
．． 5.4 are shown respectively. In this case, a trajectory in the parameter space is estimated from the acoustic parameter time series Pi(nl, and a new acoustic parameter time series Qi is created along this trajectory.
l(6) is formed, the time axis normalization is performed by the acoustic parameter time series PL (nl itself) obtained by converting the input speech.Then, the Chebyshev distance 1Iltt calculation unit (2
In 5), the Chebyshev distance N8 between the input pattern A and the standard pattern A' is calculated, and the Chebyshev distance 16 between the input pattern A and the standard pattern B' is calculated, and these distances 8 and 16 are calculated, respectively. The distance signal shown in FIG. As a result of the determination, a recognition result indicating that the input voice is the standard pattern A is obtained at the output terminal (7). Therefore, it is possible to perform speech recognition with relatively few erroneous recognitions, even if the speech is partially similar.

ここで、ＮＡＴ処優を行なう音声認識装置とＤＰマツチ
ング処理を行なう音声認識装置との演算量における差異
について説明する。Here, the difference in the amount of calculation between a voice recognition device that performs NAT processing and a voice recognition device that performs DP matching processing will be explained.

入力パターンに対する標準パターン１個当たり゛　のＤ
Ｐマツチング距離計算部（５）における平均演算量をα
とし、チェビシェフ距離算出部（２５）における平均演
算量をβとし、ＮＡＴ処理部（２１）の平均の演算量を
γとしたとき、５個の標準パターンに対するＤＰマツチ
ング処理による演算量Ｃ１はＣ１−α・Ｊ　　　　　　　　　　　・　・　・　（１
１）である。又、５個の標準パターンに対するＮＡＴ処
理した場合の演算量Ｃ２はＣ２−β・Ｊ十γ　　　　　　・・・　（１２）である
。一般に、平均演算量αは平均演算量βに対してα）β
なる関係示ある。従って、γ ・　・　・　（１３） α−β なる関係が成り立つ、即ち認識対象語い数が増加するに
従って演算量Ｃ１は演算量Ｃ２に対してＣ１＞＞Ｃ２な
る関係となり、Ｎ　Ａ　Ｔ処理を行なう音声認識装置に
依れば、演算量を大幅に低減できる。D of ゛ per standard pattern for input pattern
The average amount of calculation in the P matching distance calculation unit (5) is α
When the average amount of calculation in the Chebyshev distance calculation unit (25) is β and the average amount of calculation in the NAT processing unit (21) is γ, the amount of calculation C1 due to the DP matching process for the five standard patterns is C1− α・J ・・・ (1
1). Further, the amount of calculation C2 when performing NAT processing on five standard patterns is C2-β·J1γ (12). In general, the average amount of calculations α is α)β compared to the average amount of calculations β
There is a relationship. Therefore, the relationship γ ・・・ (13) α−β holds true, that is, as the number of words to be recognized increases, the amount of calculation C1 becomes the relationship C1>>C2 with respect to the amount of calculation C2, and the NAT process is Depending on the speech recognition device that performs this, the amount of calculation can be significantly reduced.

又、ＮＡＴ処理部（２１）より得られる新たな音響パラ
メータ時系列Ｑｉ（（ロ）はその時系列方向において一
定のパラメータ数に設定できるので、標準パターンメモ
１月４）の記憶領域を有効に利用でき、その記憶容量を
比較的少なくできる。In addition, the storage area of the new acoustic parameter time series Qi obtained from the NAT processing unit (21) ((b) can be set to a constant number of parameters in the time series direction, so the storage area of the standard pattern memo January 4) can be effectively used. , and its storage capacity can be relatively small.

ところで、この様なＮＡＴ処理を行うようにした音声認
識装置においては第１８図に示す如き状況において入力
パターンＡに対して判定されるべきでない標準バク。１
ンＢ′が判定結果となされる。By the way, in a speech recognition device that performs such NAT processing, there are standard errors that should not be judged for input pattern A in the situation shown in FIG. 1
B' is taken as the determination result.

この第１８図においては、パラメータ空間における入力
パターンＡ；″Ａ″と、標準パターンΔ′　；“Ａ″と
、標準パターンＢ’ｉ”ＳＡＮ″どを無音を示す準定常
部にて切断し展開してネオ。この場合、入力パターンＡ
は標準パターンＢ′に対して同一の音Ｍ″Ａ”を含み、
無音と“Ａ　”とを示ず準定常部において入力パターン
Ａが標準パターンＡ′よりも標準パターンＢ′により類
似し、全体の軌跡は異なるがリザンブリング点が判定さ
れるべきでない標準パターンＢ′に近づいている。In this FIG. 18, input pattern A; "A", standard pattern Δ'; Neo. In this case, input pattern A
contains the same sound M″A″ for the standard pattern B′,
A standard pattern B' in which the input pattern A is more similar to the standard pattern B' than the standard pattern A' in the quasi-stationary region without silence and "A", and the overall trajectory is different but no resembling point should be determined. is approaching.

このとき、チェビシェフ距離算出部（２５）において入
力パターンＡに対する標準パターンＢ′のチェビシェフ
距離が標準パターンＡ′のチェビシェフ距離よりも小さ
な値として得られ、判定されるべきでない標準パターン
Ｂ′が判定結果となされる。この様にＮＡＴ処理を行う
ようにした音声認識装置においては第１８図に示す如く
同一の音韻を含み、全体の軌跡は異なるかりサンプリン
グ点が判定されるべきでない標準パターンＢ′に近づく
ことがあり、このとき誤認識し、ＶＡΔへ率が低下する
という不都合があった。At this time, the Chebyshev distance of the standard pattern B' with respect to the input pattern A is obtained as a smaller value than the Chebyshev distance of the standard pattern A' in the Chebyshev distance calculation unit (25), and the standard pattern B', which should not be judged, is the judgment result. It is done. In a speech recognition device that performs NAT processing in this way, as shown in FIG. 18, the same phoneme is included, but the overall trajectory is different, and the sampling point may approach the standard pattern B' that should not be determined. , at this time, there was an inconvenience that erroneous recognition occurred and the rate decreased to VAΔ.

発明の目的本発明は斯かる点に鑑み同一の音韻を含み全体の軌跡は
異なるかりサンプリング点が判定されるべきでない標準
パターンに近づくときに誤認識することが比較的少ない
ものを得ることを目的とずる。Purpose of the Invention In view of the above, an object of the present invention is to obtain a pattern that is relatively less likely to be misrecognized when it approaches a standard pattern that contains the same phoneme, has a different overall trajectory, and whose sampling point should not be determined. Tozuru.

発明の概要本発明は音声信号入力部を有し、この音声信号入力部の
音声信号を音響分析部に供給し、この音響分析部に基づ
いて得た音響パラメータ系列を軌跡長算出器に供給し、
この軌跡長算出器にて音響パラメータ系列からそのパラ
メータ空間における軌跡の軌跡長を算出し一２入力パタ
ーンと標準パターンとをマツチング処理した処理結果を
入力パターン及び標準パターンの軌跡長に応じて判定し
、音声を認識するようにしたものであり、斯かる本発明
音声認識装置に依れば同一の音韻を含み全体の軌跡は異
なるかりサンプリング点が判定されるべきでない標準、
パターンに近づくときに誤認識することを比較的少なく
できる利益がある。Summary of the Invention The present invention has an audio signal input section, supplies an audio signal from the audio signal input section to an acoustic analysis section, and supplies an acoustic parameter series obtained based on the acoustic analysis section to a trajectory length calculator. ,
This trajectory length calculator calculates the trajectory length of the trajectory in the parameter space from the acoustic parameter series, and the processing result of matching the 12 input patterns and the standard pattern is judged according to the trajectory length of the input pattern and the standard pattern. , which is designed to recognize speech, and according to the speech recognition device of the present invention, it is a standard in which the sampling points should not be determined because they contain the same phoneme but the overall trajectory is different,
There is an advantage that erroneous recognition when approaching a pattern can be relatively reduced.

実施例以下、第１９図を参照しながら本発明音声認識装置の一
実施例について説明しよう。この第１９図において第１
図乃至第１８図と対応する部分に同一符号を付してその
詳細な説明は省略する。Embodiment Hereinafter, an embodiment of the speech recognition apparatus of the present invention will be described with reference to FIG. In this figure 19,
The same reference numerals are given to the parts corresponding to those in the figures to FIG. 18, and detailed explanation thereof will be omitted.

本例においては第１９図に不ず如＜ＮＡＴ処理部（２１
）の補間点抽出器（２４）の新たな音響パラメータ時系
列Ｑｉ（ｍｌを軌跡長信号付加器（２６）の一端に供給
すると共にＮＡＴ処理部（２１）の軌跡長算出器（２２
）の軌跡長信号を軌跡長信号付加器（２６）の他端及び
後述する距離信号補正器（２７）の一端に供給する、。In this example, as shown in FIG.
) of the interpolation point extractor (24) is supplied to one end of the trajectory length signal adder (26) and the trajectory length calculator (22) of the NAT processing section (21).
) is supplied to the other end of a trajectory length signal adder (26) and one end of a distance signal corrector (27) to be described later.

この軌跡長信号付加器（２６）はＮＡＴ処理部（２１）
の新たな音響パラメータ時系列Ｑｉ（ｍｌ毎にこの新た
な音響パラメータ時系列Ｑｉ（＠の冗となる音響分析部
（２）の音響パラメータ時系列Ｐｉ（ｎ）のパラメータ
空間における軌跡の軌跡長ＳＬを示す！ｌＬ跡長倍長信
号加する。This trajectory length signal adder (26) is connected to the NAT processing unit (21)
The trajectory length SL of the trajectory in the parameter space of the acoustic parameter time series Pi(n) of the acoustic analysis unit (2), which becomes a new acoustic parameter time series Qi (@ redundant) !IL trace length double length signal is added.

この軌跡長信号付加器（２６）の軌跡長信号が付加され
た新たな音響パラメータ時系列口ｉｆｍ）をモード切換
スイッチ（３）により、登録モードにおいては認識対象
語毎に標準パターンメモ１月４）に格納し、認識モード
においてはチェビシェフ距離算出部（２５）の一端に供
給する。又、この認識モードにおいては標準パターンメ
モリ（４）に格納されている標準パターンをチェビシェ
フ距離算出部（２５）の他端に供給する。このチェビシ
ェフ距１ｉｌＩＩＷ出部（２５）においてはチェビシェ
フ距離を示す距離信号にこのチェビシェフ距離に対応す
る標準パターンの軌跡長信号を付加した信号を形成する
如くなす。A new acoustic parameter time series (ifm) to which the trajectory length signal of the trajectory length signal adder (26) has been added is added to the standard pattern memo for each recognition target word in the registration mode by using the mode changeover switch (3). ), and in the recognition mode, it is supplied to one end of the Chebyshev distance calculation unit (25). Further, in this recognition mode, the standard pattern stored in the standard pattern memory (4) is supplied to the other end of the Chebyshev distance calculating section (25). In this Chebyshev distance 1ilIIW output section (25), a signal is formed by adding a trajectory length signal of a standard pattern corresponding to this Chebyshev distance to a distance signal indicating the Chebyshev distance.

このチェビシェフ距離算出器（２５）の軌跡長信号が付
加された距離信号を距離信号補正器（２７λの他端に供
給する。この距離信号補止器（２７）はその時入力され
ている入力パターンとしての新たな音響パラメータ時系
列Ｑ　ｉ　（ｍｌに付加されたＵＬ跡長信号と、距離信
号に対応する標準パターンの軌跡長信号とを比較し、こ
の比較結果に基づいて距離信号を補正する。The distance signal to which the trajectory length signal of the Chebyshev distance calculator (25) has been added is supplied to the other end of the distance signal corrector (27λ). The UL trace length signal added to the new acoustic parameter time series Q i (ml is compared with the trace length signal of the standard pattern corresponding to the distance signal, and the distance signal is corrected based on the comparison result.

ここで、この距離信号補止器（２７）についζ更に説明
する。一般に、同一単語であればその音響パラメータ系
列はそのパラメータ空間におい゛ζ形状及び長さが略等
しい軌跡を描くと考えられる。Here, this distance signal compensator (27) will be further explained. In general, if the words are the same, the acoustic parameter series is considered to draw a trajectory in the parameter space with approximately the same shape and length.

距離信号補正器（２７）においてはこの点に着目して、
入力パターンと標準パターンとの距離（本例においては
チェビシェフ距離である。）を、入力パターン及び標準
パターンの軌跡長のずれに応じて補正する。即ち、標準
パターンの軌跡長をＴＩ？ＬＳとし、入力パターンの軌
跡長をＴＲ３Ｉとして、これら標準パターンの軌跡長Ｔ
ＲＬＳと入力パターンの軌跡長Ｔｌ？Ｌ　ＩとのＵＬ跡
長のずれＴＲＬを例えばなる信号処理にて算出する。こ
の場合、軌跡長のずれＴＲＬば（１４）式より明らかな
如く標準パターンの軌跡長ＴＲＬＳと入カバクーンの軌
跡長ＴＲＬＩとが等しいＴＲＬＳ＝　ＴＲＬＩときに最
−＋＋１１ｔ２をとる。そして、距離信号をＣｈｂｓと
したときに、この距離信号Ｃｈｂｓに対して軌跡長のず
れＴＲＬにより次式にてポされる如き信号処理よりなる
補正を行い、補正された距離信号ＣＨＢＳを得る如くな
す。Focusing on this point, the distance signal corrector (27)
The distance between the input pattern and the standard pattern (in this example, it is the Chebyshev distance) is corrected according to the difference in trajectory length between the input pattern and the standard pattern. That is, the locus length of the standard pattern is TI? LS, and the trajectory length of the input pattern is TR3I, and the trajectory length T of these standard patterns is
RLS and input pattern trajectory length Tl? The deviation TRL of the UL trace length from LI is calculated by, for example, signal processing. In this case, as is clear from equation (14), the trajectory length deviation TRL takes the maximum value of -++11t2 when the trajectory length TRLS of the standard pattern and the trajectory length TRLI of the input back cover are equal TRLS=TRLI. Then, when the distance signal is Chbs, correction is performed on this distance signal Chbs by signal processing as expressed by the following formula using the trajectory length deviation TRL, so as to obtain a corrected distance signal CHBS. .

ＣＨＢＳ＝Ｃｈｂｓ−ＴＲＬａ（ａ　＞０）　　・・（
１５）本例においてはａ＝２に設定する。CHBS=Chbs-TRLa(a > 0)...(
15) In this example, set a=2.

この距離信号補止器（２７）の補正された距離信号ＣＨ
ＢＳを最小距離判定部（６）に供給する。その他音響分
析部（２）等は上述第９図に示す音声認識装置と同様に
構成する。Corrected distance signal CH of this distance signal supplement (27)
The BS is supplied to the minimum distance determining section (6). Other components such as the acoustic analysis section (2) are constructed in the same manner as the speech recognition device shown in FIG. 9 above.

この様にしてなる音声□認識装置の動作について説明す
る。The operation of the speech □ recognition device constructed in this way will be explained.

マイクロホン（１）の音声信号が音響分析部（２）に”
ζ音声区間毎に声帯音源特性の正規化された音響パラメ
ータ時系列Ｐｉ（ｎｌがＮＡＴ処理部（２１）に供給さ
れ、このＮＡＴ処理部（２１）にて音響パラメータ時系
列Ｐ　ｉ　（ｎ）からそのパラメータ空間における直線
近似による軌跡が推定され、この軌跡に基づいて時間軸
正規化のなされた新たな音響パラメータ時系列Ｑｉ（→
が形成される。そして、軌跡長信号付加器（２６）にて
この新たな音響パラメータ時系列Ｑｉ１ｍｌの元となる
音響分析器（２）の音響パラメータ時系列Ｐｉ（ｎｌの
パラメータ空間における直線近似による軌跡の軌跡長を
示す軌跡長信号が付加される。The audio signal from the microphone (1) is sent to the acoustic analysis section (2).
ζ The normalized acoustic parameter time series P i (nl) of the vocal cord sound source characteristics is supplied to the NAT processing unit (21) for each speech interval, and the NAT processing unit (21) converts the acoustic parameter time series P i (n) into A trajectory is estimated by linear approximation in the parameter space, and a new acoustic parameter time series Qi (→
is formed. Then, the trajectory length signal adder (26) calculates the trajectory length of the trajectory by linear approximation in the parameter space of the acoustic parameter time series Pi(nl) of the acoustic analyzer (2), which is the source of this new acoustic parameter time series Qi1ml. A trajectory length signal shown is added.

そして、この軌跡長信号付加器（２６）の軌跡長信号が
付加された新たな音響パラメータ時系列（ＩＨ（２）が
、登録モードにおいてはモード切換スイッチ（３）を介
して標準パターンメモリ（４）に格納される。Then, in the registration mode, a new acoustic parameter time series (IH (2)) to which the trajectory length signal of the trajectory length signal adder (26) has been added is transferred to the standard pattern memory (4) via the mode changeover switch (3). ).

又、認識モードにおいては、軌跡長信号付加器（２６）
の新たな音響パラメータ時系列Ｑｉ（＠が入力パターン
としてモード切換スイッチ（３）を介してチェビシェフ
距離算出器（２５）に供給されると共に標準パターンメ
モ１月４の標準パターンがチェビシェフ距離算出器（２
５）に供給され、このチェビシェフ距離算出器（２５）
にて入力パターンと標準パターンとのチェジヒエフ距離
が算出され、このチェビシェフ距離を示ず距離信号Ｃｈ
ｂｓにこのチェビシェフ距離に対応する標準パターンの
軌跡長信号を付加した信号が距離信号補正器（２７）に
供給される。In addition, in the recognition mode, the trajectory length signal adder (26)
The new acoustic parameter time series Qi (@ is supplied as an input pattern to the Chebyshev distance calculator (25) via the mode changeover switch (3), and the standard pattern of the standard pattern memo January 4 is input to the Chebyshev distance calculator (25). 2
5) and this Chebyshev distance calculator (25)
The Chebyshev distance between the input pattern and the standard pattern is calculated in Ch.
A signal obtained by adding a standard pattern trajectory length signal corresponding to this Chebyshev distance to bs is supplied to a distance signal corrector (27).

一方、軌跡長算出器（２２）のその時入力されている入
力パターンとしての新たな音響パラメータ時系列Ｑｌ（
ホ）に付加された軌跡長信号が距離信号補正器（２７）
に供給され、この距離信号補正器（２７）にて入力パタ
ーンの軌跡長ＴＲＬＩと標準パターンの軌跡長ＴＲＬＳ
とのずれＴＲＬが（１４）式にて示される信号処理にて
得られ、この軌跡長のずれＴＲＬにより（１５）式にて
示される信号処理がなされ、軌跡長のずれＴＲＬに基づ
いて補正された距離信号ＣＨＢＳが得られる。この場合
、第１８図に示す如く入力パターンＡ古は異なる単語を
示す標準パターンＢ′が入力パターンに対して同一の音
韻“Ａ″を含み全体の軌跡は異なるがリサンプリング点
が近づき、そのチェビシェフ距離が同一単語を、■〈す
標準パターンＡ′等に比べ最小となるときにおいても、
同一単語を示す標準パターンＡ′の入力パターンＡに対
する軌跡長のずれＴＲＬが略最小値２に等しくなり、こ
れに対して異なる単語を示す標準パターンＢ′の入力パ
ターンＡに対する軌跡長のずれＴＲＬが比較的大きな値
をとる。従って、距離信号補正器（２７）にて入力パタ
ーンＡと同−ｓｉｔを示す標準パターンＡ′よりなる補
正された距離信号ＣＨＢＳが得られ、この補正された距
離信号ＣＨＢＳが最小距離判定部（６）にて判定処理さ
れ、入力パターンＡに対して判定されるべき標準パター
ンＡ′が判定結果として出力端子（７）に得られる。On the other hand, a new acoustic parameter time series Ql(
The trajectory length signal added to (e) is sent to the distance signal corrector (27).
The distance signal corrector (27) calculates the trajectory length TRLI of the input pattern and the trajectory length TRLS of the standard pattern.
The deviation TRL from the trajectory length is obtained by the signal processing shown in equation (14), and the signal processing shown in equation (15) is performed using this trajectory length deviation TRL, and the correction is made based on the trajectory length deviation TRL. A distance signal CHBS is obtained. In this case, as shown in FIG. 18, the standard pattern B', which indicates a different word from the input pattern A, contains the same phoneme "A" with respect to the input pattern, and although the overall trajectory is different, the resampling point approaches, and the Chebyshev Even when the distance is the minimum compared to the standard pattern A' etc. for the same word,
The trajectory length deviation TRL of the standard pattern A' indicating the same word with respect to the input pattern A is approximately equal to the minimum value 2, whereas the trajectory length deviation TRL of the standard pattern B' indicating a different word with respect to the input pattern A is approximately equal to the minimum value 2. Takes a relatively large value. Therefore, the distance signal corrector (27) obtains a corrected distance signal CHBS consisting of the standard pattern A' showing the same sit as the input pattern A, and this corrected distance signal CHBS is transmitted to the minimum distance determining section (6). ), and the standard pattern A' to be determined for the input pattern A is obtained at the output terminal (7) as a determination result.

以上述べた如く本例の音声認識装置に依れば、音声信号
入力部としてのマイクロホン（１１を有し、この音声信
号入力部（１１の音声信号を音響パラメータ時系列Ｐｉ
（ｎｌを軌跡長算出器（２２）に供給し、この軌跡長算
出器（２２）にて音響パラメータ時系列Ｐｉ（ｎｌから
そのパラメータ空間における軌跡の軌跡長を算出し、入
力パターン−と標準パターンとをマツチング処理した処
理結果を入力パターン及び標準パターンの軌跡長に応じ
て判定し、音声を認識するようにした為、同一の音韻を
含み全体の軌跡は異なるかりサンプリング点が判定され
るべきでない標準パターンに近づくときに誤認識するこ
とを比較的少なくできる利益がある。As described above, according to the speech recognition device of this example, the microphone (11) is provided as an audio signal input section, and the audio signal of this audio signal input section (11) is input into the acoustic parameter time series Pi.
(nl is supplied to the trajectory length calculator (22), and the trajectory length calculator (22) calculates the trajectory length of the trajectory in the parameter space from the acoustic parameter time series Pi(nl), and calculates the trajectory length of the trajectory in the parameter space from the input pattern Since speech is recognized by determining the matching processing result according to the trajectory length of the input pattern and the standard pattern, the sampling point should not be determined because the overall trajectory is different even though it includes the same phoneme. There is an advantage that erroneous recognition can be relatively reduced when approaching a standard pattern.

尚、上述実施例においては距離信号補正器（２７）にお
いて（１４）式及び（１５）式にて表される信号処理を
行うようにした場合について述べたけれども、これら（
１４）式及び（１５）式に限らず適宜な関数にて表され
る信号処理を行うようにすることを可とする。又、上述
実施例においては音響パラメータ時系列Ｐｉ（ｎ）から
そのパラメータ空間における軌跡の軌跡長を算出した場
合について述べたけれども、音響パラメータ周波数系列
からそのパラメータ空間における軌跡の軌跡長を算出す
るようにしでも上述実施例と同様の作用効果を得ること
ができることは容易に理解できよう。又、上述実施例に
おいては音響パラメータ時系列からそのノ（ラメータ空
間における直線近似による軌跡の軌跡長を算出した場合
について述べたけれども、円弧近似、スプライン近似等
による軌跡の１ｌｉｌＬ跡長を算出するようにしても上
述実施例と同様の作用効果を得ることができることは容
易に理解できよう。In the above embodiment, a case was described in which the distance signal corrector (27) performs signal processing expressed by equations (14) and (15), but these (
It is possible to perform signal processing expressed by an appropriate function, not limited to equations (14) and (15). Furthermore, in the above embodiment, the case was described in which the trajectory length of the trajectory in the parameter space was calculated from the acoustic parameter time series Pi(n), but it is also possible to calculate the trajectory length of the trajectory in the parameter space from the acoustic parameter frequency series. It is easy to understand that the same effects as those of the above-mentioned embodiments can be obtained. Furthermore, in the above embodiment, the trajectory length of the trajectory was calculated from the acoustic parameter time series by linear approximation in the parameter space. However, it is easy to understand that the same effects as in the above embodiment can be obtained.

更に、上述実施例においては音響分析部（２）の音響パ
ラメータ時系列Ｐｉ（ｎ）をＮＡＴ処理部（２１）の１
ｉｌｌ跡長算出器（ｚ２）に供給し、このＮ　Ａ　Ｔ処
理１′Ｈ４（２１）の軌跡長算出器（２２）より音響パ
ラメータ時系列Ｐｉ（ｎ）からそのパラメータ空間にお
ける軌跡の軌跡長を算出するようにした場合について述
べたけれども、ＮＡＴ処理部（２１）の１ｌｉｌｌ跡長
算出器（２２）とは別途に軌跡長算出器を設け、この！
ＦｉｌＬ跡長算出器にＮＡＴ処理部（２１）の新たな音
響ノぐラメータ時系列口ｉ　（ｍｌを供給し、新たな音
響パラメータ時系列Ｑｔ（２）からそのパラメータ空間
にお＆Ｊる軌跡の軌跡長を算出し、この軌跡長に基づい
て距離、信号の補正を行うようにしても上述実施例と同
様の作用効果を得ることができることは容易に理解でき
よう。更に、第１図に示す如きＤＰマ・ノチング処理を
行うようにした音声認識装置においても、音響分析部（
２）の音響パラメータ系列を軌跡長算出器に供給し、こ
の軌跡長算出器の軌跡長信号を音響パラメータ系列に付
加し、入力パターン及び標準パターンの軌跡長に応じて
ＤＰマンナング距離を補正するようにしても誤認識を比
較的少なくすることができる。尚、本発明は上述実施例
に限らず本発明の要旨を逸脱することなくその他種々の
構成を取り得ることは勿論である。Furthermore, in the above embodiment, the acoustic parameter time series Pi(n) of the acoustic analysis section (2) is
The trajectory length calculator (22) of this NAT processing 1'H4 (21) calculates the trajectory length of the trajectory in the parameter space from the acoustic parameter time series Pi(n). As described above, a trajectory length calculator is provided separately from the 1lill trace length calculator (22) of the NAT processing unit (21), and this!
The new acoustic parameter time series i (ml) of the NAT processing unit (21) is supplied to the FIL trace length calculator, and the locus of the trajectory from the new acoustic parameter time series Qt (2) to that parameter space is calculated. It is easy to understand that the same effect as in the above embodiment can be obtained by calculating the trajectory length and correcting the distance and signal based on this trajectory length.Furthermore, as shown in FIG. Even in a speech recognition device that performs DP ma-noting processing, the acoustic analysis section (
The acoustic parameter series of 2) is supplied to a trajectory length calculator, the trajectory length signal of this trajectory length calculator is added to the acoustic parameter series, and the DP Manning distance is corrected according to the trajectory length of the input pattern and the standard pattern. Even so, misrecognition can be relatively reduced. It goes without saying that the present invention is not limited to the above-described embodiments, and can take various other configurations without departing from the gist of the present invention.

発明の効果本発明音声認識装置に依れば音声信号入力部を有し、こ
の音声信号入力部の音声信号を音響分析部こ棋錨し、こ
の音響分析部に基づいて得た音響パラメータ系列を軌跡
長算出器に供給し、この軌跡長算出器にて音響パラメー
タ系列からそのパラメータ空間における軌跡の軌跡長を
算出し、入力パターンと標準パターンとをマツチング処
理した処理結果を入力パターン及び標準パターンの軌跡
長に応じて判定し、音声を認識するようにした為、同一
の音韻を含み全体の軌跡は異なるがリサンプリング点が
判定されるべきでない標準パターンに近づくときに誤認
識することを比較的少なくできる利益がある。Effects of the Invention According to the speech recognition device of the present invention, it has an audio signal input section, the audio signal from this audio signal input section is used as an anchor to the acoustic analysis section, and the acoustic parameter series obtained based on the acoustic analysis section is obtained. The trajectory length calculator calculates the trajectory length of the trajectory in the parameter space from the acoustic parameter series, and matches the input pattern and the standard pattern. Since the speech is recognized by making a judgment based on the trajectory length, it is relatively possible to make a false recognition when approaching a standard pattern that contains the same phoneme and has a different overall trajectory, but the resampling point should not be judged. There are benefits that can be reduced.

[Brief explanation of the drawing]

第１図はＤＰマツチング処理により音声認識を行なうよ
うにした音声認識装置の例を示す構成図、第２図はＤＰ
マツチング処理の説明に供する概念図、第３図は音響パ
ラメータ空間における軌跡の説明に供する線図、第４図
、第５図及び第６図は夫々１次元の入力パターンＡ、標
準パターンＡ′及び標準パターンＢ′の例を示す線図、
第７図は入力パターンＡのパラメータ時系列と標準パタ
ーンＡ′のパラメータ時系列とのＤＰマツチング処理に
よる時間軸正規化の説明に供する線図、第８図は入力パ
ターンＡのパラメータ時系列と標準パターンＢ′のパラ
メータ時系列とのＤＰマツチング処理による時間軸正規
化の説明に供する線図、第９図はＮＡＴ処理をして音声
認識を行なうようにした音声認識装置の例を示す構成図
、第１０図、第１１図、第１２図及び第１４図は夫々Ｎ
ＡＴ処理部の説明に供する線図、第１３図は補間点抽出
器の説明に供する流れ図、第１５図、第１６図及び第１
７図は夫々ＮＡＴ処理部にてＮＡＴ処理した入力パター
ンＡ、標準パターンＡ′及び標準パターンＢ′の１次元
の音響パラメータ時系列を示す線図、第１８図は同一の
音韻を含み全体の軌跡は異なるかりサンプリング点が近
い関係にあるパラメータ時系列の例を示ず路線図、第１
９図は本発明音声認識装置の一実施例を示す構成図であ
る。（１）は音声信号入力部としてのマイクロホン、（２）
は音響分析部、（３）はモード切換スイッチ、（４）は
標準パターンメモリ、（６）は最小距離判定部、（１１
八）。（１１Ｂ）　、　・・・・、　　（ｌｌｏ　）は１５チ
ヤンネルのデジタルバンドパスフィルタバンク、（１６
）は音声区間内パラメータメモリ、（２１）はＮＡＴ処
理部、（２２）は軌跡長算出器、（２３）は補間間隔算
出器、（２４）は補間点抽出器、（２５）はチェ・ビシ
エフ距１１１１ｔ算出部、（２６）は軌跡長信号付加器
、（２７）は距離信号補正器である。第１図第１３図第１４図第１８図第１９図 −ト続ネ市−ｔＥ’７Ｆ）’；’ 昭和５１〕年ｉｏ月　１１日昭和５９年　特　許願第１３０７１４号２°’；ａ　′
９１ｏｖ　ｔ＋　＋ｌｈ　　　イゆ４．工）４３、補正
をする者事件との関係　　　時計出願人住　所　東京部品用区北品用６丁ト１７　＆　３５　’
；名称（２１８）ソ　ニ　−株式会社代表取締役　大　賀　典　ム１Ｆ４、代理人６、補正により増加する発明の数７、補正の対象　　明細書の発明の８”ｌｊ細な説明の
欄。８、補正の内容（１）明細書中、第３４頁第３４行〜第７行Ｎ　（Ｎ＋
１）（Ｎ−１）（Ｎ　＝　１５）　　　　・・・　（２）Ｎ（Ｎ−］、
）（Ｎ＝１５）　　　　・・・　（３）」とあるを下記の通りに訂正する。１　　（１＋１）（１−１）（１＝　１５）　　　　・・・　（２）（１＝　１５）
　　　　・・・　（３）」（２）同、同頁第１４行〜第１５行とあるを下記の通りに訂正する。（３）同、第３４頁第３行ｒ　ＴＲ３ＩＪとあるをｒ　
ＴＲＬＩＪに訂正する。（４）　　同、第３６頁第７行「チェジヒエフ距離」と
あるを「チェビシェフ距離」に訂正する。以上Fig. 1 is a configuration diagram showing an example of a speech recognition device that performs speech recognition by DP matching processing, and Fig.
A conceptual diagram for explaining the matching process, FIG. 3 is a diagram for explaining the locus in the acoustic parameter space, and FIGS. 4, 5, and 6 are one-dimensional input pattern A, standard pattern A', and A diagram showing an example of standard pattern B',
Figure 7 is a diagram for explaining time axis normalization by DP matching processing between the parameter time series of input pattern A and the parameter time series of standard pattern A', and Figure 8 is a diagram showing the parameter time series of input pattern A and the standard pattern A'. A diagram for explaining time axis normalization by DP matching processing with the parameter time series of pattern B', FIG. 9 is a configuration diagram showing an example of a speech recognition device that performs speech recognition by performing NAT processing, Figures 10, 11, 12 and 14 are N
Figure 13 is a diagram for explaining the AT processing section, Figure 13 is a flowchart for explaining the interpolation point extractor, Figures 15, 16, and 1.
Figure 7 is a diagram showing the one-dimensional acoustic parameter time series of input pattern A, standard pattern A', and standard pattern B' that were each subjected to NAT processing by the NAT processing unit, and Figure 18 is a diagram showing the entire trajectory including the same phoneme. The route map, the first
FIG. 9 is a block diagram showing an embodiment of the speech recognition device of the present invention. (1) is a microphone as an audio signal input section; (2)
is the acoustic analysis section, (3) is the mode changeover switch, (4) is the standard pattern memory, (6) is the minimum distance determination section, (11) is the
Eight). (11B), ..., (llo) is a 15-channel digital bandpass filter bank, (16
) is the voice interval parameter memory, (21) is the NAT processing unit, (22) is the trajectory length calculator, (23) is the interpolation interval calculator, (24) is the interpolation point extractor, and (25) is Choi Byshiev. A distance 1111t calculating section, (26) a trajectory length signal adder, and (27) a distance signal corrector. Fig. 1 Fig. 13 Fig. 14 Fig. 18 Fig. 19 - Totsukune City - tE'7F)';' io month 11, 1978 Patent Application No. 130714 2°';a'
91ov t+ +lh Iyu4. (Engineering) 43. Relationship with the case of the person making the amendment. Address of the watch applicant: 6-chome, 17th &35', Kitashinyo, Tokyo Parts Store
; Name (218) Sony Co., Ltd. Representative Director Nori Ohga 1F 4, Agent 6, Number of inventions increased by amendment 7, Subject of amendment 8"lj Detailed explanation column of the invention in the specification. 8 , Contents of amendment (1) In the specification, page 34, line 34 to line 7 N (N+
1) (N-1) (N = 15) ... (2) N (N-],
) (N=15) ... (3)'' should be corrected as follows. 1 (1+1) (1-1) (1= 15) ... (2) (1= 15)
... (3)'' (2) Same page, lines 14 to 15 are corrected as follows. (3) Same, page 34, line 3 r TR3IJ
Correct to TRLIJ. (4) Same, page 36, line 7, ``Chezykhiev distance'' is corrected to ``Chebyshev distance.''that's all

Claims

[Claims]

comprising an audio signal input section, supplies the audio signal of the audio signal input section to an acoustic analysis section, supplies an acoustic parameter series obtained based on the acoustic analysis section to a trajectory length calculator, and supplies the acoustic parameter series obtained based on the acoustic analysis section to a trajectory length calculator; The length of the trajectory in the parameter space is calculated from the acoustic parameter series, the processing result of matching the input pattern and the standard pattern is determined according to the trajectory length of the input pattern and the standard pattern, and the audio is A speech recognition device characterized in that it recognizes speech.