JPS60249199A

JPS60249199A - Voice recognition equipment

Info

Publication number: JPS60249199A
Application number: JP59106178A
Authority: JP
Inventors: 曜一郎佐古; 平岩　篤信; 誠赤羽; 雅男渡
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1984-05-25
Filing date: 1984-05-25
Publication date: 1985-12-09
Anticipated expiration: 2009-05-02
Also published as: JPH0634182B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は音声を認識する音声認識装置に関する。[Detailed description of the invention] Industrial applications The present invention relates to a speech recognition device that recognizes speech.

背景技術とその問題点従来、音声の発声速度変動に対処した音声認識装置とし
て例えば特開昭５０−９６１０４号公報に示されるよう
なりＰマツチング処理を行なうようにしたものが提案さ
れている。BACKGROUND TECHNOLOGY AND PROBLEMS Conventionally, as a speech recognition device that copes with variations in speech rate, a device that performs P matching processing has been proposed, for example, as disclosed in Japanese Patent Application Laid-Open No. 50-96104.

先ず、このＤＰＰマツチング理にて音声認識を行なうよ
うにした音声認識装置について説明する。First, a speech recognition device that performs speech recognition using this DPP matching process will be described.

第１図において、（１）は音声信号入力部としてのマイ
クロホンを示し、このマイクロホン（１）からの音声信
号が音響分析部（２）に供給され、この音響分析部（２
）にて音響パラメータ時系列Ｐｉ（ｎｌが得られる。In FIG. 1, (1) indicates a microphone as an audio signal input section, and an audio signal from this microphone (1) is supplied to an acoustic analysis section (2).
), the acoustic parameter time series Pi(nl) is obtained.

この音響分析部（２）において例えばバンドパスフィル
タバンクの整流平滑化出力が音響パラメータ時系列Ｐｉ
（ｎｌ　（ｉ　＝　ｌ、・・・・＋Ｉｉ　Ｉはバンドパ
スフィルタバンクのチャンネル数、ｎ＝１．・・・・、
ＮＵＮは音声区間判定により切り出されたフレーム数で
ある。）として得られる。In this acoustic analysis section (2), for example, the rectified and smoothed output of the bandpass filter bank is converted into the acoustic parameter time series Pi.
(nl (i = l,...+Ii I is the number of channels of the bandpass filter bank, n=1...,
NUN is the number of frames cut out by voice section determination. ) is obtained as

この音響分析部（２）の音響パラメータ時系列Ｐｉ（ｎ
）がモード切換スイッチ（３）により、登録モードにお
いては認識対象語毎に標準パターンメモリ（４）に格納
され、認識モードにおいてはＤＰＰマツチング離計算部
（５）の一端に供給される。又、この認識モードにおい
ては標準パターンメモリ（４）に格納されている標準パ
ターンがＤＰマツチング距距離針部部５）の他端に供給
される。The acoustic parameter time series Pi(n
) is stored in the standard pattern memory (4) for each recognition target word in the registration mode by the mode changeover switch (3), and is supplied to one end of the DPP matching distance calculation unit (5) in the recognition mode. Further, in this recognition mode, the standard pattern stored in the standard pattern memory (4) is supplied to the other end of the DP matching distance needle section 5).

このＤＰマツチング距距離針部部５）にてその時入力さ
れている音声の音響パラメータ時系列Ｐ　ｉ　（ｎ）よ
りなる入カバターンと標準パターンメモ１月４）の標準
パターンとのＤＰＰマツチング離計算処理がなされ、こ
のＤＰマツチング距距離計郡部５）のＤＰＰマツチング
離を示す距離信号が最小距離判定部（６）に供給され、
この最小距離判定部（６）にて入カバターンに対してＤ
ＰＰマツチング離が最小となる標準パターンが判定され
、この判定結果より入力音声を示す認識結果が出力端子
（７）に得られる。This DP matching distance needle section 5) performs DPP matching distance calculation processing between the input cover turn consisting of the acoustic parameter time series P i (n) of the voice input at that time and the standard pattern of the standard pattern memo January 4). is performed, and a distance signal indicating the DPP matching distance of the DP matching distance meter group unit 5) is supplied to the minimum distance determining unit (6),
In this minimum distance determination unit (6), D
A standard pattern with the minimum PP matching separation is determined, and from this determination result, a recognition result indicating the input voice is obtained at the output terminal (7).

ところで、一般に標準パターンメモリ（４）に格納され
る標準パターンのフレーム数Ｎは発声速度変動や単語長
の違いにより異なっている。ＤＰＰマツチング理により
この発声速度変動や単語長の違いに対処する為の時間軸
正規化がなされる。Incidentally, the number N of frames of the standard pattern stored in the standard pattern memory (4) generally varies depending on variations in speaking speed and differences in word length. The DPP matching process performs time axis normalization to deal with variations in speaking speed and differences in word length.

以下、このＤＰＰマツチング理について説明する。ここ
で、簡単の為に音響パラメータ時系列Ｐ　ｉ　（ｎｌの
周波数軸方向ｉに対応する次元を省略して標準パターン
のパラメータ時系列をｂｌ、・・・・。This DPP matching process will be explained below. Here, for simplicity, the dimension corresponding to the frequency axis direction i of the acoustic parameter time series P i (nl is omitted, and the parameter time series of the standard pattern is expressed as bl, . . . ).

ｂＮ、入カバターンのパラメータ時系列をａ１＋・・・
・＋　ａＭとして、端点固定のＤＰ−パス２嘗の場合の
ＤＰＰマツチング理について説明する。bN, parameter time series of input pattern is a1+...
・As + aM, the DPP matching process in the case of two DP-passes with fixed end points will be explained.

第２図はＤＰＰマツチング理の概念図を示し、横軸に入
力パラメータ（Ｍ＝１９）が並べられ、縦軸に標準パラ
メータ（Ｎ＝１２）が並べられ、この第２図に示す（Ｍ
、　Ｎ）格子状平面に於ける・点はＭＸＮ個であり、こ
の各・点に１つの距離が対応する。例えばａ３とｂ５と
の距離がａ３から縦に伸した直線と、ｂ５から横に伸し
た直線との交点に位置する・に対応する。この場合、距
離として例えばチェビシェフ距離を取れば、ａ３とｂ５
との距離はＤａ５ばとなる。そして、端点固定のＤＰ−バスとして、格子点
（ｍ、ｎ）に対してこの格子点（ｍ、ｎ）に結びつける
前の状態として左側の格子点（ｍ　−１、ｎ）、斜め左
下側の格子点（ｍ−１，ｎ−１）及び下側の格子点（ｍ
、ｎ−１）の３つ店だけを許した場合、始点、即ちａｌ
とｂｌとのチェビシェフ距ｌｉｔ　Ｄ　１’１を示す点
■から出発し、パス（経路）−↑／として許される３方
向に位置する点の距離を計算して、３距離中最小の方向
にパスを進め、終点、即ちａｌ９とｂｌ２とのチェビシ
ェフ距離Ｄ　１９１２を示す点◎に到達する迄同様に次
々とバスを進めて、バス上の点の距離の総和（若しくは
、この総和を通過した点の数で割ったもの）が入カバタ
ーンのパラメータ時系列ａ１＋　・・・・、ａＭと標準
パターンのパラメータ時系列ｂ１．　・・・・＋　ｂＮ
とのＤＰＰマツチング離となされる。従って、標準パタ
ーンの数がＬ個ある場合には入カバターンに対するＤＰ
Ｐマツチング離がＬ個求められ、このＬ個のＤＰマツチ
ング距離中最小の距離となる標準パターンが認識結果と
なされる。Figure 2 shows a conceptual diagram of the DPP matching process, in which input parameters (M = 19) are arranged on the horizontal axis, and standard parameters (N = 12) are arranged on the vertical axis.
, N) There are MXN points on the grid plane, and one distance corresponds to each point. For example, the distance between a3 and b5 corresponds to the intersection of a straight line extending vertically from a3 and a straight line extending horizontally from b5. In this case, if we take the Chebyshev distance as the distance, a3 and b5
The distance from this point is Da5. As a DP-bus with fixed end points, the state before connecting to the grid point (m, n) is the grid point (m -1, n) on the left side, the diagonally lower left side The grid point (m-1, n-1) and the lower grid point (m
, n-1), the starting point, that is, al
Chebyshev distance lit D between and bl Starting from the point ■ indicating 1'1, calculate the distances of points located in the three directions allowed as a path (route) -↑/, and take the path in the minimum direction among the three distances. In the same way, the bus is advanced one after another until it reaches the end point, that is, the point ◎ indicating the Chebyshev distance D 1912 between al9 and bl2. ) is the input pattern parameter time series a1+..., aM and the standard pattern parameter time series b1.・・・＋ bN
DPP matching separation is performed. Therefore, if the number of standard patterns is L, the DP for the input cover turn is
L P matching distances are determined, and the standard pattern with the minimum distance among the L DP matching distances is determined as the recognition result.

この様なりＰマツチング処理による音声認識装置によれ
ば発声速度変動や単語長の違いに対処、即ち時間軸正規
化のなされた音声認識を行なうことができる。As described above, the speech recognition device using P matching processing can deal with variations in speaking speed and differences in word length, that is, perform speech recognition with time axis normalization.

然し乍ら、この様なりＰマツチング処理により音声認識
を行なうものにおいては、音声の定常部がＤＰＰマツチ
ング離に大きく反映し、部分的に類似しているような語
い間に於いて誤認識し易いということが明らかとなった
。However, when speech recognition is performed using P matching processing like this, the stationary parts of the speech are largely reflected in the DPP matching distance, and it is easy to misrecognize between words that are partially similar. It became clear that

即ち、音響パラメータ時系列Ｐｉ（ｎｌはそのパラメー
タ空間で軌跡を描くと考えることができる。実際には各
フレームｎのパラメータがパラメータ空間内の１点に対
応することから、点列ではあるが時系列方向に曲線で結
んでいくと始点から終点迄の１つの軌跡が考えられる。In other words, the acoustic parameter time series Pi (nl) can be thought of as drawing a trajectory in its parameter space.Actually, since the parameter of each frame n corresponds to one point in the parameter space, although it is a point sequence, it can be thought of as drawing a trajectory in the parameter space. By connecting curves in the series direction, one trajectory from the starting point to the ending point can be considered.

例えば２種類の単語“ＩＡＩ″と“ＳＡＩ”とを登録し
た場合、夫々の標準パターンＡ’、Ｂ’は第３図に示す
如く“Ｈ”、“Ａ”、“■“、“Ｓ”の各音韻領域を通
過する軌跡を描く。そして、認識モードで”ＳＡＩ”と
発声した場合、この入カバターンＡＡ′の“ＳＡＩ″の
′″Ａ→■”の部分より標準パターンＢ′のＨＡＴ”の
“Ａ→■”の部分により近い兼跡を描くことがある。こ
れは特定話者認識の場合には登録時と認識時とで音質が
若干変わればあり得ることである。For example, when two types of words "IAI" and "SAI" are registered, the respective standard patterns A' and B' are "H", "A", "■", and "S" as shown in Figure 3. Draw a trajectory that passes through each phonological region. When uttering "SAI" in recognition mode, the part "A→■" of "HAT" in standard pattern B' is closer to the part "A→■" of "SAI" in input pattern B' than the part "A→■" of "SAI" in input pattern AA'. In the case of specific speaker recognition, this can occur if the sound quality changes slightly between the time of registration and the time of recognition.

ここで、第３図に示す如く入カバターンＡが全体的には
標準パターンＡ′に沿い、部分的には標準パターンＢ′
に沿う軌跡を描く場合にＤＰマッチング処理により誤認
識を招く場合を１次元パラメータを例に説明する。この
場合、第３図に示す状況、即ち部分的に類似している語
い間の関係と同様の１次元パラメータ時系列として第４
図に示す如き入カバターンＡ　、　２．４．６．８．８
．８．８．６゜４、４．４．６．８と、第５図に示す如
き標準パターンＡ’　；　３．５．７．９．９．９．飢
？、　５．５．７．９と、第６図に示す如き標準パター
ンＢ’　；　７．６．６．８゜８、８．８．６．４．４
．４とを考える。これら第４図乃至第６図のパターンよ
り明らかな如（入カバクーンＡは標準パターンＡ′と判
定されて欲しいパターンである。ところが、入カバター
ンＡに対する標準パターンＡ′及びＢ′のＤＰマツチン
グ距離を計算すると、入カバターンＡは標準パターンＢ
′に近いことが示される。Here, as shown in FIG. 3, the input cover pattern A generally follows the standard pattern A', and partially follows the standard pattern B'.
A case will be explained using a one-dimensional parameter as an example where DP matching processing causes erroneous recognition when drawing a trajectory along . In this case, the situation shown in Figure 3, that is, the fourth
Inlet cover turn A as shown in the figure, 2.4.6.8.8
．． 8.8.6°4, 4.4.6.8 and standard pattern A' as shown in FIG. 5; 3.5.7.9.9.9. Hunger? , 5.5.7.9 and standard pattern B' as shown in Figure 6; 7.6.6.8°8, 8.8.6.4.4
．． 4. As is clear from the patterns in FIGS. 4 to 6 (the input cover turn A is a pattern that is desired to be determined as the standard pattern A'), the DP matching distance of the standard patterns A' and B' with respect to the input cover turn A is When calculated, input cover pattern A is standard pattern B.
′ is shown to be close to .

即ち、入カバターンＡに対する標準パターンＡ′のＤＰ
マツチング処理として第２図と同様、第７図に示す如く
横軸に入カバターンＡのパラメータ時系列２．４．６．
８．８．８．８．６．４．４．４．６．８を並べ、縦軸
に標準パターンＡ′のパラメータ時系列３．５．７．９
．９．９．９．７．５．５．７．９を並べ、格子状平面
に於ける交点に対応して入カバターンＡの個々のパラメ
ータに対する標準パターンＡ′の個々のパラメータのチ
ェビシェフ距離をめる。That is, the DP of the standard pattern A' for the input cover pattern A
Similar to FIG. 2, the matching process is performed as shown in FIG.
8.8.8.8.6.4.4.4.6.8 are arranged, and the vertical axis is the parameter time series 3.5.7.9 of standard pattern A'.
．． 9.9.9.7.5.5.7.9 are arranged and find the Chebyshev distance of each parameter of standard pattern A' with respect to each parameter of input cover pattern A corresponding to the intersection in the grid plane. Ru.

そして、入力パラメータＡのパラメータ時系列の第１番
目のパラメータ２と、標準パラメータＡ′のパラメータ
時系列の第１番目のパラメータ３とのチェビシェフ距離
Ｄ１１＝１の点を始点とし、入カバターンＡのパラメー
タ時系列の第１３番目のパラメータ、８と、標準パター
ンＡ′のパラメータ時系列の第１２番目のパラメータ９
とのチェビシェフ距１１１Ｄｔｔｔ２＝　１　）点ヲＨ
点トシ、ＤＰ−パスとして第２［ｉ！！Ｉの場合と同様
、任意の点に対する前の状態としてその任意の点の左側
の点、下側の点及び斜め左下側の点を取ることを許した
場合（このパスを実線矢印にて示す。）、パス上の点は
Ｄ　１ｌ−Ｄ２２　０３３　Ｄ４４　Ｄ６６　Ｄｅｓ　
Ｄｌ−Ｄｅｓ　Ｄ９９Ｄ　１ｏ　１ｏ　Ｄ　１１ｒｏ　
Ｄ　１２１０−　Ｄ　１３１１Ｄ　１３１２の１４点で
あり、その距離の総和は１４であり、（距離の総和／点
数）はｌである。又、この第７図において、ＤＰ−パス
として任意の点に対する前の状態として任意の点の左側
の点及び下側の点を取ることを許した場合（このパスを
破線矢印にて示す。）、パス上の点はＤ　ｘｌ−Ｄ　２
１−　Ｄ　２２　Ｄ　３２　Ｄ　３３−　Ｄ　４３Ｄ５
３　ＤＧ３　Ｄ７３−ＤＰ４　０７６　Ｄｖｓ　Ｄ７７
−Ｄｏｌｌ　Ｄｓｓ　Ｄｓｓ−Ｄｓｓ　Ｄｔｏｓ　Ｄ１
２９　Ｄｌｘｌ。Then, starting from the point where the Chebyshev distance D11 = 1 between the first parameter 2 of the parameter time series of input parameter A and the first parameter 3 of the parameter time series of standard parameter A', the input cover turn A is The 13th parameter 8 in the parameter time series and the 12th parameter 9 in the parameter time series of standard pattern A'
Chebyshev distance 111Dttt2= 1) point woH
Point Toshi, DP-Pass as the second [i! ! As in the case of I, when the previous state for any point is allowed to take the point to the left, the point below, and the point diagonally to the lower left of the arbitrary point (this path is shown by a solid arrow). ), the points on the path are D 1l-D22 033 D44 D66 Des
Dl-Des D99D 1o 1o D 11ro
There are 14 points, D 1210-D 1311D 1312, and the sum of their distances is 14, and (sum of distances/number of points) is l. Also, in this FIG. 7, if it is allowed to take a point to the left of an arbitrary point and a point below it as the previous state for an arbitrary point as a DP-path (this path is indicated by a broken line arrow). , the points on the path are D xl−D 2
1- D 22 D 32 D 33- D 43D5
3 DG3 D73-DP4 076 Dvs D77
-Doll Dss Dss-Dss Dtos D1
29 Dlxl.

Ｄ１２１’１−Ｄ１３１スーＤ　１ａ　ｘ２の２４点で
あり、距離の総和は２４、（距離の総和／点数）は１で
ある。There are 24 points of D121'1-D131 Sue D 1a x2, the total distance is 24, and (total distance/score) is 1.

一方、入カバターンＡに対する標準パターンＢ′のＤＰ
マツチング処理を上述第７図に示す場合と同様、第８図
に示す如く行なう。即ち、入カバターンＡの個々のパラ
メータ２．４．６．８．８．８．８゜６、４．４．４．
６．８に対する標準パターンＢ′の個個のパラメータ７
、６．６．８．８．８．８．６．４．４゜４のチェビシ
ェフ距離をめ、ＤＰ−パスとして任意の点に対する前の
状態としてその任意の点の左側の点、下側の点及び斜め
左下側の点を取ることを許した場合（実線矢印）、パス
上の点はＤｌｌＤ　２２　Ｄ　３３　Ｄ　４４　Ｄ　６
６−　Ｄ　ｓｓ　−Ｄ　？？　−Ｄ　５ｓ−Ｄ　９９　
Ｄ　ｓｏ　Ｉｏ　Ｄ　１１１１Ｄ　１２１’Ｉ　Ｄ　１
３　ｘ’ｘの１３点であり、その距離の総和は１３であ
り、（距離の総和／点数）は１である。又、この第８図
において、ＤＰ−パスとして任意の点に対する前の状態
として任意の点の左側の点及び下側の点を取ることを許
した場合（破線矢印）、パス上の点はＤ　１’ｌ　Ｄ　
２１−Ｄ　３１−　Ｄ　３２−　Ｄ　３３　Ｄ　４３−
　Ｄ　４４　Ｄ　６４−Ｄ　６４−Ｄ　？４　Ｄ　ｖｓ
　Ｄ　？Ｇ　Ｄ　？Ｔ　−Ｄ　ａｔ　−Ｄ　ａ＠−Ｄ　
ｓｏ　Ｄ　５ｓＤｓ　５ｏ−Ｄｓ　１１−Ｄ＋ｏｓＩＤ
！’１１４　Ｄ１２１’Ｉ　Ｄ１３１’１の２３点であ
り、距離の総和は２１、（距離の総和／点数）は０．！
１）１３である。On the other hand, DP of standard pattern B' for input cover pattern A
The matching process is performed as shown in FIG. 8, similar to the case shown in FIG. 7 above. That is, the individual parameters of input cover pattern A 2.4.6.8.8.8.8°6, 4.4.4.
Individual parameters 7 of standard pattern B' for 6.8
, 6.6.8.8.8.8.6.4.4 Determine the Chebyshev distance of 4, and as the previous state for any point as a DP-path, the point on the left of that arbitrary point, the lower side If you allow points and points on the diagonal lower left side (solid arrow), the points on the path are DllD 22 D 33 D 44 D 6
6-Dss-D? ? -D 5s-D 99
D so Io D 1111D 121'I D 1
There are 13 points of 3x'x, and the total distance is 13, and (total distance/number of points) is 1. In addition, in this Figure 8, if it is allowed to take a point to the left of any point and a point below it as the previous state for any point as a DP-path (dashed line arrow), the point on the path will be D. 1'l D
21-D 31- D 32- D 33 D 43-
D 44 D 64-D 64-D? 4 D vs.
D? GD? T -D at -D a@-D
so D 5sDs 5o-Ds 11-D+osID
! '114 D121'I D131'1 There are 23 points, the total distance is 21, and (total distance/number of points) is 0. !
1) It is 13.

この結果より明らかな様にＤＰ−パスを２方向−１とし
その（距離の総和／点数）をＤＰマツチング距離とした
場合入カバターンＡが標準パターンＢ′と判定され、２
方向−Ｔにおける（距離の総和／点数）をＤＰマツチン
グ距離としたり３方向力における（距離の総和／点数）
をＤＰマツチング距離とした場合には入カバターンＡが
標準パターンＡ′及びＢ′に対して等距離１となり判定
されるべき結果が得られない。この様にＤＰマツチング
処理においては部分的に類似しているような語い間に於
いて誤認識し易い。As is clear from this result, when the DP-path is 2 directions - 1 and its (sum of distances/scores) is the DP matching distance, the input cover turn A is determined to be the standard pattern B', and 2
(Total distance/number of points) in direction -T is the DP matching distance, or (Total sum of distance/number of points) in 3-direction force
If the DP matching distance is taken as the DP matching distance, the input cover pattern A will be equidistant 1 from the standard patterns A' and B', and the result that should be determined will not be obtained. In this way, in the DP matching process, it is easy to misrecognize words that are partially similar.

又、ＤＰマツチング処理においては上述した様に標準パ
ターンのフレーム数Ｎが不定であり、しかも入カバター
ンに対して全標準パターンをＤＰマツチング処理する必
要があり、語いが多くなるとそれに伴って演算量が飛躍
的に増加し、標準パターンメモ１月４）の記憶容量や演
算量の点で問題があった。In addition, in the DP matching process, as mentioned above, the number of frames N of the standard pattern is undefined, and moreover, it is necessary to perform the DP matching process on all the standard patterns for the input cover turn, and as the number of words increases, the amount of calculation increases accordingly. This has caused problems in terms of storage capacity and amount of calculation for standard pattern memos (January 4).

発明の目的本発明は斯かる点に鑑み部分的に類似しているような語
い間に於いても誤認識することが比較的少なく、且つ標
準パターンメモリの記憶容量や演算量が比較的少ない音
声認識装置を得ることを目的とする。Purpose of the Invention In view of the above, the present invention has relatively few erroneous recognitions even between words that are partially similar, and the storage capacity of the standard pattern memory and the amount of calculation are relatively small. The purpose is to obtain a speech recognition device.

発明の概要本発明は音声信号入力部を有し、この音声信号入力部か
らの音声信号を音響パラメータ系列に変換し、この音響
パラメータ系列からそのパラメータ空間における軌跡を
推定し、この軌跡に基づいて音声信号を認識するように
したものであり、斯かる本発明音声認識装置に依れば部
分的に類似しているような語い間に於いても誤認識する
ことが比較的少なく、且つ標準パターンメモリの記憶容
量や演算量が比較的少ないものを得ることができる利益
がある。Summary of the Invention The present invention has an audio signal input section, converts an audio signal from the audio signal input section into an acoustic parameter series, estimates a trajectory in the parameter space from this acoustic parameter series, and estimates a trajectory in the parameter space based on this trajectory. The speech recognition device of the present invention is designed to recognize speech signals, and with the speech recognition device of the present invention, misrecognition is relatively rare even between partially similar words, and it is possible to recognize speech signals that are similar to the standard. There is an advantage that the storage capacity of the pattern memory and the amount of calculation can be relatively small.

実施例以下、第９図乃至第１７図を参照しながら本発明音声認
識装置の一実施例について説明しよう。この第９図乃至
第１７図において第１図乃至第８図と対応する部分に同
一符号を付してその詳細な説明は省略する。Embodiment Hereinafter, an embodiment of the speech recognition apparatus of the present invention will be described with reference to FIGS. 9 to 17. In FIGS. 9 to 17, parts corresponding to those in FIGS. 1 to 8 are designated by the same reference numerals, and detailed explanation thereof will be omitted.

第９図において、（１１は音声信号入力部としてのマイ
クロホンを示し、このマイクロホン（１）がらの音声信
号を音響分析部（２）の増幅器（８）に供給し、この増
幅器（８）の音声信号をカットオフ周波数５．５ＫＨｚ
のローパスフィルタ（９）を介してサンプリング周波数
１２．５ＫＨｚの１２ビツトＡ／Ｄ変換器（＋０１　ニ
供給し、このＡ／Ｄ変換器００）のデジタル音声信号を
１５チヤンネルのデジタルバンドパスフィルタバンク（
ＩＩＡ）　、（ＩＩＢ）　、　旧・、（ｌｌｏ）に供給
する。In FIG. 9, (11 indicates a microphone as an audio signal input section, the audio signal from this microphone (1) is supplied to an amplifier (8) of an acoustic analysis section (2), and the audio signal of this amplifier (8) is Signal cutoff frequency 5.5KHz
A 12-bit A/D converter (+01) with a sampling frequency of 12.5 KHz is supplied through a low-pass filter (9) of
IIA), (IIB), old, (llo).

この１５チヤンネルのデジタルバンドパスフィルタバン
ク（ＩＩＡ）　、（ＩＩＢ）　、・・・・、（ｌｌｏ）
は例えばバターワース４次のデジタルフィルタにて構成
し、２５０Ｈｚから５．５ＫＨｚまでの帯域が対数軸上
で等間隔となるように割り振られている。そして、各テ
ジタルバンドパスフィルタ（ＩＩＡ　）　、（ｌｉＢ）
　。This 15-channel digital bandpass filter bank (IIA), (IIB), ..., (llo)
is constituted by, for example, a Butterworth fourth-order digital filter, and the bands from 250 Hz to 5.5 KHz are distributed at equal intervals on the logarithmic axis. And each digital bandpass filter (IIA), (liB)
.

・・・・、（ｌｌｏ）の出力信号を１５チヤンネルの整
流器（１２Ａ　）　、（１２Ｂ　）　、・・・・、（１
２ｏ）に夫々供給し、これら整流器（１２＾）　、（１
２Ｂ）　、・・・・。..., (llo) through a 15-channel rectifier (12A), (12B), ..., (1
2o), and these rectifiers (12^) and (1
2B) ,...

（１２ｏ）の２乗出力を１５チヤンネルのデジタルロー
パスフィルタ（１３Ａ　）　、（１３ｓ　）　、・・・
・、（１３ｏ）に夫々供給する。これらデジタルローパ
スフィルタ（１３八）　、（１３ｓ）　、・・・・、（
１３ｏ）はカットオフ周波数５２．８Ｈ２のＦＩＲ（有
限インパルス応答形）ローパスフィルタにて構成する。The square output of (12o) is passed through a 15-channel digital low-pass filter (13A), (13s),...
, (13o), respectively. These digital low-pass filters (138), (13s), ..., (
13o) is constituted by an FIR (finite impulse response type) low-pass filter with a cutoff frequency of 52.8H2.

そして、各デジタルローパスフィルタ（１３Ａ）。and each digital low-pass filter (13A).

（１３ｓ）、・・・・、（１３ｏ）の出力信号をサンプ
リング周期５．１２ｍ５のサンプラー（１４）に供給す
る。The output signals of (13s), . . . , (13o) are supplied to a sampler (14) with a sampling period of 5.12 m5.

このサンプラー（１４）によりデジタルローパスフィル
タ　（１３Ａ）　、（１３日）、・・・・、（１３ｏ）
の出力信号をフレーム周期５．１２ｍ５毎にサンプリン
グし、このサンプラー（１４）のサンプリング信号を音
源情報正規化器（１５）に供給する。この音源情報正規
化器（１５）は認識しようとする音声の話者による個人
的な特徴の違いを除去するものである。This sampler (14) allows digital low-pass filters (13A), (13th), ..., (13o)
The output signal of the sampler (14) is sampled at every frame period of 5.12 m5, and the sampling signal of this sampler (14) is supplied to the sound source information normalizer (15). This sound source information normalizer (15) removes differences in individual characteristics between speakers of speech to be recognized.

即ち、フレーム周期毎にサンプラー（１４）から供給さ
れるサンプリング信号Ａｔ（ｎｌ　（ｉ　＝　１＋・・
・・。That is, the sampling signal At(nl (i = 1+...) supplied from the sampler (14) every frame period
....

１５；ｎ：フレーム番号）に対して −Ａｉ（ｎ）＝　ｌｏｇ　（Ａｔ（ｎ）＋Ｂ）・・・（
１）なる対数変換がなされる。この（１１式において、
Ｂはバイアスでノイズレベルが隠れる程度の値を設定す
る。そして、声帯音源特性をｙｉ　＝ａ　−ｉ＋ｂなる
式で近似する。このａ及びｂの計数は次式により決定さ
れる。15;n: frame number) -Ai(n)=log(At(n)+B)...(
1) A logarithmic transformation is performed. In this (11 formula,
B is set to a value such that the noise level is hidden by the bias. Then, the vocal cord sound source characteristics are approximated by the formula yi =a - i + b. The counts of a and b are determined by the following equation.

（Ｎ＝１５）　・・・　（２）（Ｎ　＝　１５）　・　・　・　（３）そして、音源の
正規化されたパラメータをＰｉ（ｎｌとすると、ａ　（
ｎｌ　＜　０のときパラメータＰｉ（ｎ）はＰｉ（ｎ）
−＾１（ｎ）　−（ａ［ｎ）・ｉ　＋ｂ（ｎｌ）　−−
−＋４１と表わされる。(N=15) ... (2) (N = 15) ... (3) Then, if the normalized parameter of the sound source is Pi (nl), then a (
When nl < 0, the parameter Pi(n) is Pi(n)
−＾1(n) −(a[n)・i +b(nl) −−
−+41.

又、ａ　（ｎ）≧０のときレベルの正規化のみ行ない、
パラメータｐｔｔｎ）は・・・　（５）と表わされる。Also, when a (n)≧0, only level normalization is performed,
The parameter pttn) is expressed as... (5).

この゛様な処理により音源特性の正規化されたパラメー
タＰ　ｉ　（ｎｌを音声区間内パラメータメモリ　（１
６）に供給す、る。この音声区間内パラメータメモリ（
１６）は後述する音声区間判定部（１７）からの音声区
間判定信号を受けて音源特性の正規化されたパラメータ
Ｐｉ（ｎ）を音声区間毎に格納する。Through this type of processing, the normalized parameters P i (nl) of the sound source characteristics are stored in the voice interval parameter memory (1
6) Supply to. This voice interval parameter memory (
16) receives a speech section determination signal from a speech section determination section (17), which will be described later, and stores a normalized parameter Pi(n) of the sound source characteristic for each speech section.

一方、Ａ／Ｄ変換器Ｑｌのデジタル音声信号を音声区間
判定部（１７）のゼロクロスカウンタ（１８）及びパワ
ー算出器（１９）に夫々供給する。このゼロクロスカウ
ンタ（１８）は５．１２ｍ５毎にその区間の６４点のデ
ジタル音声信号のゼロクロス数をカウントし、そのカウ
ント値を音声区間判定器（２０）の第１の入力端に供給
する。又、パワー算出器（１９）は５．１２ｍ５毎にそ
の区間のデジタル音声信号のパワー、即ち２乗和をめ、
その区間内パワーを示すパワー信号を音声区間判定器（
２０）の第２の入力端に供給する。更に、音源情報正規
化器（１５）の音源正規化情報ａ　（ｎｌ及びｂ　（＋
１）を音声区間判定器（２０）の第３の入力端に供給す
る。そして、音声区間判定器（２０）においてはゼロク
ロス数、区間内パワー及び音源正規化情報ａ　（ｎｌ、
ｂ　（ｎｌを複合的に処理し、無音、無声音及び有声音
の判定処理を行ない、音声区間を決定する。この音声区
間判定器（２０）の音声区間を示す音声区間判定信号を
音声区間判定部（１７）の出力として音声区間内パラメ
ータメモリ　（１６）に供給する。On the other hand, the digital audio signal from the A/D converter Ql is supplied to a zero cross counter (18) and a power calculator (19) of the audio section determining section (17), respectively. This zero cross counter (18) counts the number of zero crosses of the digital audio signal at 64 points in that section every 5.12 m5, and supplies the count value to the first input terminal of the audio section determiner (20). In addition, the power calculator (19) calculates the power of the digital audio signal in that section every 5.12 m5, that is, the sum of squares,
The power signal indicating the power within the interval is sent to the voice interval judger (
20). Furthermore, the sound source normalization information a (nl and b (+) of the sound source information normalizer (15)
1) is supplied to the third input terminal of the speech segment determiner (20). Then, in the voice section determiner (20), the number of zero crossings, the power within the section, and the sound source normalization information a (nl,
b (processes nl in a complex manner, performs processing to determine silent, unvoiced, and voiced sounds, and determines a speech section. A speech section determination signal indicating the speech section of this speech section determiner (20) is sent to the speech section determination section. The output of (17) is supplied to the voice section parameter memory (16).

この音声区間内パラメータメモリ（１６）に格納された
音声区間毎に音源特性の正規化された音響パラメータＰ
Ｈｎ）をその時系列方向にＮＡＴ処理部（２１）に供給
する。このＮＡＴ処理部（２１）はＮＡＴ処理として音
響パラメータ時系列Ｐ　ｌ　（ｎｌからそのパラメータ
空間における軌跡を推定し、この軌跡に基づいて新たな
音響パラメータ時系列Ｑｉ（ロ）を形成する。Normalized acoustic parameters P of sound source characteristics for each voice section stored in this intra-voice section parameter memory (16)
Hn) is supplied to the NAT processing unit (21) in the chronological direction. This NAT processing unit (21) estimates a trajectory in the parameter space from the acoustic parameter time series P l (nl) as a NAT process, and forms a new acoustic parameter time series Qi (b) based on this trajectory.

ここで、このＮＡＴ処理部（２１）について更に説明す
る。音響パラメータ時系列Ｐｉ（ｎｌ　（ｉ　−１，・
・・・＋Ｉｉ　ｎ　＝　Ｌ・・・・、Ｎ）はそのパラメ
ータ空間に点列を描く。第１０図に２次元パラメータ空
間に分布する点列の例を示す。この第１０図に示す如く
音声の非定常部の点列は粗に分布し、準定常部は密に分
布する。この事は完全に定常であればパラメータは変化
せず、その場合には点列はバラメーク空間に停留するこ
とからも明らかである。Here, this NAT processing section (21) will be further explained. Acoustic parameter time series Pi(nl (i −1,・
...+Ii n = L..., N) draws a point sequence in the parameter space. FIG. 10 shows an example of a point sequence distributed in a two-dimensional parameter space. As shown in FIG. 10, the point sequence of the non-stationary part of the voice is distributed coarsely, and the quasi-stationary part is densely distributed. This is clear from the fact that if it is completely stationary, the parameters will not change, and in that case the point sequence will remain in the variable space.

第１１図は第１０図に示す如き点列上に滑らかな曲線で
軌跡を描いた例を示す。この第１１図に示す如（点列に
対して軌跡を推定できれば、音声の発声速度変動に対し
て軌跡は殆ど不変であると考えることができる。何故な
らば、音声の発声速度変動による時間長の違いは殆どが
準定當部の時間的伸縮（第１０図に示す如き点列におい
ては準定常部の点列密度の違いに相当する。）に起因し
、非定常部の時間長の影響は少ないと考えられるからで
ある。FIG. 11 shows an example in which a locus is drawn as a smooth curve on the point sequence as shown in FIG. As shown in Fig. 11 (if a trajectory can be estimated for a sequence of points, it can be considered that the trajectory remains almost unchanged even when the speech rate changes. Most of the differences are due to the temporal expansion and contraction of the quasi-constant part (corresponding to the difference in the density of the quasi-stationary part in the point sequence shown in Fig. 10), and the influence of the time length of the unsteady part is This is because it is considered to be small.

ＮＡＴ処理部（２１）においてはこの様な音声の発声速
度変動に対する軌跡の不変性に着目して時間軸正規化を
行なう。The NAT processing unit (21) performs time axis normalization by focusing on the invariance of the trajectory with respect to such variations in speech rate.

即ち、第１に音響パラメータ時系列Ｐｉ（ｎｌに対して
始端Ｐｉ（１）から終端Ｐｉ（ト））迄を連続曲線で描
いた軌跡を推定し、この軌跡を示す曲線をＰｉ（ｓｌ（
０≦Ｓ点列全体を近似的に通過するようなものであれば
良い。That is, first, a trajectory drawn as a continuous curve from the start point Pi(1) to the end point Pi(g) for nl is estimated, and the curve representing this trajectory is expressed as Pi(sl()).
It is sufficient if it approximately passes through the entire 0≦S point sequence.

第２に推定されたＰｉｔｓ）から軌跡の長さＳＬをめ、
第１２図に○印にて示す如く軌跡に沿って一定長で新た
な点列をリサンプリングする。例えばＭ点にサンプリン
グする場合、一定長さ、即ちリサンプリング間隔Ｔ＝Ｓ
Ｌ／　（Ｍ−１）を基準として軌跡上をリサンプリング
する。ごのリサンプリングされた点列を旧ｆｍｌ　（ｉ
　＝Ｌ−、Ｉ；　ｍ　＝１．＝・、Ｍ）とすれば、［］
Ｈ１ｌ−Ｐｉ（０１、Ｑｉ（Ｍ）　＝　Ｐｉ（Ｓ）であ
る。Determine the trajectory length SL from the second estimated Pits),
As shown by circles in FIG. 12, a new point sequence is resampled at a constant length along the trajectory. For example, when sampling at M points, the resampling interval T=S
The trajectory is resampled using L/(M-1) as a reference. The resampled point sequence of each is converted into the old fml (i
=L-,I; m =1. =・,M), then []
H1l-Pi(01, Qi(M) = Pi(S).

この様にして得られた新たなパラメータ時系列Ｑｉに）
は軌跡の基本情報を有しており、しかも音声の発声速度
変動に対して殆ど不変なパラメータとなる。即ち、新た
なパラメータ時系列口ｉ　ｔｍｌは時間軸正規化がなさ
れたパラメータ時系列となる。The new parameter time series Qi obtained in this way)
has basic information on the trajectory, and is a parameter that is almost invariant to variations in speech rate. That is, the new parameter time series entry itml becomes a parameter time series subjected to time axis normalization.

この様な処理の為に、音声区間内パラメータメモリ　（
１６）の音響パラメータ時系列Ｐｉ（ｎｌを軌跡長算出
器（２２）に供給する。この軌跡長算出器（２２）は音
響パラメータ時系列Ｐｉ（ｎｌがそのパラメータ空間に
おいて描く直線近似による軌跡の長さ、即ち軌跡長を算
出するものである。この場合、■次元ベクトルａ１及び
ｂｉ間のユークリッド距１ｉｔｌＤ（ａｉ＋ｂｉ）は・・・　（６）である。そこで、■次元の音響パラメータ時系列Ｐｉ（
ｎ）　（ｉ　＝１．−＝、Ｉ；　ｎ＝１．−、　Ｎ）よ
り、直線近似により軌跡を推定した場合の時系列方向に
隣接するパラメータ間距離Ｓ　ｆｎ）は５（ｎ）＝Ｄ　
（Ｐｉ　（ｎｌｘ　）　、Ｐｉ（ｎｌ）　（ｆｉ＝１．
・−・・、Ｎ）・・・　（７）と表わされる。そして、時系列方向における第１番目の
パラメータＰｉｆｌ）から第ｎ番目のパラメータＰｉｆ
ｎｌ迄の距離５Ｌ（ｎ）はと表わされる。尚、５Ｌ（１）　＝　Ｏである。更に、
軌跡長ＳＬはと表わされる。軌跡長算出器（２２）はこの（７）式、
（８）式及び（９）式にて示す信号処理を行なう如くな
す。For this kind of processing, parameter memory (
The acoustic parameter time series Pi (nl) of 16) is supplied to the trajectory length calculator (22). In other words, the Euclidean distance 1itlD(ai+bi) between the ■dimensional vectors a1 and bi is... (6).Then, the ■dimensional acoustic parameter time series Pi(
n) (i = 1.-=, I; n = 1.-, N), when the trajectory is estimated by linear approximation, the distance S fn) between adjacent parameters in the time series direction is 5(n) = D
(Pi (nlx), Pi(nl) (fi=1.
..., N)... (7) It is expressed as. Then, from the first parameter Pifl) to the nth parameter Pif in the time series direction
The distance 5L(n) to nl is expressed as. Note that 5L(1) = O. Furthermore,
The trajectory length SL is expressed as. The trajectory length calculator (22) uses this equation (7),
The signal processing shown in equations (8) and (9) is performed.

この軌跡長算出器（２２）の軌跡長ＳＬを示す軌跡長信
号を補間間隔算出器（２３）に供給する。この補間間隔
算出器（２３）は軌跡に沿って直線補間により新たな点
列をリサンプリングする一定長のりサンプリング間隔Ｔ
を算出するものである。この場合、Ｍ点にリサンプリン
グするとすれば、リサンプリング間隔ＴはＴ＝ＳＬ／　（Ｍ−１）　・・・　α呻と表わされる。A trajectory length signal indicating the trajectory length SL of this trajectory length calculator (22) is supplied to an interpolation interval calculator (23). This interpolation interval calculator (23) resamples a new point sequence by linear interpolation along the trajectory at a fixed length sampling interval T.
is calculated. In this case, if resampling is performed at M points, the resampling interval T is expressed as T=SL/(M-1) .

補間間隔算出器（２３）はこのαΦ式にて示す信号処理
を行なう如くなす。The interpolation interval calculator (23) is configured to perform signal processing as shown by this αΦ formula.

この補間間隔算出器（２３）のりサンプリング間隔Ｔを
示すリサンプリング間隔信号を補間点抽出器（２４）の
一端に供給すると共に音声区間内パラメータメモリ　（
１６）の音響パラメータ時系列Ｐｉ（ｎｌを補間点抽出
器（２４）の他端に供給する。この補間点抽出器（２４
）は音響パラメータ時系列Ｐｉ（ｎ）のそのパラメータ
空間における軌跡例えばパラメータ間を直線近似した軌
跡に沿ってリサンプリング間隔Ｔで新たな点列をリサン
プリングし、この新たな点列より新たな音響パラメータ
時系列Ｑｉ（−を形成するものである。This interpolation interval calculator (23) supplies a resampling interval signal indicating the sampling interval T to one end of the interpolation point extractor (24), and also supplies the voice interval parameter memory (
16) is supplied to the other end of the interpolation point extractor (24).
) resamples a new point sequence at a resampling interval T along the trajectory of the acoustic parameter time series Pi(n) in its parameter space, for example, a trajectory that is a linear approximation between the parameters, and from this new point sequence, a new acoustic It forms the parameter time series Qi(-).

ここで、この補間点抽出器（２４）における信号処理を
第１３図に示す流れ図に沿って説明する。先ず、ブロッ
ク（２４ａ）にてリサンプリング点の時系列方向におけ
、る番号を示す変数Ｊに値１が設定されると共に音響パ
ラメータ時系列Ｐｌ（ｎ）の時系列方向における番号を
示す変数ＩＣに値１が設定される。そして、ブロック（
２４ｂ）にて変数Ｊがインクリメントされ、ブロック（
２４ｃ）にてそのときの変数、１が（Ｍ−１）以下であ
るかどうかにより、そのときのりサンプリング点の時系
列方向における番号かりサンプリングする必要のある最
後の番号になっていないかどうかを判断し、なっていな
ければブロック（２４ｄ）にて第１番目のりサンプリン
グ点から第５番目のりサンプリング点までのりサンプル
距離ｐＬが算出され、ブロック（２４ｅ）にて変数ＩＣ
がインクリメントされ、ブロック　（２４ｆ　）にてリ
サンプル距１１１［ＩＬが音響パラメータ時系列Ｐｉ（
ｎｌの第１番目のパラメータＰｉ（１）から第１Ｃ番目
のパラメータＰｉＱｃ）までの距離５ＬＯｃ）よりも小
さいかどうかにより、そのときのりサンプリング点が軌
跡上においてそのときのパラメータＰ　１（Ｉｃ　）よ
りも軌跡の始点側に位置するかどうかを判断し、位置し
ていなければブロック（２４ｅ　）にて変数ＩＣをイン
クリメントした後再びブロック（２４ｆ）にてリサンプ
リング点とパラメータＰｉ（Ｉｃ）との軌跡上における
位置の比較をし、リサンプリング点が軌跡上においてバ
ラメークＰｉ（ＩＣ）よりも始点側に位置すると判断さ
れたとき、ブロック（２４ｇ）にてリサンプリングによ
り軌跡に沿う新たな音響ノマラメータＱｉ（Ｊ）が形成
される。即ち、先ず第３番目のりサンプリング点による
リサンプル距１１１ＤＬからこの第３番目のりサンプリ
ング点よりも始点側に位置する第（ＩＣ−１）番目のパ
ラメータＰｉθＣ−１）による距離５ＬＯｃ−１）を減
算して第（ＩＣ−１）番目のパラメータＰｉ（Ｉｃ−１
）から第３番目のりサンプリング点迄の距１ＩＩＩＳＳ
をめる。次に、軌跡上においてこの第３番目のりサンプ
リング点の両側に位置するパラメータＰ　ｉ（Ｉｃ−１
）及びパラメータＰ　ｉ（Ｉｃ　）間の距離５Ｔｒｌｌ
（この距離Ｓ　（ｎｌは（７）式にて示される信号処理
にて得られる。）にてこの距ａＳＳを除算ｓｓ／　Ｓ　
（ＩＣ−１）　Ｌ、この除算結果ＳＳ／　Ｓ　（ＩＧ−
１）に軌跡上において第３番目のりサンプリング点の両
側に位置するパラメータＰｉＱｃ）とＰｉθＣ−１ンと
の差（ＰｉＱｃ）　Ｐｉ（＋ｃ−ｔ））を（引算（Ｐｉ
Ｇｃ）　Ｐｉ（Ｉｃ−１））　＊　ＳＳ／　Ｓ　（＋ｃ
−ｔ）　シて、軌跡上において第３番目のりサンプリン
グ点のこのリサンプリング点よりも始点側に隣接して位
置する第（ＩＣ−１）番目のパラメータＰｉ（ＩＣ−１
）からの補間量を算出し、この補間量と第３番目のりサ
ンプリング点よりも始点側に隣接して位置する第（ＩＣ
−１）番目のパラメータＰｉＯｃ−１）とを加算して、
軌跡に沿う新たな音響パラメータＱｉ（Ｊ）が形成され
る。第１４図に２次元の音響パラメータ時系列Ｐｔ１）
、ＰＩ２）。Here, the signal processing in this interpolation point extractor (24) will be explained along the flowchart shown in FIG. First, in block (24a), a value 1 is set to a variable J indicating the number of the resampling point in the chronological direction, and a variable IC indicating the number of the acoustic parameter time series Pl(n) in the chronological direction is set. is set to the value 1. And block (
24b), the variable J is incremented, and the block (
In 24c), depending on whether the variable 1 at that time is less than or equal to (M-1), it is determined whether the number of the sampling point in the time series direction is the last number that needs to be sampled. If it is not, the glue sample distance pL from the first glue sampling point to the fifth glue sampling point is calculated in block (24d), and the variable IC is calculated in block (24e).
is incremented, and in block (24f), the resampling distance 111 [IL is the acoustic parameter time series Pi (
Depending on whether the distance from the first parameter Pi(1) of nl to the first Cth parameter PiQc) is smaller than the distance 5LOc), the sampling point at that time is smaller than the parameter P1(Ic) at that time on the trajectory. If it is not located, the variable IC is incremented in block (24e), and then the trajectory between the resampling point and the parameter Pi (Ic) is determined in block (24f) again. When the above positions are compared and it is determined that the resampling point is located closer to the starting point than the parameter Pi (IC) on the trajectory, a new acoustic nomameter Qi ( J) is formed. That is, first, the distance 5LOc-1) due to the (IC-1)th parameter PiθC-1) located closer to the starting point than the third glue sampling point is subtracted from the resampling distance 111DL based on the third glue sampling point. and the (IC-1)th parameter Pi(Ic-1
) to the third glue sampling point 1IIISS
I put it on. Next, the parameter P i (Ic-1
) and the distance 5Trll between the parameter P i (Ic )
(Divide this distance aSS by this distance S (nl is obtained by signal processing shown in equation (7)) ss/S
(IC-1) L, this division result SS/S (IG-
1), the difference (PiQc) Pi(+c-t)) between the parameter PiQc) located on both sides of the third glue sampling point on the trajectory and PiθC-1 is (subtracted (Pi
Gc) Pi(Ic-1)) * SS/ S (+c
-t) Then, the (IC-1)th parameter Pi(IC-1
), and calculate the interpolation amount from this interpolation amount and the third (IC) located adjacent to the starting point side from the third
-1)th parameter PiOc-1),
A new acoustic parameter Qi(J) along the trajectory is formed. Figure 14 shows the two-dimensional acoustic parameter time series Pt1)
, PI2).

・・・・、Ｐ（８１に対してパラメータ間を直線近似し
て軌跡を推定し、この軌跡に沿って直線補間により６点
の新たな音響パラメータ時系列Ｑ（１）、Ｑ（２１゜・
・・・、　Ｑ（６１を形成した例を示す。又、このブロ
ック（２４ｇ）においては周波数系列方向に１次元分（
ｉ＝１．・・・・、Ｉ）の信号処理が行なわれる。..., P(81, a trajectory is estimated by linear approximation between the parameters, and along this trajectory, new acoustic parameter time series Q(1), Q(21°・
..., Q(61) is shown. Also, in this block (24g), one dimension (
i=1. ..., I) signal processing is performed.

この様にしてブロック（２４ｂ）乃至（２４ｇ）にて始
点及び終点（これらは旧ｆ１＞　＝　ＰｉｌＯ）　、　
ｇｉ（Ｍ）　＝　Ｐｉ（ｓ）である。）を除＜　（Ｍ−
２）点のりサンプリングにより新たな音響パラメータ時
系列Ｑｉｆｍ）が形成される。In this way, the start and end points (these are old f1>=PilO) in blocks (24b) to (24g),
gi(M) = Pi(s). ) except < (M-
2) A new acoustic parameter time series Qifm) is formed by point sampling.

このＮＡＴ処理部（２１）の新たな音響パラメータ時系
列旧（ｍ）をモード切換スイ・ノチ（３）により、登録
モードにおいては認識対象語毎に標準パターンメモリ（
４）に格納し、認識モードにおいてはチェビシェフ距離
算出部（２５）の一端に供給する。又、このｖｇｍモー
ドにおいては標準パターンメモリ（４）に格納されてい
る標準パターンをチェビシェフ距離算出部（２５）の他
端に供給する。このチェビシェフ距離算出部（２５）に
おいてはその時入力されている音声の時間軸の正規化さ
れた新たな音響パラメータ時系列Ｑｉに）よりなる入カ
バターンと、標準パターンメモ１月４）の標準パターン
とのチェビシェフ距離算出処理がなされる。In the registration mode, the new acoustic parameter time series old (m) of the NAT processing unit (21) is stored in the standard pattern memory (
4), and in the recognition mode, it is supplied to one end of the Chebyshev distance calculation unit (25). Also, in this vgm mode, the standard pattern stored in the standard pattern memory (4) is supplied to the other end of the Chebyshev distance calculating section (25). In this Chebyshev distance calculation unit (25), a new acoustic parameter time series Qi which is normalized on the time axis of the audio input at that time is used. Chebyshev distance calculation processing is performed.

そして、このチェビシェフう距離を示す距離信号を最小
距離判定部（６）に供給し、この最小距離判定部（６）
にて入カバターンに対するチェビシェフ距離が最小とな
る標準パターンが判定され、この判定結果より入力音声
を示す認識結果を出力端子（７）に供給する。Then, the distance signal indicating this Chebyshev distance is supplied to the minimum distance determination section (6), and the minimum distance determination section (6)
A standard pattern with the minimum Chebyshev distance to the input cover pattern is determined, and from this determination result, a recognition result indicating the input voice is supplied to the output terminal (7).

この様にしてなる音声認識装置の動作について説明する
。The operation of the speech recognition device constructed in this way will be explained.

マイクロホン＋１）の音声信号が音響分析部（２）にて
音声区間毎に音源特性の正規化された音響パラメータ時
系列Ｐｉ（ｎ）に変換され、この音響パラメータ時系列
ＰｉＴｎ）がＮＡＴ処理部（２１）に供給され１．この
ＮＡＴ処理部（２１）にて音響パラメータ時系列Ｐｉ（
ｎ）からそのパラメータ空間における直線近似による軌
跡が推定され、この軌跡に基づいて時間軸正規化のなさ
れた新たな音響パラメータ時系列口１（（６）が形成さ
れ、登録モードにおいてはこの新たな音響パラメータ時
系列Ｑｉに）がモード切換スイッチ（３）を介して標準
パターンメモ１月４）に格納される。The audio signal of the microphone +1) is converted into an acoustic parameter time series Pi(n) in which the sound source characteristics are normalized for each voice section in the acoustic analysis unit (2), and this acoustic parameter time series PiTn) is converted to an acoustic parameter time series Pi(n) in which the sound source characteristics are normalized in the acoustic analysis unit (2). 21) is supplied to 1. In this NAT processing unit (21), the acoustic parameter time series Pi (
A trajectory is estimated by linear approximation in the parameter space from The acoustic parameter time series Qi) are stored in the standard pattern memo (January 4) via the mode changeover switch (3).

又、認識モードにおいては、ＮＡＴ処理部（２１）の新
たな音響パラメータ時系列旧［ｆｆ１）がモード切換ス
イッチ（３）を介してチェビシェフ距離算出器（２５）
に供給されると共に標準パターンメモ１月４）の標準パ
ターンがチェビシェフ距離算出器（２５）に供給される
。第１５図乃至第１７図に第４図乃至第６図に示す１次
元の入カバターンへのパラメータ時系列；２、４．６．
８．８．８．８．６．４．４．４．６．８　、標準パタ
ーンＡ′のパラメータ時系列、　３．５．７．９゜９、
９．９．７．’　５．５．７．９、標準パターンＢ′の
パラメータ時系列、　７．６．６．８．８．８．８．６
．４．４゜４をＮＡＴ処理部（２１）にて直線近似にて
軌跡を推定し、リサンプリング点を８点とする処理をし
た１次元の入カバターンＡのパラメータ時系列；２、４
．６．８．６．４．６．８、標準パターンＡ′のパラメ
ータ時系列ｉ　３．５．７．９．７．５．７．９、標準
パターンＢ′のパラメータ時系列ｉ　７．６．７．８゜
７、６．５．４を夫々示す。この場合、音響パラメータ
時系列ＰＩ（ｎ）からそのパラメータ空間における軌跡
を推定し、この軌跡に沿って新たな音響パラメータ時系
列Ｏ１に）が形成されるので、入力音声を変換した音響
パラメータ時系列Ｐｌ（ｎ＋自身により時間軸正規化が
なされる。そして、チェビシェフ距離算出器（２５）に
おいて入カバターンＡと標準パターンＡ′との間のチェ
ビシェフ距離８が算出されると共に入カバターンＡと標
準パターンＢ′との間のチェビシェフ距１ｉ１１１６が
算出され、これら距離８及び距離１６を夫々示す距離信
号が最小距離判定部（６）に供給され、この最小距離判
定部（６）にて距離８が距離１６よりも小さいことから
標準パターンＡが入カバターンＡ′であると判定され、
この判定結果より入力音声が標準パターンＡであること
を示す認識結果が出力端子（７）に得られる。従って、
部分的に類似しているような語い間に於いても誤認識す
ることが比較的少ない音声認識を行なうことができる。In the recognition mode, the new acoustic parameter time series old [ff1] of the NAT processing unit (21) is transmitted to the Chebyshev distance calculator (25) via the mode changeover switch (3).
and the standard pattern of the standard pattern memo (January 4) is supplied to the Chebyshev distance calculator (25). Parameter time series for one-dimensional input cover turns shown in FIGS. 4 to 6 in FIGS. 15 to 17; 2, 4.6.
8.8.8.8.6.4.4.4.6.8, Parameter time series of standard pattern A', 3.5.7.9°9,
9.9.7. '5.5.7.9, parameter time series of standard pattern B', 7.6.6.8.8.8.8.6
．． 4. Parameter time series of one-dimensional input cover pattern A whose trajectory is estimated by linear approximation in the NAT processing unit (21) and the resampling points are set to 8 points; 2, 4
．． 6.8.6.4.6.8, Parameter time series i of standard pattern A' 3.5.7.9.7.5.7.9, Parameter time series i of standard pattern B' 7.6. 7.8°7 and 6.5.4 are shown respectively. In this case, a trajectory in the parameter space is estimated from the acoustic parameter time series PI(n), and a new acoustic parameter time series O1) is formed along this trajectory, so the acoustic parameter time series obtained by converting the input speech Time axis normalization is performed by Pl(n+ itself.The Chebyshev distance calculator (25) then calculates the Chebyshev distance 8 between the input cover pattern A and the standard pattern A', and also calculates the Chebyshev distance 8 between the input cover pattern A and the standard pattern B. The Chebyshev distance 1i1116 between the Since it is smaller than , it is determined that the standard pattern A is the input pattern A',
From this determination result, a recognition result indicating that the input voice is standard pattern A is obtained at the output terminal (7). Therefore,
Speech recognition can be performed with relatively few erroneous recognitions even between words that are partially similar.

以上述べた如く本発明音声認識装置に依れば、音声信号
入力部としてのマイクロホン（１１を有し、この音声信
号入力部（１１からの音声信号を音響パラメータ時系列
Ｐｉ（ｎ）に変換し、この音響パラメータ時系列Ｐｉ（
ｎ）からそのパラメータ空間における直線近似による軌
跡を推定し、この軌跡に基づいて音声信号を認識するよ
うにした為、部分的に類似しているような語い間に於い
ても誤認識することが比較的少ない音声の認識ができる
利益がある。又、処理の為の演算量を比較的少なくでき
る利益がある。As described above, the speech recognition device of the present invention has the microphone (11) as an audio signal input section, and converts the audio signal from the audio signal input section (11) into an acoustic parameter time series Pi(n). , this acoustic parameter time series Pi(
Since the trajectory is estimated by linear approximation in the parameter space from n) and the speech signal is recognized based on this trajectory, there is no possibility of erroneous recognition even between partially similar words. It has the advantage of being able to recognize relatively few voices. Further, there is an advantage that the amount of calculation for processing can be relatively reduced.

ここで、ＮＡＴ処理を行なう本発明音声認識装置とＤＰ
マツチング処理を行なう従来の音声認識装置との演算量
における差異について説明する。Here, the voice recognition device of the present invention that performs NAT processing and the DP
The difference in the amount of calculation compared to a conventional speech recognition device that performs matching processing will be explained.

入カバターンに対する標準パターン１個当たりのＤＰマ
ツチング距離計算部（５）における平均演算量をαとし
、チェビシェフ距離算出部（２５）における平均演算量
をβとし、ＮＡＴ処理部（２１）の平均の演算量をγと
したとき、５個の標準パターンに対するＤＰマツチング
処理による演算量ｃ１は（，１＝α　・　Ｊ　・　・　・　（１１）である。又
、５個の標準パターンに対するＮＡＴ処理した場合の演
算量Ｃ２はＣ２−β・Ｊ＋γ　・・・　（１２）である。一般に、平均演算量αは平均演算量βに対して
α）βなる関係がある。従って、Ｊ）□　・・・　（１
３）　・ α−β なる関係が成り立つ、即ち認識対象語い数が増加するに
従って演算量Ｃ１は演算量ｃ２に対してＣ１＞＞Ｃ２な
る関係となり、ＮＡＴ処理を行なう本発明音声認識装置
に依れば、演算量を大幅に低減できる利益がある。Let α be the average amount of calculation in the DP matching distance calculation unit (5) per standard pattern for the input cover turn, let β be the average amount of calculation in the Chebyshev distance calculation unit (25), and calculate the average amount of calculation in the NAT processing unit (21). When the amount is γ, the calculation amount c1 by DP matching processing for 5 standard patterns is (, 1 = α ・ J ・・・ (11). Also, when NAT processing is performed for 5 standard patterns, The calculation amount C2 is C2-β·J+γ (12) Generally, the average calculation amount α has the relationship α)β with respect to the average calculation amount β. Therefore, J)□ ... (1
3) - The relationship α-β holds true, that is, as the number of words to be recognized increases, the amount of calculation C1 becomes the relationship C1>>C2 with respect to the amount of calculation C2, and this depends on the speech recognition device of the present invention that performs NAT processing. This has the advantage of significantly reducing the amount of calculation.

又、ＮＡＴ処理部（２１）より得られる新たな音響パラ
メータ時系列Ｑｉ（２））はその時系列方向においチ一
定ノハラメータ数に設定できるので、標準パターンメモ
１月４）の記憶領域を有効に利用でき、その記憶容量を
比較的少なくできる利益がある。In addition, the new acoustic parameter time series Qi (2)) obtained from the NAT processing unit (21) can be set to a constant number of parameters in the time series direction, so the storage area of the standard pattern memo (January 4) can be effectively used. It has the advantage that its storage capacity can be relatively small.

尚、上述実施例においては音響パラメータ時系列Ｐｌ［
ｎｌからそのパラメータ空間における軌跡を直線近療に
て推定すると共にこの軌跡から新たな音響パラメータ時
系列Ｑｉ（ホ）を直線補間にて形成するようにした場合
について述べたけれども、円弧近似、スプライン近似等
により軌跡を推定すると共に軌跡から新たな音響パラメ
ータ時系列口ｉ（（ロ）を円弧補間、スプライン補間等
にて形成するようにしても上述実施例と同様の作用効果
を得ることができることは容易に理解できょう。又、上
述実施例においては音響パラメータ時系列Ｐｉ（ｎｌが
らそのパラメータ空間における軌跡を推定し、この軌跡
に基づいて新たな音響パラメータ時系列Ｑ　ｉ　ｈｌを
形成するようにした場合について述べたけれども、音響
パラメータ周波数系列からそのパラメータ空間における
軌跡を推定し、この軌跡に基づいて新たな音響パラメー
タ周波数系列を形成することにより、音声信号の周波数
特性の正規化を行なうことができる。更に、上述実施例
においては音響パラメータ時系列Ｐｉ（ｎｌからそのパ
ラメータ空間に兆ける軌跡を推定し、この軌跡に沿って
リサンプリングすることにより新たな音響パラメータ時
系列Ｏ１（ホ）を形成した場合について述べたけれども
、軌跡の形状を示す係数を抽出し、この係数を新たな音
響パラメータ時系列としても音声認識を行なうことがで
きる。又、本発明は上述実施例に躍らず本発明の要旨を
逸脱することなくその他種々の構成を取り得ることは勿
論である。Note that in the above embodiment, the acoustic parameter time series Pl[
Although we have described the case where the trajectory in the parameter space is estimated from nl by linear approximation and a new acoustic parameter time series Qi (e) is formed from this trajectory by linear interpolation, arc approximation, spline approximation It is also possible to obtain the same effects as in the above-mentioned embodiments by estimating the trajectory by estimating the trajectory and also by forming a new acoustic parameter time series i ((b) from the trajectory by circular interpolation, spline interpolation, etc. It is easy to understand.Also, in the above embodiment, the trajectory in the parameter space is estimated from the acoustic parameter time series Pi (nl), and a new acoustic parameter time series Q i hl is formed based on this trajectory. As described above, the frequency characteristics of the audio signal can be normalized by estimating the trajectory in the parameter space from the acoustic parameter frequency series and forming a new acoustic parameter frequency series based on this trajectory. Furthermore, in the above embodiment, a new acoustic parameter time series O1 (E) is formed by estimating a trajectory extending from the acoustic parameter time series Pi(nl) into the parameter space and resampling along this trajectory. Although the case has been described above, it is also possible to perform speech recognition by extracting the coefficients indicating the shape of the trajectory and using these coefficients as a new acoustic parameter time series.Furthermore, the present invention does not go beyond the above-mentioned embodiments. Of course, various other configurations can be adopted without departing from the above.

発明の効果本発明音声認識装置に依れば音声信号入力部を有し、こ
の音声信号入力部からの音声信号を音響パラメータ系列
に変換し、この音響パラメータ系列からそのパラメータ
空間における軌跡を推定し、この軌跡に基づいて音声信
号を認識するようにした為、部分的に類僚しているよう
な語い間に於いても誤認識することが比較的少なく、且
つ標準パターンメモリの記憶容量及び演算量の比較的少
ないものを得ることができる利益がある。Effects of the Invention According to the speech recognition device of the present invention, it has a speech signal input section, converts the speech signal from the speech signal input section into an acoustic parameter series, and estimates a trajectory in the parameter space from this acoustic parameter series. Since the speech signal is recognized based on this trajectory, there are relatively few misrecognitions even between partially similar words, and the storage capacity of the standard pattern memory There is an advantage that a relatively small amount of calculation can be obtained.

[Brief explanation of drawings]

第１図はＤＰマツチング処理により音声認識を行なうよ
うにした音声認識装置の例を示す構成図、第２図はＤＰ
マツチング処理の説明に供する概念図、第３図は音響パ
ラメータ空間における軌跡の説明に供する線図、第４図
、第５図及び第６図は夫々１次元の入カバターンＡ、標
準パターンＡ′及び標準パターンＢ′の例を示す線図、
第７図は入カバターンＡのパラメータ時系列と標準パタ
ーンＡ′のパラメータ時系列とのＤＰマツチング処理に
よる時間軸正規化の説明に供する線図、第８図は入カバ
ターンＡのパラメータ時系列と標準パターンＢ′のパラ
メータ時系列とのＤＰマツチング処理による時間軸正規
化の説明に供する線図、第９図は本発明音声認識装置の
一実施例を示す構成図、第１０図、第１１図、第１２図
及び第１４図は夫々ＮＡＴ処理部の説明に供する線図、
第１３図は補間点抽出器の説明に供する流れ図、第１５
図、第１６図及び第１７図は夫々ＮＡＴ処理部にて処理
した入カバターンＡ、標準パターンＡ′及び標準パター
ンＢ′の１次元の音響パラメータ時系列を示す線図であ
る。（１）は音声信号入力部としてのマイクロホン、（２）
は音響分析部、（３）はモード切換スイッチ、（４）は
標準パターンメモリ、（６）は最小距離判定器、り１１
八）（ｌｌａ）、・・・・、（Ｉｌｏ）は１５チヤンネ
ルのデジタルバンドパスフィルタバンク、（１６）　ハ
音声区間内パラメータメモリ、（２１）はＮＡＴ処理部
、（２２）は軌跡長算出器、（２３）は補間間隔算出器
、（２４）は補間点抽出器、（２５）はチェビシェフ距
離算出器である。第１図２　Ｊ　５　６第１３図第１４図手続補正書１．事件の表示昭和５９年　特　許　願　第１０６１７８号３、補正を
する者事件との関係　特許出願人住　所　東京部品用区北品用６丁目７番３５号名称（２
１８）ソニー株式会社代表取締役　大　賀　典　雄４、代理人６、補正により増加する発明の数（１）明細書中、第３頁末行〜第１１頁第１行「第２図
は・・・誤認識し易い。」とあるを下記の通り訂正する
。「第２図はＤＰマツチング処理の概念図を示し、横軸に
入力パラメータ（Ｍ＝１９）が並べられ、縦軸に標準パ
ラメータ（Ｎ＝１２）が並べられ、この第２図に示す（
Ｍ、Ｎ）格子状平面に於ける・点はＭ　Ｘ　Ｎ個であり
、この各・点に１つの距離が対応する。例えばａ３とｂ
５との距離がａ３から縦に伸した直線と、ｂ５から横に
伸した直線との交点に位置する・に対応する。この場合
、距離として例えばチェビシェフ距離を取れば、ａ３と
ｂ５とのチェビシェフ距離ｄ（３゜５）はとなる（この場合、周波数軸方向ｉに対応する次元を省
略しているのでＩ＝１である。）。そして、端点固定の
ＤＰ−バスとして、格子点（ｍ、ｎ）に対してこの格子
点（ｍ、ｎ）に結びつける前の状態として左側の格子点
（ｒｎ−１゜ｎ）、斜め左下側の格子点（ｍ−１，ｎ−
１）及び下側の格子点（ｍ、ｎ−１）の３つ乃だけを許
した場合、始点、即ちａｌとｂｌとのチェビシェフ距ｌ
ｉｄ　（１，１）を示す点■から出発し、バス（経路）
として３方向乃を選び、終点、即ちａＭとｂＮとのチェ
ビシェフ距Ｍｄ　（Ｍ、Ｎ）を示す点■に至るバスで、
通過する各格子点の距離の総和が最小になるものをめ、
この距離の総和を入力パラメータ数Ｍと標準パラメータ
数Ｎとの和より値１を減算した（Ｍ＋Ｎ−１）にて除算
して得られた結果が入カバターンのパラメータ時系列ａ
ｉ、・・・・、ａＭと標準パターンのパラメータ時系列
ｂｌ＋　・・・・、ｂＮとのＤＰマツチング距離となさ
れる。この様な処理を示す初期条件及び漸化式は初期条件ｇ　（１，１）＝ｄ　（１，１）漸化式と表され、これよりＤＰマツチング距離ＤＭ（Ａ、　Ｂ
）はＤＭ　（Ａ、Ｂ）　−ｇ　（Ｍ、Ｎ）／　（Ｍ＋Ｎ−１
）と表される（（Ｍ＋Ｎ−１）でｇ　（Ｍ、　Ｎ）を割
っているのは標準パターンのフレーム数Ｎの違いによる
距離の値の差を補正するためである。）。この様な処理により標準パターンの数がＩＪＩＷある場
合には入カバターンに対するＤＰマツチング距離がＬ個
求められ、このＬ個のＤＰマツチング距離中最小の距離
となる標準パターンが認識結果となされる。この様なりＰマツチング処理による音声認識装置によれ
ば発声速度変動や単語長の違いに対処、即ち時間軸正規
化のなされた音声認識を行なうことができる。然し乍ら、この様なりＰマツチング処理により音声認識
を行なうものにおいては、音声の定密部がＤＰマツチン
グ距離に大きく反映し、部分的に類似しているような語
い間に於いて誤認識し易いということが明らかとなった
。即ち、音響パラメータ時系列Ｐｉ（ｎｌはそのパラメー
タ空間で軌跡を描くと考えることができる。実際には各フレームｎのパラメータがパラメータ空間内
の１点に対応することから、点列ではあるが時系列方向
に曲線で結んでいくと始点から終点迄の１つの軌跡が考
えられる。例えば２種類の単語“ＳＡＮ”と“ＩＡＩ”
とを登録した場合、夫々の標準パターンＡ’、Ｂ’は第
３図に示す如く“Ｓ”、”Ａ″、“Ｎ″、“Ｈ″。 “Ａ”、“Ｉ”の各音韻領域を通過する軌跡を描く。そ
して、認識モードで“ＳＡＮ”と発声した場合、全体的
にみれば入カバターンＡに対する標準パターンＢ′の類
似する部分は非常に少ないが、この入カバターンＡの”
ＳＡＮ″の“Ａ″の部分が標準パターンＡ′の“ＳＡＮ
”の“Ａ″の部分より標準パターンＢ′の“ＨＡＩ”の
“Ａ”の部分により類似し、且つその部分（準定密部）
に点数が多い場合がある。ここで、第３図に示す如く入カバターンＡのパラメータ
が全体的には標準パターンＡ′のパラメータに類似し、
部分的には標準パターンＢ′のパラメータに類似する場
合にＤＰマツチング処理により誤認識を招く場合を１次
元パラメータを例に説明する。この場合、第３図に示す
状況、即ち部分的に類似している語い間の関係と同様の
１次元パラメータ時系列として第４図に示す如き入カバ
ターンＡ　；　２．４．６．８．８．８゜８、６．４．
４．４．６．８と、第５図に示す如き標準パターンＡ’
　；　３．５．７．９．９．９．９．７．５゜５、７．
９と、第６図に示す如き標準パターンＢ′；　７．６．
６．８．８．８．８．６．　４．４．４とを考える。こ
れら第４図乃至第６図のパターンより明らかな如く入カ
バターンＡは標準パターンＡ′と判定されて欲しいパタ
ーンである。ところが、入カバターンＡに対する標準パ
ターンＡ′及びＢ′のＤＰマツチング距離を計算すると
、入カバターンＡは標準パターンＢ′に近いことが示さ
れる。即ち、入カバターンＡに対する標準パターンＡ′のＤＰ
マツチング処理として第２図と同様、第７図に示す如く
横軸に入カバターンＡのパラメータ時系列；　２．４．
６．８．８．８．８．６．４．４゜４、６．８を並べ、
縦軸に標準パターンＡ′のパラメータ時系列；　３．５
．７．　’Ｉｔ、　９．９．９．７．５゜５、７．９を
並べ、格子状平面に於ける交点に対応して入カバターン
への個々のパラメータに対する標準パターンＡ′の個々
のパラメータのチェビシェフ距離をめる。そして、入力
パラメータＡのパラメータ時系列の第１番目のパラメー
タ２と、標準パラメータＡ′のパラメータ時系列の第１
番目のパラメータ３とのチェビシェフ距離ｄ　（１，１
）　−１の点を始点とし、入カバターンＡのパラメータ
時系列の第１３番目のパラメータ８と、標準パターンＡ
′のパラメータ時系列の第１２番目のパラメータ９との
チェビシェフ距離ｄ　（１３，１２）　＝　１の点を終
点とし、ＤＰ−パスとして第２図の場合と同様、任意の
点に対する前の状態としてその任意の点の左側の点、下
側の点及び斜め左下側の点を取ることを許した場合（こ
のバスを実線矢印にて示す。）、パス上の点はｄ　（１
，１）−ｄ　（２，２）−ｄ（３，３）−ｄ　（４，４
）−ｄ　（５，５）−ｄ（６，６）−ｄ　（７，７）−
ｄ　（８，８）−ｄ（９，９）　−ｄ　（１０，１０）
　−ｄ　（１１，１０）　−ｄ（１２，１０）　−ｄ　
（１３，１１）　−ｄ　（１３，１２）の１４点であり
、その距離の総和は２４であり、このＤＰマツチング距
離ＤＭ　（Ａ’、　Ａ’）は１である。一方、入カバターンＡに対する標準パターンＢ′のＤＰ
マツチング処理を上述第７図に示す場合と同様、第８図
に示す如く行なう。即ち、入カバターンへの個々のパラ
メータ；　２．４．６゜８、８．８．８．６．４．４．
４．６．８に対する標準パターンＢ′の個個のパラメー
タ；　７．６．６．８゜８、８．８．６．．４．４．４
のチェビシェフ距離をめ、ＤＰ−バスとして任意の点に
対する前の状態としてその任意の点の左側の点、下側の
点及び斜め左下側の点を取ることを許した場合（このバ
スを実線矢印にて示す。）、バス上の点はｄ　（１，１
）−ｄ　（２，２）−ｄ　（３，３）−ｄ　（４，４）
−ｄ　（５，５）　−ｄ　（６，６）−ｄ　（７，７）
−ｄ　（８，８）−ｄ　（９，９）−ｄ　（１０，１０
）　−ｄ　ｒｌＬ　１１）　−ｄ　（１２，１１）　−
ｄ　（１３，１１＞の１３点であり、その距離の総和は
１５であり、このＤＰマツチング距離ＤＭ　（Ａ、Ｂ’
）は０．６５である。このＤＰ−バスを３方向フｌとした結果より明らかな様
に入カバターンＡがそのＤＰマツチング距離の小さな標
準パターンＢ′と判定され、判定されるべき結果が得ら
れない。この様にＤＰマツチング処理においては部分的
に類似しているような語い間に於いて誤認識し易い。」
（２）同、第１４頁第４行〜第５行「音声の話者による
個人的な特徴の違い」とあるを［音声の話者による声帯
音源時４九の違い］に訂正する。（３）同、同頁第１５行〜第１５頁第２行（Ｎ＝１５）
　・　・　・　（２）（Ｎ＝１５）　・　・　・　（３）」とあるを下記の通りに訂正する。（１＝　１５）　・・・　（２）・　・　・　（５）とあるを下記の通りに訂正する。・・・　（５）」（５）同、第１７頁第３行〜第４行ｒＮＡＴ処理部（２
Ｉ）に供給する。」とあるを下記の通りに訂正する。ｒＮＡＴ処理部（２１）に供給する（このＮＡＴとはＮ
ｏｒｍａｌｉｚａｔｉｏｎ　Ａｌｏｎｇ　Ｔｒａｊｅｃ
ｔｏｒｙの頭文字を取ったものである。）６」（６）同、第１９頁第５行［口１（１）　＝　Ｐ　ｔ（
Ｏ）　＋口ｉ（Ｍ）　＝　Ｐ　１（Ｓ）　Ｊる。（７）　同、第２０頁第５行〜第６行ｒＳ（１１１＝Ｄ　（円　（ｎ＋１）　＋　円（ｎｌ）
　（ｎ＝１．＝・、Ｎ）・・・（７）」とあるを下記の通りに訂正する。ｒｓ、ｆｎｌ−Ｄ　（Ｐｉ　（ｎ＋１　）　、Ｐｉ（ｎ
ｌ）（ｎ＝Ｌ・・・・、Ｎ−１）　・　・　・（７）」
（８）同、第２６頁第１６行、第１８行及び第２７頁第
１６行〜第１７行「チェビシェフ距離算出器（２５）Ｊ
とあるを［チェビシェフ距離算出部（２５）Ｊに夫夫訂
正する。（９）同、第３３頁第１６行〜第１７行「チェビシェフ
距離算出器」とあるを「チェビシェフ距離算出部」に訂
正する。００）図面中、第３図、第７図及び第８図を別紙の通り
に夫々訂正する。以　上Fig. 1 is a configuration diagram showing an example of a speech recognition device that performs speech recognition by DP matching processing, and Fig.
A conceptual diagram for explaining the matching process, FIG. 3 is a diagram for explaining the locus in the acoustic parameter space, and FIGS. 4, 5, and 6 are one-dimensional input pattern A, standard pattern A', and A diagram showing an example of standard pattern B',
Fig. 7 is a diagram for explaining time axis normalization by DP matching processing between the parameter time series of input cover turn A and the parameter time series of standard pattern A', and Fig. 8 is a diagram showing the parameter time series of input cover turn A and the standard pattern A'. A diagram for explaining time axis normalization by DP matching processing with the parameter time series of pattern B', FIG. 9 is a block diagram showing an embodiment of the speech recognition device of the present invention, FIGS. 10, 11, FIG. 12 and FIG. 14 are diagrams for explaining the NAT processing section, respectively;
Figure 13 is a flowchart for explaining the interpolation point extractor;
16 and 17 are diagrams showing one-dimensional acoustic parameter time series of input cover turn A, standard pattern A', and standard pattern B' processed by the NAT processing section, respectively. (1) is a microphone as an audio signal input section; (2)
is an acoustic analysis section, (3) is a mode changeover switch, (4) is a standard pattern memory, (6) is a minimum distance judger, and 11
8) (lla), ..., (Ilo) is a 15-channel digital bandpass filter bank, (16) c) voice interval parameter memory, (21) is a NAT processing unit, (22) is a trajectory length calculator , (23) is an interpolation interval calculator, (24) is an interpolation point extractor, and (25) is a Chebyshev distance calculator. Figure 1 2 J 5 6 Figure 13 Figure 14 Procedural amendment 1. Indication of the case 1981 Patent Application No. 106178 3, Person making the amendment Relationship to the case Patent applicant address 6-7-35, Kitashinyo, Tokyo Parts Ward Name (2)
18) Sony Corporation Representative Director Norio Ohga 4, Agent 6, Number of inventions increased by amendment (1) In the specification, from the end of page 3 to the first line of page 11, "Figure 2...・Easy to misrecognize.'' is corrected as follows. "Figure 2 shows a conceptual diagram of the DP matching process. The input parameters (M = 19) are arranged on the horizontal axis, the standard parameters (N = 12) are arranged on the vertical axis, and the (
M, N) There are M x N points on the grid plane, and one distance corresponds to each point. For example a3 and b
The distance from 5 corresponds to . which is located at the intersection of a straight line extending vertically from a3 and a straight line extending horizontally from b5. In this case, if we take the Chebyshev distance as the distance, the Chebyshev distance d (3°5) between a3 and b5 becomes (In this case, since the dimension corresponding to the frequency axis direction i is omitted, I = 1. be.). Then, as a DP-bus with fixed end points, for the grid point (m, n), the state before connecting to this grid point (m, n) is the left grid point (rn-1゜n), the diagonally lower left side Lattice point (m-1, n-
1) and the lower grid point (m, n-1), the Chebyshev distance l between the starting point, that is, al and bl
Starting from the point ■ indicating id (1, 1), the bus (route)
Select three directions as follows, and take a bus to the end point, that is, point ■, which indicates the Chebyshev distance Md (M, N) between aM and bN.
Find the one that minimizes the sum of the distances of each passing grid point,
The result obtained by dividing the sum of these distances by (M+N-1), which is obtained by subtracting 1 from the sum of the number of input parameters M and the number of standard parameters, is the parameter time series a of the input pattern.
This is the DP matching distance between i, . . . , aM and the parameter time series bl+ . . . bN of the standard pattern. The initial condition and recurrence formula showing such processing are expressed as the initial condition g (1, 1) = d (1, 1) recurrence formula, and from this, the DP matching distance DM (A, B
) is DM (A, B) -g (M, N)/ (M+N-1
) (The reason why g (M, N) is divided by (M+N-1) is to correct the difference in distance value due to the difference in the number of frames N of the standard pattern.) Through such processing, when the number of standard patterns is IJIW, L DP matching distances for the input cover patterns are obtained, and the standard pattern having the minimum distance among the L DP matching distances is determined as the recognition result. As described above, the speech recognition device using P matching processing can deal with variations in speaking speed and differences in word length, that is, perform speech recognition with time axis normalization. However, when speech recognition is performed using P matching processing like this, the constant density part of the speech is largely reflected in the DP matching distance, and it is easy to misrecognize between words that are partially similar. It became clear that. In other words, the acoustic parameter time series Pi (nl) can be thought of as drawing a trajectory in its parameter space.In reality, the parameter of each frame n corresponds to one point in the parameter space, so although it is a point sequence, it is If you connect them with a curve in the series direction, you can think of one trajectory from the starting point to the ending point.For example, two types of words "SAN" and "IAI"
, the standard patterns A' and B' are "S", "A", "N", and "H", respectively, as shown in FIG. Draw a trajectory that passes through each phoneme region of “A” and “I”. If you say "SAN" in the recognition mode, overall there are very few similarities between the standard pattern B' and the input cover turn A, but the standard pattern B' is similar to the input cover turn A.
The "A" part of "SAN" is the "SAN" of standard pattern A'
” is more similar to the “A” part of the “HAI” of the standard pattern B′, and that part (semi-densified part)
may have a large number of points. Here, as shown in FIG. 3, the parameters of the input cover pattern A are generally similar to the parameters of the standard pattern A',
Taking a one-dimensional parameter as an example, a case will be described in which a DP matching process causes erroneous recognition when the parameter is partially similar to the parameter of the standard pattern B'. In this case, the situation shown in FIG. 3, that is, the input cover turn A as shown in FIG. 4 as a one-dimensional parameter time series similar to the relationship between partially similar words; 2.4.6.8. 8.8°8, 6.4.
4.4.6.8 and the standard pattern A' as shown in Figure 5.
; 3.5.7.9.9.9.9.7.5°5, 7.
9 and standard pattern B' as shown in FIG. 6; 7.6.
6.8.8.8.8.6. 4.4.4. As is clear from the patterns shown in FIGS. 4 to 6, the inlet pattern A is a pattern that is desired to be determined as the standard pattern A'. However, when calculating the DP matching distance of the standard patterns A' and B' with respect to the input cover turn A, it is shown that the input cover turn A is close to the standard pattern B'. That is, the DP of the standard pattern A' for the input cover pattern A
Similar to FIG. 2, as for matching processing, as shown in FIG. 7, the horizontal axis represents the parameter time series of input cover pattern A; 2.4.
6.8.8.8.8.6.4.4゜4, 6.8 are lined up,
Parameter time series of standard pattern A' on vertical axis; 3.5
．． 7. 'It, 9.9.9.7.5°5, 7.9 are arranged, and Chebyshev of the individual parameters of the standard pattern A' for the individual parameters to the input cover pattern corresponding to the intersections in the lattice plane. Keep your distance. Then, the first parameter 2 of the parameter time series of input parameter A and the first parameter 2 of the parameter time series of standard parameter A'
Chebyshev distance d (1, 1
) -1 as the starting point, the 13th parameter 8 of the parameter time series of input pattern A, and the standard pattern A.
The end point is the Chebyshev distance d (13, 12) = 1 with the 12th parameter 9 of the parameter time series of If you are allowed to take a point to the left, a point below, and a point diagonally to the lower left of that arbitrary point (this bus is indicated by a solid arrow), the point on the path will be d (1
,1)-d (2,2)-d(3,3)-d (4,4
)-d (5,5)-d(6,6)-d (7,7)-
d(8,8)-d(9,9)-d(10,10)
-d (11,10) -d(12,10) -d
There are 14 points (13, 11) - d (13, 12), and the sum of their distances is 24, and this DP matching distance DM (A', A') is 1. On the other hand, DP of standard pattern B' for input cover pattern A
The matching process is performed as shown in FIG. 8, similar to the case shown in FIG. 7 above. That is, the individual parameters to the input cover pattern; 2.4.6°8, 8.8.8.6.4.4.
Individual parameters of standard pattern B' for 4.6.8; 7.6.6.8°8, 8.8.6. ．． 4.4.4
If we calculate the Chebyshev distance of , and allow the previous state of any point to be taken as a DP-bus, a point to the left, a point below, and a point diagonally to the lower left of that point (this bus is represented by the solid line arrow). ), and the point on the bus is d (1, 1
)-d (2,2)-d (3,3)-d (4,4)
-d (5,5) -d (6,6) -d (7,7)
-d (8,8)-d (9,9)-d (10,10
) -d rlL 11) -d (12,11) -
d (13, 11>), the sum of the distances is 15, and this DP matching distance DM (A, B'
) is 0.65. As is clear from the result of flipping the DP-bus in three directions, the input pattern A is determined to be the standard pattern B' whose DP matching distance is small, and the result to be determined cannot be obtained. In this way, in the DP matching process, it is easy to misrecognize words that are partially similar. ”
(2) Same, page 14, lines 4 to 5, ``Differences in personal characteristics depending on the speaker of the voice'' should be corrected to ``49 differences in the vocal cord sound source depending on the speaker of the voice.'' (3) Same page, line 15 to page 15, line 2 (N=15)
・・・ (2) (N=15) ・・・ (3) ” should be corrected as follows. (1 = 15) ・・・ (2) ・・・ (5) Correct the statement as follows. ... (5)'' (5) Same, page 17, lines 3 to 4, rNAT processing unit (2
I). '' should be corrected as follows. is supplied to the rNAT processing unit (21) (this NAT is
normalization Along Trajec
It is an acronym for tory. )6'' (6) Same, page 19, line 5 [mouth 1 (1) = P t(
O) + 口i(M) = P 1(S) Jru. (7) Same, page 20, lines 5 to 6 rS (111=D (yen (n+1) + yen (nl)
(n=1.=・,N)...(7)" is corrected as follows. rs, fnl-D (Pi (n+1), Pi(n
l) (n=L...,N-1) ・・・(7)''
(8) Same, page 26, lines 16 and 18, and page 27, lines 16 to 17 “Chebyshev Distance Calculator (25) J
[Chebyshev Distance Calculator (25) J] (9) Same, page 33, lines 16 to 17, ``Chebyshev distance calculator'' is corrected to ``Chebyshev distance calculator''. 00) In the drawings, Figures 3, 7, and 8 are corrected as shown in the attached sheet. that's all

Claims

[Claims]

It has an audio signal input section, converts the audio signal from the audio signal input section into an acoustic parameter series, estimates a trajectory in the parameter space from the acoustic parameter series, and recognizes the audio signal based on the trajectory. A speech recognition device characterized by: