JPS62136700A

JPS62136700A - Voice recognition equipment

Info

Publication number: JPS62136700A
Application number: JP60277714A
Authority: JP
Inventors: 曜一郎佐古; 正照赤羽; 誠赤羽; 平岩　篤信; 田村　震一; 雅男渡
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1985-12-10
Filing date: 1985-12-10
Publication date: 1987-06-19
Anticipated expiration: 2009-07-20
Also published as: JPH0654439B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】以下の順序でこの発明を説明する。[Detailed description of the invention] The invention will be explained in the following order.

Ａ　産業上の利用分野Ｂ　発明の概要Ｃ従来の技術Ｄ　発明が解決しようとする問題点Ｅ　問題点を解決するための手段Ｆ　作用Ｇ　実施例Ｇ１音響分析回路の説明（第１図）６２時間正規化処理の説明（第１図、第２図）Ｇ３バイ
アス値付与の具体例の説明（第１図、第３図）Ｇ４パターンマツチング処理の説明（第１図）Ｈ発明の
効果Ａ　産業上の利用分野この発明は、前もって作成し記憶しである認識対象語の
標準パターンと、認識したい語の入力パターンとのパタ
ーンマツチングを行うことにより音声認識を行なう装置
に関する。A. Field of industrial application B. Overview of the invention C. Prior art D. Problem to be solved by the invention E. Means for solving the problem F. Effect G. Example G1 Description of acoustic analysis circuit (Fig. 1) 62 hours Explanation of normalization processing (Fig. 1, Fig. 2) Explanation of specific example of G3 bias value assignment (Fig. 1, Fig. 3) Explanation of G4 pattern matching processing (Fig. 1) H Effect of invention A Industry FIELD OF THE INVENTION The present invention relates to an apparatus for performing speech recognition by performing pattern matching between a standard pattern of a word to be recognized, which is created and stored in advance, and an input pattern of a word to be recognized.

Ｂ　発明の概要この発明は認識対象語の音響パラメータ系列が描く軌跡
を推定して得た入力パターンと、その標準パターンとの
パターンマツチングを行うことにより音声認識を行なう
装置において、軌跡を推定する際に用いる音響パラメー
タ系列の時系列方向に隣接するパラメータ間の距離に対
してバイアスを与えるこ°とにより、準定常部でのゆら
ぎの影響を除去することが可能、あるいは準定常部の特
徴をよりよく抽出できるようにしたものである。B. Summary of the Invention This invention estimates a trajectory in a device that performs speech recognition by pattern matching an input pattern obtained by estimating a trajectory drawn by an acoustic parameter sequence of a recognition target word with its standard pattern. By applying a bias to the distance between adjacent parameters in the time series direction of the acoustic parameter series used in the process, it is possible to remove the influence of fluctuations in the quasi-stationary region, or to improve the characteristics of the quasi-stationary region. This allows for better extraction.

Ｃ従来の技術音声は時間軸に沿って変化する現象で、スペクトラム・
パターンが刻々と変化するように音声を発声することに
よって固有の哨語や言葉が生まれる。この人間が発声す
る単語や言葉を自動認識する技術が音声認識であるが、
人間の聴覚機能に匹敵するような音声認識を実現するこ
とは現在のところ至難のことである。このため、現在実
用化されている音声認識の殆んどは、一定の使用条件の
下で、認識対象単語の標準パターンと入力パターンとの
パターンマツチングを行なうことによりなす方法である
。C Conventional technologySpeech is a phenomenon that changes along the time axis, and is a phenomenon that changes along the time axis.
Unique syllables and words are created by vocalizing sounds with ever-changing patterns. Speech recognition is a technology that automatically recognizes words and phrases spoken by humans.
At present, it is extremely difficult to achieve speech recognition that is comparable to the human auditory function. For this reason, most speech recognition methods currently in practical use are performed by pattern matching a standard pattern of words to be recognized and an input pattern under certain conditions of use.

第４図はこの音声認識装置の概要を説明するだめの図で
、マイクロホン（１１よりの音声入力が音響分析回路（
２）に供給される。この音響分析回路（２）では入力音
声パターンの特徴を表わす音響パラメータが抽出される
。この音響パラメータを抽出する音響分析の方法は種々
考えられるが、例えばその−例としてバンドパスフィル
タと整流回路を１チヤンネルとし、このようなチャンネ
ルを通過帯域を変えて複数個並べ、このバンドパスフィ
ルタ群の出力としてスペクトラム・パターンの時間変化
を抽出する方法が知られている。この場合、音響パラメ
ータはその時系列Ｐｉ（ｎｌ　（ｉ　＝　１．　２・・
・ＴＲＩは例えばバンドパスフィルタのチャンネル数、
ｎ＝１，２・・・ＮＵＮは音声区間判定により判定され
た区間において認識に利用されるフレーム数）で表わす
ことができる。Figure 4 is a diagram for explaining the outline of this speech recognition device, in which the voice input from the microphone (11) is input to the acoustic analysis circuit (
2). This acoustic analysis circuit (2) extracts acoustic parameters representing the characteristics of the input speech pattern. Various acoustic analysis methods can be considered to extract this acoustic parameter, but for example, a bandpass filter and a rectifier circuit are used as one channel, and a plurality of such channels are arranged with different passbands, and this bandpass filter A method is known in which a temporal change in a spectrum pattern is extracted as a group output. In this case, the acoustic parameters are expressed as their time series Pi(nl (i = 1. 2...
・TRI is, for example, the number of channels of a bandpass filter,
n=1, 2...NUN can be expressed as the number of frames used for recognition in the section determined by voice section determination.

この音響分析回路（２）よりの音響パラメータ時系列Ｐ
ｉ（ｎｌは、例えばスイッチからなるモード切換回路（
３）に供給される。この回路（３）のスイッチが端子Ａ
側に切り換えられるときは登録モード時で、音響パラメ
ータ時系列Ｐｉ（ｎｌが認識パラメータとして標準パタ
ーンメモ１月４）にストアされる。つまり、音声認識に
先だって話者の音声パターンが標準パターンとしてこの
メモ１月４）に記憶される。なお、この登録時、発声速
度変動や単語長の違いにより一般に各登録標準パターン
のフレーム数は異なっている。Acoustic parameter time series P from this acoustic analysis circuit (2)
i(nl is a mode switching circuit (for example, a switch)
3). The switch of this circuit (3) is terminal A
When it is switched to the registration mode, it is stored in the acoustic parameter time series Pi (nl is the recognition parameter in the standard pattern memo January 4). That is, prior to speech recognition, the speaker's speech pattern is stored as a standard pattern in this memo. Note that during this registration, the number of frames for each registered standard pattern generally differs due to variations in speaking speed and differences in word length.

一方、このスイッチ（３）が端子Ｂ側に切り換えられる
ときは認識モード時である。そして、この認識モード時
は、音響分析回路（２）からのそのときの入力音声の音
響パラメータ時系列が入力音声パターンメモ１月５）に
供給されて一時スドアされる。そしてこの入力パターン
と標準パターンメモ１月４）から読み出された複数の認
識対象単語の標準パターンのそれぞれとの違いの大きさ
が距離算出回路（６）にて計算され、そのうち入力パタ
ーンと標準パターンとの差が最小の認識対象単語が最小
値判定回路（７）にて検出され、これにて入力された単
語力ｑ忍識される。On the other hand, when this switch (3) is switched to the terminal B side, it is the recognition mode. In this recognition mode, the acoustic parameter time series of the input voice at that time from the acoustic analysis circuit (2) is supplied to the input voice pattern memo (January 5) and temporarily stored. Then, the distance calculation circuit (6) calculates the difference between this input pattern and each of the standard patterns of the plurality of recognition target words read from the standard pattern memo (January 4). The recognition target word with the smallest difference from the pattern is detected by the minimum value determination circuit (7), and the input word power q is determined by this.

このように、登録された標準パターンと入力パターンの
パターンマツチング処理により入力音声の認識を行なう
ものであるが、この場合に同じ単語を同じように発声し
てもそのスペクトラムパターンは時間軸方向にずれたり
伸縮したりすることを考慮しなければならない。すなわ
ち、例えば「ハイ」という単語を認識する場合、標準パ
ターンが「ハイ」で登録されているとき、入力音声が「
ハーイ」と時間軸方向に伸びてしまった場合、これは距
離が大きく違い、全く違った単語とされてしまい、正し
い認識ができない。このため、音声６＋ＯＡ＆のパター
ンマツチングでは、この時間軸方向のずれ、伸縮を補正
する時間正規化の処理を行なう必要があり、また、この
時間正規化は認識精度を向上させるための重要な処理で
ある。In this way, the input speech is recognized by pattern matching processing between the registered standard pattern and the input pattern, but in this case, even if the same word is uttered in the same way, the spectrum pattern will not change in the time axis direction. You must take into consideration that it may shift or expand/contract. In other words, for example, when recognizing the word "hai", if the standard pattern is registered as "hai", the input voice is "
If the word ``hi'' extends along the time axis, the distance will be very different and it will be treated as a completely different word, making it impossible to recognize it correctly. Therefore, in pattern matching of audio 6+OA&, it is necessary to perform time normalization processing to correct this shift in the time axis direction and expansion/contraction, and this time normalization is an important process for improving recognition accuracy. It is.

この時間正規化の一方法としてＤ　Ｐ　（Ｄｙｎａｍｉ
ｃＰｒｏｇｒａｍｍｉｎｇ　）マツチングと呼ばれる手
法がある（例えば特開昭５０−９６１０４号公報参照）
。One method of time normalization is D P (Dynami
cProgramming) There is a method called matching (for example, see Japanese Patent Application Laid-Open No. 50-96104).
.

このＤＰマツチングは時間軸のずれを考慮した多数の標
準パターンを用意しておくのではなく、歪関数によって
多数の時間を正規化した標準パターンを生成し、これと
入力パターンとの距離を求め、その最小値のものを検知
することにより、音声認識をするものである。This DP matching does not prepare a large number of standard patterns that take time axis deviations into account, but instead generates a standard pattern that normalizes a large number of times using a distortion function, and calculates the distance between this and the input pattern. Speech recognition is performed by detecting the one with the minimum value.

ところで、このＤＰマツチングの手法を用いる場合、登
録される標準パターンのフレーム数は不定であり、しか
も全登録標準パターンと入力パターンとのＤＰマツチン
グ処理をする必要があり、語党が多くなると演算量が飛
躍的に増加する欠点がある。By the way, when using this DP matching method, the number of frames of the standard pattern to be registered is undefined, and it is necessary to perform DP matching processing between all registered standard patterns and the input pattern, and as the number of word groups increases, the amount of calculation increases. The disadvantage is that the amount increases dramatically.

また、ＤＰマツチングは、定常部（スペクトラムバクー
ンの時間変化のない部分）を重視したマツチング方式で
あるので部分的類似パターン間で誤認識を生しる可能性
があった。Furthermore, since DP matching is a matching method that emphasizes a stationary part (a part of the spectrum that does not change over time), there is a possibility that erroneous recognition may occur between partially similar patterns.

このような欠点を生しない時間正規化の手法を本出願人
は先に提案したく例えば特願昭５９−１０６１７７号）
。The present applicant would like to first propose a time normalization method that does not cause such drawbacks (for example, Japanese Patent Application No. 106177/1982).
.

すなわち、音響パラメータ時系列Ｐｉ（ｎ）は、そのパ
ラメータ空間を考えた場合、点列を描く。例えば認識対
象単語がｒＨＡＩＪであるとき音響分析用バンドパスフ
ィルタの数が２個で、Ｐｉ（ｎｌ−（Ｐｔ　　　Ｐ２　）であれば、入力音声の音響パラメータ時系列はその２次
元パラメータ空間には第５図に示すような点列を描く。That is, the acoustic parameter time series Pi(n) draws a sequence of points when its parameter space is considered. For example, when the recognition target word is rHAIJ and the number of acoustic analysis band-pass filters is 2 and Pi(nl-(Pt P2)), the acoustic parameter time series of the input speech has the following two-dimensional parameter space: Draw a sequence of points as shown in Figure 5.

この図から明らかなように音声の非定常部の点列は粗に
分布し、準定常部は密に分布する。この場合、完全に音
声が定常であればパラメータは変化せず、その場合には
点列はパラメータ空間において一点に停留することにな
るが、人間は同じ音を発生しても、音声のゆらぎのため
完全な定常にはならず、図のように準定常部として、ゆ
らぎの影響がでる。As is clear from this figure, the point sequence of the non-stationary part of the voice is distributed coarsely, and the quasi-stationary part is densely distributed. In this case, if the audio is completely stationary, the parameters will not change, and in that case, the point sequence will stay at one point in the parameter space, but humans can produce the same sound without any fluctuation in the audio. Therefore, it does not become completely stationary, and as shown in the figure, it becomes a quasi-stationary part and is affected by fluctuations.

そして、以上のことから、音声の発声速度変動による時
間軸方向のずれは殆んどが準定常部の点列密度の違いに
起因し、非定常部の時間長の影響は少ないと考えられる
。そこで、この人力パラメータ時系列Ｐｉ（ｎｌの点列
から第６図に示すように点列全体を近似的に通過するよ
うな連続曲線で描いた！ｌＬ跡を推定すれば、この軌跡
は音声の発声速度変動に対して殆んど不変であることが
わかる。From the above, it is considered that most of the deviation in the time axis direction due to variations in speech rate is caused by differences in point sequence density in the quasi-stationary part, and that the influence of the time length of the unsteady part is small. Therefore, if we estimate the trajectory of the human parameter time series Pi (nl), which is drawn as a continuous curve that approximately passes through the entire point sequence as shown in Figure 6, this trajectory will be the same as that of the voice. It can be seen that it is almost unchanged with respect to variations in speaking speed.

このことから、出願人は、次のような時間軸正規化方法
を提案した。すなわち、先ず入力パラメータの時系列Ｉ
’１（ｎｌの始端Ｐｉ（１）から終端Ｐｉ（Ｎ）までを
連続曲線で描いた軌跡を推定する。この場合、この軌跡
の推定は例えば音響パラメータ時系列を第７図に示すよ
うに直線近似することによって行なう。この推定した曲
線から軌跡の長さＳを求める。Based on this, the applicant proposed the following time axis normalization method. That is, first, the input parameter time series I
'1 (nl) A trajectory drawn as a continuous curve from the starting point Pi (1) to the ending point Pi (N) is estimated. This is done by approximation.The length S of the trajectory is determined from this estimated curve.

そして第７図において○印で示すようにこの軌跡に沿っ
て所定長Ｔで再サンプリングする。例えばＭ個の点に再
サンプリングする場合、Ｔ＝Ｓ／（Ｍ　　１）　　　　　　・・・（１）の長さ
を基準として！ｌｔ跡を再サンプリングする。Then, resampling is performed at a predetermined length T along this trajectory as indicated by the circle in FIG. For example, when resampling to M points, T=S/(M 1)...based on the length of (1)! lt traces are resampled.

この再サンプリングされた点列を描くパラメータ時系列
を旧（ｍｌ　（ｉ　＝　１．　２−１．　ｍ＝　１．　
２　・−・・Ｍ）とすれば、このパラメータ時系列旧（
ｍ）は軌跡の基本情報を有しており、しかも音声の発声
速度変動に対して殆んど不変なパラメータである。The parameter time series depicting this resampled point sequence is old (ml (i = 1. 2-1. m = 1.
2 ・−・・M), this parameter time series old (
m) has basic information on the trajectory, and is a parameter that is almost invariant to variations in speech rate.

つまり、時間軸が正規化された認識パラメータ時系列で
ある。In other words, it is a recognition parameter time series whose time axis has been normalized.

したがって、このパラメータ時系列旧（ｍ）を標準パタ
ーンとして登録しておくとともに、入力パターンもこの
パラメータ時系列旧（ｍｌとして得、このパラメータ時
系列旧（ｍ）により両パターン間の距離を求め、その距
離が最小であるものを検知して音声認識を行うようにす
れば、時間軸方向のずれが正規化されて除去された状態
で音声認識が常になされる。Therefore, this parameter time series old (m) is registered as a standard pattern, and the input pattern is also obtained as this parameter time series old (ml), and the distance between both patterns is calculated using this parameter time series old (m). If the voice recognition is performed by detecting the one with the minimum distance, the voice recognition will always be performed in a state where the deviation in the time axis direction is normalized and removed.

そして、この処理方法によれば、登録時の発声速度変動
や単語長の逢いに関係なく認識パラメータ時系列旧（ｍ
）のフレーム数は常にＭであり、その上、認識パラメー
タ時系列Ｑｉ（ｍｌは時間正規化されているので、入力
パターンと登録標準パターンとの距離の演算は最も単純
なヂエビシエフ距離を求める演算でも良好な効果が期待
できる。According to this processing method, the recognition parameter time series old (m
) is always M, and in addition, the recognition parameter time series Qi (ml) is time-normalized, so the calculation of the distance between the input pattern and the registered standard pattern is even the simplest calculation of the Dzievisiev distance. Good effects can be expected.

また、以上の方法は音声の非定常部をより重視した時間
正規化の手法であり、ＤＰマツチング処理のような部分
的類似パターン間の誤認識が少なくなる。Furthermore, the above method is a time normalization method that places more emphasis on non-stationary parts of speech, and reduces misrecognition between partially similar patterns such as in DP matching processing.

さらに、発声速度の変動情報は正規化パラメータ時系列
Ｑｉ（ｍ）には含まれず、このためパラメータ空間に配
位するパラメータ遷移構造のグローバルな特徴等の扱い
が容易となり、不特定話者認識に対しても有効な各種方
法の適用が可能となる。Furthermore, since the speech rate fluctuation information is not included in the normalized parameter time series Qi(m), it is easy to handle the global characteristics of the parameter transition structure arranged in the parameter space, and it is useful for speaker-independent recognition. Various effective methods can also be applied.

なお、以下、以上のような時間正規化の処理をＮ　Ａ　
Ｔ　（Ｎｏｒｍａｌｉｚａｔｉｏｎ　Ａｌｏｎｇ　Ｔｒ
ａｊｅｃｔｏｒｙ）処理と呼ぶ。In addition, in the following, the above time normalization process is performed using N A
T (Normalization Along Tr
This is called ajectory processing.

Ｄ　発明が解決しようとする問題点ところで、以上のようなＮＡＴ処理を行なっても準定常
部のゆらぎの影響は残留している。D. Problems to be Solved by the Invention Incidentally, even if the above-described NAT processing is performed, the influence of fluctuations in the quasi-stationary region remains.

逆に、この準定常部の特徴は話者によって異なるから、
この準定常部の特徴をより抽出できれば認識率が向上す
ることも考えられる。Conversely, the characteristics of this quasi-stationary part differ depending on the speaker, so
If more features of this quasi-stationary region can be extracted, the recognition rate may be improved.

この発明は一見矛盾する上記の２つのこと、すなわち準
定常部の影響をできるだけ除去するということと、準定
常部の特徴をより抽出できるということが、ともに実現
できるようにしたＮＡＴ処理方式の改良案を提供しよう
とするものである。This invention is an improvement of the NAT processing method that makes it possible to achieve the two seemingly contradictory things mentioned above, namely, to eliminate the influence of the quasi-stationary region as much as possible, and to be able to extract more characteristics of the quasi-stationary region. This is an attempt to provide a proposal.

Ｅ　問題点を解決するための手段この発明においては、入力音声信号の音響パラメータ系
列を得る音響分析手段（２）と、この音響分析手段（２
）よりの音響パラメータ系列のパラメータ間の距離を算
出するパラメータ間距離算出手段（９１）と、このパラ
メータ間距離算出手段（９１）で求められた各距離にバ
イアスを付与するバイアス付与手段（９２）と、バイア
ス付与された各パラメータ間距離に基づいて音響分析手
段（２）よりの音響パラメータ系列がパラメータ空間で
描く軌跡を推定しこの軌跡から認識パラメータ系列を生
成する正規化パラメータ生成手段（９３）　　（９４）
　　（９５）と、認識対象語の標準パターンの認識パラ
メータ系列が記憶されている標準パターンメモ１月４）
と、音響パラメータ系列に基づいて形成される入力パタ
ーンの認識パラメータ系列と上記標準パターンメモリよ
りの標準パターンの認識パラメータ系列との差を算出す
る距離算出手段（６）と、距離算出手段（６）で算出さ
れた値の最小の標準パターンの語を検知して認識出力を
得る最小値判定手段（７）とを設ける。E. Means for Solving the Problems The present invention includes an acoustic analysis means (2) for obtaining an acoustic parameter sequence of an input audio signal;
), and a bias applying means (92) that applies a bias to each distance obtained by the inter-parameter distance calculation means (91). and normalization parameter generation means (93) that estimates a trajectory drawn in the parameter space by the acoustic parameter series from the acoustic analysis means (2) based on the biased distance between each parameter, and generates a recognition parameter series from this trajectory. (94)
(95) and the standard pattern memo that stores the recognition parameter series of the standard pattern of the recognition target word (January 4)
and distance calculation means (6) for calculating the difference between the input pattern recognition parameter sequence formed based on the acoustic parameter sequence and the standard pattern recognition parameter sequence from the standard pattern memory; and distance calculation means (6). Minimum value determining means (7) is provided for detecting the minimum standard pattern word of the values calculated in and obtaining a recognition output.

Ｆ　作用入力の音響パラメータ系列のパラメータ間距離より所定
のバイアス値を減算することにより準定常部の複数のパ
ラメータ間の距離間隔を零又は微少にすることができ、
準定常部をゆらぎの殆んどない定常部とみなすことがで
きる。F By subtracting a predetermined bias value from the inter-parameter distance of the acoustic parameter series of the action input, the distance interval between the plurality of parameters in the quasi-stationary part can be made zero or minute,
The quasi-stationary region can be regarded as a stationary region with almost no fluctuations.

また、入力音響パラメータ系列のパラメータ間距離に所
定のバイアス値を加算すれば、パラメータ間距離が本来
率さい準定常部も所定値以上の距離となって非定常部す
なわち過渡部と同様に扱うことができ、この準定常部の
特徴を抽出するごとが可能になる。Furthermore, if a predetermined bias value is added to the inter-parameter distance of the input acoustic parameter series, the quasi-stationary part, where the inter-parameter distance is originally large, becomes a distance greater than the predetermined value, and can be treated in the same way as an unsteady part, that is, a transient part. This makes it possible to extract the features of this quasi-stationary region.

鴫− Ｇ　実施例第１図はこの発明による音声認識装置の一実施例で、こ
の例は音響分析に１６チヤンネルのバンドパスフィルタ
群を用いた場合で、第４図と対応する部分には同一符号
を付す。雫-G Embodiment FIG. 1 shows an embodiment of the speech recognition device according to the present invention. This example shows a case where a 16-channel band-pass filter group is used for acoustic analysis, and the parts corresponding to those in FIG. 4 are the same. Attach a sign.

Ｇ１音響分析回路（２）の説明すなわち、この例の場合、音響分析回路（２）において
は、マイクロホン（１）からの音声信号がアンプ（２１
１）及び帯域制限用のローパスフィルタ（２１２＞を介
してＡ／Ｄコンバータ（２１３）に供給され、例えば１
２．５ｋＨｚのサンプリング周波数で１２ビツトのデジ
タル音声信号に変換される。このデジタル音声信号は、
１５チヤンネルのバンドパスフィルタバンク（２２）の
各チャンネルのデジタルバンドパスフィルタ（２２１１
）　、　　（２２１２）　、・・・・、　、（２２１ｘ
６）に供給される。このデジタルバンドパスフィルタ（
２２ｈ　）、　　（２２１２）　、・・・・、　　（２
２ｈ６）は例えばバターワース４次のデジタルフィルタ
にて構成され、２５０１１ｚから５．５ＫＩ（ｚまでの
帯域が対数軸上で等間隔で分割された各帯域が各フィル
タの通過帯域となるようにされている。そして、各デジ
タルバンドパスフィルタ（２２１１）　　、　　（２２
１２）　　。Description of G1 acoustic analysis circuit (2) In other words, in this example, in the acoustic analysis circuit (2), the audio signal from the microphone (1) is transmitted to the amplifier (21).
1) and a band-limiting low-pass filter (212>) to the A/D converter (213).
It is converted into a 12-bit digital audio signal at a sampling frequency of 2.5kHz. This digital audio signal is
A digital band pass filter (2211) for each channel of a 15 channel band pass filter bank (22)
) , (2212) ,..., , (221x
6). This digital bandpass filter (
22h), (2212),..., (2
2h6) is composed of, for example, a Butterworth fourth-order digital filter, and each band from 25011z to 5.5KI (z is divided at equal intervals on the logarithmic axis) becomes the passband of each filter. Then, each digital band pass filter (2211), (22
12).

・・・・、　　（２２ｈｉ、）の出力信号はそれぞれ整
流回路（２２２１）　、　　（２２２２）　、・・・・
、　　（２２２１Ｇ）に供給され、これら整流回路（２
２２□）、（２２２２）。The output signals of ..., (22hi,) are respectively rectified by circuits (2221), (2222), ...
, (2221G) and these rectifier circuits (2221G).
22□), (2222).

・・・・（２２２１＆　）の出力はそれぞれデジタルロ
ーパスフィルタ（２２３１）　、　　（２２３２）　、
・・・・、（２２３１Ｇ）に供給される。これらデジタ
ルローパスフィルタ（２２３１）　、　　（２２３２）
　、・・・・、　　（２２３ｔｓ）は例えばカットオフ
周波数５２．８ＨｚのＦＩＲローバスフ′イルタにて構
成される。...The outputs of (2221 & ) are respectively digital low-pass filters (2231), (2232),
..., (2231G). These digital low-pass filters (2231), (2232)
, . . . (223ts) is composed of, for example, an FIR low-pass filter with a cutoff frequency of 52.8 Hz.

音響分析回路（２）の出力である各デジタルローパスフ
ィルタ（２２３１）　、　　（２２３２）　、・・・・
。Each digital low-pass filter (2231), (2232), etc. which is the output of the acoustic analysis circuit (2)
.

（２２３１６）の出力信号は特徴抽出回路（２３）を構
成するサンプラー（２３Ｈに供給される。このサンプラ
ー（２３１）ではデジタルローパスフィルタ（２２３１
）　、　　（２２３２）　、・・・・、　　（２２３１
ｓ）の出力信号をフレーム周期５．１２ｍ５ｅｃ毎にサ
ンプリングする。したがって、これよりはサンプル時系
列Ａｔ（ｎｌ　（ｉ　＝　１．　２．　−１６；　ｎは
フレーム番号でｎ＝１．２．　　・・・・、Ｎ）が得ら
れる。The output signal of (22316) is supplied to the sampler (23H) that constitutes the feature extraction circuit (23). In this sampler (231), the digital low-pass filter (2231
) , (2232) ,..., (2231
The output signal of s) is sampled every frame period of 5.12m5ec. Therefore, from this, a sample time series At(nl (i = 1. 2. -16; n is the frame number and n = 1.2, . . . , N) is obtained.

このサンプラー（２３１）からの出力、゛つまりサンプ
ル時系列＾１（ｎｌは音源情報正規化回路（２３２）に
供給され、これにて認識しようとする音声の話者による
声帯音源特性の違いが除去される。The output from this sampler (231), i.e., the sample time series ^1 (nl), is supplied to the sound source information normalization circuit (232), which removes differences in vocal cord sound source characteristics depending on the speaker of the speech to be recognized. be done.

即ち、フレーム周期毎にサンプラー（２３１）から供給
されるサンプル時系列Ａｉ［ｎ）に対してＡｉ（ｎ）＝
　　ｌｏｇ　（＾１（ｎｌ＋　Ｂ）　　　　　　　・・
・（２）なる対数変換がなされる。この（１１式におい
て、Ｂはバイアスでノイズレベルが隠れる程度の値を設
定する。That is, for the sample time series Ai[n) supplied from the sampler (231) every frame period, Ai(n)=
log (^1(nl+B)...
・(2) A logarithmic transformation is performed. In this equation (11), B is set to a value that hides the noise level with the bias.

そして、声帯音源特性をｙｉ＝ａ−ｉ＋ｂなる式で近似
すると、このａ及びｂの係数は次式により決定される。Then, when the vocal cord sound source characteristics are approximated by the formula yi=a−i+b, the coefficients of a and b are determined by the following formula.

（Ｉ　＝　１６）　　　　　・・・（３）（１＝　１６
）　　　　　　・　・　・（４）そして、音源の正規化
されたパラメータをＰｉ［ｎｌとすると、ａ　ｆｎ）　
＜　ＯのときパラメータＰｉ（ｎ）はＰｉ（ｎｌ＝Ａｉ
ｆｎｌ　−（ａ　（ｎｌ　−ｉ　＋ｂ（ｎｌ）　　　・
・・（５１と表される。(I = 16) ... (3) (1 = 16
) ・・・ (4) Then, if the normalized parameters of the sound source are Pi [nl, then a fn)
< When O, the parameter Pi(n) is Pi(nl=Ai
fnl −(a (nl −i +b(nl) ・
...(Represented as 51.

又、ａ　ｆｎ）≧０のときレベルの正規化のみ行ない、
パラメータＰｉ（ｎｌは・・・（６）と表される。Also, when afn)≧0, only level normalization is performed,
The parameter Pi(nl is expressed as...(6).

こうして声帯音源特性の違いが正規化されて除去された
音響パラメータ時系列Ｐｉ（ｎ）がこの音源情報正規化
回路（２３２）より得られる。In this way, the sound source information normalization circuit (232) obtains an acoustic parameter time series Pi(n) in which differences in the vocal cord sound source characteristics are normalized and removed.

この音源情報正規化回路（２３２）よりの音響パラメー
タＰｉ（ｎｌは音声区間内パラメータメモリ（８）に供
給される。この音声区間内パラメータメモ１月８）では
音声区間判定回路（２４）からの音声区間判定信号を受
けて、パラメータＰｉ（ｎ）が、判定さた音声区間毎に
ストアされる。The acoustic parameters Pi (nl) from this sound source information normalization circuit (232) are supplied to the voice interval parameter memory (8). In response to the speech section determination signal, a parameter Pi(n) is stored for each determined speech section.

音声区間判定回路（２４）はゼロクロスカウンタ（２４
１）とパワー算出回路（２４２）と音声区間決定回路（
２４３）とからなり、Ａ／Ｄコンバータ（２１３）より
のデジタル音声信号がゼロクロスカウンタ（２４１）及
びパワー算出回路（２４２）に供給される。ゼロクロス
カウンタ（２４１）では１フレ一ム周期５．１２ｍ５ｅ
ｃ毎に、この１フレ一ム周期内の６４サンプルのデジタ
ル音声信号のゼロクロス数をカウントし、そのカウント
値が音声区間決定回路（２４３）の第１の入力端に供給
される。パワー算出回路（２４２）では１フレ一ム周期
毎にこのｌフレーム周期内のデジタル音声信号のパワー
、すなわち２乗和が求められ、その出力パワー信号が音
声区間決定回路（２４３）の第２の入力端に供給される
。音声区間決定回路（２４３）には、さらに、その第３
の入力端に音源情報正規化回路（２３２）よりの音源正
規化情報が供給される。そして、この音声区間決定回路
（２４３＞　においてはゼロクロス数、区間内パワー及
び音源正規化情報が複合的に処理され、無音、無声音及
び有声音の判定処理が行なわれ、音声区間が決定される
。The voice section determination circuit (24) includes a zero cross counter (24).
1), a power calculation circuit (242), and a voice section determination circuit (
243), and the digital audio signal from the A/D converter (213) is supplied to the zero cross counter (241) and the power calculation circuit (242). One frame period of zero cross counter (241) is 5.12m5e
c, the number of zero crossings of the 64 samples of the digital audio signal within this one frame period is counted, and the count value is supplied to the first input terminal of the audio section determining circuit (243). The power calculation circuit (242) calculates the power of the digital audio signal within this l frame period for each frame period, that is, the sum of squares, and the output power signal is sent to the second voice section determination circuit (243). Supplied to the input end. The voice section determining circuit (243) further includes a third
The sound source normalization information from the sound source information normalization circuit (232) is supplied to the input terminal of the sound source information normalization circuit (232). In this speech section determination circuit (243>), the number of zero crossings, the power within the section, and the sound source normalization information are processed in a composite manner, and a process for determining silence, unvoiced sound, and voiced sound is performed, and a speech section is determined.

この音声区間決定回路（２４３）よりの判定された音声
区間を示す音声区間判定信号は音声区間判定回路（２４
）の出力として音声区間内パラメータメモリ　（２００
’）に供給される。The voice interval determination signal indicating the determined voice interval from the voice interval determination circuit (243) is transmitted to the voice interval determination circuit (243).
) as the output of the voice interval parameter memory (200
') is supplied.

こうして、判定音声区間内においてメモリ　（２００）
にストアされた音響パラメータ時系列ｐｉ（ｎ）はＮ　
Ａ　′ｒ’処理回路（９）に供給される。In this way, within the judgment voice section, the memory (200)
The acoustic parameter time series pi(n) stored in is N
A 'r' processing circuit (9) is supplied.

Ｇ２時間正規化処理の説明この場合、ＮＡＴ処理回路（９）はパラメータ間距離算
出回路（９１）とバイアス付与回路（９２）と軌跡長算
出回路（９３）と補間間隔算出回路（９４）と補間点抽
出回路（９５）からなる。Explanation of G2 time normalization processing In this case, the NAT processing circuit (9) processes the interparameter distance calculation circuit (91), the bias application circuit (92), the trajectory length calculation circuit (93), the interpolation interval calculation circuit (94), and the interpolation It consists of a point extraction circuit (95).

パラメータメモリ　（２００）からのパラメータ時系列
Ｐｉ（ｎｌ　（ｉ　＝　１．　２．　　”、　　１６；
　ｎ＝　１．　２゜・・・・、Ｎ）はパラメータ間距離
算出回路（９１）に供給される。このパラメータ間距離
算出回路（９１）においては音響パラメータ時系列Ｐｉ
（ｎｌがそのパラメータ空間乙こおいて前述の第７図に
示すように描く直線近似による軌跡における各パラメー
タ間の距離を算出する。Parameter time series Pi(nl (i = 1. 2. ”, 16;
n=1. 2°..., N) are supplied to an inter-parameter distance calculation circuit (91). In this inter-parameter distance calculation circuit (91), the acoustic parameter time series Pi
(nl calculates the distance between each parameter in the trajectory drawn by linear approximation as shown in FIG. 7 above in the parameter space B).

この場合、■次元ベクトルａｉ及びｂ１間のユークリッ
ド距離Ｄ（ａｌ、ｂ；）は ■ は５（ｎｌ＝　Ｄ　（Ｐｉ　（ｎ　＋１　）　、　Ｐｉ（
ｎ））（ｎ＝１．　　・・・・、　　Ｎ）　　　　　　
　　・　・　・（８）と表わされる。In this case, the Euclidean distance D(al, b;) between the ■dimensional vectors ai and b1 is 5(nl=D(Pi(n+1), Pi(
n)) (n=1...., N)
・・・It is expressed as (8).

こうして算出されたパラメータ間距離Ｓ　（ｎｌは、バ
イアス付与回路（９２）に供給される。The inter-parameter distance S (nl) thus calculated is supplied to the bias applying circuit (92).

このバイアス付与回路（９２）においては各パラメータ
間距離Ｓ　（ｎ）に対し、後述するように所定のバイア
ス値が減算され又は加算される。In this bias applying circuit (92), a predetermined bias value is subtracted or added to each parameter distance S (n) as described later.

このパラメータ間距離Ｓ　（ｎｌに対しバイアス値が付
与されたパラメータ間距離ＢＳＴｆｌ）はｇＬ跡長算出
回路（９３）に供給され、時系列方向における第１番目
のパラメータＰ　ｉ　ｎ）から第Ｎ番目（最後）のパラ
メータＰｉ（ｔＪ）までの全軌跡長ＳＬが、このパラメ
ータ間距離ＢＳ（ｎ）が用いられて算出される。This inter-parameter distance S (inter-parameter distance BSTfl with a bias value given to nl) is supplied to the gL trace length calculation circuit (93), and is calculated from the first parameter P in) to the N-th parameter in the time series direction. The total trajectory length SL up to the (last) parameter Pi(tJ) is calculated using this inter-parameter distance BS(n).

すなわち、時系列方向における第１番目のパラメータＰ
　１（１１から第ｎ番目のパラメータＰｉｆｎｌ迄の距
離５Ｌｆｎ）はと表わされる。そして、全軌跡長ＳＬはと表わされる。That is, the first parameter P in the time series direction
1 (distance 5Lfn from 11 to the n-th parameter Pifnl) is expressed as. Then, the total trajectory length SL is expressed as.

この軌跡長算出回路（９３）にて求められた軌跡長ＳＬ
を示す信号は補間間隔算出回路（９４）に供給される。The trajectory length SL obtained by this trajectory length calculation circuit (93)
A signal indicating the interpolation interval calculation circuit (94) is supplied to the interpolation interval calculation circuit (94).

この補間間隔算出回路（９４）では軌跡に沿って再サン
プリングするときの再サンプリング間隔Ｔを算出する。This interpolation interval calculation circuit (94) calculates the resampling interval T when resampling is performed along the locus.

この場合、Ｍ点に再サンプリングするとすれば、再サン
プリング間隔ＴはＴ＝ＳＬ／（Ｍ−１）　　　　　　　　・・・　（１１
）として求められる。In this case, if resampling is performed at M points, the resampling interval T is T=SL/(M-1) (11
) is required.

この補間間隔算出回路（９４）よりの再サンプリング間
隔Ｔを示す信号は補間点抽出回路（９５）に供給される
。また、パラメータメモリ　（２００）よりの音響パラ
メータ時系列Ｐｉ（ｎ）及びバイアス付与回路（９２）
よりのバイアス付与されたパラメータ間距離ＢＳ（ｎｌ
が、この補間点抽出回路（９５）に供給される。この補
間点抽出回路（９５）は音響パラメータ時系列Ｐｉ（ｎ
ｌのそのパラメータ空間における軌跡、例えばパラメー
タ間を直線近似したＩ！ｌ！ｌＬ跡に沿って第７図にお
いて○印にて示すように再サンプリング間隔Ｔで再サン
プリングし、このサンプリングにより得た新たな点列よ
り認識パラメータ時系列Ｑｉ（ｍｌを形成する。この場
合に、補間時に使用される２パラメ一タ間の距離として
はバイアスが付与された値ＢＳ（ｎ＝が使用される。A signal indicating the resampling interval T from this interpolation interval calculation circuit (94) is supplied to an interpolation point extraction circuit (95). In addition, the acoustic parameter time series Pi(n) from the parameter memory (200) and the bias applying circuit (92)
The biased interparameter distance BS(nl
is supplied to this interpolation point extraction circuit (95). This interpolation point extraction circuit (95) uses the acoustic parameter time series Pi(n
The locus of l in its parameter space, for example I!, which is a linear approximation between parameters. l! Re-sampling is performed along the lL trace at a re-sampling interval T as indicated by the circle in FIG. 7, and a recognition parameter time series Qi (ml) is formed from the new point sequence obtained by this sampling. In this case, A biased value BS (n=) is used as the distance between two parameters used during interpolation.

すなわち、この補間点抽出回路（９５）においては第２
図に示すフローチャートに従った処理がなされ、ＬＵＡ
６パラメータ時系列Ｑｉ（ｍｌが形成される。That is, in this interpolation point extraction circuit (95), the second
The processing according to the flowchart shown in the figure is performed, and LUA
A six-parameter time series Qi (ml) is formed.

先ず、ステップ［１０１）にて再サンプリング点の時系
列方向における番号を示す変数Ｊに値１が設定されると
共に音響パラメータ時系列Ｐｉ（ｎｌのフレーム番号を
示す変数ＩＣに値ｌが設定され、イニシャライズされる
。次にステップ（１０２〕にて変数Ｊがインクリメント
され、ステップ（１０３）にてそのときの変数Ｊが（Ｍ
−１）以下であるかどうかが判別されることにより、そ
のときの再サンプリング点の時系列方向における番号が
リサンプリングする必要のある最後の番号になっている
かどうかを判断する。最後の番号であればステップ〔１
０４）に進み、再サンプリングは終了する。First, in step [101], a value 1 is set to the variable J indicating the number of the resampling point in the time series direction, and a value l is set to the variable IC indicating the frame number of the acoustic parameter time series Pi (nl). Next, in step (102), variable J is incremented, and in step (103), variable J at that time is (M
-1) It is determined whether the number of the resampling point at that time in the time series direction is the last number that needs to be resampled by determining whether or not it is less than or equal to -1). If it is the last number, step [1
04) and the resampling ends.

最後の番号でなければステップ（１０５）にて第１番目
の再サンプリング点（これは例えば無音の部分である。If it is not the last number, then in step (105) the first resampling point (this is, for example, a silent part) is selected.

）から第３番目の再サンプリング点までの再サンプリン
グ距離ＯＬが算出される。) to the third resampling point is calculated.

次にステップ（１０６）に進み、変数ＩＣがインクリメ
ントされる。Next, the process proceeds to step (106), where the variable IC is incremented.

次にステップ（１０７）にて再サンプル距離ＤＬが音響
パラメータ時系列Ｐｉ（ｎｌの第１番目のパラメータＰ
ｉ（１１から第１Ｃ番目のパラメータＰｉｒ＋ｏまでの
距離Ｓ　１．（１口よりもｔＪ・さいかどうかにより、
そのときの再サンプリング点が軌跡上においてそのとき
のパラメータＰｉｏｃ＋よりも軌跡の始点側に位置する
かどうかが判断され、始点側に位置していなければステ
ップ（１０６）に戻り変数ＩＣをインクリメントした後
再びステップ（１０７）にて再サンプリング点とパラメ
ータＰｉ（１０との軌跡上における位置の比較をし、再
サンプリング点が軌跡上においてパラメータＰｉ（Ｉｏ
よりも始点側に位置すると判断されたとき、ステップ（
１０Ｂ）に進み認識バラメーｌりＱｉσ）が形成される
。Next, in step (107), the resampling distance DL is set to the first parameter P of the acoustic parameter time series Pi (nl).
i(distance S from 11 to 1Cth parameter Pir+o
It is determined whether the resampling point at that time is located closer to the starting point of the trajectory than the parameter Pioc+ at that time on the trajectory, and if it is not located closer to the starting point, the process returns to step (106) and the variable IC is incremented. In step (107) again, the positions of the resampling point and the parameter Pi (10) on the trajectory are compared, and the resampling point and the parameter Pi (Io) are compared on the trajectory.
When it is determined that the position is closer to the starting point than the step (
Proceeding to step 10B), the recognition parameters Qiσ) are formed.

即ち、第３番目の再サンプリング点による再サンプリン
グ距＃ＤＬからこの第３番目の再サンプリング点よりも
始点側に位置する第（ＩＣ−１）番目のパラメータＰ　
１ｏｃ−□、による距離Ｓ　Ｌ（Ｉｃ−１）を減算して
第（ＩＣ−１）番目のパラメータＰ　ｉ　（ＩＣ−１＋
から第３番目の再サンプリング点迄の距離ＳＳを求める
。この距離はもちろんバイアス付加後の値ＢＳ（ｎ）が
用いられて求められる。That is, from the resampling distance #DL by the third resampling point, the (IC-1)th parameter P located closer to the starting point than this third resampling point
1oc-□, the distance S L (Ic-1) is subtracted to obtain the (IC-1)th parameter P i (IC-1+
Find the distance SS from to the third resampling point. This distance is of course determined using the value BS(n) after adding the bias.

次に、軌跡上においてこの第３番目の再サンプリング点
の両側に位置するパラメータＰ　ｉ　（Ｉｃ−ｕ及びパ
ラメータＰｉｏｏ間の距離Ｓ　（ｎｌに対してバイアス
値を付加して後の距１ｉ１１ｔＢｓ（ｎ）によってこの
距！１ｉｌｔｓｓを除算し、この除算結果ＳＳ／　ＢＳ
（＋い０．に軌跡上において第３番目の再サンプリング
点の両側に位置するパラメータＰｉｎｏとＰ　ｆｏｅ−
１１との差（Ｐ　ｉ（Ｉｏ　　Ｐ　ｉｏｃ　−ｏ　）を
掛算して、軌跡上において第３番目の再サンプリング点
のこの再サンプリング点よりも始点側に隣接して位置す
る第（ＩＣ−１）番目のパラメータＰ　１ｕｃ−１１か
らの補間量を算出し、この補間量と第３番目の再サンプ
リング点よりも始点側に隣接して位置する第（ＩＣ−１
）番目のパラメータＰ　１ｏｃ−１，とを加算して、軌
跡に沿う新たな認識パラメータＱｉ（工。Next, the distance between the parameter P i (Ic-u and the parameter Pioo located on both sides of this third resampling point on the trajectory S (nl) and the subsequent distance 1i11tBs(n ) and divide this distance !1iltss by the division result SS/BS
(At +0., the parameters Pino and Pfoe- located on both sides of the third resampling point on the trajectory
11 (P i (Io P ioc −o )), the third resampling point on the trajectory is located adjacent to the starting point side of the third resampling point (IC-1). The interpolation amount from the th parameter P 1uc-11 is calculated, and this interpolation amount and the th (IC-1
)-th parameter P 1oc-1, and a new recognition parameter Qi (technique) along the trajectory is obtained.

が形成される。is formed.

このようにして始点及び終点（これらはそれぞれ無音で
あるときは旧（１）　＝　Ｏ、ＱｉｃＭ】＝　０である
。）を除＜　　（Ｍ−２）点の再サンプリングにより認
識パラメータ時系列Ｑ　ｉ（ｍ）　／！ｌ＜形成される
。In this way, the recognition parameter time series Q (m) /! l< is formed.

Ｇ３バイアス値付与の具体例の説明バイアス値の与え方は種々考えられるが、第１の例とし
てはパラメータ間距離算出回路（９１）で求めたパラメ
ータ間距離の最小値Ｓ　ｆｎ）　ｍｔ。をバイアス値と
して各パラメータ間距離Ｓ　（ｎ）より減算する場合が
あげられる。これは、準定常部を定常部と殆んどみなせ
るようにする場合である。すなわち、バイアス付与回路
（９２）では、ＢＳ（ｎｌ＝ｓ（ｎｌ　　　　Ｓｆｎ）ｍｔｎ　　　　
　　　　　　　　　　　　　　　ＨＨ＋　　　　（１２
）なる演算がなされる。Description of a specific example of giving a G3 bias value Various ways of giving a bias value can be considered, but the first example is the minimum value of the distance between parameters S fn) mt calculated by the distance between parameters calculation circuit (91). An example of this is when the bias value is subtracted from the distance S (n) between each parameter. This is the case in which the quasi-stationary region can almost be regarded as a stationary region. That is, in the bias applying circuit (92), BS(nl=s(nl Sfn)mtn
HH+ (12
) is performed.

例えば第３図に示すような２次元のパラメータ時系列、
を考え、各パラメータ間距離Ｓ　（ｎｌが図示の通りで
ある場合、その最小値ｓ　（ｎ）□ｔｎ＝３である。For example, a two-dimensional parameter time series as shown in Figure 3,
When the distance between each parameter S (nl is as shown in the figure), the minimum value s (n)□tn=3.

そこで、バイアス付与回路（９２）で（１２）式の演算
を行なえば、各パラメータ間距離ＢＳｆｎ）は同図の下
方に示すような値になり、準定常部におけるパラメータ
間距離ＢＳ（ｎ）は零又は微少なものとなる。Therefore, if the bias applying circuit (92) calculates equation (12), the distance between each parameter BSfn) becomes a value as shown in the lower part of the figure, and the distance between parameters BS(n) in the quasi-stationary part becomes It becomes zero or minute.

そして、補間点抽出回路（９５）ではこのバイアス値の
付与された距離が用いられて前述の第２図のフローチャ
ートのステップ（１０８）で補間点の抽出がなされるの
で、定常部をほぼ一点をみなした軌跡の推定がなされ、
その軌跡に沿った再サンプリングがされて認識パラメー
タ時系列Ｇｉｆｍｌが得られる。Then, in the interpolation point extraction circuit (95), the distance to which this bias value is applied is used to extract the interpolation point in step (108) of the flowchart shown in FIG. The assumed trajectory is estimated,
The recognition parameter time series Gifml is obtained by resampling along the trajectory.

こうして準定常部のゆらぎの影響を少なくできるＮＡＴ
処理において、より効果的に準定常部のゆらぎの影響を
排除することができる認識パラメータ時系列旧（ｍ）を
得ることができるものである。In this way, NAT can reduce the influence of fluctuations in the quasi-stationary region.
In the processing, it is possible to obtain a recognition parameter time series old (m) that can more effectively eliminate the influence of fluctuations in the quasi-stationary region.

次に、第２の例としてはパラメータ間距離Ｓ　ｉｎ）に
バイアス値ａを加算する場合である。すなわち、バイア
ス付与回路（９２）では、ＢＳｆｎ）＝　５ｆｎ）＋　ａ　　　　　　　　　・・
・（１３）なる演算がされる。Next, as a second example, a bias value a is added to the inter-parameter distance S in). That is, in the bias applying circuit (92), BSfn)=5fn)+a...
- The calculation (13) is performed.

この例の場合には、新たなパラメータ間距離ＢＳ［＋１
１の準定常部のパラメータ間距離が引き伸ばされるため
、補間点抽出回路（９５）から得られる認識パラメータ
時系列Ｑｉ（ｍ）はこの準定常部の特徴をも抽出したも
のとなる。In this example, the new parameter distance BS[+1
Since the distance between the parameters of the quasi-stationary part 1 is extended, the recognition parameter time series Qi (m) obtained from the interpolation point extraction circuit (95) also extracts the features of this quasi-stationary part.

このようなバイアス値加算の場合はＤＰマツチングで言
うところの整合窓の概念となり、ａ＝＋■のときは、Ｎ
ＡＴ処理は線形伸縮に等しくなる。In the case of such bias value addition, the concept of matching window in DP matching is used, and when a=+■, N
AT processing is equivalent to linear stretching.

なお、以上のバイアス値は軌跡長に応して変えるように
してもよいし、さらに、準定常部のパラメータ間距離の
平均値からバイアス値を定めるようにしてもよい。Note that the above bias value may be changed depending on the trajectory length, or furthermore, the bias value may be determined from the average value of distances between parameters in the quasi-stationary part.

また、パラメータ間圧Ａｔｉ　Ｓ　（ｎ）よりバイアス
値を減算して準定常部の影響を殆んど排除できる新たな
パラメータ間圧４　ＢＳ（ｎｌを得る場合及び準定常部
の特徴をより抽出する場合のバイアス値としては、前記
のように距離Ｓ　（ｎ）の最小値Ｓ　（ｎ）□１や上記
のようなコントロール値を用いるのではなく、実験等に
より求めた固定の値を用いるようにしてもよい。In addition, a new parameter pressure 4 BS (nl) can be obtained by subtracting the bias value from the parameter pressure Ati S (n) to almost eliminate the influence of the quasi-stationary region, and the characteristics of the quasi-stationary region can be extracted more easily. In this case, instead of using the minimum value S (n) □1 of the distance S (n) as described above or the control value as described above, use a fixed value determined through experiments etc. as the bias value. It's okay.

なお、このバイアス値を距％Ｓｔｔ　Ｓ　（ｎｌから減
算する場合、バイアス減算後の距離ＢＳ［ｎ）≧０の範
囲で行なうようにする。もっともＢＳ（ｎｌ　＜　０と
なった場合にはその距’Ｒ？Ｉ　ＢＳ（ｎ）　＝　Ｑと
強制的に定めるようにしてもよい。Note that when this bias value is subtracted from the distance % Stt S (nl), it is performed within the range of distance BS[n]≧0 after bias subtraction. However, if BS(nl < 0), the distance 'R?I BS(n) = Q may be forcibly determined.

なお、以上は１６チヤンネルからなるパラメータの値に
対しバイアスを付与する場合について説明したが、１６
チヤンネルのうちの各１チヤンネル毎又は複数チャンネ
ル毎に、つまり周波数帯域毎にパラメータを考え、その
パラメータについてバイアス付与を考慮したＮＡＴ処理
を行なうことにより詳細な特徴抽出ができるものである
。In addition, although the case where bias is given to the value of the parameter consisting of 16 channels has been explained above, 16
Detailed feature extraction can be performed by considering a parameter for each channel or for each of a plurality of channels, that is, for each frequency band, and performing NAT processing that takes biasing into consideration for the parameter.

Ｇ、パターンマツチング処理の説明このＮＡＴ処理回路（９）よりの認識パラメータ時系列
Ｑｉ（ｍ）はモード切換回路（３）に供給さ些るととも
に軌跡長算出回路（９１）よりの算出ｇｔ跡長を示す信
号がモード切換回路（３１）に供給される。G. Description of pattern matching processing The recognition parameter time series Qi (m) from this NAT processing circuit (9) is supplied to the mode switching circuit (3), and the calculated gt trace from the trajectory length calculation circuit (91) is A signal indicating the length is supplied to the mode switching circuit (31).

そして、登録時においては認識パラメータ時系列は標準
パターンメモ１月４）にストアされる。At the time of registration, the recognition parameter time series is stored in the standard pattern memo (January 4).

次に、音声認識時は以下のようにして、パターンマツチ
ング処理がなされる。Next, during speech recognition, pattern matching processing is performed as follows.

すなわち、ＮＡＴ処理回路（９）にて前記のようにＮＡ
Ｔ処理されて得られた認識パラメータ時系列口ｉ　（ｍ
）はモード切換回路（３）を介して距離算出回路（６）
に供給されて、標準パターンとの距離の算出がなされる
。That is, in the NAT processing circuit (9), the NA
The recognition parameter time series i (m
) is the distance calculation circuit (6) via the mode switching circuit (3).
is supplied to calculate the distance to the standard pattern.

この場合の距離は例えば簡易的なチェビシェフ距離とし
て算出される。この距離算出回路（６）よりの各標準パ
ターンと入力パターンとの距離の算出出力は最小値判定
回路（７）に供給され、距離算出値が最小となる標準パ
ターンが判定され、この判定結果により入力音声の認識
結果が出力端（７０）に得られる。The distance in this case is calculated as, for example, a simple Chebyshev distance. The calculation output of the distance between each standard pattern and the input pattern from this distance calculation circuit (6) is supplied to the minimum value determination circuit (7), which determines the standard pattern with the minimum distance calculation value, and based on this determination result. The recognition result of the input speech is obtained at the output end (70).

なお、以上の実施例においては音響パラメータ時系列Ｐ
ｉ（ｎ）からそのパラメータ空間における軌跡の軌跡長
を算出した場合について述べたが、音ツパラメータ周波
数系列からそのパラメータ空間における軌跡の軌跡長を
算出するようにしてもよい。In addition, in the above embodiment, the acoustic parameter time series P
Although the case has been described in which the trajectory length of the trajectory in the parameter space is calculated from i(n), the trajectory length of the trajectory in the parameter space may be calculated from the sound parameter frequency series.

また、上述の実施例においては直線近似による！Ｉｔ跡
の軌跡長を算出するようにしたが、円弧近似、スプライ
ン近似などによる軌跡の軌跡長を算出するようにしても
よい。Also, in the above embodiment, linear approximation is used! Although the trajectory length of the It trace is calculated, the trajectory length of the trajectory may be calculated by arc approximation, spline approximation, or the like.

Ｈ発明の効果以上のようにして、この発明によれば、ＮＡＴ処理にお
いてパラメータ間距離を算出する場合にバイアスを与え
るようにしたので、このバイアス値が負の場合には定常
部（準定常部）を除いた過渡部のみから特徴抽出をした
認識パラメータ時系列を得ることができ、一方、バイア
ス値が正の場合には、卓定常部に対する極端な時間軸正
規化がなくなり、この準定常部の特徴をも抽出できるよ
うになる。Effects of the Invention H As described above, according to the present invention, a bias is applied when calculating the distance between parameters in NAT processing, so that when this bias value is negative, the stationary part (quasi-stationary part) ) can be obtained by extracting features only from the transient part.On the other hand, when the bias value is positive, there is no extreme time axis normalization for the stationary part, and this quasi-stationary part It becomes possible to extract the features of

[Brief explanation of drawings]

第１図はこの発明装置の一実施例のブロック図、第２図
はその要部の動作の説明のためのフローチャートを示す
図、第３図はこの発明の要部の動作を説明するための図
、第４図は音声認識装置の基本構成を示すブロック図、
第５図〜第７図はＮＡＴ処理を説明するための図である
。（２）は合口分析回路、（４）は標準パターンメモリ、
（６）は標準パターンと人カバターンとの距離算出回路
、（７）は最小値判定回路、（９）はＮＡＴ処理回路、
（９１）はパラメータ間距離算出回路、（９２）はバイ
アス付与回路、（９５）は補間点抽出回路である。桶°間点抽出のフローチャート第２図ノくイ７ス付与動作の一イ列の況明図第３図音声誌ＩＲの基本的構成のブロック閃第４図パラメータ空間１：Ｅ＜涜、列を示す図ＷＪｓ図パラメータ空間亀；描＜Ｓ跡ｎ例を示旧」第６図ＬＬ線遅似１：Ｊ６推定した車７Ｌ跡０イ列ε本す図第
７図FIG. 1 is a block diagram of an embodiment of the device of this invention, FIG. 2 is a flowchart for explaining the operation of the main part thereof, and FIG. 3 is a diagram showing the operation of the main part of the invention. 4 is a block diagram showing the basic configuration of the speech recognition device,
FIGS. 5 to 7 are diagrams for explaining NAT processing. (2) is a joint analysis circuit, (4) is a standard pattern memory,
(6) is a distance calculation circuit between the standard pattern and the human cover turn, (7) is a minimum value judgment circuit, (9) is a NAT processing circuit,
(91) is an inter-parameter distance calculation circuit, (92) is a bias applying circuit, and (95) is an interpolation point extraction circuit. Flowchart for extracting points between buckets Figure 2 A diagram of one sequence of operations for adding a space Figure 3 A block diagram of the basic configuration of an audio magazine IR Figure 4 Parameter space 1: E Diagram showing WJs diagram Parameter space turtle; Draw<S trace n example shown in Figure 6 LL line delay 1: J6 Estimated car 7L trace 0 I column ε Figure 7

Claims

[Scope of Claims] (a) Acoustic analysis means for obtaining an acoustic parameter sequence of an input audio signal; (b) A parameter for calculating the distance between adjacent parameters in the chronological direction of the acoustic parameter sequence from this acoustic analysis means. (c) bias applying means for applying a bias to each distance calculated by the inter-parameter distance calculating means; and (d) the acoustic analysis means based on the distance between each parameter to which the bias has been applied. (e) a standard pattern in which a recognition parameter sequence of a standard pattern of a recognition target word is stored; (f) a distance calculation means for calculating a difference between the recognition parameter series of the input pattern and the recognition parameter series of the standard pattern read from the standard pattern memory; (g) a value calculated by the distance calculation means; A speech recognition device comprising minimum value determining means for obtaining a recognition output by detecting the minimum standard pattern of words.