JPH0668677B2

JPH0668677B2 - Speech recognition method and apparatus using vector division quantization

Info

Publication number: JPH0668677B2
Application number: JP59093572A
Authority: JP
Inventors: 聖一中川
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-05-10
Filing date: 1984-05-10
Publication date: 1994-08-31
Anticipated expiration: 2009-08-31
Also published as: JPS60237497A

Description

【発明の詳細な説明】技術分野本発明は、ベクトル分割量子化を用いた音声認識方法及
びその装置に関する。TECHNICAL FIELD The present invention relates to a speech recognition method and apparatus using vector division quantization.

従来技術第１図は、音声認識装置の基本回路図で、図中、１はマ
イクロホン、２は分析部、３は切り換えスイッチ、４は
標準パターン部、５は入力音声パターン部、６は距離計
算部、７は最小値検出部、８は認識結果部で、距離計算
部６及び最小値検出部７でパターンマッチング部を形成
している。第１図において、ます、マイクロホン１から
入つてくる音声を分析してその音声パターンの特徴を認
識するパターンを抽出する。特定話者用のシステムで
は、認識する前に、前もつてその話者の各認識対象単語
の分析結果を標準パターンとして登録しておき、認識す
る時には、各認識対像単語の標準パターンと入力音声パ
ターンのパラメータを比較して、最も近い即ち距離の小
さい認識対象単語を選択する。なお、不特定話者の場合
には、個人差を吸収できる標準パターンを使用する。2. Description of the Related Art FIG. 1 is a basic circuit diagram of a voice recognition device. In the figure, 1 is a microphone, 2 is an analysis unit, 3 is a changeover switch, 4 is a standard pattern unit, 5 is an input voice pattern unit, and 6 is distance calculation. Reference numeral 7 denotes a minimum value detecting portion, 8 denotes a recognition result portion, and the distance calculating portion 6 and the minimum value detecting portion 7 form a pattern matching portion. In FIG. 1, the voice coming from the microphone 1 is analyzed to extract a pattern for recognizing the features of the voice pattern. In the system for a specific speaker, the analysis result of each recognition target word of the speaker is registered as a standard pattern before recognition, and at the time of recognition, the standard pattern of each recognition image word is input. The parameters of the voice patterns are compared to select the closest recognition target word, that is, the word having the smallest distance. In the case of an unspecified speaker, a standard pattern that can absorb individual differences is used.

第２図は、帯域通過フィルタ群（BPF）を使用した分析
法の一例を示す図で、同図は、「３」（／san／）とい
う音声を16チャンネルの帯域通過フィルタ群（全帯域は
200〜600HZ）で分析（BPF分析）したスペクトラムパタ
ーンの時間変化図である。時間軸の一単位は18msで、あ
る時刻で断面をとると、それがその時刻でのスペクトラ
ムになつており、実際の認識処理は、すべてデジタル処
理となり、ある時刻ｉでの横一列のスペクトラムの強度
値を特徴ベクトルai（＝ai₁ai₂ai₃…ai₈…ai₁₆）とし、
入力音声パターン（ここでは「３」の音声パターン）は
Ａ＝a₁a₂…ai…aI（Ｉ＝32）となる。FIG. 2 is a diagram showing an example of an analysis method using a band pass filter group (BPF). In the figure, the voice "3" (/ san /) is transmitted through a 16 channel band pass filter group (all bands are
It is a time change figure of the spectrum pattern analyzed (BPF analysis) by 200-600HZ). One unit of the time axis is 18 ms, and if you take a cross section at a certain time, it becomes the spectrum at that time, and the actual recognition process is all digital processing, and the spectrum of a horizontal row at a certain time i Let the intensity value be the feature vector ai (= ai ₁ ai ₂ ai ₃ … ai ₈ … ai ₁₆ ),
The input voice pattern (here, the voice pattern "3") is A = a ₁ a ₂ ... Ai ... aI (I = 32).

従つて、音声パターンは次のように表現される。Therefore, the voice pattern is expressed as follows.

Ａ＝a₁a₂…ai…aI …（１） aiは時刻ｉにおける音声の特徴を表す量で、一般にはベ
クトル値であり、Ａはこの特徴ベクトルai〔ｉ＝１〜32
（Ｉ＝32の場合）〕の時系列になり、Ｉは音声パターン
Ａの長さに相当する。A = a ₁ a ₂ ... Ai ... aI (1) ai is a quantity representing the feature of the voice at time i, and is generally a vector value, and A is the feature vector ai [i = 1 to 32.
(When I = 32)], and I corresponds to the length of the voice pattern A.

また、ベクトルaiを特徴ベクトルと呼び、 ai＝（ai₁,ai₂…aiq…aiQ） …（２）で表わす。Ｑはベクトルの次数で、第２図の例では帯過
帯域フイルタ群のチャンネる数16に相当する。The vector ai is called a feature vector and is represented by ai = (ai ₁ , ai ₂ ... Aiq ... aiQ) (2). Q is the order of the vector and corresponds to the channel number 16 of the bandpass band filter group in the example of FIG.

同様に単語ｎの標準パターンをBnとし、 Bn＝b₁nbn₂…bjn…bJn …（３）で表わす。この時、bjnは単語ｎの標準パターンの時刻
ｊにおける特徴ベクトルで、前記入力パターンＡの特徴
ベクトルaiと同次数である。また、Jnは単語ｎの標準パ
ターンの長さを表わし、ｎは単語名を示す通し番号で、
Ｎ単語の認識単語セットを考えてΣとすると、 Σ＝｛n|n＝1,2…Ｎ｝ ……（４）となる。ただし、特定の単語を指定する必要がない場合
は添え字ｎを省略して、Ｂ＝b₁b₂…bj…bJ ……（５） bj＝（bj₁,bj₂,…bj₈…bjQ） …（６）となる。Similarly, the standard pattern of the word n is Bn, and Bn = b ₁ nbn ₂ ... Bjn ... BJn ... (3). At this time, bjn is a feature vector of the standard pattern of the word n at time j, and has the same degree as the feature vector ai of the input pattern A. Jn represents the length of the standard pattern of word n, and n is a serial number indicating the word name,
Considering the recognition word set of N words and assuming Σ, Σ = {n | n = 1,2 ... N} (4) However, when it is not necessary to specify a specific word, the subscript n is omitted, and B = b ₁ b ₂ … bj… bJ …… (5) bj ＝ (bj ₁ , bj ₂ ,… bj ₈ … bjQ ) ... (6)

音声認識処理では、入力パターンＡについて認識単語セ
ットのすべての単語の標準パターンBnを時間正視化しな
がらパターンマッチングし、Ｎ単語の中から最も入力パ
ターンＡに近い単語ｎを探し出す。In the voice recognition process, the standard pattern Bn of all the words in the recognition word set for the input pattern A is subjected to pattern matching while being time-sighted, and the word n closest to the input pattern A is searched from the N words.

第３図は、時間正視化のための写像モデルで、これは、
前記例で言えば「３」という単語の標準パターンＢを写
像関数によつて入力パターンの時間軸に揃えるもので、
通常、前記写像関数を、ｊ＝ｊ（ｉ） ……（７）で表現し、これを歪関数と呼んでいる。FIG. 3 shows a mapping model for time emmetropization.
In the above example, the standard pattern B of the word "3" is aligned with the time axis of the input pattern by the mapping function.
Usually, the mapping function is expressed by j = j (i) (7), which is called a distortion function.

この歪関数が既知であれば、標準パターンＢの時間軸を
第（７）式によつて変換して入力パターンＡの時間軸ｉ
に揃えることができるが、実際には、この歪関数は未知
であり、そのため、一方のパターンを人工的に歪ませて
他方のパターンに最も類似するようにしてすなわち距離
を最小にして最適な歪関数を定めるようにしている。If this distortion function is known, the time axis of the standard pattern B is converted by the equation (7) to obtain the time axis i of the input pattern A.
However, in practice, this distortion function is unknown, so that one pattern is artificially distorted so that it is most similar to the other, i.e., the distance is minimized and the optimal distortion is obtained. I try to define the function.

第４図は、上記原理を実行するためのDP（Dynamic Prog
ramming;動的計画法）マッチング法の一例を説明するた
めの図で、今、標準パターンＢの時間軸を歪ます関数と
して歪関数ｊ（ｉ）を考えると、この歪関数ｊ（ｉ）に
よつてパターンＢは次のようなパターンＢ′に変換され
る。Figure 4 shows DP (Dynamic Prog) for implementing the above principle.
ramming; dynamic programming) A diagram for explaining an example of a matching method. Now, when the distortion function j (i) is considered as a function that distorts the time axis of the standard pattern B, this distortion function j (i) Therefore, the pattern B is converted into the following pattern B '.

Ｂ′＝Bj（_１）bj（_２）…bj（ｉ）…bj（Ｉ）…（８）上記歪関数は、実際の音声パターンの時間歪現像を考慮
して、例えば、（イ）、ｊ（ｉ）は（近似的に）単調増加関数，（ロ）、ｊ（ｉ）は（近似的に）連続関数，（ハ）、ｊ（ｉ）はｉの近傍の値をとる，等の条件を加えるが、これらの条件を満たす歪関数はほ
とんど無限に存在するが、その中で、Ｂ′が入力パター
ンＡに最も類似するすなわち距離が最も小さくなるよう
な歪関数ｊ（ｉ）を定める。このためには、まず、標準
パターンＢの時間軸を歪関数ｊ（ｉ）で入力パターンＡ
のｉ軸上に写像してパターンＢ′を得るが、この時、パ
ターンＡとパターンＢ′の距離を最小にするような歪関
数ｊ（ｉ）が最適な歪関数である。この入力パターンＡ
と写像パターンＢ′の距離は、で表わされる。ここで、‖ ‖は２つのベクトルの距
離を示す。そして、上記（９）式の距離の最小化問題
は、で定義される。一般に、Ｄ（A,B）を時間正視化距離又
はパターン間距離と呼び、ｄ（i,j）はベクトルaiとbj
との距離で、通常、ベクトル間距離と呼んでいる。B ′ = Bj ( ₁ ) bj ( ₂ ) ... bj (i) ... bj (I) ... (8) The distortion function is, for example, (a), j in consideration of the time distortion development of the actual voice pattern. (I) is (approximately) a monotonically increasing function, (b), j (i) is (approximately) a continuous function, (c), j (i) is a value in the vicinity of i, and so on. In addition, there are almost infinite distortion functions that satisfy these conditions. Among them, the distortion function j (i) is determined so that B ′ is most similar to the input pattern A, that is, the distance is the smallest. For this purpose, first, the time axis of the standard pattern B is set to the distortion pattern j (i) in the input pattern A
The pattern B'is obtained by mapping on the i-axis of the above. At this time, the distortion function j (i) that minimizes the distance between the pattern A and the pattern B'is the optimum distortion function. This input pattern A
And the mapping pattern B ′ is It is represented by. Here, ‖ ‖ indicates the distance between two vectors. Then, the problem of minimizing the distance in the above equation (9) is Is defined by Generally, D (A, B) is called the time-normalized distance or the distance between patterns, and d (i, j) is the vector ai and bj.
It is the distance between and and is usually called the vector distance.

第５図は、第４図に示した（i,j）平面を抽象化して格
子状表面にし、各格子点についてその座標（i,j）に対
応するベクトル間距離ｄ（i,j）を求めるようにしたも
ので、前記第（10）式をこの平面上で考えると、（1,
1）から始めて（I,J）に至る最適な経路（パス）を探し
ていくことになるが、この場合、ｉ−１の状態からｉの
状態へ移るパスは図示の通り３通りに制限することが多
い。なお、整合窓は極端な時間歪を起こさないようにす
るためのもので、該整合窓によって時間正視化に関する
前記３つの条件（イ）〜（ハ）を満たしている。ここ
で、今、ｉ＝1,2…Ｉのそれぞれのｉにおいて、次にど
の状態のｊに移るべきかの制御を最適に行い、第（10）
式の評価関数を最小にする場合を考えると、初期条件は、ｇ（1,1）＝ｄ（1,1） ……（12）漸化式は、パターン間距離は、Ｄ（A,B）＝ｇ（I,J） ……（14）となり、前記（13）式の計算は、第５図の格子点を（i,
j）の増加する方向にたどって行うことになる。すなわ
ち、ｇ（i,j）は（1,1）点から（i,j）点に至るまでの
距離和を最小にしたもので、第（13）式は、第（ｉ−
１）段のj,（ｊ−１），（ｊ−２）についてすでに求ま
つているｇ（ｉ−1,j）,g（ｉ−1,j−１）,g（ｉ−1,j
−２）を基に、第ｉ段の状態ｊにおけるｇ（i,j）を求
めるものである。In FIG. 5, the (i, j) plane shown in FIG. 4 is abstracted into a grid surface, and for each grid point, the inter-vector distance d (i, j) corresponding to its coordinates (i, j) is calculated. If the above equation (10) is considered on this plane, (1,
Starting from 1), we will search for the optimal path (path) to (I, J). In this case, the paths that move from the state of i-1 to the state of i-1 are limited to three as shown in the figure. Often. Note that the matching window is for preventing extreme time distortion, and the matching window satisfies the above three conditions (a) to (c) regarding the time-sequentialization. Now, for each i of i = 1, 2 ... I, optimal control is performed as to which state j should be moved to next, and the
Considering the case where the evaluation function of the formula is minimized, the initial condition is g (1,1) = d (1,1) …… (12) The recurrence formula is The distance between the patterns is D (A, B) = g (I, J) (14), and the calculation of the above equation (13) is performed using the grid points (i,
It will be done in the increasing direction of j). That is, g (i, j) is the minimum sum of distances from the (1,1) point to the (i, j) point.
1) g (i−1, j), g (i−1, j−1), g (i−1, j) already obtained for j, (j−1), and (j−2) in stage
-2), g (i, j) in the state j of the i-th stage is obtained.

第６図、上述DPマツチング処理を実行するプロセッサの
ブロツク線図で、図中、11はＡメモリ、12はＢメモリ、
13はｄ（i,j）計算部、14はｇ（i,j）計算部、15はＧ
（ｊ）メモリ、16は制御部で、ｄ（i,j）計算部13でai
とbiのベクトル間距離を計算し、ｇ（i,j）計算部14で
（i,j）に至る最短距離ｇ（i,j）を算出し、これらを並
行処理する。ｇ（i,j）;j＝Ｉ〜Ｊを計算する時はＧ
（ｊ）メモリ15にｇ（ｉ−1,j）;j＝１〜Ｊが入つてい
る。また、minはg₁とg₂に小さい方を検出し、小さい方
の値をｇに入れる。FIG. 6 is a block diagram of a processor that executes the above-mentioned DP matching processing. In the figure, 11 is A memory, 12 is B memory,
13 is a d (i, j) calculator, 14 is a g (i, j) calculator, and 15 is a G
(J) memory, 16 is a control unit, and a (i, j) calculation unit 13 is ai
And the distance between vectors of bi is calculated, and the shortest distance g (i, j) to reach (i, j) is calculated by the g (i, j) calculation unit 14, and these are processed in parallel. g (i, j); j = G when calculating I to J
(J) The memory 15 contains g (i-1, j); j = 1 to J. Also, for min, the smaller _one of g ₁ and g ₂ is detected, and the smaller one is put into g.

而して、上記DPマッチング法による時は、第（13）式の
１項から明らかなように、整合窓を設けないものとすれ
ば、少なくともＩ×Ｊ×Ｎ（ただしＮは登録単語数）回
の計算を必要とする。When the DP matching method is used, at least I × J × N (where N is the number of registered words) if no matching window is provided, as is clear from item (1) of equation (13). Requires calculation of times.

上記DP法による距離計算量を削減するために擬音韻単位
をとるスプリツト法が提案されているが、このスプリツ
ト法は、入力音声のそれぞれのフレームの距離計算を予
め有限個（Ｋ個とする）の擬音韻（コードブツク）との
間だけで行つてマトリツクスの形で蓄えておき、DPマツ
チングの際には、単にマトリツクスを検索すればよいよ
うにして距離の計算量を減らしたものである。このスプ
リツト法でベクトル量子化が行われるのは、単語標準パ
ターンのみであり、入力音声に対してはベクトル量子化
は適用されていない。而して、このスプリツト法では、
入力音声の分析フレームと予め蓄えられた擬音韻（ベク
トル）との距離マトリツクスを作成するが、この距離マ
トリツクスは、横軸が入力音声のフレーム番号となり、
縦軸が擬音韻（ベクトル）番号となつており、この距離
マトリツクスを参照してベクトル番号系列として蓄えら
れている標準パターンと入力音声とのDPマツチングを行
う。In order to reduce the distance calculation amount by the DP method, a split method that takes an onomatopoeic unit has been proposed. In this split method, the distance calculation for each frame of the input speech is limited to a predetermined number (K). This is to reduce the amount of calculation of distance by making it possible to search only for the matrix in the case of DP matching, by storing only in the form of a matrix by going only to the onomatopoeia (chord book). Only the word standard pattern is subjected to vector quantization by this split method, and vector quantization is not applied to the input voice. Thus, in this split method,
A distance matrix between the analysis frame of the input voice and the pseudophony (vector) stored in advance is created. In this distance matrix, the horizontal axis is the frame number of the input voice,
The vertical axis is an onomatopoeic (vector) number, and DP matching between the standard pattern stored as a vector number series and the input voice is performed by referring to this distance matrix.

第７図は、スプリツト法に基づく、認識システムの一例
を示すブロツク図で、図中、20は入力部、21は分析部、
22はベクトル間距離テーブル、23は擬音韻標準パターン
（コードブツクともいう）、24は単語辞書記憶部、25は
DPマツチング部、26は単語同定部である。FIG. 7 is a block diagram showing an example of a recognition system based on the split method, in which 20 is an input unit, 21 is an analysis unit,
22 is a vector distance table, 23 is an onomatopoeia standard pattern (also called codebook), 24 is a word dictionary storage unit, and 25 is
The DP matching section 26 is a word identification section.

入力音声20を分析部21でスペクトル分析し、各フレーム
ごとに、前記擬音韻標準パターン23との距離を計算して
前記距離テーブル22を作成する。前記入力音声フレーム
と単語辞書24とのマツチングをDPマツチング25によつて
行ない最小距離パターンを有する単語を単語同定部26に
て認識結果として出力する。このスプリツト法によつて
ベクトル間距離の計算回数はＩ×Ｋとなりベクトル量子
化しない従来の方法（Ｉ×Ｊ×Ｋ）と比べと大幅に減少
する。The input speech 20 is spectrally analyzed by the analysis unit 21, and the distance to the onomatopoeia standard pattern 23 is calculated for each frame to create the distance table 22. The input speech frame and the word dictionary 24 are matched by the DP matching 25, and the word having the minimum distance pattern is output by the word identifying unit 26 as a recognition result. With this split method, the number of calculation of the inter-vector distance becomes I × K, which is greatly reduced as compared with the conventional method (I × J × K) where vector quantization is not performed.

目的本発明は、特徴ベクトルをベクトル量子化するスプリツ
ト法による音声認識方法及びその装置において、標準パ
ターン並びに入力パターンベクトルを分割することによ
つて、すなわち、例えば16要素からなる16次元ベクトル
を前半の８要素からなる８次元ベクトルと後半の８要素
からなる８次元ベクトルに分割することによって、パタ
ーンマツチングの際に必要な計算量をスプリツト法より
もさらに減少させ、もつて認識速度の向上を図ることを
目的としてなされたものである。The present invention aims at dividing a standard pattern and an input pattern vector by dividing a standard pattern and an input pattern vector in a voice recognition method and a device thereof by a split method for vector quantization of a feature vector, that is, a 16-dimensional vector consisting of 16 elements By dividing into an 8-dimensional vector consisting of 8 elements and an 8-dimensional vector consisting of the latter half of 8 elements, the amount of calculation required for pattern matching can be further reduced as compared with the split method, and the recognition speed can be improved. It was made for the purpose of achieving it.

構成本発明は、上記目的を達成するために、（１）特徴ベク
トルをベクトル量子化するスプリット法による音声認識
方法において、入力音声パターンベクトルを分割し、分
割された入力音声パターンベクトルと分割された擬音韻
標準パターンとの距離を計算して、各々のベクトル間距
離テーブルを作成し、標準パターンと入力音声パターン
とのDPマッチングを分割ベクトル単位で行うこと、或い
は、（２）特徴ベクトルをベクトル量子化するスプリッ
ト法による音声認識装置において、入力音声パターンベ
クトルを分割する分割部と、分割された入力音声パター
ンの分割ベクトルごとに擬音韻標準パターンとの距離を
計算して作成されたベクトル間距離テーブルと、分割さ
れた擬音韻標準パターンのベクトルナンバーシーケンス
から成る単語辞書と、前記複数のベクトル間距離テーブ
ルの距離テーブルを引用し、前記単語辞書とのマッチン
グを分割ベクトル単位で行う複数のDPマッチング部とを
有することを特徴としたものである。本発明の構成につ
いて、以下、一実施例に基づいて説明する。In order to achieve the above-mentioned object, the present invention (1) divides an input voice pattern vector in a voice recognition method by a split method in which a feature vector is vector-quantized, and divides the input voice pattern vector with the divided input voice pattern vector. The distance between the pseudophony standard pattern is calculated, an inter-vector distance table is created for each, and DP matching between the standard pattern and the input voice pattern is performed in divided vector units, or (2) the feature vector is set as a vector. In a voice recognition device using the splitting method for quantizing, an inter-vector distance created by calculating a distance between a dividing unit that divides an input voice pattern vector and a pseudophony standard pattern for each divided vector of the divided input voice patterns. A word consisting of a table and a vector number sequence of segmented onomatopoeic standard patterns It is characterized by having a dictionary and a plurality of DP matching units that refer to the distance tables of the plurality of inter-vector distance tables and perform matching with the word dictionary in units of divided vectors. The configuration of the present invention will be described below based on an embodiment.

第８図は、本発明の一実施例を説明するための構成図
で、ベクトル分割数を２にしたときのものであり、図
中、23aは２分割された一方の擬音韻標準パターン、23b
は他方の擬音韻標準パターン、22aは前記標準パターン2
3aに対応するベクトル間距離テーブル、22bは前記標準
パターン23bに対応するベクトル間距離テーブル、24は
２分割された前記23aと23bの擬音韻標準パターンのベク
トルナンバーシーケンスから構成される単語辞書記憶
部、25a,25bは各々前記22a,22bの距離テーブルを引用す
るDPマツチング部である。入力音声20を分析部21でスペ
クトル分析し、各入力フレームベクトルを２分割したも
のを各々前記標準パターン23a,23bとの距離を計算し、
前記距離テーブル22a,22bをそれぞれ作成する。前記入
力音声フレームと単語辞書24とのマツチングを前記分割
ベクトル単位に行ない、加算後、DPマツチング部25aお
よび25bにてマツチングを行ない。最小距離パターンを
有する単語を単語同定部26にて認識結果として出力す
る。FIG. 8 is a block diagram for explaining one embodiment of the present invention, in which the number of vector divisions is set to 2. In the figure, 23a is one of the two pseudophonic standard patterns, 23b.
Is the other onomatopoeia standard pattern, 22a is the standard pattern 2
An inter-vector distance table corresponding to 3a, 22b an inter-vector distance table corresponding to the standard pattern 23b, and 24 a word dictionary storage unit composed of the vector number sequence of the onomatopoeic standard pattern of 23a and 23b divided into two. , 25a, 25b are DP matching parts which refer to the distance tables of 22a, 22b, respectively. The input voice 20 is spectrum-analyzed by the analysis unit 21, and each input frame vector is divided into two to calculate the distances from the standard patterns 23a and 23b.
The distance tables 22a and 22b are created respectively. The input voice frame and the word dictionary 24 are matched in units of the division vectors, and after addition, the DP matching units 25a and 25b perform matching. The word having the minimum distance pattern is output by the word identifying unit 26 as a recognition result.

効果以上の説明から、本発明によると、ベクトル分割を行な
うことにより、擬音韻標準パターンの大きさを削減で
き、したがつて、ベクトル間距離の計算量を従来のスプ
リツト法に比べてさらに減少させ、認識速度の向上を図
ることができる。すなわち、スプリット法による擬音韻
パターン数をｋとすれば、例えばベクトル分割数を２と
した場合は、擬音韻パターンは最善の場合になる。又、分割されたベクトルの次元数は１／２にな
るので、距離の計算量は全体でになる。例えば、ｋ＝256とすれば、計算量は１／16に
なる。このように、ベクトル間距離の計算量を減少させ
ることができる。Effect From the above description, according to the present invention, by performing vector division, the size of the onomatopoeia standard pattern can be reduced, and therefore the amount of calculation of the distance between vectors is further reduced as compared with the conventional split method. Therefore, the recognition speed can be improved. That is, if the number of onomatopoeia patterns by the split method is k, for example, if the number of vector divisions is 2, the onomatopoeia pattern is the best case. become. Also, since the number of dimensions of the divided vector becomes 1/2, the calculation amount of distance is become. For example, if k = 256, the calculation amount becomes 1/16. In this way, the amount of calculation of the inter-vector distance can be reduced.

[Brief description of drawings]

第１図は、音声認識装置の基本構成図、第２図は、音声
分析の一例を示す図、第３図は、時間正視化のための写
像モデル、第４図は、歪関数による時間正視化図、第５
図は、時間正視化を行うための格子状平面図、第６図
は、DPマツチング処理を行うプロセツサのブロツク線
図、第７図は、スプリツト法の一例を説明するためのブ
ロツク図、第８図は、本発明による音声認識装置の一実
施例を説明するための構成図である。 20……入力部、21……分析部、22,22a,22b……ベクトル
間距離テーブル、23,23a,23b……擬音韻標準パターン、
24……単語辞書記憶部、25,25a,25b……DPマツチング
部、26……単語同定部。FIG. 1 is a basic configuration diagram of a voice recognition device, FIG. 2 is a diagram showing an example of voice analysis, FIG. 3 is a mapping model for temporal emmetropization, and FIG. Figure 5, No.
FIG. 6 is a grid-like plan view for time-sequentialization, FIG. 6 is a block diagram of a processor for performing DP matching processing, FIG. 7 is a block diagram for explaining an example of the split method, and FIG. FIG. 1 is a block diagram for explaining an embodiment of a voice recognition device according to the present invention. 20 …… input part, 21 …… analyzing part, 22,22a, 22b …… distance table between vectors, 23,23a, 23b …… pseudophonic standard pattern,
24 …… Word dictionary storage section, 25,25a, 25b …… DP matching section, 26 …… Word identification section.

Claims

[Claims]

1. A speech recognition method using a split method for vector-quantizing a feature vector, dividing an input speech pattern vector, and calculating a distance between the divided input speech pattern vector and the divided pseudophony standard pattern. , A voice recognition method using vector division quantization, characterized in that a distance table between vectors is created and DP matching between a standard pattern and an input voice pattern is performed for each division vector.

2. A voice recognition device using a split method for vector-quantizing a feature vector, wherein a distance between a division unit that divides an input voice pattern vector and a pseudophony standard pattern is divided for each of the divided input voice pattern division vectors. An inter-vector distance table created by calculation, a word dictionary consisting of vector number sequences of divided onomatopoeia standard patterns, a distance table of the plurality of inter-vector distance tables is cited, and matching with the word dictionary is performed. A voice recognition device using vector division quantization, comprising: a plurality of DP matching units that perform division vector units.