JPS60237496A

JPS60237496A - Voice recognition equipment

Info

Publication number: JPS60237496A
Application number: JP59093571A
Authority: JP
Inventors: 中川　聖一
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-05-10
Filing date: 1984-05-10
Publication date: 1985-11-26
Anticipated expiration: 2009-05-02
Also published as: JPH0634186B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】辣］Ｌ年野□ 本発明は、ベクトル鬼子化を用いた人語穴音声認識装置
に関する。[Detailed Description of the Invention] 辣]L年ノ□ The present invention relates to a human speech recognition device using vector demonization.

一従来技蝦第１図は、音声認識装置の基本回路図で、図中、１はマ
イクロホン、２は分析部、３は切り換えスイッチ、４は
標準パターン部、５は入力音声パターン部、６は距離計
算部、７は最小値検出部、８は認！ｆｌｉｔ　ｉ：’ｉ
災部で２「［ｉ謡計笠部６及び最小値検出音５７でパタ
ーンマツチング部を形成している。第１図において、ま
ず、マイクロホン１から入ってくる音声を分析してその
音声パターンの特徴を認識するパターンを抽出する。特
定話者用のシステムでは、認識する前に、前もってその
話者の各認識対象単語の分析結果を標準パターンとして
登録しておき、認識する時には、各認識対像単語の標準
パターンと入力音声パターンのパラメータを比較して、
最も近い即ち距離の小さい認識対象単語を選択する。な
お、不特定話者の場合には、個人差を吸収できる標準パ
ターンを使用する。1. Prior Art Figure 1 is a basic circuit diagram of a speech recognition device. In the figure, 1 is a microphone, 2 is an analysis section, 3 is a changeover switch, 4 is a standard pattern section, 5 is an input speech pattern section, and 6 is a basic circuit diagram of a speech recognition device. Distance calculation section, 7 is minimum value detection section, 8 is recognition! flit i:'i
In the disaster area, the pattern matching part is formed by the 2-point scale part 6 and the minimum value detection sound 57. In Fig. 1, first, the sound coming from the microphone 1 is analyzed and its sound pattern is In a system for a specific speaker, before recognition, the analysis results of each word to be recognized by that speaker are registered as standard patterns. Compare the parameters of the standard pattern of the target word and the input speech pattern,
Select the closest word to be recognized, that is, the word with the smallest distance. Note that in the case of unspecified speakers, a standard pattern that can absorb individual differences is used.

第２図は、帯域通過フィルタ群（Ｂ　Ｐ　Ｆ）を使用し
た分析法の一例を示す図で、同図は、「３」（／ｓａｎ
／）という音声を１６チヤンネルの帯域通過フィルタ群
（全帯域は２００〜６０００Ｈ２）で分析（Ｂ　Ｐ　Ｆ
分析）したスペクトラムパターンの時間変化図である。FIG. 2 is a diagram showing an example of an analysis method using a group of bandpass filters (B P F).
/) is analyzed using a group of 16-channel band-pass filters (all bands are 200 to 6000H2) (B P F
It is a time change diagram of the analyzed spectrum pattern.

時間軸の一単位は１８ｍ５で、ある時刻で断面をとると
、それがその時刻でのスペクトラムになっており、実際
の認識処理は、すべてデジタル処理となり、ある時刻ｉ
での横一列のスペクトラムの強度値を特徴ベクトルａｉ
（＝ａｉ１ａｉ２　ａＩ３−　ａｉ８″°°ａｉ＋６）
とし、入力音声パターン（ここでは「３」の音声パター
ン）はＡ＝ａ１　ａ２−ａｊ−ａＩ　（Ｉ　＝３２）と
なる。One unit of the time axis is 18 m5, and if you take a cross section at a certain time, it becomes the spectrum at that time.Actual recognition processing is all digital processing, and at a certain time i
The intensity values of the spectrum in a horizontal row are expressed as a feature vector ai
(=ai1ai2 aI3- ai8″°°ai+6)
Then, the input voice pattern (here, the voice pattern of "3") is A=a1 a2-aj-aI (I=32).

従って、音声パターンは次のように表現される。Therefore, the voice pattern can be expressed as follows.

Ａ＝ａ１　ａ２−ａｉ−ａＩ　・・・・（１）ａｉは時
刻ｉにおける音声の特徴を表す量で、一般にはベクトル
値であり、Ａはこの特徴ベクトルａ　ｉ　（ｉ　＝　１
−３２　（Ｉ　＝　３２の場合）〕の時系列になり、■
は音声パターンＡの長さに相当する。A=a1 a2-ai-aI (1) ai is a quantity representing the feature of the voice at time i, and is generally a vector value, and A is this feature vector a i (i = 1
−32 (when I = 32)], and ■
corresponds to the length of voice pattern A.

また、ベグ１−ルａｉを特徴ベクトルと呼び、ａ　ｉ　
＝　（ａｉ４　、　ａＩ２−ａｉｑ　−ａｉｄ）　＝（
２）で表わす。Ｑはベクトルの次数で、第２図の例では
帯過帯域フィルタ群のチャンネる数１６に相当する。Also, Veg1-ru ai is called a feature vector, and a i
= (ai4, aI2-aiq-aid) =(
2). Q is the order of the vector, which in the example of FIG. 2 corresponds to the number of channels of the bandpass filter group, 16.

同様に単語ｎの標準パターンをＢｎとし。Similarly, let Bn be the standard pattern for word n.

Ｂｎ＝　ｂｌ　’　ｂ２　ｎ・＝ｂｊｎ−ｂＪｎ’　−
（３）で表わす。この時、ｂＪｎは単語ｎの標準パター
ンの時刻ｊにおける特徴ベクトルで、前記入カバターン
Ａの特徴ベクトルａｉと同次数である。また、Ｊｎは単
語ｎの標準パターンの長さを表わし、ｎは単語名を示す
通し番号で、Ｎ単語の認識単語セットを考えてΣとする
と、 Σ＝（ｎｌｎ＝１．２−Ｎ）　・・−・（４）となる。Bn=bl' b2 n・=bjn-bJn'-
It is expressed as (3). At this time, bJn is a feature vector of the standard pattern of word n at time j, and has the same degree as the feature vector ai of the input cover turn A. Also, Jn represents the length of the standard pattern of word n, n is a serial number indicating the word name, and assuming a recognition word set of N words and Σ, Σ=(nln=1.2-N)... −・(4).

ただし、特定の単語を指定する必要がない場合は添え字
ｎを省略して、Ｂ＝ｂ、ｂ２・・・ｂｊ・・・ｂＪ　・・・・・・（５
）ｔ）Ｊ＝　（ｂｊｉ　ｒ　ｔ）Ｊ２１・・・ｂｊ８・
・・ｂ　ｊＱ）　・・・（６）となる。However, if there is no need to specify a specific word, omit the subscript n and write B=b, b2...bj...bJ (5
) t) J= (bji r t) J21...bj8.
...b jQ) ...(6).

音声認識処理では、入カバターンＡについて認識単語セ
ットのすべての単語の標準パターンＢｎを時間正規化し
ながらパターンマツチングし、Ｎ単語の中から最も入カ
バターンＡに近い単語ｎを探し出す。In the speech recognition process, standard patterns Bn of all words in the recognized word set are time-normalized and pattern matched for the input cover turn A, and a word n closest to the input cover turn A is searched out of N words.

第３図は、時間正規化のための写像モデルで、これは、
前記例で言えば「３」という単語の標準パターンＢを写
像関数によって入カバターンの時間軸に揃えるもので、
通常、前記写像関数を、ｊ＝ｊ（ｉ）　・・・・・・（
７）で表現し、これを歪関数と呼んでいる。Figure 3 shows a mapping model for time normalization, which is
In the above example, the standard pattern B of the word "3" is aligned with the time axis of the input pattern using a mapping function,
Usually, the mapping function is expressed as j=j(i)...(
7) This is called the distortion function.

この歪関数が既知であれば、標準パターンＢの時間軸を
第（７）式によって変換して入カバターンＡの時間軸ｉ
に揃えることができるが、実際には、この歪関数は未知
であり、そのため、一方のパターンを人工的に歪ませて
他方のパターンに最も類似するようにしてすなわち距離
を最小にして最適な歪関数を定めるようにしている。If this distortion function is known, the time axis of the standard pattern B is converted by equation (7) to calculate the time axis i of the input pattern A.
However, in reality, this distortion function is unknown, so one pattern is artificially distorted to be most similar to the other pattern, i.e. the distance is minimized to obtain the optimal distortion. I am trying to define a function.

第４図は、上記原理を実行するためのＤＰマツチング法
の一例を説明するための図で、今、標準パターンＢの時
間軸を歪まず関数として歪関数ｊ（ｉ）を考えると、こ
の歪関数ｊ（ｉ）によってパターンＢは次のようなパタ
ーンＢ′に変換される。FIG. 4 is a diagram for explaining an example of the DP matching method for carrying out the above principle. If we now consider the distortion function j(i) as a function of the time axis of standard pattern B without distortion, this distortion Pattern B is converted into the following pattern B' by function j(i).

Ｂ’　＝ｂｊ（＋　）　ｂｊ（ｚ　）・・・ｂｊ（ニル
・・ｂ　ｊ　（Ｉ）　・・（８）上、仔歪関数には、実
際の音声パターンの時間専現像を考慮して、例えば、（イ）、ｊ　（ｉ）は（近似的に）単調増加関数。B' = bj (+) bj (z)...bj(nil...b j (I)...(8) Above, the child distortion function is given by considering the time-only development of the actual speech pattern. For example, (i), j (i) is (approximately) a monotonically increasing function.

（ロ）、ｊ（１）は（近似的に）連続関数。(b), j(1) is (approximately) a continuous function.

（ハ）、ｊ（１）は１の近傍の値をとる。(c), j(1) takes a value near 1.

等の条件を加えるが、これらの条件を満たす歪関数はほ
とんど無限に存在するが、その中で、Ｂ′が入カバター
ンＡに最も類似するすなわち距離が最も小さくなるよう
な歪関数ｊ　（ｉ）を定める。このためには、まず、標
準パターンＢの時間軸を歪関数ｊ（１）で入カバターン
Ａのｉ軸上に写像してパターンＢ′を得るが、この時、
パターンＡとパターンＢ′の距離を最小にするような歪
関数ｊ　（ｉ）が最適な歪関数である。この入カバター
ンＡと写像パターンＢ′の距離は、 Σ　Ｉｆ　ａｉ　−ｂｊ（ｉ）　Ｉｆ　−−（９）１＝
１で表わされる。ここで、ＩＩ　ＩＩは２つのベクトルの
距離を示す。そして、上記（９）式の距離の最小化問題
は、ｊ＝ｊ（ｉ）　ｉ＝１　ｑ＝１で定義される。一般に、Ｄ　（Ａ、Ｂ）を時間正規化距
離又はパターン間距離と呼び、ｄ　（ｉ＋　Ｊ）はベク
トルａｉとｂｊとの距離で、通常、ベクトル間距離と呼
んでいる。There are almost infinite number of distortion functions that satisfy these conditions, but among them, the distortion function j (i) for which B' is most similar to input pattern A, that is, the distance is the smallest. Establish. To do this, first, the time axis of standard pattern B is mapped onto the i-axis of input cover pattern A using distortion function j(1) to obtain pattern B', but at this time,
The optimal distortion function is the distortion function j (i) that minimizes the distance between pattern A and pattern B'. The distance between the input cover pattern A and the mapping pattern B' is Σ If ai −bj(i) If −−(9)1=
It is expressed as 1. Here, II II indicates the distance between the two vectors. The distance minimization problem in equation (9) above is defined as j=j(i) i=1 q=1. Generally, D (A, B) is called a time-normalized distance or inter-pattern distance, and d (i+J) is the distance between vectors ai and bj, which is usually called inter-vector distance.

第５図は、第４図に示した（ｉ、ｊ）平面を抽象化して
格子状平面にし、各格子点についてその座標（ｉ、ｊ）
に対応するベクトル量比Ｗ＆ｄ（ｉ。Figure 5 abstracts the (i, j) plane shown in Figure 4 into a lattice plane, and calculates the coordinates (i, j) of each lattice point.
The vector quantity ratio W&d(i.

ｊ）をめるようにしたもので、前記第（１０）式をこの
平面上で考えると、（１，１）から始めて（Ｉ、Ｊ）に
至る最適な経路（パス）を探していくことになるが、こ
の場合、ｉ−１の状態からｉの状態へ移るパスは図示の
通り３通りに制限されることが多い。なお、整合窓は極
端な時間歪を起こさないようにするためのもので、該整
合窓になって時間正規化に関する前記３つの条件（イ）
〜（ハ）の満たして〜ｊする。ここで、今、ｉ＝１゜２
・・■のそれぞれのｉにおいて、次にどの状態のｊに移
るべきかの制御を最適に行い、第（１０）式の評価関数
を最小にする場合を考えると、初期条件は、ｇ　（１，１）＝ｄ　（１，１）　・・・・・（１２）
漸化式は、Ｄ　（Ａ、Ｂ）＝ｇ　（Ｉ、Ｊ）　・・・・・・（１４
）となり、前１ｓＨ３）式の計算は、第５図の格子点を
（ｉ、ｊ）の増加するたどって行うことになる。すなわ
ち、ｇ（ｉ、ｊ）は（１，１）点から（ｉ、ｊ）点に至
るまでの距離和を最小にしたもので、第（１３）式は、
第（ｉ−１）段のｊ。j), and if we consider the above equation (10) on this plane, we can start from (1, 1) and search for the optimal route (path) to (I, J). However, in this case, the path from state i-1 to state i is often limited to three as shown in the figure. Note that the matching window is used to prevent extreme time distortion, and the above three conditions (a) regarding time normalization are met when the matching window is used.
Satisfy ~(c) and ~j. Here, now i=1゜2
...Considering the case where the evaluation function of equation (10) is minimized by optimally controlling which state j to move to for each i in ■, the initial condition is g (1 ,1)=d(1,1)...(12)
The recurrence formula is D (A, B) = g (I, J) (14
), and the calculation of the previous equation 1sH3) is performed by tracing the lattice points in FIG. 5 as (i, j) increases. In other words, g(i, j) is the minimum sum of distances from point (1, 1) to point (i, j), and equation (13) is
j of the (i-1)th stage.

（ｊ−１）　、（ｊ−２）についてすでにまっているｇ
（ｌ　１．ｊ）＋ｇ（ｉ　１ｙ　ｊ　１）＋ｇ（１１＋
ｊ　２）を基に、第１段の状態ｊにおけるｇ　（ｉ＋　
ｊ）をめるものである。g that has already been determined for (j-1) and (j-2)
(l 1.j)+g(i 1y j 1)+g(11+
j 2), g (i+
j).

第６図は、上述ＤＰマツチング処理を実行するプロセッ
サのブロック線図で、図中、１１はΔメモリ、１２はＢ
メモリ、１３はｄ（１＋　Ｊ）計算部、１４はｇ（ｌ、
ｊ）計算部４１５はＧ（ｊ）メモリ、１６は制御部で、
ｄ　Ｎ＋　ｊ）計算部１３でａｌとｂｉのベクトル間距
離を計算し、ｇ　（ｊｒｊ）計算部１４で（ｉ、ｊ）に
至る最短距離ｇ（ｉ、ｊ）を算出し、これらを並行処理
する。ｇ（ｉ、」）；ｊ＝Ｉ〜Ｊを計算する時はＧ　（
ｊ）メモリ１５にｇ　（］　１＋　ｊ）　；Ｊ　＝１〜
Ｊが入っている。また、ｍ　ｉ　ｎはｇｌとｇ２の小さ
い方を検出し、小さい方の値をｇに入れる。FIG. 6 is a block diagram of a processor that executes the above-mentioned DP matching process, in which 11 is a Δ memory, 12 is a B
memory, 13 is d(1+J) calculation unit, 14 is g(l,
j) The calculation unit 415 is a G(j) memory, 16 is a control unit,
d N+ j) The calculation unit 13 calculates the distance between the vectors al and bi, and the g (jrj) calculation unit 14 calculates the shortest distance g (i, j) to (i, j), and these are processed in parallel. do. g(i,''); When calculating j=I~J, use G (
j) In the memory 15, g (] 1+ j); J = 1~
It has a J in it. Moreover, min detects the smaller of gl and g2, and puts the smaller value into g.

而して、上記ＤＰマツチング法による時は、第（１３）
式の１項から明らかなように、整合窓を設けないものと
すれば、少なくともＩＸＪＸＮ（ただしＮは登録単語数
）回の計算を必要とする。Therefore, when using the above DP matching method, No. (13)
As is clear from the first term of the equation, if no matching window is provided, at least IXJXN calculations (where N is the number of registered words) are required.

上記ＤＰ法による距離計算量を削減するために。In order to reduce the distance calculation amount by the above DP method.

擬音韻単位をとるスプリット法が提案されているが、こ
のスピリット法は、入力音声のそれぞれのフレームの距
離計算を予め有限個（Ｋ個とする）の擬音韻（コードブ
ック）との間だ番づで行ってマトリックスの形で蓄えて
おき、ＤＰマツチングの際には、単にマトリックスを検
索すればよいようにして距離の計算量を減らしたもので
ある。このスプリット法でベクトル量子化が行われるの
は、単語標準パターンのみであり、入力音声に対しては
ベクトル量子化は適用されていない。而して、このスピ
リット法では、入力音声の分析フレームと予め蓄えらね
た擬音韻（ベクトル）との距離７１−リツクスを作成す
るが、この距離マトリックスは、横軸が入力音声のフレ
ーム番号となり、縦軸が擬音韻（ベクトル）番号となっ
ており、この距離マトリックスを参照してベクトル番号
系列として蓄えられている標準パターンと入力音声との
ＤＰマツチングを行う。A split method that takes onomatopoeic units has been proposed, but the spirit method calculates the distance between each frame of input speech in advance by calculating the distance between each frame of the input speech and a finite number of onomatopoeias (codebook). This method reduces the amount of distance calculation by simply searching the matrix for DP matching. In this split method, vector quantization is performed only on word standard patterns, and vector quantization is not applied to input speech. Therefore, in this Spirit method, a distance 71-ricks is created between the analysis frame of the input speech and the onomatopoeia (vector) stored in advance, but in this distance matrix, the horizontal axis is the frame number of the input speech. , the vertical axis is an onomatopoeic (vector) number, and by referring to this distance matrix, DP matching is performed between the standard pattern stored as a vector number series and the input speech.

上記スプリット法を更に改良したものとして、ダブルス
ブリット法が提案されているが、このダブルスブリット
法は、標準パターンのみならす入力音声をもベクトル量
子化する方法である。A double split method has been proposed as a further improvement of the above split method, and this double split method is a method of vector quantizing input speech that includes only a standard pattern.

第７図は、上記ダブルスブリット法の一例を説明するた
めのブロック線図で、図中、２０は入力部、２１は分析
部、２２はベクトル量子化部、２３はベクトル番号発生
部、２４は標準ベタ１−ル記憶部、２５はベクトル距離
マトリックステーブル、２６はＤＰマツチング部で、入
力音声はベクトル量子化部２２において標準ベクトル記
憶部２４の標準特徴ベクトルに変換されて量子化され、
ベクトル番号発生部２３においてベクトル番号系列に変
換され、ＤＰマツチング部２６に送られる。ＤＰマツチ
ング部２Ｇでは、前述のごとくして送られてくる入力音
声のベクトル番号系列と、ベクトル記憶部２４に予め蓄
えられている単語標準バタンのベクトル番号系列とのＤ
Ｐマツチングを、ベクトル間の距離を表わす前記ベクト
ル間距離マトリックステーブル２５を参照しながら実行
する。FIG. 7 is a block diagram for explaining an example of the double blit method, in which 20 is an input section, 21 is an analysis section, 22 is a vector quantization section, 23 is a vector number generation section, and 24 is a block diagram for explaining an example of the double blit method. 25 is a vector distance matrix table; 26 is a DP matching unit; the input audio is converted into a standard feature vector in the standard vector storage unit 24 and quantized in the vector quantization unit 22;
It is converted into a vector number sequence in the vector number generation section 23 and sent to the DP matching section 26. The DP matching unit 2G performs a D matching between the vector number sequence of the input voice sent as described above and the vector number sequence of the word standard button stored in advance in the vector storage unit 24.
P matching is executed while referring to the inter-vector distance matrix table 25 representing the distance between vectors.

更に詳細に説明すると、入力音声Ａのｉ番目のフレーム
の特徴ベクトルａｉは、ａ、ｉ　＝　（ａｉ＋　、　ａｉ２　、−ａｉｐ）ただ
し、ｉ＝１．２．・・・Ｉ（Ｉ；入カフレーム数Ｐ；特
徴パタメータの次元数で表わされ、一方、音声の標準ベクトルパターンＢのに
番目の特徴ベクトルｂｋは、ｂｋ　＝　（ｂｋ、、ｂｋ２・・ｂｋｐ）ただし、ｋ＝
１．２．・・・Ｋ（Ｋ；量子化標準ベクトル数）で表わ
される（この特徴ベタ１−ルｂｋはベクトル量子化され
ている）。To explain in more detail, the feature vector ai of the i-th frame of input audio A is a, i = (ai+, ai2, -aip) where i = 1.2. ... I (I; number of input frames P; number of dimensions of feature parameters; on the other hand, the second feature vector bk of the standard vector pattern B of speech is expressed as bk = (bk,,bk2...bkp ) However, k=
1.2. ... is represented by K (K: number of quantized standard vectors) (this characteristic vector bk has been vector quantized).

二こで、入力音声の特徴ベクトルａｉをベクトル量子化
するために、標準ベクトル記憶部の標準パターンと照合
して特徴ベクトルｂｇに変換する。Second, in order to vector quantize the feature vector ai of the input voice, it is compared with the standard pattern in the standard vector storage section and converted into a feature vector bg.

ただし、ｉ＝＝ａｒｇ　ｍ１ｎＩ（ａｉ　ｂｋ）ｌである。ただ
し、ｉ＝ａｒｇ　ｒｎｉｎｌ（ａｉ−ｂｋ）１にはｌ、：Ｌｉ−ｂｋｌを最ｔ、ｌ＼にするｋの値をいう
。即ち、ａｉと全てのｂｋとの距離を計算し、ａ］に距
離が最も近い特徴ベクトルｂ↑を８１の代りに用い、’
ａｉ＝ｂ′ｉとするものである。ここで、ｂｋとｂｊ（
ｋ≠ｊ）とのフレーム間距離を前もって計算してベクト
ル距離マトリックステーブル２５に格納しておくと、各
入力フレームの特徴ベクトルａｉとベクトル量子化され
た単語標準パターンｂｋとのフレーム間距離はテーブル
２５を参照することによって得られる。ここで、登録単
語の標準パターンは、ｅｆＡ準ベクトルのベクトル番号
系列によって得られる。However, i==arg m1nI(ai bk)l. However, i=arg rninl(ai-bk)1 refers to the value of k that makes l, :Li-bkl the most t, l\. That is, calculate the distance between ai and all bk, use the feature vector b↑ whose distance is closest to a] instead of 81, and '
Ai=b'i. Here, bk and bj (
k≠j) is calculated in advance and stored in the vector distance matrix table 25, the interframe distance between the feature vector ai of each input frame and the vector quantized word standard pattern bk is calculated in the table. 25. Here, the standard pattern of registered words is obtained by the vector number sequence of the efA quasi-vector.

上記ダブルスプリント法は、（１）距離計算量をスプリット法よりも減少できる。The above double sprint method is (1) The amount of distance calculation can be reduced compared to the split method.

（２）事前に距離マトリックスを設定できるので、距離
マトリックスを巧妙に設定できる。(2) Since the distance matrix can be set in advance, the distance matrix can be set skillfully.

（３）入力音声に簡易な尺度を用いることによりＬＰＧ
分析における特徴パラメータの計算を省略できる。(3) LPG by using a simple scale for input speech
Calculation of feature parameters in analysis can be omitted.

等の利点があり、特に、特定話者認識に対しては有用で
ある。しかし、同一単語中の同一音韻に対応する特徴ベ
クトルでも、話者によって大きな違いがある。それ故、
不特定話者の音声認識では複数個の標準パターンを用す
ることが多い（マルチテンプレート法、ＫＮＮ法等）。It has the following advantages, and is particularly useful for specific speaker recognition. However, even feature vectors corresponding to the same phoneme in the same word vary greatly depending on the speaker. Therefore,
Speech recognition for unspecified speakers often uses a plurality of standard patterns (multi-template method, KNN method, etc.).

しかし、計算量やメモリー量が標準パターン数に比例し
て大きくなり実用上問題がある。However, the amount of calculation and memory increases in proportion to the number of standard patterns, which poses a practical problem.

目　的本発明は、上述のごとき実情に鑑みてなされたもので、
特に、ダブルスブリット法による音声認識装置において
、認識精度及び認識速度の向上を目的としてなされたも
のである。Purpose The present invention was made in view of the above-mentioned circumstances.
In particular, this method was developed for the purpose of improving recognition accuracy and recognition speed in a speech recognition device using the double split method.

１−一一戊本発明の構成について、以下、−実施例に基づいて説明
する。1-11 The structure of the present invention will be described below based on embodiments.

第８図は、本発明による音声認識装置の一実施例を説明
するための構成図で、図中、２７はベクトルナンバーシ
ーフェンス及びクラスナンバーシーフェンス部、２８は
クラス別されたベクトル間距離マトリックステーブルで
、その使節７図と同様の作用をする部分には第７図と同
一の参照番号が付しである。標準パターンは、第９図に
示すように、ベクトルナンバーシーフェンスと、各フレ
ームがクラス分けされたクラスナンバーシーフェンスか
らなる。ベクトル間距離マトリックステーブル２８は、
第１０図に示すように、クラス数Ｍ個のテーブルを持ち、クラスＣのマトリ
ックスの要素ｄ、、は入力ベクトルが擬音Ｊ韻ｉであった時とクラスＭの標準パターンのベクトルが
擬音韻ｊであった時とのベクトル間距離を表わす。ＤＰ
マツチング部２６において、２７のクラスナンバーシー
フェンスによって標準パターンに対応するクラスの距離
マトリックステーブルが２８から選択され、該パターン
のベクトルナンバーシークエンスを参照して入カバター
ンとのＤＰマツチングが行われる。FIG. 8 is a block diagram for explaining one embodiment of the speech recognition device according to the present invention. In the figure, 27 is a vector number sea fence and a class number sea fence section, and 28 is a vector distance matrix classified by class. Parts of the table that function similarly to those in Figure 7 are given the same reference numbers as in Figure 7. As shown in FIG. 9, the standard pattern consists of a vector number sea fence and a class number sea fence in which each frame is classified into classes. The inter-vector distance matrix table 28 is
As shown in Figure 10, there is a table with M classes, and element d of the matrix of class C is the onomatopoeic rhyme i when the input vector is the onomatopoeic j rhyme i, and the vector of the standard pattern of the class M is the onomatopoeic rhyme j. It represents the distance between vectors from when . DP
In the matching unit 26, the distance matrix table of the class corresponding to the standard pattern is selected from 28 using the class number sea fence 27, and DP matching with the input cover pattern is performed by referring to the vector number sequence of the pattern.

効　果従って、本発明によると、入力音声をベクトルナンバー
スシークエンスによって符号化するとともに、ベクトル
間距離マトリックステーブルを音の種類に応じて予めク
ラス分けしておき、クラス分けされたインデックス（ク
ラスナンバーシーフェンスによって与えられる）に基づ
いて所望のマトリックステーブルを選択してＩ）Ｐマツ
チングするようにしたので音声認識をより迅速にかつ正
確に行うことができる。Effects Therefore, according to the present invention, input speech is encoded by a vector number sequence, the vector distance matrix table is classified in advance according to the type of sound, and the classified index (class number sequence) is used. Since the I)P matching is performed by selecting a desired matrix table based on the (given by the fence), speech recognition can be performed more quickly and accurately.

[Brief explanation of the drawing]

第１図は、音声認識装置の基本構成図、第２図は、音声
分析の一例を示す図、第３図は、時間正規化のための写
像モデル、第４図は、歪関数による時間正規化図、第５
図は、時間正規化を行うための格子状平面図、第６図は
、ＤＰマツチング処理を行うプロセッサのブロック線図
、第７図は、ダブルスブリット法の一例を説明するため
のブロック図、第８図は１本発明による音声認識装置の
一実施例を説明するための図、第９図は、本発明の実施
に使用する標準パターンの構成例を示す図、第１０図は
、本発明の実施に使用するベクトル間距離マトリックス
テーブルの一例を示す図である。２０・・・入力部、２１・・・分析部、２２・・・ベク
トル量子化部、２３・・・テンプレート（ベクトル）部
、２４・・・ベクトルシーフェンス器、２５・・・ベク
トル間距離マトリックステーブル、２６・・・ＤＰマツ
チング部、２７・・・ベクトルナンバーシーフェンス及
びクラスナンバーシーフェンス部、２８・・クラス分け
されたベクトル間距離マトリックステーブル。第　ｌ　図第２図第３図夕歌阜ｌぐターンＢ第４図第９図第１０図手続補正書（出射昭和５９年６月２７日１、事件の表示昭和５９年　特許願　第９３５７１号２、発明の名称音声認識装置３、補正をする者事件との関係　特許出願人オオタ　り　ナカマゴメ住所　東京都大田区中馬込１丁目３番６号氏名（名称＞
　（６７４）株式会社リコー代表者　浜　１）　広（ほ
か１名）４、代　理　人住　所　〒２３１　横浜市中区不老町１−２−７シヤト
レーイン横浜８０７号７、補正の内容（１）、明細書の特許請求の範囲を別紙の通り補正する
。（２）、明細書第８頁第７行目に記載の「増加するたど
って」を［増加する方向にたどって」に補正する。（３）、同第９頁第７行から第８行目に記載の「第（１
３）式の１項から」を「第（１３）式の右辺第１項から
」に補正する。（４）、同第１０頁第１９行目に記載の「２６はＤＰマ
ツチング部で、」を「２６はＤＰマツチング部、２９は
最小距離単語同定部、３０は認識結果出力部で、」に補
正する。（５）、同第１１頁第９行目に記載の「参照しながら実
行する。」を「参照しながら実行し、最小距離を有する
単語を単語同定部２９で決定し、認識結果出力部３０に
し出力する。」に補正する。（６）、同第１２頁第７行目に記載のｒｌａｉ−ｂｋｌ
を最小にするｋの値をいう。」をｒｌ（ａｉ−ｂｋ）ｌ
を最小にするｋの値をいう。」に補正する。（７）、同第１４頁第１０行から第１１行目に記載の「
第９図に示すように、ベクトルナンバーシーフェンスと
、」を「第９図に示すように、単語ｎのベクトルナンバ
ーシーフェンスｂ−と、」に補正する。〕（８）、同第１２行から第１３行目に記載の「クラス分
けされたクラスナンバーシーフェンスからなる。」をｒ
クラス分けされた単語ｎのクラスナンバーシーフェンス
Ｃからなる。」に補正する。（９）、同第１５行から第１６行目に記載のｒクラスＣ
のマトリックスの要素ｄ　は入力ベクトルが」をＪ「クラスｍの７１ヘリツクスの要素ｄ　は入カベクＪトルが」に補正する。（１０）、同第１７行目に記載の「クラスＭ」を「クラ
スｍ」に補正する。（１１）、同第１５頁第３行から第４行目に記載のＦＤ
Ｐマツチングが行われる。」をＦＤＰマツチングが行わ
れ、最小距離を有する単語を単語同定部２９で決定し、
認識結果出力部３０にて出力する。」に補正する。（１２）　、同第８行から第９行目に記載の「音の種類
」を「音韻の種類」に補正する。（１３）、同第１６頁第１３行から第１４行目に記載の
「２８・・・クラス分け・　・テーブル。」を「２８・
・クラス分けされたベクトル間距離マトリックステーブ
ル、２９・・単語同定部、３０・認識結果出力部。」に
補正する。（１４）、図面の第５図、第７図、第８図、第１０図を
別紙の通り補正する。特許請求の範囲量子化された標準パターンと、入力音声を量子化する量
子化手段と、量子化された入力音声をインデックス化す
る手段と、クラス／けされたベクることを特徴とする音
声認識装置。第　５１ノ第１０図第７図Figure 1 is a basic configuration diagram of the speech recognition device, Figure 2 is a diagram showing an example of speech analysis, Figure 3 is a mapping model for time normalization, and Figure 4 is time normalization using a distortion function. Figure 5
6 is a block diagram of a processor that performs DP matching processing; FIG. 7 is a block diagram for explaining an example of the double split method; FIG. 8 is a diagram for explaining one embodiment of the speech recognition device according to the present invention, FIG. 9 is a diagram showing an example of the configuration of a standard pattern used in implementing the present invention, and FIG. It is a figure which shows an example of the distance matrix table between vectors used for implementation. 20... Input unit, 21... Analysis unit, 22... Vector quantization unit, 23... Template (vector) unit, 24... Vector sea fence unit, 25... Vector distance matrix Table, 26... DP matching section, 27... Vector number sea fence and class number sea fence section, 28... Distance matrix table between vectors classified into classes. Figure l Figure 2 Figure 3 Yukafu Ig Turn B Figure 4 Figure 9 Figure 10 Procedural Amendment (Issue June 27, 1980 1, Incident Indication 1982 Patent Application No. 93571) 2. Name of the invention Speech recognition device 3. Relationship with the case of the person making the amendment Patent applicant Ota Ri Nakamagome Address 1-3-6 Nakamagome, Ota-ku, Tokyo Name (Name>
(674) Ricoh Co., Ltd. Representative Hama 1) Hiroshi (and 1 other person) 4. Agent Address 7, 807, Shatrain Yokohama, 1-2-7 Furo-cho, Naka-ku, Yokohama 231, Contents of amendment (1), The claims of the specification are amended as shown in the attached sheet. (2) "Following in an increasing direction" written on page 8, line 7 of the specification is amended to "tracing in an increasing direction." (3), page 9, lines 7 to 8, “(1)
3) Correct "from the first term of equation (13)" to "from the first term on the right side of equation (13)." (4) In the 19th line of page 10, change "26 is the DP matching unit," to "26 is the DP matching unit, 29 is the minimum distance word identification unit, 30 is the recognition result output unit," to correct. (5) "Execute while referring to" described in the 9th line of page 11 of the same page is changed to "Execute while referring to the word, the word with the minimum distance is determined by the word identification unit 29, and the recognition result output unit 30 Correct to "Output." (6), rlai-bkl described on page 12, line 7 of the same
The value of k that minimizes ” rl(ai-bk)l
The value of k that minimizes ”. (7), page 14, lines 10 to 11, “
As shown in FIG. 9, the vector number sea fence is corrected to ``as shown in FIG. 9, the vector number sea fence b- of word n''. ] (8), "consists of class numbered sea fences" written in lines 12 to 13 of the same
It consists of a class number sea fence C of classified words n. ”. (9), r class C described in lines 15 to 16 of the same
The element d of the matrix of is the input vector J, and the element d of the 71 helix of class m is the input vector J. (10), "Class M" written in the 17th line is corrected to "Class m". (11), FD as stated in page 15, lines 3 to 4
P matching is performed. " is subjected to FDP matching, and the word with the minimum distance is determined by the word identification unit 29,
The recognition result output section 30 outputs the result. ”. (12) The "type of sound" described in the 8th line to the 9th line is corrected to "type of phoneme". (13), "28...class classification table" written in lines 13 to 14 on page 16 of the same page is changed to "28...
・Classified vector distance matrix table, 29. Word identification unit, 30. Recognition result output unit. ”. (14) Figures 5, 7, 8, and 10 of the drawings will be corrected as shown in the attached sheet. Claims: Speech recognition characterized by a quantized standard pattern, quantization means for quantizing input speech, means for indexing quantized input speech, and a class/keyed vector. Device. Figure 51 Figure 10 Figure 7

Claims

[Claims]

It has a quantized standard pattern, a quantization means for quantizing the input audio, a means for converting the quantized input audio into an intex, and a vector distance matrix table. A speech recognition device characterized by performing verification using the index.