JPS62245294A

JPS62245294A - Voice recognition system

Info

Publication number: JPS62245294A
Application number: JP61089139A
Authority: JP
Inventors: 沢井　秀文
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-04-17
Filing date: 1986-04-17
Publication date: 1987-10-26

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】挟権衾夏本発明は、音声認識装置、パターンマツチング、信号の
コード化等の技術分野において使用して好適な音声認識
方式に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition method suitable for use in technical fields such as speech recognition devices, pattern matching, and signal coding.

灸米技帆未知入力音声パターンと標準音声パターンを共に代表ベ
クトルのコード列に変換するベクトル量子化の手法を基
礎とした音声認識方法について、例えば、特公昭５９−
９９５００号公報が述べられているが、該公報に記載さ
れた発明においては。Regarding a speech recognition method based on a vector quantization method that converts both an unknown input speech pattern and a standard speech pattern into a code string of representative vectors, for example,
No. 99500 is mentioned, but in the invention described in the publication.

距離尺度や代表ベクトルの木構造化等の工夫が凝らされ
ているが、特に、不特定話者向きの音声認識方式に適用
する場合に対しては言及されていない、不特定話者向き
の音声認識においては、辞書の構成法２代表ベクトルの
構造化、距離テーブルの作成法、入力音声のベクトル量
子化法等の効率的なシステム構成が必要である。Although efforts have been made to create a tree structure of distance measures and representative vectors, there is no mention of application to speech recognition methods for unspecified speakers. In recognition, efficient system configurations such as dictionary construction methods, structuring of representative vectors, distance table creation methods, and vector quantization methods for input speech are required.

■−−煎本発明は、上述のごとき実情に鑑みてなされたもので、
特に、音声信号を特徴ベクトルの系列で表現した後に、
代表ベクトルのコード系列に変換して未知入カバターン
を認識するベクトル量子化の手法を用いた音声認識方式
において、認識処理速度の向上と認識精度の向上を図る
ことを目的としてなされたものである。■--The present invention was made in view of the above-mentioned circumstances.
In particular, after expressing the audio signal as a series of feature vectors,
This was done with the aim of improving recognition processing speed and recognition accuracy in a speech recognition method that uses a vector quantization method that recognizes unknown input cover patterns by converting them into code sequences of representative vectors.

菫−一一處本発明は、上記目的を達成するために、音声信号を入力
する音声入力部と、入力された音声を分析して特徴パラ
メータに変換する特徴分析部と、特徴パラメータをベク
トル量子化してコード列に変換するベクトル量子化部と
、ベクトル量子化する際に必要な代表的なコードベクト
ルを格納しておくコードブック格納部と、認識対象とな
る標準パターンをコード列で格納しておく標準パターン
格納部と、入力音声と標準パターンとのマツチングを行
なう認識処理部と、認識処理の際に必要となるパターン
間の距離計算をコード間の距離として引用するための距
離テーブル格納部と、認識結果を出力する端子とから成
り、前記コードベクトルをクラスタリングして各クラス
タ内に属するコードベクトル間でマハラノビスの汎距離
による距離計算を行なってテーブルに格納しておくこと
を特徴としたものである。以下、本発明の実施例に基い
て説明する。SUMMARY In order to achieve the above object, the present invention includes an audio input section that inputs an audio signal, a feature analysis section that analyzes the input audio and converts it into feature parameters, and a vector quantum A vector quantization section that converts the data into a code string, a codebook storage section that stores representative code vectors necessary for vector quantization, and a codebook storage section that stores standard patterns to be recognized as code strings. a recognition processing unit that matches input speech and standard patterns; and a distance table storage unit that uses distance calculations between patterns, which are necessary during recognition processing, as distances between codes. , and a terminal for outputting recognition results, and is characterized in that the code vectors are clustered, distances are calculated using Mahalanobis' general distance between the code vectors belonging to each cluster, and the result is stored in a table. be. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するための電気的ブ
ロック線図で、図中、１は音声入力端子、２は特徴ベク
トル抽出部、３はベクトル量子化部。FIG. 1 is an electrical block diagram for explaining one embodiment of the present invention. In the figure, 1 is an audio input terminal, 2 is a feature vector extraction section, and 3 is a vector quantization section.

４はコードブック格納部、５は認識処理部、６は標準パ
ターン格納部、７は距離テーブル、８は認識結果格納部
で、音声入力端子１より入力された音声は、特徴ベクト
ル抽出部２で音声の特徴を表わす特徴ベクトルに変換さ
れる。この特徴ベクトルはさらにコードブック格納部４
に格納されている代表ベクトルのいずれかに変換され、
該当する代表ベクトルのコードがベクトル量子化部３で
登録される。ベクトル量子化部３で登録したコード系列
と、予め標準パターン格納部６に登録格納されている標
準パターンのコード系列とのパターンマツチングを認識
処理部５で行なうが、その際、代表ベクトル間の距離は
予め計算して距離テーブル７に格納しておいたものをコ
ードの組み合わせから引用することにより高速に求める
ことができる。認識処理部５では、ＤＰマツチング（動
的計画法によるパターンマツチング）等の処理により最
小距離を有する標準パターン名を認識結果として認識結
果格納部８にて出力する。4 is a codebook storage unit, 5 is a recognition processing unit, 6 is a standard pattern storage unit, 7 is a distance table, and 8 is a recognition result storage unit.The audio input from the audio input terminal 1 is processed by the feature vector extraction unit 2. It is converted into a feature vector representing the characteristics of the voice. This feature vector is further stored in the codebook storage unit 4.
is converted to one of the representative vectors stored in
The code of the corresponding representative vector is registered in the vector quantization unit 3. The recognition processing unit 5 performs pattern matching between the code sequence registered in the vector quantization unit 3 and the standard pattern code sequence registered and stored in the standard pattern storage unit 6 in advance. The distance can be obtained quickly by quoting the distance calculated in advance and stored in the distance table 7 from the combination of codes. The recognition processing unit 5 outputs the standard pattern name having the minimum distance as a recognition result to the recognition result storage unit 8 through processing such as DP matching (pattern matching using dynamic programming).

第２図は、本発明におけるコードブック４の構造を表わ
す図であり１図中、４１，４２，４３゜４４はクラスタ
、４１ａ、４２ａ、４３ａ、４４ａは各々、各クラスタ
に属する代表ベクトル（コードベクトル）　、４１ｂ、
４２ｂ、４３ｂ、４４ｂは各クラスタの平均ベクトルを
表わす、コードベクトルの作成法については、例えば、
Ｙ、　Ｌｉｎｄａｅｔ　ａｌ、　　”Ａｎ　Ａｌｇｏｒ
ｉｔｈｍ　ｆｏｒ　Ｖｅｃｔｏｒ　Ｑｕａｎｔｉｚｅｒ
Ｄｅｓｉｇｎ”　、　　ＩＥＥＥ、　　Ｔｒａｎｓ、Ｃ
ｏｍｇ＊ｕｎ、　　Ｖｏｌ　　Ｃｏｗ−２８゜Ｎｏ、１
．　Ｊａｎ、　１９８０．に記載されている。簡単に述
べると、学習サンプルとして種々の音韻や音節のパター
ンを含むような特徴ベクトルを用いて、全体の平均歪み
が最小さなるように、繰り返し計算を行なってクラスタ
を決定し、代表ベクトルを求めるものである。このよう
にして、求めた代表べクトル４１ａ、４２ａ、４３ａ、
４４ａ等をさらに学習サンプルとして、同様のアルゴリ
ズムを用いてクラスタリングしたものが第２図である。FIG. 2 is a diagram showing the structure of the codebook 4 according to the present invention. In FIG. vector), 41b,
42b, 43b, and 44b represent the average vector of each cluster.For the method of creating the code vector, for example,
Y., Linda et al., “An Algor.
ithm for Vector Quantizer
Design”, IEEE, Trans, C
omg*un, Vol Cow-28°No, 1
．． Jan, 1980. It is described in. To put it simply, we use feature vectors that include various phoneme and syllable patterns as training samples, perform repeated calculations to determine clusters, and obtain representative vectors so that the overall average distortion is minimized. It is something. In this way, the obtained representative vectors 41a, 42a, 43a,
FIG. 2 shows a result of clustering using the same algorithm using 44a and the like as learning samples.

第３図は、第２図の各クラスタ４１〜４４毎にクラスタ
に存在するコードベクトル間の距離テーブルを表わした
図であり、７１〜７４は第２図の４１〜４４に対応して
いる。また、７１ａ〜７４ａは前記のクラスタ内の距離
テーブル７１〜７４に属しない部分の距離テーブルであ
る。７１〜７４には各クラスタ内でのコードベクトル間
の距離を計算したものを格納する際に、通常のユークリ
ッド距離や市街地距離を用いる代わりにマハラノビスの
汎距離やベイズ則に基づく距離を用いることにより、特
に不特定話者の認識システムに適した距離テーブルを構
成することができる。また、７１ａ〜７４ａに通常のユ
ークリッド距離や市街地距離を格納しておけば、第２図
のクラスタ間のコードベクトル間距離を簡便に表現する
ことができる。あるいは、７１ａ〜７４ａに代表的な距
離（例えば、前記クラスタ間のコードベクトル間距離の
平均値）を格納することにすれば、この部分のメモリー
を低減させることが可能である。FIG. 3 is a diagram showing a distance table between code vectors existing in each cluster 41 to 44 in FIG. 2, and 71 to 74 correspond to 41 to 44 in FIG. 2. Further, 71a to 74a are distance tables of portions that do not belong to the distance tables 71 to 74 within the cluster. In 71 to 74, when storing the calculated distances between code vectors within each cluster, instead of using the usual Euclidean distance or urban distance, Mahalanobis' general distance or distance based on Bayes' law is used. In particular, it is possible to construct a distance table suitable for a speaker-independent recognition system. Furthermore, if normal Euclidean distances and urban area distances are stored in 71a to 74a, the distance between code vectors between clusters in FIG. 2 can be easily expressed. Alternatively, by storing representative distances (for example, the average value of the code vector distances between the clusters) in 71a to 74a, it is possible to reduce the memory required for this portion.

但し、第３図のｉ、ｊはコード番号、Ｎは量子化レベル
（コードベクトル数）、ｄｌ（ｉｔ　ｊ）ｔ・・・・。However, i and j in FIG. 3 are code numbers, N is the quantization level (number of code vectors), dl(it j)t...

ｄ’（ｉ、Ｊ）は各々クラスタ４１〜４４内での距離テ
ーブルを表わす。d'(i, J) represents a distance table within the clusters 41 to 44, respectively.

第４図は、第１図のベクトル量子化部３において、入力
音声をベクトル量子化してコード系列に変換する際に、
第２図のコードベクトルのクラスタリングとの整合性を
考慮したコードベクトルの木構造化の例を示したもので
あり、図中、木構造の深さは２段階で、各ノード（節）
の数には量子化レベルＮの平方根Ｆ１にほぼ等しくして
いる６例えば、Ｎ＝５１２のときは、Ｋ＝Ｊ「丁７押２
３である。したがって、第２図のクラスタ数及びクラス
タ内の平均コードベクトル数はいずれも約２３個に設定
すれば最も効率的な符号化が可能となる。FIG. 4 shows that when the vector quantization unit 3 of FIG. 1 vector quantizes the input speech and converts it into a code sequence,
This shows an example of tree-structuring of code vectors in consideration of consistency with code-vector clustering in Figure 2. In the figure, the depth of the tree structure is two levels, and each node (node)
The number is set approximately equal to the square root F1 of the quantization level N.6For example, when N=512,
It is 3. Therefore, the most efficient encoding can be achieved by setting both the number of clusters and the average number of code vectors within a cluster to about 23 in FIG. 2.

層−一一果以上の説明から明らかなように２本発明によると、音声
の特徴ベクトルの内、代表的なベクトルを表わすコード
ベクトルをクラスタリングすることにより、コードベク
トル間の距離テーブルも分割することができ、その結果
、テーブルの縮少を図ることができ、メモリ量の低減を
行うことができる。また、クラスタ内のコードベクトル
間の距離尺度を統計的な距離尺度（マハラノビスの汎距
離、ベイズ則に基づく距離）にすることにより。As is clear from the above description, according to the present invention, by clustering code vectors representing representative vectors among voice feature vectors, the distance table between code vectors can also be divided. As a result, it is possible to reduce the size of the table and the amount of memory. Also, by making the distance measure between code vectors within a cluster a statistical distance measure (Mahalanobis general distance, distance based on Bayes' law).

不特定話者向きの認識システムを構成でき、認識精度を
向上させることができる。コードベクトルのクラスタリ
ングに基づいて、コードベクトルを木構造化することに
より、入力音声を高速に符号化することも可能となる。A recognition system suitable for unspecified speakers can be configured, and recognition accuracy can be improved. By creating a tree structure of code vectors based on clustering of code vectors, it is also possible to code input speech at high speed.

[Brief explanation of drawings]

第１図は、本発明の一実施例を説明するための電気的ブ
ロック線図、第２図は、コードブックの構造を示す図、
第３図は、コードベクトル間の距離テーブルを示す図、
第４図は、コードベクトルの木構造を示す図である。１・・・音声入力端子、２・・・特徴ベクトル抽出部。３・・・ベクトル量子化部、４・・・コードブック格納
部、５・・・認識処理部、６・・・標準パターン格納部
、７・・・距離テーブル、８・・・認識結果格納部。特許出願人　　株式会社　リコー箆　　１　図第２図第　３　因第４図ＫＩＩＩＪｔｇFIG. 1 is an electrical block diagram for explaining one embodiment of the present invention, FIG. 2 is a diagram showing the structure of a codebook,
FIG. 3 is a diagram showing a distance table between code vectors,
FIG. 4 is a diagram showing a tree structure of code vectors. 1...Audio input terminal, 2...Feature vector extraction unit. 3... Vector quantization section, 4... Codebook storage section, 5... Recognition processing section, 6... Standard pattern storage section, 7... Distance table, 8... Recognition result storage section . Patent applicant Ricoh Co., Ltd. 1 Figure 2 Figure 3 Figure 4 KIIIJtg

Claims

[Claims]

(1) an audio input unit that inputs an audio signal, a feature analysis unit that analyzes the input audio and converts it into feature parameters, and a vector quantization unit that vector quantizes the feature parameters and converts them into a code string; A codebook storage section that stores representative code vectors required for vector quantization, a standard pattern storage section that stores standard patterns to be recognized as code strings, and input audio and standard pattern storage sections. It consists of a recognition processing unit that performs matching of the codes, a distance table storage unit that uses the distance calculation between patterns required during recognition processing as the distance between codes, and a terminal that outputs the recognition results. A speech recognition method characterized in that vectors are clustered, distances are calculated using Mahalanobis' general distance between code vectors belonging to each cluster, and the results are stored in a table.

(2) The speech recognition method according to claim (1), wherein the distance calculation is performed using a distance measure based on Bayesian judgment.

(3) In the distance table, the distance measure between each cluster is Euclidean distance or urban area distance, and the distance measure within each cluster is the statistic according to claim (1) or (2). The speech recognition method according to claim 1 or 2, characterized in that a distance measure is used as a distance measure.

(4) In the distance table, the distance table is reduced by setting the distance between code vectors between each cluster to an appropriate constant value.
The speech recognition method described in section 1).

(5) The number of clusters and the number of code vectors belonging to each cluster are the same and are equal to the square root of the total number of code vectors. Voice recognition method.