JPS6172297A

JPS6172297A - Standard pattern generation system for voice recognition

Info

Publication number: JPS6172297A
Application number: JP59195238A
Authority: JP
Inventors: 船橋　賢一
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1984-09-17
Filing date: 1984-09-17
Publication date: 1986-04-14

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】く技術分野〉本発明は、特にパターン・マッチング方式による特定話
者の連続音声認識に有用で、調音結合。DETAILED DESCRIPTION OF THE INVENTION Technical Field The present invention is particularly useful for continuous speech recognition of a specific speaker using a pattern matching method, and is particularly useful for recognizing continuous speech of a specific speaker using a pattern matching method.

発声によるパターンの変動を考慮に入れて、カテゴリご
とに複数個の標準パターンを作成する方式％式％不特定話者の音声認識方式においては、従来から、多数
の人の音声パターンからクラスタリング分析によってい
くつかのクラスタを求め、それぞれのクラスタを代表す
るパターンを標準パターンとする技法か用いられている
。この場合は、クラスタリング分析を行なうため、音声
パターンが非常に多数ある場合を対象としている。A method of creating multiple standard patterns for each category by taking into account variations in patterns due to utterances.In speech recognition methods for non-specific speakers, conventional speech recognition methods have traditionally used clustering analysis from the speech patterns of a large number of people. A technique is used in which several clusters are determined and a pattern representing each cluster is used as a standard pattern. In this case, since clustering analysis is performed, the target is a case where there are a large number of voice patterns.

一方、本発明で問題とするのは、主として、特定話者の
場合における標準パターンの作成であり、比較的少数の
登録音声パターンから複数個の標準パターンを選択する
方法である。そのため、クラスタリング分析では不適当
てあり、認識装置に組みこむには、アルゴリズムが複雑
すきるという問題点があった。On the other hand, the problem of the present invention is mainly the creation of standard patterns for a specific speaker, and the method of selecting a plurality of standard patterns from a relatively small number of registered speech patterns. As a result, there were problems with clustering analysis, and the algorithm was too complex to incorporate into a recognition device.

〈発明の目的〉特定話者の音声認識においては、標準パターンとしでは
、発声のばらつき、調音結合の影響を考慮に入れ、カテ
ゴリごとに複数個もつことが望ましい。−例として、単
音節／ＣＶ／単位に認識する場合を考えよう。<Objective of the Invention> In speech recognition of a specific speaker, it is desirable to have a plurality of standard patterns for each category, taking into account variations in pronunciation and the effects of articulatory combination. - As an example, let's consider the case of recognizing monosyllables/CV/units.

この時たとえば、登録音声カテゴリ・単音節／　ｋ　ａ
　／に対して標準パターンを作成する場合、孤立発声の
他に／Ｖｋａ／（Ｖ＝ａ　、　ｉ　、ｕ　、ｅ。At this time, for example, the registered speech category monosyllable / k a
When creating a standard pattern for /, in addition to isolated utterances, /Vka/(V=a, i, u, e.

０）等を発声し、登録したのち、このうちたとえば、代
表的なもの３個を標準パターンとして最終的に登録した
いといった問題がおこる。0) etc. and register them, a problem arises in which, for example, it is desired to finally register three representative patterns as standard patterns.

本発明は、こうしたある登録音声カテゴリの比較的少数
の音声パターンから、代表的な音声パターンを指定した
数だけ選択する方式であって、それが一定の量的な判定
論理に基づいて行なえ、特に音声登録機能をもつ特定話
者標準装置にあって効果的に利用し得る有用な方式を提
供するものである。The present invention is a method for selecting a designated number of representative voice patterns from a relatively small number of voice patterns in a certain registered voice category, and is particularly capable of selecting a specified number of representative voice patterns based on a certain quantitative judgment logic. The present invention provides a useful method that can be effectively used in a speaker-specific standard device having a voice registration function.

〈実施例〉以下に、本発明を実施例に基ついて詳細に説明する。<Example> The present invention will be explained in detail below based on examples.

音声パターンは特徴パラメータの時系列によって表わさ
れ、パターン九とパターン■の間には距離ｄ　（（、ｙ
）が与えられているとする。A speech pattern is represented by a time series of feature parameters, and there is a distance d ((, y
) is given.

特徴パラメータとして、自己相関係数、ケプストラム係
数等が考えられ、距離としては、ユークリッド距離を用
いてダイナミックプログラミングによって求めたものが
考えられるが、以下の説明は、これらに依存しない一般
性をもつものである。Possible feature parameters include autocorrelation coefficients and cepstral coefficients, and distances can be calculated using dynamic programming using Euclidean distance, but the following explanation is based on generality that does not depend on these. It is.

アルボリムの理解を容易にするため、下記に参考として
標準パターンを１個とする場合も含めて説明する。In order to facilitate understanding of Arborim, the following explanation will include the case where one standard pattern is used for reference.

ｌ）標準パターンを１個とする場合第３図にその概念図を示す。l) When using one standard pattern Figure 3 shows its conceptual diagram.

あるカテゴリＣの音声パターンを（電１２％２゜・・・
ｒ　％Ｎ）とする。このうちから標準パタ−７１個電　
を選択する基準は、各電、に対し、Ｄ・　＝Σｄ（、ｊ
　、電１）３＝＋を求め、（１）、（ｊ＝＋、・・・、Ｎ））のうち最小
のＤｌ　を与える音声パターンＸｉ　　を、カテゴリＣ
の標準パターンとする。A certain category C voice pattern (Den 12% 2゜...
r%N). Of these, there are 71 standard putters.
The criterion for selecting is that for each electric current, D = Σd(, j
, Electric 1) Find 3=+, and select the voice pattern Xi that gives the minimum Dl among (1), (j=+,...,N)) for category C.
This is the standard pattern.

これは、最小二乗原理の考え方に基づくものであり、標
準パターンＸ　は、いわば音声パターンの「重心」に近
いものとして選ばれる。This is based on the idea of the least squares principle, and the standard pattern X is selected as being close to the "center of gravity" of the speech pattern.

以上が基本アルゴリズムとなるが、本発明のように複数
の標準パターンを設定する場合は次のようになる。The above is the basic algorithm, but when setting a plurality of standard patterns as in the present invention, the algorithm is as follows.

２）標準パターンを２個以上とする場合カテゴリＣの音
声パターン（ｌ　、・・・、ｘＨＩから、一般にｍ個の
標準パターン”　１（１１、・・・。2) When there are two or more standard patterns From the voice pattern of category C (l,...,xHI, there are generally m standard patterns "1 (11,...).

’ｉ（ｍｌ’を選択する場合を述べる。第１図にｍ＝２
の場合の概念図を示す。The case where 'i(ml') is selected will be described. In Figure 1, m=2
A conceptual diagram of the case is shown.

（電、・・・、電、）からｍ個の標準パターンの候補、
Ｒ，−（囁　、　、電、　）をとり出す中１”’　　ｉ
（ｍｌＮ′。m standard pattern candidates from (den,...,den,),
Take out R, - (whisper, , electric, ) while 1”' i
(ml N'.

音声パターンｌ　とｍ個の標準パターンの候補の集合Ｒ
１＝（罵１（１１’・・、２％ｉ（ｍｌ’の「距離」Ｄ
（ｘ、　　、Ｒ，）を、Ｄ　（ｘ　　、Ｒ、）＝ｍ’　ｎ　［ｄ（ｑｊ＋　ｘ　
１（ｋｌ）、　ｋ＝１．−、ｍｌ」　　　　１（ｍｉｎは最小値を表わす）とする。そして、この最小「距離コを用いて、各々のｍ
個の標準パターンの候補の集合Ｒ＝（・ｉ（＋１’・・
・、・１−））に対し、Ｄ、＝　　Σ　Ｄ　（ｘ　Ｊ　
　Ｉ　Ｒｒ　　）ｊ＝＋を求め、Ｄｌ　　が最小になるようなＲ１＝（％ｉ（＋
１’・・”ｉ（ｍｌ’をｍ個の標準パターンとして選択
する。A set R of speech pattern l and m standard pattern candidates
1=(expletive 1(11'..., 2%i(ml''s "distance" D
(x, ,R,), D (x, R,)=m' n [d(qj+x
1 (kl), k=1. −, ml” 1 (min represents the minimum value). Then, using this minimum distance
Set of standard pattern candidates R=(・i(+1'...
・,・1−)), D,= Σ D (x J
Find R1=(%i(+
1'..."i(ml') is selected as m standard patterns.

本方式では、以上に述べたように、あるカテゴリの音声
パターン間の配列（α（Ｘ−、Ｘ））」を前もって算出しておけば、後は簡単な演算（最小値を
求める操作と加算）によって、求めるべき標準パターン
の組み合わせを得ることができる。In this method, as mentioned above, if the arrangement (α(X-, ), the desired combination of standard patterns can be obtained.

第２図は本方式による音声認識装置の構成例である。FIG. 2 shows an example of the configuration of a speech recognition device according to this method.

この音声認識装置は、登録モードと認識モードをもつ。This speech recognition device has a registration mode and a recognition mode.

登録モードにおいては、入力された音声データはマイク
１．アンプ２　、Ａ／Ｄ変換器３を通り、特徴パラメー
タ抽出部４によってパラメータ系列に変換され、音声パ
ターン用メモリー５に蓄わえられる。登録はカテゴリご
とに行なうことにすれば、音声パターン用メモリ５は、
あるカテゴリの音声パターンを入れる分だけの大きさで
よい。In the registration mode, input audio data is sent to microphone 1. The signal passes through an amplifier 2 and an A/D converter 3, is converted into a parameter series by a feature parameter extraction section 4, and is stored in a speech pattern memory 5. If we decide to register each category, the voice pattern memory 5 will be
It only needs to be large enough to accommodate the audio pattern of a certain category.

音声パターン用メモリ５中のパターンは、マツチング計
算部６によって相互の距離が求められ、距離マトリック
ス・メモリ７に蓄えられる。論理判定部Ｃｌｌ１８は、
標準パターンの候補の組み合わせＲを発生させ、距離マ
トリックス・メモリ７から、最小値をとる演算を加算に
よって前述のＤｌを計算し、これが最小となる組み合わ
せを判定して、その組み合わせの音声パターンを音声パ
ターン用メモリ５から標準パターン用メモリ９に転送す
る。The mutual distances of the patterns in the voice pattern memory 5 are determined by the matching calculation unit 6 and stored in the distance matrix memory 7. The logic judgment unit Cll18 is
Generate a combination R of standard pattern candidates, calculate the above-mentioned Dl by adding the minimum value calculation from the distance matrix memory 7, determine the combination with the minimum value, and voice the audio pattern of that combination. It is transferred from the pattern memory 5 to the standard pattern memory 9.

認識モードにおいては、入力音声は、特徴パラメータ抽
出部４によってパラメータの時系列に変換され、標孕パ
ターン用メモリ９中のパターンとの距離をマツチング計
算部６て求め、論理判定部［１１１０で認識結果を判定
する。In the recognition mode, the input voice is converted into a time series of parameters by the feature parameter extraction unit 4, the distance from the pattern in the memory 9 for the prenatal pattern is determined by the matching calculation unit 6, and the input voice is recognized by the logic determination unit [1110]. Judge the results.

〈発明の効果〉以上のように本発明によれば、比較的少数の音声パター
ンから所定数の標準パターンを作成でき、しかもそれか
一定の量的な判定論理に基ついて行なえるものであり、
特に特定話者認識装置において非常に有用な方式が提供
できる。<Effects of the Invention> As described above, according to the present invention, it is possible to create a predetermined number of standard patterns from a relatively small number of speech patterns, and it is also possible to create this based on a certain quantitative judgment logic.
In particular, a very useful method can be provided in a specific speaker recognition device.

[Brief explanation of drawings]

第１図は本発明の一実施例を示す標準パターンを２個と
する場合の概念図、第２図は音声認識装置としての構成
例を示すプロ７り図、第３図は参考として標僧パターン
を１個とした場合の概念図である。Ｃ・・・カテゴリ、Ｘ　　＋　Ｘ２　　・・、ｘｎ　　
・・音声パタ！Figure 1 is a conceptual diagram showing an example of the present invention when there are two standard patterns, Figure 2 is a professional diagram showing an example of the configuration of a speech recognition device, and Figure 3 is a standard pattern for reference. It is a conceptual diagram when there is one pattern. C...Category, X + X2..., xn
・Voice pattern!

Claims

[Claims]

1. In a speech recognition method using pattern matching, which creates a specified number of standard patterns for each registered voice category, it is possible to create a specified number of standard patterns for any combination of registered voice patterns in each category as a standard pattern. An evaluation value indicating appropriateness is obtained by summing the minimum distance between one voice pattern in the category and the voice pattern in the combination over all voice patterns in the category, and the voice pattern has the minimum evaluation value. A standard pattern creation method for speech recognition that uses a combination of as the standard pattern for that category.