JPS61102698A

JPS61102698A - Standard pattern adaption system

Info

Publication number: JPS61102698A
Application number: JP59224632A
Authority: JP
Inventors: 迫江　博昭
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1984-10-25
Filing date: 1984-10-25
Publication date: 1986-05-21
Also published as: JPH0570838B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声認！！＃！装置の標準パターンの利用者の
声質に対する適応方式に関する。[Detailed Description of the Invention] (Field of Industrial Application) The present invention uses voice recognition! ! #! This invention relates to a method for adapting a standard pattern of a device to a user's voice quality.

（従来技術とその問題点）音声認識技術は近年急速に向上し、計算機へのデータ入
力、各種機械システムへの制御指令入力等に広く利用さ
れるようになっている。しかし、その認識性能は理想的
なものとは言えず、利用者本人の発声による標準パター
ン登録が必要であるとか、誤認識が発声するとかいった
問題が残されている。(Prior art and its problems) Speech recognition technology has improved rapidly in recent years and is now widely used for inputting data to computers, inputting control commands to various mechanical systems, etc. However, its recognition performance is not ideal, and there are still problems such as the need for standard pattern registration by the user's own voice and the possibility of erroneous recognition.

これら既存のシステムはすべてパターンマツチングの原
理によって動作するように設計されている。これは各単
語ごとに標章パターンを用意しておき、未知入カバター
ンが与えられると、これら標芹パターンと比較を行って
最も類似したものを探索することによって判定を行うと
いう原理である。通常、標章パターンは利用者が本人の
発音によって登録する。これは同一単語であっても八て
よって音声の特徴が大幅に異なるという個人差の問題に
対処するためである。All of these existing systems are designed to operate on the principle of pattern matching. The principle behind this is that a symbol pattern is prepared for each word, and when an unknown cover pattern is given, a determination is made by comparing these symbol patterns and searching for the most similar symbol pattern. Usually, the mark pattern is registered by the user using his or her own pronunciation. This is to deal with the problem of individual differences, where even the same word has significantly different speech characteristics.

しかし、同一人による同一単語の発声であっても、音声
のパターンは様々に変化する。このうち時間方向の伸縮
に関しては、特願昭４５−５３８９６号明細書に記され
るごとく、動的計画法を利用したパターンマツチング手
法（ＤＰマツチングと呼ばれている）が開発され有効に
利用されている。However, even when the same word is uttered by the same person, the speech patterns vary. Regarding expansion and contraction in the time direction, a pattern matching method using dynamic programming (called DP matching) has been developed and effectively used, as described in the specification of Japanese Patent Application No. 45-53896. has been done.

この他に、母音の無声化、鼻母音化、などいわば周波数
方向の変動というべきものがあり、これに対してけＤＰ
マツチングのような有効な方策が見付かっていない。　
　　′ このような種類の変動（以下単に変動と呼ぶ）に対する
方策として、同一単語に対して複数個の標準パターンを
用意しておき、これによって実際の入カバターンが、い
ずれかの標準パターンと一致するようにしようとする方
法がとられる。すなわち、利用者が各単語をそれぞれ複
数回発声して登録を行い、これらが実際に発生し得る周
波数方向の変動の代表例になっていると期待して用いる
ことによって、パターン変に起因する誤認識を低減しよ
うとするものである。しかし、多量の標準パターンを用
意するには、それなにに大容量のメモリーが必要となシ
、結果として、音声認識装置が高価になるという欠点が
ある。それに加えて、利用者が多数回発声登録しても、
それで実際に生起し得るパターン変動を総てカバーでき
るという保証がないと込う問題がある。むしろ普通の利
用者は緊張してしまって、一定のかしこまった発声を繰
り返してしまい、種々の変動パターンが登録できないこ
とが多い。In addition to this, there are changes in the frequency direction, such as vowel devoicing and nasal vowelization.
No effective measures such as matching have been found.
′ As a measure against this kind of variation (hereinafter simply referred to as variation), we prepare multiple standard patterns for the same word, and by doing so, the actual input pattern matches one of the standard patterns. A method is taken to try to do so. In other words, by uttering and registering each word multiple times and using these words with the expectation that they are representative examples of fluctuations in the frequency direction that can actually occur, errors caused by pattern changes can be avoided. It seeks to reduce awareness. However, in order to prepare a large number of standard patterns, a large capacity memory is required, and as a result, the speech recognition device becomes expensive. In addition, even if the user registers to speak many times,
Therefore, there is a problem in that there is no guarantee that all pattern variations that may actually occur can be covered. On the contrary, ordinary users often become nervous and repeat a certain, stiff utterance, and are often unable to register various fluctuating patterns.

（発明の目的）本発明は従来の標準パターン適応方式の有する上記欠点
を改良し、標準パターンのメモ、り一効率が良く、かつ
登録されなかった変動を持った音声パターンをも高い精
度で認識できる（Ｔ準パターン　−適応方式を実現し、
高性能で、しかも安価な音声認識装置を提供することを
目的とする。(Objective of the Invention) The present invention improves the above-mentioned drawbacks of the conventional standard pattern adaptation method, improves the memo-reading efficiency of standard patterns, and recognizes speech patterns with high accuracy even with variations that have not been registered. (T quasi-pattern - realizes an adaptive method,
The purpose is to provide a high-performance and inexpensive voice recognition device.

（発明の構成）本発明によれば、標準パターン適応方式は、特徴の時系
列のネットワークとして標章パターンを表現し、それと
利用者の発声による音声パターンとをマツチングさせて
ネットワーク中に最適パスを定め、この最適パス上の上
記特徴を上記音声パターンによって修正することを特徴
とする標章パターン適応方式が得られる。(Structure of the Invention) According to the present invention, the standard pattern adaptation method expresses a mark pattern as a time-series network of features, and matches it with a voice pattern uttered by a user to find an optimal path in the network. A mark pattern adaptation method is obtained, which is characterized in that the characteristics on the optimum path are modified by the sound pattern.

また、本発明によれば、特徴の時系列のネットワークと
して表現される標準パターンと利用者の発声による音声
パターンとをマツチングさせてネットワーク中に最適パ
スを定め、最適パス上の上記特徴と、それに対応する上
記音声パターン中の特徴とが所定の基準以上類似してい
るときに限って前記標４Ａターン特徴を該音声パターン
特徴によって修正することを特徴とする標準パターン適
応方式が得られる。Further, according to the present invention, an optimal path is determined in the network by matching a standard pattern expressed as a time-series network of features with a voice pattern uttered by the user, and the above-mentioned features on the optimal path and the A standard pattern adaptation method is obtained in which the standard pattern adaptation method is characterized in that the standard 4A turn feature is modified by the voice pattern feature only when the feature in the corresponding voice pattern is similar to or above a predetermined criterion.

さらに本発明によれば、特命の時系列のネットワークと
して表現される標準パターンと利用者の発声による音声
パターンとをマツチングさせてネットワーク中に最適パ
スを定め、最適パス上の上記特徴と、それに対応する上
記音声パターン中の特徴とが、所定の基準以上類似して
いる部分では前記標準パターン特徴を該音声パターン特
徴で修正し、所定の基準以上類似していない部分におい
ては、該部分に並列する新たなパスを、該音声パターン
特徴をもとにして作成することを特徴とする標準パター
ン適応方式が得られる。Further, according to the present invention, an optimal path is determined in the network by matching a standard pattern expressed as a special time-series network with a voice pattern uttered by the user, and the above-mentioned characteristics on the optimal path and corresponding responses are determined. In a portion where the features in the voice pattern are similar to each other by more than a predetermined standard, the standard pattern feature is modified with the voice pattern feature, and in a portion where the feature is not similar by more than a predetermined standard, the standard pattern feature is modified in parallel with the portion. A standard pattern adaptation method is obtained in which a new path is created based on the voice pattern features.

（作用）特願昭５７−１５６４１３号明細書【は稈準パターンを
特徴の時系列ネットワークとして表現し、それを時系列
標準パターンとマツチングさせることによって認識を行
う方法が示されている。このように標準パターンをネッ
トワーク表現すると、標準パターン記憶のためのメモリ
ー効率が高いという利点がある。いま、例として数字音
声“８”（／ｈａｔｆｉ／）の場合について考察する。(Operation) Japanese Patent Application No. 57-156413 discloses a method for recognition by expressing a quasi-culm pattern as a time-series network of features and matching it with a time-series standard pattern. Representing standard patterns in this way over a network has the advantage of high memory efficiency for storing standard patterns. Now, as an example, consider the case of the numeric sound "8" (/hatfi/).

この単語の発音は通常は／ｈｓｔｆｉ／であるが、語頭
の母音／、／が無声化して／ａ／（下部の“Ｑ”は無声
化を意味する）となったり、語尾の母音／ｉ／が無声化
して／ｉ／となったりすることが知られている。このよ
うな現象は相互独立に発生するので、結果として第１図
に示すような（ａ）／　ｈａｔｆ　ｉ　／　、　（ｂ）
／　ｈ　ａ　ｔ　ｆ　ｉ　／、　（ｃ）／　ｈ　ａ　ｔ
　ｆ　ｉ　／、　（ｄ）／　ｈ　ａ　ｔ　ｆ　ｒ　／、
の計４種の変動が発生することになる。第１図は、いわ
ゆるオートマトン表現であり、初期状態０より出発して
、音韻り、　ａ（ａ）、　、ｔ　ｆ　、　１（ｉ）を受
は取って順次、状態１．２．３　　と進み、最終状態４
に至って終了する。This word is usually pronounced /hstfi/, but sometimes the initial vowels /, / are devoiced to become /a/ (the "Q" at the bottom means devoiced), or the final vowel /i/ is known to be devoiced to become /i/. Since these phenomena occur independently of each other, the result is (a) /hatfi /, (b) as shown in Figure 1.
/ h a t f i /, (c) / h a t
f i /, (d) / h at f r /,
A total of four types of fluctuations will occur. Fig. 1 is a so-called automaton representation, starting from the initial state 0, taking the phonemes, a(a), , t f , 1(i), and proceeding sequentially to states 1.2.3. , final state 4
and ends.

ある状態から次の状態へある入力（この場合は音韻）を
受けて移ることは遷移と呼ばれる。この場合の入力とし
ては音韻を示す記号でもよいが、一般的には、それぞれ
の入力（例えば／ａ／に対応するもの）が特徴を示す量
（例えばイクトル）の時系列となっていてもよい。Moving from one state to the next in response to some input (in this case, phonology) is called a transition. In this case, the input may be a symbol indicating a phoneme, but generally, each input (for example, corresponding to /a/) may be a time series of quantities indicating characteristics (for example, ictor). .

第１図の（ａ）　、　（ｂ）　、　（ｃ）　、　（ｄ）
の例はそれぞれの種類の変形（の組合せ）に対して独立
の標準パターンを用意する場合を示し、ている。このよ
うな場合に、利用者本人の発声によって登録されなかっ
た種類の変動一対しても高ｂＧＷ度で認識できる方式が
昭和５９年８月１日に出願さハた特許明細書「標準パタ
ーン登録方式」に記載されている。しかし、このような
方式は、もともと各種変動に対して独立に標準パターン
を用意する必要があるため記憶量が大になるという欠点
があった。(a), (b), (c), (d) in Figure 1
This example shows a case where an independent standard pattern is prepared for each type of transformation (combination of). In such a case, a method was filed on August 1, 1982 that allows recognition of variations in types that were not registered by the user's own voice at a high bGW degree, and the patent specification ``Standard Pattern Registration'' was filed on August 1, 1982. method”. However, such a method originally had the drawback that it required a large amount of memory because standard patterns had to be prepared independently for various variations.

。そこで、本発明においては第２図の如く、標準パター
ンをネットワーク状（分岐を含む）オートマトンとして
表現することを前提としている。このように表現すると
標準パターン記憶のためのメモリ量が大幅に低減される
という利点がある。すなわち、記憶量はほぼ、オートマ
トンの状態（節）と状態（節）を冶ぶ遷移（枝）の数に
比例するが。. Therefore, the present invention is based on the premise that the standard pattern is expressed as a network-like (including branching) automaton as shown in FIG. This representation has the advantage that the amount of memory for storing the standard pattern is significantly reduced. In other words, the amount of memory is roughly proportional to the number of states (nodes) of the automaton and the number of transitions (branches) that control the states (nodes).

第１図の（ａ）　、　（ｂ）　、　ｆｃ）　、　ｆｄ）
では計１６個であるの番こ対して、第２図では６個とな
っている。(a), (b), fc), fd) in Figure 1
In contrast to the total of 16 pieces in Figure 2, there are 6 pieces in Figure 2.

このように分岐を有するオートマトン表現されのことを
節と呼び、“→″（状態遷移）のことを枝と呼ぶのはグ
ラフ理論での術語によるものである。なお、第２図の各
校ζこ対しては／ｈ／、　／ａ／のような記号が存在す
るのではなくて、前記特願昭５７−１５６４１３号明細
書でのけ準パターン記述と同様に音声の特徴の時系列が
対応しているのである。The expression of an automaton with branches in this way is called a node, and "→" (state transition) is called an edge, based on terminology in graph theory. Note that for each school ζ in FIG. 2, symbols such as /h/ and /a/ do not exist, but are similar to the quasi-pattern description in the specification of Japanese Patent Application No. 57-156413. The time series of voice features corresponds to

いま例として、音＠／ｈ／に対応する特徴をベクトル列
ｂ１ｂ、　ｂ、で示す。同様にして各音韻に／　ｉ／　
　　ｂｌａ　　ｂｔｓ　　ｂｔ。As an example, the features corresponding to the sound @/h/ are represented by vector sequences b1b, b. Similarly, for each phoneme /i/
Bla bts bt.

なるベクトル列を対応づける。前記特願昭５７−１５６
４１３号明細書にはこのような標準パターンをＢ””　ｂｌｂｚ　ｂ３　（ｂ４　ｋｇ　ｂ６　ｂ７　
Ｔｏｌ／ｂｇ　ｂｚ６　Ｔｏｌｌ　）ｂｔｔ　　ｂｔｓ
　（ｂｔｔ　　ＩＬＬｌ　　ｂｔａ　　ｂ！ｑ　　／ｂ
ｔｓ　　ｂｔｏ　ｂｔｏ　　）　　（２１と表現し、入
カバターン人＝　ａｓ　　ｂｚ　°°＝’　ａ　Ａ°゛゛ａ　ｒ　
　　　　　　　（３１とのマツチングを行う装置が記載
されている。このパターンマツチング装置を用い、（２
）式の如き標準パターンを所要の単語毎に用意しパター
ンマツチング法を行うことにより、小量の標準パターン
メモＩＪｉｔで、しかもパターンの変動に対して安定な
認識を行うことができる。The vector sequences are associated with each other. Said patent application Sho 57-156
The specification of No. 413 describes such a standard pattern as B"" blbz b3 (b4 kg b6 b7
Tol/bg bz6 Toll ) btt bts
(btt ILLl bta b!q /b
ts bto bto ) (expressed as 21, entering Kabatan = as bz °°=' a A°゛゛a r
(31) is described. Using this pattern matching device, (2
) By preparing a standard pattern for each required word and performing the pattern matching method, stable recognition can be performed with a small amount of standard pattern memo IJit and even against pattern fluctuations.

ここで問題になるのは標準パターンＢを如何して作成す
るかという点である。これに対しては本発明では次の２
点を基本方針とｑている。The problem here is how to create standard pattern B. In contrast, in the present invention, the following two
The basic policy is to do this.

＋１１　　予想される種類の変動はあらかじめネットワ
ーク中の技として組み込んでおき原形標準パターンを用
意する。この処理は大量のサンプル発声のパターンを目
視と手作業によって実行できる。+11 The expected types of variations are incorporated in advance as techniques in the network and a standard standard pattern is prepared. This process can be performed visually and manually using a large number of sample utterance patterns.

（２）このままでは利用者本人の発声パターンが考慮さ
れていないので認識率が低いと予想される。(2) As it is, the recognition rate is expected to be low because the user's own speech pattern is not taken into account.

そこで、この原形標準パターンを本人の発声齋こよって
修正する。Therefore, this original standard pattern is modified based on the person's vocalization.

上記の内、（１）は人間が介在してのオフライン処理で
あってよく、その方法に関しては本発明の関与するとこ
ろではないので、これ以上の説明は省略する。Of the above, (1) may be offline processing with human intervention, and the present invention does not relate to that method, so further explanation will be omitted.

以下、上記］２）を実行する方式について説明する。Hereinafter, a method for executing 2) above will be described.

いま原形標準パターンとして第２図のネットワーク、す
なわち（２）式のＢが与えられているとし、（３）式の
人を利用者本人の発声による音声パターンとして、これ
を用いて原形標準パターンＢを修正する場合について述
べる。Now assume that the network in Figure 2, that is, B in equation (2), is given as the original standard pattern, and the person in equation (3) is the voice pattern uttered by the user himself/herself, and this is used to create the original standard pattern B. Let's discuss the case of modifying.

音声パターンＡと原形標準パターンＢとをマツチングし
て、入カバターン人の各特徴１１１が標準パターンのど
の特徴に対応づけられるかを定める。The voice pattern A and the original standard pattern B are matched to determine which feature of the standard pattern each feature 111 of the person entering the cover pattern is associated with.

この対応づけを定める方法としては前記特願昭５７−１
５６４１３号明細書に記さているスタックを用いるＤＰ
マツチング法を改良し用いることによっても可能である
が、ここでは、より理解し易い方法の例を示す。第２図
のネットワークを展開して第１図の（ａ）　、（ｂ）　
−（ｃ）　＊　（ｄ）のような形に変換する。（ＩＬｆ
２）式の形式で考えると次の４種の時系列に変換される
〇これらを統一的に示すために添字をつけなおしてＣ：Ｃ
１Ｃ！゛°”°゛　１°”−”　ＣＪ　　　　（４）と
表現する。例えばＢ４の場合となる。As a method for determining this correspondence, the above-mentioned patent application No. 57-1
DP using the stack described in No. 56413
This is also possible by improving and using a matching method, but here we will show an example of a method that is easier to understand. Expanding the network in Figure 2, we get (a) and (b) in Figure 1.
−(c) * Convert to the form shown in (d). (ILf
2) When considered in the form of an equation, it is converted into the following four types of time series〇To show these in a unified manner, resubscripts are added as C:C
1C! It is expressed as ゛°”°゛ 1°”-” CJ (4).For example, this is the case of B4.

特願昭４６−６２７８２号明細齋こほこのような時系列
ＡとＣとの間で、特徴ａｌと（Ｊとの間の対応を、写像
ｊ　＝　ｊ　（ｉ）として定め、対応づけられた特徴同
士の距離が、全体として最小になるようにする手段が記
載されている。これによって得られる写像ｊ　＝　ｊ　
（ｉ）によると音＠／ｈ／、／、／、／ｌｆ／。Patent Application No. 1982-62782 The correspondence between the features al and (J is determined as mapping j = j (i) between the time series A and C such as Kohoko, and the correspondence is established. A method for minimizing the distance between features as a whole is described.The resulting mapping j = j
According to (i), the sounds @/h/, /, /, /lf/.

／ｉ／に属するベクトル１ｉとｃＩとが正確に対応づけ
られるとされている。このために特願昭４６−６２７８
２号明細書ではｕ２１．ｕ３１式に動的計画法漸化式％式％のもとに計算するという、いわゆるＤＰマツチングの処
理を実行している。ここにｄ（１，ｊ）はａｌとｃｊと
の距離ＩＩ　−１−−ｓ　Ｉｔである。（６）式の計算
並行して右辺〔〕内の最小値が第１式、第２式であるそ
れぞれの場合に応じてＷ（ｉ　、　ｊ　）＝０゜１．２
の値をテーブル状に記憶しておく。これをもとにして次
のような漸化的処理によって写像ｊ（ｉ）を定める。It is said that the vector 1i belonging to /i/ and cI are accurately correlated. For this purpose, the patent application No. 46-6278
In specification No. 2, u21. A so-called DP matching process is executed in which calculation is performed based on the dynamic programming recurrence formula % formula % on the u31 formula. Here, d(1,j) is the distance II −1−s It between al and cj. In parallel to the calculation of equation (6), W (i, j) = 0゜1.2 depending on each case where the minimum value in the right side [] is the first equation and the second equation.
Store the values in a table. Based on this, mapping j(i) is determined by the following recursive process.

ｊ　（Ｉ　）＝Ｊｊ（ｉ−１）＝ｊ（ｉ）−Ｗ（ｉ、ｊ（ｉ））　　　　
（８）また（５）式の計算結果として得られるｇ（１，
Ｊ）はパターンＡとＣとの間の距離Ｄ（人、Ｃ）となる
ことが知られている。j (I) = J j (i-1) = j (i) - W (i, j (i))
(8) Also, g(1,
J) is known to be the distance D(person, C) between patterns A and C.

いまパターンＣとして前記のＢを展開したパターンＢｌ
　ｍ　Ｂｍ　＠　ｇ、　ｅ　Ｂ４を代入して（６）（７
１式によるＤＰマツチングを繰り返し実行して距離Ｄ（
人、Ｂ１）。Now, as pattern C, pattern Bl is developed from the above B.
m Bm @ g, e Substitute B4 and get (6) (7
By repeatedly performing DP matching using equation 1, the distance D(
person, B1).

Ｄ（人、Ｂ、）、Ｄ（人＝　Ｂ、　）、Ｄ（Ａ−８４）
を求める・これら距離が最小となるＢｒｎを定めると、
この音声パターフ人はＢｆｆ、と同じ傾向の変動を持っ
ていることになる。そこで原形標準パターンＢの中のＢ
ｍに対応する枝列（最適パスと呼ぶ）上の特徴を音声パ
ターン人の特徴で修正する。か（すると、同一傾向の変
動である限り、利用者本人の特徴がマツチングに使用さ
れることになるので、正確な認識動作が可能となる。D (person, B, ), D (person = B, ), D (A-84)
Find Brn where these distances are the minimum, then
This voice pattern person has the same tendency of fluctuation as Bff. Therefore, B in the original standard pattern B
The features on the branch sequence (referred to as the optimal path) corresponding to m are modified using the features of the voice pattern person. (Then, as long as the fluctuations have the same tendency, the user's own characteristics will be used for matching, making it possible to perform accurate recognition operations.

上記の最適パス上の特徴の修正操作を具体例を用いて説
明する。距離Ｄ（人、Ｂｍ）が最小となる展開パターン
がＢ４であったとする。この場合パス上の特徴、すなわ
ち（５）式のｔ＋が修正されることになる。（８）式に
よりて得られる写像ｊ　＝　ｊ　（ｉ）によりて時系列
Ｃの中で粕に対応づけられるべき（１が定まるので、こ
のＪに（５）式によって対応づけられる特徴すをｉ、ｌ
によって置換する。例えば・・・・・・ｊ（３１＝４．　ｊ（４）＝４．　ｊ（５
）＝６　・・・・となったときはなる置換１こよって修正を行う。The operation for modifying the features on the optimal path described above will be explained using a specific example. Assume that the development pattern with the minimum distance D (person, Bm) is B4. In this case, the feature on the path, ie, t+ in equation (5), will be modified. Mapping j = j obtained by equation (8) (1) should be associated with lees in time series C by (i) ,l
Replace by. For example...j(31=4.j(4)=4.j(5
) = 6..., then make the correction by replacing 1.

このような修正は１個の原形標準パターンに対して複数
の音声パターン（当然同一単語のもの）を繰り返し与え
て行なうのがよい。かくして標準パターンの（利用者に
対する）適応化が完成する。Such modification is preferably carried out by repeatedly applying a plurality of voice patterns (naturally, of the same word) to one original standard pattern. In this way, the adaptation of the standard pattern (to the user) is completed.

このようにして得られる標準パターンＢは、ネットワー
ク表現されているためにメモリ量が少なくてよく。利用
者本人の発声に現われる種類の変動に対しては適応化が
完成している。また、たまたま利用者本人の発声に現れ
なかった種類の変動に対しては原形標準パターンとして
与えられた枝が残されている。それゆえ、このような標
準パターンの適応処理を各単語毎に行って、これら標準
パターンＢを用い、前記特願昭５７−１５６４１３に記
載されている手段によって入カバターンとのパターンマ
ツチングを行うことにすると、小形でかつ高精度の音声
認識が可能になる。The standard pattern B obtained in this way requires a small amount of memory because it is expressed in a network. Adaptation has been completed to the types of fluctuations that appear in the user's own utterances. Furthermore, for types of variations that do not happen to appear in the user's own utterances, branches given as the original standard pattern are left. Therefore, such standard pattern adaptation processing is performed for each word, and using these standard patterns B, pattern matching with the input cover pattern is performed by the means described in the above-mentioned Japanese Patent Application No. 57-156413. This makes it possible to perform compact and highly accurate speech recognition.

（実施例）第３図は本発明による第１の実施例を説明するためのブ
ロック図である。制御部１０は制御信号ｆｉ、Ｉ’ｆｌ
を発生して全体の動作を制御する。この認識装置の動作
は標準パターンを利用者本人の性質に適応させるための
適応モードと、実際に認識を　　　　　。(Embodiment) FIG. 3 is a block diagram for explaining a first embodiment of the present invention. The control unit 10 receives control signals fi, I'fl
generates and controls the entire operation. This recognition device operates in an adaptive mode to adapt the standard pattern to the user's own characteristics, and in actual recognition.

行うための認識モードとに分れている。ます不発　　　
　　：明が直接関与する適応モードについて説明する。It is divided into recognition mode and mode. More and more misfires
: Describe the adaptation mode in which Ming is directly involved.

いま認識対象単語として数字“Ｏ”、”１”・・・ｎ・
・・“９”を考える。各章ｉｎには原形標準パターンＢ
１が（２）式に例示した形式で記憶部１４に記憶されて
いる。これらは前記制御部１０よりの制御信号ｎによっ
て指定されて読み出され、バッファ１５に送られて適応
処理される。いまｎ　＝　８として数字８の標準パター
ンＢａが適応処理される場合の例を示す。添字８を省略
して、第２図に対応する（２）式のＢの形で示して説明
する。Now, the numbers “O”, “1”...n, are the words to be recognized.
...Think of "9". Original standard pattern B for each chapter
1 is stored in the storage unit 14 in the format illustrated in equation (2). These are designated and read out by the control signal n from the control section 10, and sent to the buffer 15 for adaptive processing. An example will now be shown in which a standard pattern Ba of the number 8 is adaptively processed with n=8. The subscript 8 will be omitted and the description will be made using the form B of equation (2) corresponding to FIG. 2.

マイクロホン１１から利用者本人の発声による“８″の
音声が入力され、前処理部１２で分析処理され、（３）
式の音声パターン人の形に変換されバッファ１３に入力
される。この時点以後、前記制御部１０からの制御信号
ｍは、１，２，３，４゜５の値をとって変化される。こ
れに応じて前記バッファ１５から、前記原形パターンを
展開したパターンＢ”　、　Ｂ”　、　ＢＳ　、　Ｂ番
が発生され、それぞれが（４）式のパターンＣとしてバ
ッファ１６に送られ、前記の入カバターン人と、Ｍ１マ
ツチング部１７でマツチング処理される。第１マツチン
グ部１７は前記特願昭４６−６２７８２号明細書の第５
図と同様に構成され、前記ｆ６）　、　（７）　、　（
ｓ）式の如き処理によって、距離ｎ（人、Ｃ）と、写像
ｊ　（ｉ）とが出力される。The voice of "8" uttered by the user is inputted from the microphone 11, and analyzed by the preprocessing unit 12, (3)
The voice pattern of the expression is converted into a human form and input into the buffer 13. After this point, the control signal m from the control unit 10 is changed to take values of 1, 2, 3, and 4°5. In response, the buffer 15 generates patterns B", B", BS, and number B, which are developed from the original pattern, and each of them is sent to the buffer 16 as pattern C in equation (4), and the input cover pattern is The M1 matching unit 17 performs matching processing with the person. The first matching section 17 is the
It is configured similarly to the figure, and the above f6), (7), (
A distance n (person, C) and a mapping j (i) are output by processing such as the equation s).

最小値検出部１８では、各Ｃ＝Ｂｍに対して計算される
距離Ｄ（Ａ、Ｃ）が最小となるｍ：ｍ’を定める。この
ｍ：ｍ’に対応する写像ｊ（ｉ）は写像バッファ１９に
保持される。いま先の原理説明の場合と同様ｍ′＝４で
あったとする。変換記憶部２０にはＢ４を発生するため
のｔｌとｂｋとの対応（（５）式のような対応づけ）が
記憶されている。この対応づけと、前記写像バッファ１
９に保持される写像を用いて（９）式に例示したごとく
、−ｊ（＋）に対応するベクトルＴｏｋを定め、これを
１１によって置換することにより、標準パターンの修正
を行うという処理が修正部２１で実行される。このよう
に修正された標準パターンは前記記憶部１４に数字４　
　　　″の標準パターンＢａとして記憶される。The minimum value detection unit 18 determines m:m' where the distance D (A, C) calculated for each C=Bm is the minimum. The mapping j(i) corresponding to this m:m' is held in the mapping buffer 19. Assume that m'=4 as in the previous explanation of the principle. The conversion storage unit 20 stores the correspondence between tl and bk (correspondence as shown in equation (5)) for generating B4. This correspondence and the mapping buffer 1
As illustrated in equation (9) using the mapping held in 9, the process of modifying the standard pattern is modified by determining the vector Tok corresponding to -j(+) and replacing it with 11. It is executed in section 21. The standard pattern modified in this way is stored in the storage section 14 as a number 4.
'' is stored as a standard pattern Ba.

このような処理を数字０〜９について行なうこトニ適応
モードが終了する。After performing such processing for numbers 0 to 9, the adaptive mode ends.

次に認識モードについて簡単に説明する。このモードで
はバッファ１３に入力されるパターン人は未知の入カバ
ターンである。このパターンは第２マツチング部２２に
導かれ、そこで前記記憶部１４に保持されている適応済
みの標準パターンＢｎ（ｎ＝０．１．　−・・・・９）
と比較される。このマツチング部の構成は前記の特頓昭
５７−１５６４１３号明細書の第７図と同様である。た
だし、この文献ではオンライン文字認識への応用を例示
しており、音声認識を対象としている本発明の場合には
上記第７図の距離計算部２６を、特徴ａｌと特徴ｂｌと
の距離を算出する形式に変更する。Next, the recognition mode will be briefly explained. In this mode, the pattern input to buffer 13 is an unknown input pattern. This pattern is guided to the second matching section 22, where the adapted standard pattern Bn (n=0.1.--9) held in the storage section 14 is used.
compared to The structure of this matching section is the same as that shown in FIG. 7 of the above-mentioned specification of Tokuton Sho 57-156413. However, this document exemplifies the application to online character recognition, and in the case of the present invention targeting speech recognition, the distance calculation unit 26 in FIG. 7 is used to calculate the distance between the feature al and the feature bl. Change the format to

このマツチング部によって入カバターン人と各鳴準パタ
ーンＢｒｌとの距離Ｄ（Ａ、Ｂ”）が算出される。これ
らは最小値検出部２３によって相互比較され、最小とな
るｎ　”　ｎが決定される。このｎが認識結果として出
力される。This matching section calculates the distance D (A, B") between the incoming cover turner and each standard pattern Brl. These are compared with each other by the minimum value detection section 23, and the minimum value n"n is determined. . This n is output as the recognition result.

以上第１の実施例について説明したが、第１の実施例に
おいては、次のような問題点が残されている。第２図の
例で述べると、原形標準パターンＢには母音／、／とし
ては正常な発声／、／と無声化した発声／、／の２種の
変動が用意されている。母音の変動には、この他に鼻音
化という現象があることが知られており、／？／（〜は
鼻音化を示す）なる変動パターンが発声し得る。いま音
声パターンＡに／７／なる変動があるとき、これを用い
て適応を行うと、／、／または／、／のいずれかの枝（
ここでは／、／とする）が／７／の特徴によって置換さ
れてしまう。こうなると正常な母音／、／の枝が消滅し
てしまい、／ｈ　、　ｔ　ｆ　ｉ／あるいは／　ｈ　ａ
　ｔ　ｆ　Ｉ　／なる発声に対する認識が劣化すること
になる。Although the first embodiment has been described above, the following problems remain in the first embodiment. To describe the example of FIG. 2, the original standard pattern B has two variations for the vowels / and /: normal utterances /, / and devoiced utterances /, /. In addition to this, it is known that there is a phenomenon called nasalization in the variation of vowels, /? A variation pattern of / (~ indicates nasalization) can be uttered. Now, if there is a variation of /7/ in speech pattern A, if we perform adaptation using this, either branch of /, / or /, / (
Here /, /) are replaced by the feature of /7/. In this case, the normal vowel /, / branches disappear, /h, t f i/ or / ha
Recognition of the utterance t f I / will deteriorate.

以下に述べる第２の実施例は、上記の如く原形標準パタ
ーンＢに存在しなかった種類の変動を有する音声パター
ン人を用いて適応処理を行なっても、原形標準パターン
に劣化を生じない標準パターン適応方式を提供するもの
である。The second embodiment described below is a standard pattern that does not cause deterioration in the original standard pattern even if adaptive processing is performed using a speech pattern person having a type of variation that did not exist in the original standard pattern B as described above. It provides an adaptive method.

本発明の第２の実施例は、特徴の時系列のネットワーク
として表現される障準パターンと利用者の発声による音
声パターンとをマツチングさせてネットワーク中に最適
パスを定め、最適パス上の上記特徴と、それに対応する
上記音声パターン中の特徴とが所定の基準以上類似して
いるときに限って前記標準パターン特徴を該音声パター
ン特徴によって修正することを特徴とする。A second embodiment of the present invention determines an optimal path in a network by matching an obstacle pattern expressed as a time-series network of features with a voice pattern uttered by a user, and and the corresponding feature in the voice pattern are similar to each other by more than a predetermined criterion, the standard pattern feature is modified by the voice pattern feature.

かくの如く適応することにすると、正常母音／ａ／に対
しても、無声化母音／、／に対しても録音化母音／７／
は一定以上異なったパターンであるので、類似性は基準
値以下となり、修正によって／、／、　／、／の枝が消
滅することはなく、したがって原形標準パターンが劣化
するという現象は回避できる。If we decide to adapt in this way, the recorded vowel /7/ will be applied to both the normal vowel /a/ and the devoiced vowel /, /.
are patterns that differ by more than a certain degree, so the similarity is below the reference value, and the /, /, /, / branches will not disappear due to modification, and therefore the phenomenon of deterioration of the original standard pattern can be avoided.

第３図において、第１マツチング部１７によって（６）
式を実行するために２１とＣ３の距離ｄ（ｉ、Ｄが算出
されている。前記修正部でｃ＋（１）に対応するベクト
ル６ｋを修正するとき、前記の距離ｄ（ｉ。In FIG. 3, the first matching section 17 (6)
In order to execute the formula, the distance d(i, D) between 21 and C3 is calculated. When the correction section corrects the vector 6k corresponding to c+(1), the distance d(i, D) between 21 and C3 is calculated.

ｊ（ｉ））が所定の基準値以下であると、ｂｋを２１で
置換するという処理を省略する。If j(i)) is less than a predetermined reference value, the process of replacing bk with 21 is omitted.

かくすると、原形標準パターンに含まれていない音声パ
ターンで適応化を行なっても、標準パターンが劣化する
ことはない。In this way, even if adaptation is performed using a voice pattern that is not included in the original standard pattern, the standard pattern will not deteriorate.

以上述べた第２の実施例により、ｔＪＩ、ｌの実施例に
おいて残されていた原形標準パターンに含まれない種類
の変動を持りた音声パターンを用いて適応処理を行なっ
ても標準パターンが劣化するという問題をも防止するこ
とができた。しかし、この第２の実施例においても原形
標準パターンネットワークの枝として組み込まれていな
い種類の変動は、そのような変動を持ったパターン人を
用いて適応処理を行っても、永久に標準パターン中に組
み込まれることが無いという欠点が残されている。According to the second embodiment described above, the standard pattern deteriorates even if adaptive processing is performed using a speech pattern that has a type of variation that is not included in the original standard pattern that was left in the embodiment of tJI,l. This problem could also be prevented. However, even in this second embodiment, variations that are not incorporated as branches of the original standard pattern network will remain permanently in the standard pattern even if adaptive processing is performed using a pattern person with such variations. The drawback remains that it is not incorporated into the system.

例えば第２図の原形標準パターンＢには／、／が鼻音化
した／７／なる枝が含まれていないので、／ｈ’７ｔｆ
　ｉ／なる音声パターン人を用いて適応処理を行っても
、第２の発明の処理では／７／の特徴を持りた枝は発生
せず、したがって／ｈ７＊ｆｘ／あるいは／ｈ７ｔｆｉ
／なる音声パターンの認識は上手く行かない。For example, the original standard pattern B in Figure 2 does not include the branch /7/ in which / and / are nasalized, so /h'7tf
Even if adaptive processing is performed using a speech pattern person with i/, the processing of the second invention does not generate a branch with the characteristics of /7/, and therefore /h7*fx/ or /h7tfi
/ speech pattern recognition does not work well.

以下に述べる第３の実施例は上記欠点をも改良し、原形
標準パターンネットワークに組込まれていなかったよう
な種類の変動をも学習し適応し得るものである。The third embodiment described below also improves on the above drawbacks and is capable of learning and adapting to types of variations that were not incorporated into the original standard pattern network.

本発明の第３の実施例は、特徴の時系列のネットワーク
として表現される標準パターンと利用者の発声による音
声パターンとをマツチングさせてネットワーク中ｌこ最
適パスを定め、最適パス上の上記特徴と、それに対応す
る上記音声パターン中の特徴とが、所定の基準以上類似
している部分では前記標準パターン特徴を該音声パター
ン特徴で修正し、所定の基準以上類似していない部分に
おいては、該部分に並列する新たなパスを、該音声パタ
ーン特徴をもとにして作成することを特徴とする。A third embodiment of the present invention determines an optimal path in the network by matching a standard pattern expressed as a time-series network of features with a voice pattern uttered by a user, and and the corresponding features in the voice pattern are similar to each other by more than a predetermined standard, the standard pattern features are modified by the voice pattern features, and in parts where they are not similar by more than a predetermined standard, The method is characterized in that a new path parallel to the part is created based on the voice pattern feature.

＠２図、あるいは（２）式に示す原形標準パターンＢを
考える。これに対して、／ｈ７ｔｆｉ／なる音声パター
ンＡを用いて適応化処理を行うとする。Consider the original standard pattern B shown in Figure @2 or equation (2). On the other hand, it is assumed that adaptation processing is performed using voice pattern A of /h7tfi/.

第３図の第１マツチング部で得られる写像Ｎｉ）と距離
ｄ（ｉ、ｊ）を参照する修正部２１では／７／に属する
ベクトルａｌとそれに対応するＣバー）との距離ｄ　（
ｉ　、　ｊ（ｉ））が、／ａ／あるいは／、／と／７／
との不一致に起因して大となることが検知できる。これ
らの部分では、第４図のように、／７／の特徴を音声パ
ターンから抜きとって並べ、新たな枝を付加する。（２
）式は例えば次のようになるＯＢ　＝　　ｂｔ　　ｂｔ　　ｂｓ　　（ｈ４　ｂ５　ｂ
ｅ　　ｂ丁　ｂｓ　／　　ｂｅ　　も１０”１１／ａｓ
　　ａｈ　　ａｓ　　）ｂａｔ　　ｂ＋３　（１１１４
ｉｔｓ　　ｂｔｓ　　ｂｓｔ　　／１ｉｔｓｂｔｕ　ｂ
ｔｏ　　）　　　　　　　　　　　　　　ＯＣｊここに
２３２４２ｇが鼻音化母音／、／に対応する特徴として
音声パターン人から抜き出したものである。かくすると
、あらかじめ原形標準パターンＢに組み込まれていない
種類の変動をも自動習得しで適応することが可能となっ
た。The correction unit 21 that refers to the mapping Ni) obtained in the first matching unit in FIG. 3 and the distance d(i, j) calculates the distance d (
i, j(i)) is /a/ or /, / and /7/
It can be detected that the value increases due to the discrepancy between the two. In these parts, as shown in FIG. 4, the features of /7/ are extracted from the voice pattern and arranged, and new branches are added. (2
) For example, the formula is as follows: OB = bt bt bs (h4 b5 b
e b ding bs / be also 10”11/as
ah as ) bat b+3 (1114
its bts bst /1itsbtu b
to ) OCj Here, 23242g is extracted from the speech pattern human as a feature corresponding to the nasalized vowel /, /. In this way, it has become possible to automatically learn and adapt to variations that are not incorporated in the original standard pattern B in advance.

以上、本発明の原理を実施例に基づいて説明したがこれ
らの記載は本発明の権利範囲を限定するものではない。Although the principle of the present invention has been explained above based on examples, these descriptions do not limit the scope of the rights of the present invention.

特に標準パターン特徴Ｊをａｌによって（Ｅ５正する方
法としてａｌによるｂｊの置換ｖｆに＋″″１”“パ″
２゛（１）　（ｔｔ：ｔ　’）”Ｆ−１４ｉｃｒ６　　
　　。In particular, as a method to correct standard pattern feature J by al (E5), replace vf of bj by al with +""1""pa"
2゛(1) (tt:t')"F-14icr6
.

こともできる。またネットワーク標準パターンの表現及
び取り扱い方法としては別の方法（例えば特願昭５８−
１３１４３８号明細誉に記載の方式）によることもでき
る。さらに、音声の単位は数字のような単語でなく、子
音・母音音節、子音・母音・子音音声等の別の単位であ
ってもよい。You can also do that. In addition, there are other methods for expressing and handling network standard patterns (for example, Japanese Patent Application No. 1982-
131438) may also be used. Furthermore, the unit of sound is not a word such as a number, but may be another unit such as a consonant, a vowel syllable, a consonant, a vowel, or a consonant sound.

（発明の効果）かくの如き適応方式を用いると、標準パターン記憶骨が
少なく、かつ話者適応性が良く、適応モードで入力され
なかった１類の変動をも正しく詔識できる話者適応が可
能となった。(Effects of the invention) When such an adaptation method is used, the standard pattern memory is small, the speaker adaptation is good, and the speaker adaptation can correctly recognize type 1 variations that were not input in the adaptation mode. It has become possible.

[Brief explanation of the drawing]

第１図、゛第２図、第４図は原理説明図、第３図は実施
例ブロック図である。図において、１０　制御部、　　　１１・・マイクロホン、１２・・
・前処理部、　　１３・・・バッファ、１４・・・記憶
部、　　　１５・バッファ、１６・・バッファ、　　１
７・・・第１マツチング部、１８　・最小値検出部、１
９・・・写′＜？バッファ、２０・・変換記憶部、　２
１・・・＜６正部、２２−用２マツチング部、２３・・・最小値検出部、である。1, 2, and 4 are principle explanatory diagrams, and FIG. 3 is a block diagram of an embodiment. In the figure, 10 control unit, 11...microphone, 12...
・Preprocessing unit, 13...Buffer, 14...Storage unit, 15.Buffer, 16...Buffer, 1
7... First matching section, 18 - Minimum value detection section, 1
9...Photo'<? Buffer, 20... Conversion storage unit, 2
1...<6 positive part, 22-2 matching part, 23... Minimum value detection part.

Claims

[Claims]

(1) An optimal path is determined in the network by matching a standard pattern expressed as a time-series network of features with a voice pattern uttered by the user, and the above-mentioned features on this optimal path are modified by the above-mentioned voice pattern. A standard pattern adaptation method featuring:

(2) An optimal path is determined in the network by matching a standard pattern expressed as a time-series network of features with a voice pattern uttered by the user, and the above features on the optimal path and the corresponding voice pattern are determined. A standard pattern adaptation method characterized in that the standard pattern feature is modified by the voice pattern feature only when the features in the middle are similar to each other by a predetermined criterion or more.

(3) An optimal path is determined in the network by matching a standard pattern expressed as a time-series network of features with a voice pattern uttered by the user, and the above features on the optimal path and the corresponding voice pattern are determined. In a part where the features are similar to each other by more than a predetermined standard, the standard pattern feature is modified with the voice pattern feature, and in a part where the features are not similar by more than a predetermined standard, a new path parallel to the part is created. , a standard pattern adaptation method characterized in that it is created based on the voice pattern characteristics.