JPS616694A

JPS616694A - Voice registration system

Info

Publication number: JPS616694A
Application number: JP59126800A
Authority: JP
Inventors: 石塚　久夫
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1984-06-20
Filing date: 1984-06-20
Publication date: 1986-01-13

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（技術分野）本発明は音声認識装置における音声登録方式に関する。[Detailed description of the invention] (Technical field) The present invention relates to a voice registration method in a voice recognition device.

　。.

（従来技術）音声認ｍは、あらかじめ登録しておいた標準パターンと
、入力された未知音声パターンとのパターンマツチング
処理を行い、最も似ている標準パターンをもって認識結
果とする方式が一般的である。この場合、あらかじめ登
録すべき音声の登録の仕方は、−回の発声でその音声を
分析、検出して登録パターンを作成し、この作成された
登録パターンを音声登録するものが広く使われている。(Prior art) The general method of voice recognition m is to perform pattern matching between a standard pattern registered in advance and an input unknown voice pattern, and use the most similar standard pattern as the recognition result. be. In this case, the widely used method for registering the voice that should be registered in advance is to analyze and detect the voice by uttering - times to create a registration pattern, and then register the created registration pattern as the voice. .

この場合、登舜のための発声に失敗すると、音声認識が
うまく行かないことになる。また、同一話者が同一単語
を発声しても１発声の度に全く同一とはいえない。また
、登録のだめの発声が１例えば風邪などのために普段と
全く異なったときには、発声に失敗したときと同様に認
識が困難になる。In this case, if the utterance for boarding fails, the voice recognition will not go well. Furthermore, even if the same speaker utters the same word, it cannot be said that each utterance is exactly the same. Furthermore, if the registered utterance is completely different from usual, for example due to a cold, recognition becomes difficult, as is the case when the utterance fails.

しかし、この音声認識が困娃となる原因を発声の失敗と
断定することは難しく、また、認識処理において認識さ
れるべき音声と異なる音声が誤って認識される場合もあ
る。このことは認識の際の発声不良に原因する場合もあ
るが、登録されている標準パターンの質がスい場合にも
起り得る。However, it is difficult to conclude that the cause of this difficulty in speech recognition is a failure in utterance, and there are also cases in which a voice different from the voice that should be recognized is erroneously recognized in the recognition process. This may be caused by poor vocalization during recognition, but it can also occur if the quality of the registered standard pattern is poor.

このようガ発声による登録失敗をなくすためにあらかじ
め練習した上で、音声登録を行々ったときには、曳質の
標準パターンを作成することか可能と考えられる。さら
に、音声認識の際の発声においても、登録時の練習の効
果が期待できるため誤認識が起こりにくくなり、認識率
が向上すると考えられる。しかし、音声認識処理′を簡
単に行なえるようにするためには、音声登録時に、発声
練習を必要とするよう碌ことは装置運用上望ましくない
。In order to eliminate such registration failures due to vocalizations, it may be possible to create a standard pattern of recordings if you practice in advance and then perform voice registration. Furthermore, since practice during registration can be expected to have an effect on utterances during speech recognition, misrecognition is less likely to occur and the recognition rate is expected to improve. However, in order to easily perform the voice recognition process, it is not desirable in terms of device operation to require vocal practice at the time of voice registration.

（発明の目的）本発明の目的は、このような問題点上解決し、音声登録
時に良質ガ標準パターンを作成するための発声練習を不
要とし、しかも音声登録時の発声不良を原因とする誤認
識を排除できる音声登録方式を提供することにある。(Object of the Invention) An object of the present invention is to solve the above-mentioned problems, eliminate the need for vocal practice to create a high-quality standard pattern during voice registration, and prevent errors caused by poor vocalization during voice registration. The object of the present invention is to provide a voice registration method that can eliminate recognition.

（発明の構成）本発明の構成は、登録された音声標準パターンと入力音
声パターンとのマツチング処理を行う音声認識装置の音
声登録方式において、登録しようとする第１の音声囚と
同一輪とし″′Ｃ認識されるべき音声を複数回発声して
得られた複数の第１の音声パターン（Ａｓ　ｔ　ＡＩ−
・・・、　Ａｎ　）の相互間で所定のマツチング距離計
算を行ってこれら第１の音声パターン（Ａｌ〜Ａｎ）に
対して同輪間マッチング距離の平均を求めるとともに、
前記第１の音声Ａと異なる飴として認識されるべき音声
を発声して得られた第２の音声パターン（Ｂｔ、Ｂｔ、
・・・、Ｂｎ）と前記第１の音声パターン（Ａｔ　−Ａ
Ｉ　＊・・・ｔ　Ａ　ｎ　）との異語間でマツチング距
離計算を行って異語間マツチング距離の平均を求め、前
記第１の音声パターン（Ａ８．鳥。(Configuration of the Invention) The configuration of the present invention is such that in a voice registration method of a voice recognition device that performs a matching process between a registered voice standard pattern and an input voice pattern, 'C A plurality of first speech patterns obtained by uttering the speech to be recognized a plurality of times (As t AI-
..., An), and calculate the average matching distance between the same rings for these first voice patterns (Al to An).
A second voice pattern (Bt, Bt,
..., Bn) and the first audio pattern (At -A
A matching distance is calculated between different words with I*...t A n ) to obtain an average of the different words matching distance, and the first speech pattern (A8. bird) is calculated.

・・・、Ａｎ）の中から前記同輪間マッチング距離の平
均と異語間マツチング距離の平均との差の太きいものを
前記第１の音声（５）の標準パターンとして登録するこ
とを特徴とする。. . . , An), the one with the largest difference between the average matching distance between the same words and the average matching distance between different words is registered as the standard pattern of the first speech (5). shall be.

（実施例）次に、本発明を図面によって詳細に説明する。(Example) Next, the present invention will be explained in detail with reference to the drawings.

第１図は本発明の一実施例のブロック図、第２図は第１
図における音声パターンの一例のマツチング距離を示す
図である。図において、１は音声分析検出部、２は音声
登録部、３，５は記憶部、４はマツチング処理部、５は
平均処理部、７は減算器、８は比較部である。FIG. 1 is a block diagram of one embodiment of the present invention, and FIG. 2 is a block diagram of an embodiment of the present invention.
It is a figure which shows the matching distance of an example of the audio|voice pattern in a figure. In the figure, 1 is a speech analysis detection section, 2 is a speech registration section, 3 and 5 are storage sections, 4 is a matching processing section, 5 is an averaging processing section, 7 is a subtracter, and 8 is a comparison section.

今、登録しようとする音声か３種類（Ａ、Ｂ、Ｃ）あり
、音声Ａを３回発声し、これら発声パターンをそれぞれ
Ａ１　＊　Ａ２　＝　ＡＩとする。また、音声Ｂを４回
（Ｂ１＊　Ｂ２−　Ｂｓ−Ｂ４　）音声Ｃを１回（Ｃ１
）発声し、これら音声のうちひとつだけ選択して標準パ
ターンとして登録する方法について説明する。第１図に
おいて、まず、音声分析検出部１で、発声された音声パ
ターンＡ１ｔ　Ａ（ｅ　）’ｓ　ｔ　Ｂｔ　＊　Ｂｔ　
＊　Ｂｓ　ｔ　Ｂ４１　Ｃを記憶ｓ３に格納保持する。Now, there are three types of voices (A, B, and C) to be registered, and voice A is uttered three times, and each of these utterance patterns is A1 * A2 = AI. In addition, sound B is played four times (B1*B2- Bs-B4) and sound C is played once (C1
) and select only one of these voices to register as a standard pattern. In FIG. 1, first, the speech analysis detection unit 1 analyzes the uttered speech pattern A1tA(e)'stBt*Bt
*Bs t B41 C is stored and held in the memory s3.

この際、どのパターンとどのパターンが同一輪であるか
といった音声語情報も記憶部３に格納する。次に、マツ
チング処理部４は、記憶部３に格納された音声パターン
を読み出してマツチング距離をそれぞれ計算し、結果を
記憶部５に格納する。At this time, phonetic word information such as which pattern and which pattern are the same ring is also stored in the storage unit 3. Next, the matching processing unit 4 reads out the audio patterns stored in the storage unit 3, calculates matching distances for each, and stores the results in the storage unit 5.

第２図におけるマツチング距離の欄は、このようにして
得られたものである。このマツチング処理に関しては、
例えば共立出版（株）発行の情報科学講座Ｅ１９．３の
新美康永著「音声認識」の第１０７頁に述べられている
。発声が全く等く、音声パターンも全く同じになれは、
マツチング距離はゼロになり、２つの音声パターン間の
マツチング距離が小さい程、その２つの音声は似ている
し、マツチング距離が大きい程異なっていることを示し
ている。The matching distance column in FIG. 2 was obtained in this way. Regarding this matching process,
For example, it is described on page 107 of "Speech Recognition" by Yasunaga Niimi, Information Science Course E19.3, published by Kyoritsu Shuppan Co., Ltd. If the vocalizations are exactly the same and the vocal patterns are exactly the same,
The matching distance becomes zero, indicating that the smaller the matching distance between two voice patterns, the more similar the two voices are, and the larger the matching distance, the more different they are.

次に同−詔として認識されるべき音声Ａつ＝　Ａ（、Ａ
−ｓの各パターンに対して平均処理部６で、同輪間マッ
チング距離の平均Ｘと異語間マツチング距離の平均Ｙと
を求め、減算器７でこれらの差（Ｙ−Ｘ）を求める。こ
うして得られた各パターンＡｔ　、鳥−Ａ、に対するＹ
−Ｘの値を比較部８で比較し、最も大きい値を与えるパ
ターンを標準パターンとして登碌部２で登録する。この
例では、ＡＩが最も大きい差Ｙ−Ｘを与えるので、この
発声Ａ１を音声人の標準パターンとして登録する。次に
Ｂｓ−Ｂｔ−’％ｌＢ４に対しても同様に同輪間マッチ
ング距離の平均、異語間マツチング距離の平均を求め、
それらの差を求めて最も大きい差を与えるパターンＢ４
を標準パターンとして登録する。Ｃは一回のみの発声な
ので、Ｃ１をそのまへ登録する。Next, the voice A that should be recognized as the same edict = A (, A
For each pattern of -s, the average processing unit 6 calculates the average X of the matching distance between same words and the average Y of the matching distance between different words, and the subtractor 7 calculates the difference (Y-X) between them. Y for each pattern At thus obtained, bird-A
The comparison unit 8 compares the values of -X, and the registration unit 2 registers the pattern giving the largest value as a standard pattern. In this example, since AI gives the largest difference Y-X, this utterance A1 is registered as the standard pattern for the voice person. Next, for Bs-Bt-'%lB4, calculate the average matching distance between same words and the average matching distance between different words,
Pattern B4 that gives the largest difference by finding the difference between them
Register as a standard pattern. Since C is uttered only once, C1 is registered as is.

この類似語間のマツチング距離は小さくなるが認識され
るべき音声パターンとのマツチング距離よりも小さくな
ると誤認識と一＆ってしまうので。This matching distance between similar words becomes small, but if it becomes smaller than the matching distance with the speech pattern that should be recognized, it will result in incorrect recognition.

別の音声として認識されるべきものは、互いにマツチン
グ距離の大きいパターンを標準パターンとした方が有利
である。また、同じ語として認識されるべきパターンの
中では、発声失敗時のパターンは排除すべきであるが、
この発声失敗時のパターンに対する同輪間マッチング距
離は比較的大きくなる。実施例では、異語間マツチング
距離の平均から同輪間マッチング距離の平均を差引いた
値が最も大きくなるパターンを標準パターンとするので
、別の語として認識されるべき音声と類似していす、し
かも発声失敗パターンを排除したパターンを選んでいる
ことになる。こうして選ばれた音声パターンは、標準パ
ターンとして適切であり誤認識を低減することができる
。For things that should be recognized as different voices, it is advantageous to use patterns with a large matching distance as standard patterns. Also, among patterns that should be recognized as the same word, patterns that occur when pronunciation fails should be excluded,
The matching distance between the same rings for this pattern when vocalization fails becomes relatively large. In the example, the standard pattern is the pattern in which the value obtained by subtracting the average matching distance between different words from the average matching distance between same words is the largest, so that the pattern that is similar to the speech that should be recognized as another word, Moreover, this means that a pattern is selected that excludes patterns of failure in vocalization. The voice pattern selected in this way is suitable as a standard pattern and can reduce misrecognition.

（発明の効果）以上説明したように、本発明によれば、発声の失敗によ
る再登録の繁雑さを除去し、よい登録のための発声練習
を不要とし、異なる音声を誤認識することを避け、さら
に認識率を向上させる登録音声データを作成できるとい
う効果がある。(Effects of the Invention) As explained above, according to the present invention, it is possible to eliminate the complexity of re-registration due to failure of vocalization, eliminate the need for vocal practice for good registration, and avoid misrecognition of different voices. This has the effect of creating registered voice data that further improves the recognition rate.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すプロ、り図、第２図は
本実施例の説明に用いられるパターンとマツチング距離
との関係図である。図において１・・・・・・音声分析
検出部、２・・・・・・音声登録部、３゜５・・・・・
・記憶部、４・・・・・・マツチング処理部、６・・・
・・・平均処理部、７・・・・・・減算器、８・・・・
・・比較部、１０・・・・・・入力端子である。第２図FIG. 1 is a schematic diagram showing an embodiment of the present invention, and FIG. 2 is a diagram showing the relationship between patterns and matching distances used to explain this embodiment. In the figure, 1... Voice analysis detection unit, 2... Voice registration unit, 3° 5...
・Storage unit, 4...Matching processing unit, 6...
... Average processing section, 7 ... Subtractor, 8 ...
. . . Comparison unit, 10 . . . Input terminal. Figure 2

Claims

[Claims] In a voice registration method of a voice recognition device that performs a matching process between a registered voice standard pattern and an input voice pattern, a plurality of voices to be recognized as the same ring as a first voice to be registered are recognized. A predetermined matching distance is calculated between the plurality of first speech patterns obtained by uttering the same words twice, and the average matching distance between the same words is calculated for these speech patterns. Calculate the matching distance between the second speech pattern obtained by uttering the speech to be recognized as the first speech pattern and the first speech pattern to obtain the average of the matching distance between the first speech pattern and the first speech pattern. A voice registration method characterized in that a voice pattern having a large difference between the average matching distance between the same words and the average matching distance between different languages is registered as a standard pattern of the first voice.