JPS6324297A

JPS6324297A - Voice dictionary generation system for specified speaker's voice recognition equipment

Info

Publication number: JPS6324297A
Application number: JP61168528A
Authority: JP
Inventors: 笹沼　三郎; 小牧　光弘
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-07-17
Filing date: 1986-07-17
Publication date: 1988-02-01

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［概　要］本発明は特定話者音声圧ｇｇｌｉ装置において、音声認
識辞書として、予め使用語の複数人による発声から作成
した共通辞書と、入力者の発声の共通辞書に対する認識
により誤認識する語についてのみ入力者専用に作成した
特定辞書とで構成するようにしたもので、これにより登
録のための発声回数と辞書領域を低減した。[Detailed Description of the Invention] [Summary] The present invention provides a voice recognition dictionary for a specific speaker voice pressure ggli device, which uses a common dictionary created in advance from the utterances of words used by a plurality of people, and a common dictionary of input person's utterances. This system is configured with a specific dictionary created exclusively for the input user for words that are misrecognized due to recognition of words, thereby reducing the number of utterances and dictionary area required for registration.

[Industrial application field]

本発明は、特定話者音声認識装置における音声辞書作成
方式に関する。The present invention relates to a speech dictionary creation method in a specific speaker speech recognition device.

特定話者音声認識装置は、使用に当って前もって人力者
の音声を登録し、認識時にその登録した音声に対し発声
した音声を比較照合し、認識結果とする。その登録と、
さらに認識率を高めるための登録された音声パラメタに
対する修正のための発声、即ち学習の回数を低減する方
が使用者にとっては望ましい。When using a specific speaker speech recognition device, the speech of a skilled person is registered in advance, and during recognition, the uttered speech is compared against the registered speech and used as a recognition result. The registration and
Furthermore, it is desirable for the user to reduce the number of times of utterance, that is, learning, for correcting registered voice parameters in order to increase the recognition rate.

また、辞書領域についても、特定の人の辞書領域が少な
ければ少ないほど、同一装置に内蔵できる使用者の辞書
数が増える。或いは、使用者が少ないならば装置を小型
化できるので望ましいことである。Also, regarding the dictionary area, the smaller the dictionary area for a specific person, the greater the number of user's dictionaries that can be built into the same device. Alternatively, if the number of users is small, it is desirable because the device can be made smaller.

［従来の技術］従来例の特定話者音声認識装置の音声認識辞書作成方式
を第４図の装置構成図に示す。[Prior Art] A speech recognition dictionary creation method of a conventional speaker-specific speech recognition device is shown in the device configuration diagram of FIG.

示す。show.

５は音声認識装置１に接続し、これを制御し、その音声
認識結果を利用する上位装置である。Reference numeral 5 denotes a host device that connects to the speech recognition device 1, controls it, and uses the speech recognition results.

１０は音声認識部に備えられる音声認識辞書である。10 is a speech recognition dictionary provided in the speech recognition section.

３０は入力線、３１は内部インタフェース線、３２は外
部インタフェース線である。30 is an input line, 31 is an internal interface line, and 32 is an external interface line.

従来は、第４図に示すように、音声認識辞書１０は入力
者専用の特定辞書２０だけで構成されていた。Conventionally, as shown in FIG. 4, the speech recognition dictionary 10 was comprised only of a specific dictionary 20 dedicated to the input user.

即ち、音声入力を行う総ての者は、入力すべき総ての語
について発声し、その音声パラメタをそれぞれの入力者
専用の音声認識辞書として格納する登録という作業を行
う。That is, all the people who perform voice input speak out all the words to be input, and perform the registration process of storing the voice parameters as a voice recognition dictionary dedicated to each inputter.

なおその後も、認識率を向上するために複数回の発声を
行い、音声認識辞書を修正する学習という作業を行う。After that, in order to improve the recognition rate, the robot performs a learning process in which it utters the words multiple times and corrects the speech recognition dictionary.

［発明が解決しようとする問題点］上記のように従来の特定話者音声入力装置の音声入力辞
書は、その登録および学習のために、各使用者による多
くの発声回数を必要とするものであった０また、特定辞書だけで構成していたために、入力者′が
代る場合は、特定辞書全部を作り替え、または既に作成
しである場合は、外部記憶装置などを利用して総て入れ
替えなければならない。[Problems to be Solved by the Invention] As mentioned above, the voice input dictionary of the conventional speaker-specific voice input device requires each user to speak a large number of times in order to register and learn the voice input dictionary. In addition, since it was configured with only a specific dictionary, if the person who inputs the data changes, the entire specific dictionary must be recreated, or if it has already been created, it can be completely rewritten using an external storage device, etc. Must be replaced.

また、複数人の音声人力辞書を内蔵する場合は、使用す
る人数分の総てを内蔵する大きな音声入力辞書を必要と
するという問題点があった。Furthermore, when a plurality of voice input dictionaries are built in, there is a problem in that a large voice input dictionary containing all the voices for the number of users is required.

本発明は、このような従来の問題点を解消した。The present invention solves these conventional problems.

新規な特定話者音声入力装置の音声入力辞書作成方式を
提供しようとするものである。The present invention aims to provide a new voice input dictionary creation method for a specific speaker voice input device.

Ｅ問題点を解決するための手段］第１図は本発明の特定話者音声認識装置の音声認識辞書
作成方式の原理ブロック図を示す。Means for Solving Problem E] FIG. 1 shows a block diagram of the principle of a speech recognition dictionary creation method for a specific speaker speech recognition apparatus of the present invention.

第１図おいて、２１は共通辞書であり、予め使用する語
について数人または数十人の発声した音声パラメタに基
づいて作成しておく。In FIG. 1, reference numeral 21 denotes a common dictionary, which is created in advance based on voice parameters uttered by several or dozens of people regarding the words to be used.

４１は音声認識部に備えている音声パラメタ抽出機能で
ある。41 is a speech parameter extraction function provided in the speech recognition section.

４２は音声認識部に備えている比較照合機能である。42 is a comparison and verification function provided in the speech recognition section.

入力者の発声は音声パラメタ抽出部４１により音比較照
合機能４２により正しく入力されなかった語については
、その使用者用の特定辞書２２として登録する。正しく
認識された語については特定辞書２１には登録しない。The utterances of the inputting person are processed by the sound parameter extraction section 41 and the sound comparison/verification function 42. Words that are not correctly inputted are registered in the specific dictionary 22 for that user. Correctly recognized words are not registered in the specific dictionary 21.

このように、音声認識辞書１０を共通辞書２１と、特定
辞書２２で構成し、認識時には、使用者の音声パラメタ
を共通辞書２１とその使用者用の特定辞書２２と比較照
合して認識する。In this way, the speech recognition dictionary 10 is composed of the common dictionary 21 and the specific dictionary 22, and during recognition, the speech parameters of the user are compared and recognized with the common dictionary 21 and the specific dictionary 22 for the user.

［作用］音声認識辞書１０を、共通辞書２１と特定辞書２２とに
分けて構成し、共通辞書２１は既に作成されているため
、入力者による音声登録という特別の作業はなくなる。[Operation] The speech recognition dictionary 10 is configured to be divided into a common dictionary 21 and a specific dictionary 22, and since the common dictionary 21 has already been created, there is no need for a special task of voice registration by the input person.

また、入力者特有の声のパターンは特定辞書２２に収納
することにより、高い認識率を得られる。Furthermore, by storing voice patterns unique to the input person in the specific dictionary 22, a high recognition rate can be obtained.

［実施例コ以下第２図および第３図に示す実施例により、本発明を
さらに具体的に説明する。[Example] The present invention will be explained in more detail with reference to an example shown in FIGS. 2 and 3.

第２図は本発明の実施例の装置構成図である。FIG. 2 is a diagram showing the configuration of an apparatus according to an embodiment of the present invention.

第２図において、第４図に示した従来例と異なる点は、
音声認識辞書１０を共通辞書２１と特定辞書２２に分け
て構成した点である。In Fig. 2, the differences from the conventional example shown in Fig. 4 are as follows.
This is because the speech recognition dictionary 10 is divided into a common dictionary 21 and a specific dictionary 22.

第３図は本発明の実施例における処理の流れを示す図で
ある。FIG. 3 is a diagram showing the flow of processing in an embodiment of the present invention.

第３図において、２１は共通辞書であって、使用する語
について予め数人或いは数十人の発声した音声パラメタ
を幾つかパターンに集約し、音声認識辞書１０に内蔵し
ておく。In FIG. 3, reference numeral 21 denotes a common dictionary, in which voice parameters of words to be used uttered by several or several dozen people are collected in advance into several patterns and stored in the voice recognition dictionary 10.

２２は各入力者専用の特定辞書である。22 is a specific dictionary dedicated to each inputter.

２３は入力者の発声から抽出した音声パラメタを示す。23 indicates audio parameters extracted from the input person's utterance.

以下、第２図および第３図を参照して、本実施例装置の
動作を説明する。The operation of the apparatus of this embodiment will be described below with reference to FIGS. 2 and 3.

（１）入力者がマイク４から使用語を発声すると、マイ
ク４から入力された音声は、入力線３０を経て特定話者
音声認識装置１の音声認識部２に入力される。(1) When the input person utters a word to be used from the microphone 4, the voice input from the microphone 4 is input to the voice recognition unit 2 of the specific speaker voice recognition device 1 via the input line 30.

（２）音声認識部２では、入力された音声から音声パラ
メタ２３を作成する。(2) The speech recognition unit 2 creates speech parameters 23 from the input speech.

（３）作成された音声パラメタ２３と共通辞書２】とを
照合し、一定の数値以上の差がある場合は、その音声パ
ラメタを特定辞書２２として別に作成する。(3) The created audio parameters 23 and the common dictionary 2] are compared, and if there is a difference of more than a certain value, the audio parameters are created separately as the specific dictionary 22.

例えば、Ａ氏の第１番目の語の音声パラメタ２３１は共
通辞書２１の総ての語と照合し、最も類似した語が予定
した階１語バラメク２１１であり、且つ一定の数値内の
差であったから特定辞書２２にパラメタを作成しない。For example, the audio parameter 231 of Mr. A's first word is checked against all the words in the common dictionary 21, and the most similar word is the planned floor 1 word barameku 211, and the difference is within a certain numerical value. Therefore, no parameters are created in the specific dictionary 22.

次に、第２番目の語の音声パラメタの隘２語パラメタ２
３２は同様に総ての語と照合し最も類似した語が予定し
た患２語パラメタ２１２ではあったが、一定の数値以上
の差があった、または最も類似した語が予定した隘２語
パラメタ２１２でなかったから、特定辞書２２に寛２語
Ａ氏特定パラメタ２２１　として音声パラメタが収納さ
れる。Next, the second word parameter 2 of the phonetic parameter of the second word.
Similarly, 32 was compared with all words and the most similar word was the scheduled two-word parameter 212, but there was a difference of more than a certain value, or the most similar word was the scheduled two-word parameter 212. 212, the voice parameter is stored in the specific dictionary 22 as the Kan 2 language Mr. A specific parameter 221.

（４）同様に、８氏の場合も、発声した語のうち、共通
辞書２１と一定の数値以上の差がある場合は、ｌｔｎ語
Ｂ氏り定パラメタ２２２のように特定辞書２２内に８氏
専用の辞書が作成される。(4) Similarly, in the case of Mr. 8, if there is a difference of more than a certain value from the common dictionary 21 among the words uttered, 8 is added to the specific dictionary 22 as in the ltn word B limit parameter 222. A dictionary exclusively for him will be created.

（５）音声認識時には、以上のようにして作成された特
定辞書２２について、上位装置５および音声入力制御部
３により入力者の指定を行い、特定辞書２２内の個人別
の辞書を限定して照合する辞書を決定する。従って、音
声認識は共通辞書２１および特定辞書２２中の指定され
た個人辞書と照合して行われる。(5) At the time of speech recognition, the host device 5 and the voice input control unit 3 specify the input person for the specific dictionary 22 created as described above, and limit the individual dictionaries in the specific dictionary 22. Decide which dictionary to check. Therefore, speech recognition is performed by comparing with designated personal dictionaries in the common dictionary 21 and the specific dictionary 22.

［発明の効果］以上説明のように本発明によれば、予め作成された共通
辞書に合わない発声のみを特定辞書とすることで、数少
ない発声回数により高い認識率の辞書を作成することが
でき、また複数人の辞書を少ない辞＠領域で構成でき、
音声認識装置の性能および操作性の向上に寄与する効果
は大である。[Effects of the Invention] As explained above, according to the present invention, by using only the utterances that do not match the common dictionary created in advance as a specific dictionary, it is possible to create a dictionary with a high recognition rate with a small number of utterances. , Also, dictionaries for multiple people can be configured with fewer dictionary @ areas,
The effect of contributing to improving the performance and operability of the speech recognition device is significant.

[Brief explanation of the drawing]

第１図は本発明の原理ブロック図、第２図は本発明の実施例の装置構成図、第３図は本発明
の実施例の処理の流れを示す図、第４図は従来例の装置
構成図である。図面において、 ■は特定話者音声認識装置、　２は音声認識部、３は音
声入力制御部、　　　　　４はマイク、５は上位装置、
　　　　　　　１０は音声認識辞書、２０は特定辞書（
従来）、　　　２］は共通辞書、２２は特定辞書（本発
明）、　　２３は音声パラメタ、３０は人力線、３１は内部インタフェース線、３２は外部インタフェース線、４１は音声パラメタ抽出機能、４２は比較照合機能、２１１〜２１４は共通辞書内各語パラメタ、２２Ｌ　２
２２は特定辞書２２中パラメタ、２３１〜２３３は各音
声パラメタ、をそれぞれ示す。１Ｎ１．、′ 本発明の原理ブロック図第　　４　　図Fig. 1 is a block diagram of the principle of the present invention, Fig. 2 is a block diagram of a device according to an embodiment of the present invention, Fig. 3 is a diagram showing the processing flow of an embodiment of the present invention, and Fig. 4 is a diagram of a conventional device. FIG. In the drawings, ■ is a specific speaker voice recognition device, 2 is a voice recognition unit, 3 is a voice input control unit, 4 is a microphone, 5 is a host device,
10 is a speech recognition dictionary, 20 is a specific dictionary (
2] is a common dictionary, 22 is a specific dictionary (invention), 23 is a voice parameter, 30 is a human power line, 31 is an internal interface line, 32 is an external interface line, 41 is a voice parameter extraction function, 42 is a comparison Verification function, 211 to 214 are parameters for each word in the common dictionary, 22L 2
22 is a parameter in the specific dictionary 22, and 231 to 233 are audio parameters. 1N1. ,' Fig. 4 Block diagram of the principle of the present invention

Claims

[Claims]

(1) Speech recognition dictionary provided by the specific speaker speech recognition device (1
0), it consists of a common dictionary (21) created from the audio parameters uttered by multiple speakers regarding the words used, and a specific dictionary (22) created from the audio parameters uttered by the inputter about the words used. A speech dictionary of a specific speech speech recognition device, characterized in that the speech dictionary is configured to recognize the speech uttered by comparing and collating the speech uttered with the dictionary for the inputting person in the common dictionary (21) and the specific dictionary (22). Creation method.

(2) The specific dictionary (22) is created from only the audio parameters of words spoken by the user that were unrecognizable through comparison with the common dictionary (21). A speech dictionary creation method for a specific speaker speech recognition device according to claim 1.