JPH0256680B2

JPH0256680B2 -

Info

Publication number: JPH0256680B2
Application number: JP58121377A
Authority: JP
Inventors: Ryoichi Ogasawara; Takashi Yoshida; Hideyuki Koike
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1983-07-04
Filing date: 1983-07-04
Publication date: 1990-11-30
Also published as: JPS6012599A

Description

【発明の詳細な説明】発明の技術分野本発明は、音声認識装置を利用したシステムに
於いて、音声パタンフアイルの管理・編集方法に
関するものである。DETAILED DESCRIPTION OF THE INVENTION Technical Field of the Invention The present invention relates to a voice pattern file management/editing method in a system using a voice recognition device.

技術の背景従来のこの種のシステムは、その使用にあたつ
て、利用者一人一人が自分の音声と、その音声が
正しく認識された時に出力すべき情報を対応づけ
て、あらかじめ登録しておく必要があり、また出
力すべき情報が多くの利用者にとつて共通のもの
であつても個別に管理する必要がある。（たとえ
ば、斎藤収三：音声情報処理の基礎、オーム社，
1981.11）従来技術と問題点従来の方式では登録量の増加に伴い、利用者が
自分で登録する量が多くなり、登録回数が増し、
このため利用者の手数負担の増加およびシステム
で管理するフアイル量の増加という欠点があつ
た。Background of the technology In order to use this type of conventional system, each user registers in advance the information that should be output when the user's voice is correctly recognized. Even if the information to be output is common to many users, it is necessary to manage it individually. (For example, Shuzo Saito: Fundamentals of speech information processing, Ohmsha,
1981.11) Conventional technology and problems In the conventional method, as the amount of registrations increases, users have to register more themselves, the number of registrations increases,
This has resulted in disadvantages such as an increase in the burden on users and an increase in the amount of files managed by the system.

また、すでに登録された音声パタンによる音声
認識において、話者の発声する音声の経年変化等
により、誤認識が多くなつた場合、利用者自身が
再登録しなければならないという欠点があつた。 Furthermore, in voice recognition using already registered voice patterns, if there are many false recognitions due to changes in the voice uttered by the speaker over time, etc., there is a drawback that the user must re-register the voice patterns.

発明の目的本発明はこれらの欠点を解決するため、共用で
きる情報は音声パタンと共にシステムで用意し、
利用者があらかじめ登録しなくとも取り出せるよ
うにするとともに、認識確度の管理と音声パタン
の操作によつて、誤認識増加に伴う音声の再登録
を省略できるようにしたもので、以下図面につい
て詳細に説明する。Purpose of the Invention In order to solve these drawbacks, the present invention prepares information that can be shared together with voice patterns in a system,
In addition to making it possible for users to retrieve the sound without registering it in advance, the system also allows users to manage recognition accuracy and manipulate sound patterns to eliminate the need to re-register sounds due to an increase in misrecognitions.The drawings are detailed below. explain.

発明の実施例図は本発明による音声パタン編集方法を説明す
る一実施例の具体的構成例であつて、１は音声入
力端子、２は音声分析部、３は入力音声パタンメ
モリ、４は認識処理部、５は制御部、６は入出力
インタフエース、７は音声パタンフアイル、７
１，７２は音声パタンフアイル７を構成するもの
であつて、７１は共通音声パタンフアイル、７２
は個別音声パタンフアイル、８は音声パタンに対
応して登録された出力用情報フアイルである。な
お音声分析部２は現在LSI化され一般に供されて
いる音声分析器、また認識処理部４、制御部５は
通常のマイクロプロセツサが適用される。Embodiment of the Invention The figure shows a specific configuration example of an embodiment for explaining the audio pattern editing method according to the invention, in which 1 is an audio input terminal, 2 is an audio analysis section, 3 is an input audio pattern memory, and 4 is a recognition unit. 5 is a processing unit, 5 is a control unit, 6 is an input/output interface, 7 is an audio pattern file, 7
1 and 72 constitute the audio pattern file 7, 71 is a common audio pattern file, 72
8 is an individual voice pattern file, and 8 is an output information file registered corresponding to the voice pattern. Note that the speech analysis section 2 is an LSI-based speech analyzer that is currently available to the general public, and the recognition processing section 4 and control section 5 are ordinary microprocessors.

音声入力端子１より音声が入力されると、音声
分析部２で入力音声パタンに変換され、入力音声
パタンメモリ３に蓄積される。蓄積完了と同時に
制御部５は、個別音声パタンフアイル７２のうち
発声者に対応した部分と入力音声パタンメモリ３
の内容とを比較するよう認識処理部４に指示し、
認識処理部４による認識結果としてどの音声パタ
ンであるかを示す識別番号、認識確度すなわち認
識における類似度の認識処理結果を受け取る。制
御部５では、入出力インタフエース６を通じて認
識処理部４から受け取つた認識結果の確認を発声
者に求めるか、あるいは当該システムであらかじ
め決めてある判断基準に従つて認識結果の妥当性
をチエツクし、正しいと判断された場合、識別番
号により出力用情報フアイル８を検索し、検索し
た結果の出力情報を入出力インタフエース６を通
じて外部へ出力する。 When speech is input from the speech input terminal 1, it is converted into an input speech pattern by the speech analysis section 2 and stored in the input speech pattern memory 3. At the same time as the storage is completed, the control unit 5 saves the portion of the individual voice pattern file 72 corresponding to the speaker and the input voice pattern memory 3.
instructs the recognition processing unit 4 to compare the contents of
As a recognition result by the recognition processing unit 4, an identification number indicating which voice pattern it is and a recognition processing result of recognition accuracy, that is, similarity in recognition are received. The control unit 5 requests the speaker to confirm the recognition result received from the recognition processing unit 4 through the input/output interface 6, or checks the validity of the recognition result according to criteria determined in advance by the system. , if it is determined to be correct, searches the output information file 8 using the identification number, and outputs the output information as a result of the search to the outside through the input/output interface 6.

上に述べた音声パタンフアイル管理・編集動作
において、音声認識の結果、個別音声パタンフア
イル７２に該当解なしと判断された場合、制御部
５は認識処理部４に対し、入力音声パタンメモリ
３の内容との比較を共通音声パタンフアイル７１
との間で行なうよう認識処理部４に指示し、比較
結果を受け取る。制御部５では、前記動作と同様
の確認手段により認識結果が正しいと判断された
場合には、識別番号による出力用情報フアイル８
の検索結果を外部へ出力するとともに、入力音声
パタンメモリ３の内容に当該識別番号を付与し、
個別音声パタンフアイル７２の発声者対応の部分
に追加登録する。共通音声パタンフアイル７１に
も該当解なしと判断された場合にはその旨を発声
者に通知し、処理を終了する。 In the voice pattern file management/editing operation described above, if it is determined that there is no corresponding answer in the individual voice pattern file 72 as a result of voice recognition, the control section 5 instructs the recognition processing section 4 to read the input voice pattern memory 3. Common audio pattern file 71 for comparison with the content
The recognition processing unit 4 is instructed to perform the comparison between the two and receives the comparison results. In the control unit 5, if the recognition result is determined to be correct by the confirmation means similar to the above-mentioned operation, the output information file 8 based on the identification number is
output the search results to the outside, and add the identification number to the contents of the input voice pattern memory 3,
It is additionally registered in the portion corresponding to the speaker of the individual voice pattern file 72. If it is determined that there is no corresponding solution in the common speech pattern file 71, the speaker is notified of this and the process is terminated.

上に述べた音声パタンフアイル管理・編集動作
において、個別音声パタンフアイル７２から正解
が得られ、結果が発声者に通知された場合、制御
部５では認識結果として認識処理部４より出力さ
れる認識確度を、過去にさかのぼつて統計処理
し、その結果が当該システムであらかじめ決めて
ある判断基準より下回ると、正解として選択され
た音声パタンの操作（例えば入力音声パタンと置
き換える等）を行ない、個別音声パタンフアイル
７２を更新する。 In the voice pattern file management/editing operation described above, when a correct answer is obtained from the individual voice pattern file 72 and the result is notified to the speaker, the control unit 5 uses the recognition output from the recognition processing unit 4 as a recognition result. The accuracy is statistically processed retrospectively, and if the result is lower than the judgment criteria predetermined by the system, the audio pattern selected as the correct answer is manipulated (for example, replaced with the input audio pattern), and the individual audio Update the pattern file 72.

すなわち、利用者である発声者の入力音声パタ
ンが個別音声パタンフアイル７２に一致解があつ
た場合は、発声者の入力音声パタンを個別音声パ
タンフアイル７２の中の対応する音声パタンの過
去にさかのぼつて認識における類似度の認識確度
の変動の統計的値と、一致解があつた場合の認識
における類似度の値とがあらかじめ定める閾値を
越えたとき、たとえば個別音声パタンフアイル７
２の中の対応する入力音声パタンの個別音声パタ
ンと置き換える、正解として選択された音声パタ
ンの操作を行い、個別音声パタンフアイル７２を
更新する。 That is, if the input voice pattern of the user who is the speaker is found to match in the individual voice pattern file 72, the input voice pattern of the user is traced back to the past of the corresponding voice pattern in the individual voice pattern file 72. For example, when the statistical value of the variation in recognition accuracy of similarity in recognition and the value of similarity in recognition when there is a matching solution exceed a predetermined threshold, for example, the individual speech pattern file 7
The voice pattern selected as the correct answer is replaced with the individual voice pattern of the corresponding input voice pattern in 2, and the individual voice pattern file 72 is updated.

なお、本実施例の構成では入力音声パタンメモ
リを設けることで、入力音声パタンを取り出す手
段としているが、他の実施例としては、入力音声
を蓄積・再生する手段を設け、音声認識装置を認
識終了後に登録モードへ切り替え、再生音により
入力音声パタンを作成し、取り出す方法、音声認
識装置を２装置用意し、一方を認識モード、他方
を登録モードとすることで入力音声パタンを取り
出す方法等がある。 Note that in the configuration of this embodiment, an input speech pattern memory is provided as a means for extracting input speech patterns, but in other embodiments, a means for storing and reproducing input speech is provided, and the speech recognition device recognizes the input speech pattern. After the process is complete, switch to registration mode, create and retrieve the input audio pattern using the playback sound, prepare two speech recognition devices, set one to recognition mode and the other to registration mode, and retrieve the input audio pattern. be.

発明の効果以上説明したように、多くの利用者に共通の情
報をシステムで一括管理し、それに対応した音声
パタンのフアイルを持つことにより、利用者があ
らかじめ登録しなくとも、共通情報を利用できる
ことから、利用者操作性向上の利点がある。Effects of the Invention As explained above, by collectively managing information common to many users in a system and having files with corresponding audio patterns, the common information can be used without the need for users to register in advance. This has the advantage of improving user operability.

また、共通の情報をシステムで一括管理するこ
とから、重複する情報を取り除くことができ、フ
アイル規模を縮小できる利点がある。 Additionally, since common information is managed collectively in the system, duplicate information can be removed and the file size can be reduced.

また、正解が得られた時の認識確度を管理する
ことで、既存音声パタンによる音声認識の認識率
低下を自動的に検出でき、さらに認識確度を基準
値と比較して既存音声パタンを操作することによ
り、利用者による音声の再登録を省略できる利点
がある。 In addition, by managing the recognition accuracy when a correct answer is obtained, it is possible to automatically detect a decrease in the recognition rate of speech recognition due to existing speech patterns, and further compare the recognition accuracy with a reference value to manipulate the existing speech patterns. This has the advantage of omitting the user's voice re-registration.

また、入力音声パタンを取り出す手段を設ける
ことで、個人用音声パタンを自動生成する時、既
存の個人用音声パタンの経年変化等に対する音声
パタンの操作として入力音声パタンと置き換える
方法を取る時に、利用者による再発声を省略でき
る利点がある。 In addition, by providing a means to extract the input voice pattern, it can be used when automatically generating a personal voice pattern, or when replacing the input voice pattern as a voice pattern operation in response to changes in existing personal voice patterns over time. This has the advantage of omitting the need for the person to re-speak.

[Brief explanation of drawings]

図は本発明の実施例の構成図である。１…音声入力端子、２…音声分析部、３…入力
音声パタンメモリ、４…認識処理部、５…制御
部、６…入出力インタフエース、７…音声パタン
フアイル、７１…共通音声パタンフアイル、７２
…個別音声パタンフアイル、８…出力用情報フア
イル。 The figure is a configuration diagram of an embodiment of the present invention. DESCRIPTION OF SYMBOLS 1... Audio input terminal, 2... Audio analysis section, 3... Input audio pattern memory, 4... Recognition processing section, 5... Control section, 6... Input/output interface, 7... Audio pattern file, 71... Common audio pattern file, 72
...Individual audio pattern file, 8...Output information file.

Claims

[Claims] 1. The speech to be recognized is patterned in advance, the speech uttered by the user is patterned, and the degree of similarity is compared for each of the speech pattern groups that have been patterned in advance. and, as a result of the comparison, a pattern with the highest degree of similarity is output as a recognition result.In a speech pattern editing method in a speech recognition system, the pattern having the highest degree of similarity is output as a recognition result. A voice pattern file consisting of a common voice pattern file as a voice, a voice pattern file consisting of an individual voice pattern file of the voice uttered by an individual user, and an input voice that analyzes the voice uttered by the user and converts the voice-recognized input voice into patterns and temporarily stores them. comprising a voice pattern storage means, extracting a patterned input voice pattern uttered by a user from the input voice pattern storage means, and extracting the extracted input voice pattern uttered by the user and the user's individual voice of the voice pattern file; information to be output when the retrieved input voice pattern uttered by the user does not have a match in the individual voice pattern file of the individual and there is a match in the common voice pattern file. A voice pattern editing method comprising: associating the input voice pattern with an input voice pattern, and adding the associated input voice pattern to an individual voice pattern file of the individual. 2 The speech to be recognized is patterned in advance, the speech uttered by the user is patterned, the similarity is compared for each of the voice pattern groups that have been patterned in advance, and as a result of the comparison, A voice pattern editing method in a voice recognition system that edits a voice pattern output that outputs a pattern with the highest degree of similarity as a recognition result, comprising: a common voice pattern file as a common pattern group of all voices to be recognized; , a voice pattern file consisting of an individual voice pattern file of the voice uttered by an individual user, and an input voice pattern storage means that analyzes the voice uttered by the user, converts the voice-recognized input voice into a pattern, and temporarily stores it. , extracting a patterned input voice pattern uttered by the user from the input voice pattern storage means, and comparing the extracted input voice pattern uttered by the user with the user's individual voice pattern file of the voice pattern file; and information to be output when the retrieved input voice pattern uttered by the user does not have a match in the individual voice pattern file of the individual and there is a match in the common voice pattern file, and the input voice pattern. Add the matched input voice pattern to the individual voice pattern file of the individual, and if the retrieved input voice pattern uttered by the user matches the individual voice pattern file of the individual, The input voice pattern uttered by the user is calculated based on the statistical value of the variation in similarity in the past recognition of the corresponding voice pattern in the individual voice pattern file, and the recognition in the case where there is a matching solution. A voice pattern editing method characterized in that when a similarity value exceeds a predetermined threshold, the corresponding input voice pattern in the individual voice pattern file is replaced with an individual voice pattern.