JPS5917595A

JPS5917595A - Voice recognition system

Info

Publication number: JPS5917595A
Application number: JP57125834A
Authority: JP
Inventors: 徳子松井; 俊宏木村
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1982-07-21
Filing date: 1982-07-21
Publication date: 1984-01-28

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は、認識対象の各単語に対応して複数組の標準音
声バタンを内蔵（格納、記憶）している音声認識装置に
おいて、特に誤認識時の訂正処理の効率向上を図るため
の音声認識方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention aims to improve the efficiency of correction processing in the case of erroneous recognition in a speech recognition device that incorporates (stores and stores) a plurality of sets of standard speech buttons corresponding to each word to be recognized. This relates to a voice recognition method to improve the quality of speech.

この種の音声認識装置における従来の音声認識方式は、
例えば、一連のサービスが完了するまで、内蔵されてい
る複数組の全標準音声バタンを使用して音声認識処理を
行わせるようにしていたので、ある特定の発声者または
一般の発声者による特定の単語が、ある特定用の標準音
声パタンに関して誤認識を起こし易いものであると、一
連の音声認識処理において誤認識が当該単語について集
中・多発をするというおそれがあった。The conventional speech recognition method for this type of speech recognition device is
For example, until a series of services is completed, voice recognition processing is performed using multiple sets of built-in standard voice buttons, so a specific voice recognition process by a specific speaker or a general speaker can be If a word is likely to be misrecognized with respect to a certain standard speech pattern, there is a risk that misrecognitions will be concentrated or occur frequently for the word in a series of speech recognition processes.

また、誤認識が発生したときには、同一内容のものを再
発声させ、かつ、誤認識におけるものと全く同一内容の
音声認識処理を行わせるようにしていた。Furthermore, when an erroneous recognition occurs, the same content is re-uttered, and the speech recognition process is performed with exactly the same content as that in the erroneous recognition.

しかしながら、誤認識をしたということは、その発声音
声バタンか、各標準音声バタンのうちで真に上記発声音
声バタンに近いものとして認識・判定をされるべき標準
音声バタンよりも、誤認識の対象となった標準音声パタ
ンの方に近かったということである。However, the fact that it was misrecognized means that the target of the misrecognition is the vocalized sound bat or the standard sound bat that should be recognized and judged as being truly close to the above vocalized sound bat. This means that it was closer to the standard speech pattern that became .

したがって、上述のように同一内容の音声認識処理を繰
り返しても、反復して同様な誤認識となる確率が高く、
正しい認識結果が得られるまでには、相当に多くの発声
の繰り返しをしなければならないので、認識に要する時
間が長くなるとともに、発声者に対する負担も大きくな
るという問題があった。Therefore, even if the speech recognition process for the same content is repeated as described above, there is a high probability that the same erroneous recognition will occur repeatedly.
Since a considerable number of vocalizations have to be repeated before a correct recognition result is obtained, there is a problem in that the time required for recognition becomes long and the burden on the speaker increases.

本発明の目的は、上記した従来技術の欠点をＡクシ、特
に誤認識時の訂正処理の効率向上を可能とする音声認識
方式を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a speech recognition method that overcomes the above-mentioned drawbacks of the prior art and, in particular, makes it possible to improve the efficiency of correction processing in the case of erroneous recognition.

本発明に係る音声認識方式の構成は、認識対象の各単語
に対応して複数組の標準音声パタンデータを記憶してお
き、入力音声の特徴抽出を行い、その特徴データと上記
全標準バタンデータとの類似度計算処理を行い、その類
似度が最上位となる標準音声パタンを認識結果として判
定・出力する機能を有する音声認識装置において、入力
音声について誤認識となり易い単語と当該原単語との対
応関係のテーブルを用意しておくことにより、人力音声
の誤認識をしたときは、上記対応関係のテーブルに基づ
いて当該原単語の確認要求メツセージを出力せしめるよ
うに制御・処理するものである。The structure of the speech recognition method according to the present invention is to store a plurality of sets of standard speech pattern data corresponding to each word to be recognized, extract the features of the input speech, and combine the feature data with all the above-mentioned standard speech pattern data. In a speech recognition device that has a function of calculating the similarity between the input speech and determining and outputting the standard speech pattern with the highest degree of similarity as the recognition result, it is possible to distinguish between words that are likely to be misrecognized in input speech and the original word. By preparing a correspondence table, when a human voice is misrecognized, control and processing is performed such that a message requesting confirmation of the original word is output based on the correspondence table.

なお、これを補足的に説明すれば、単語相互間には、そ
れらの音響的性質が全く同一のもの（同音異義）、酷似
もしくは類似しているもの、捷たは音響的布、変化等に
よって同一もしくは酷似・類似したものになるもの等、
誤って認識され易いものがあるので、これらの対応関係
を実験的、統計的に求めておき、誤認識が発生したとき
に当該対応単語の確認出力をして新だな認識処理を行わ
ずに訂正処理の効率向上を図ろうとするものである。ち
なみに、誤り易い単語の対応関係としては、例えば、１
（いち）、８（けち）および１（いち）。To give a supplementary explanation to this, there are words that have exactly the same acoustic properties (homophones), words that are very similar or similar, and words that are due to distortions, acoustic fabrics, changes, etc. Things that are the same or very similar/similar, etc.
Since there are some words that are easily recognized incorrectly, the correspondence relationships between these words are determined experimentally and statistically, and when a misrecognition occurs, the corresponding word is output for confirmation without having to perform a new recognition process. This is an attempt to improve the efficiency of correction processing. By the way, as a correspondence relationship between words that are easy to make mistakes, for example, 1
(ichi), 8 (stingy) and 1 (ichi).

２（に）の相互間等が代表的なものとされている。A typical example is the mutual relationship between 2 and 2.

以下、本発明の実施例を図に基づいて説明する。Embodiments of the present invention will be described below with reference to the drawings.

第１図は、本発明に係る音声認識方式の一実施例の方式
構成図、第２図は、その処理フローチャートである。FIG. 1 is a system configuration diagram of an embodiment of the speech recognition method according to the present invention, and FIG. 2 is a processing flowchart thereof.

ここで、１は、認識対象の各単語について複数組の標準
音声バタンデータを格納（記憶）している標準音声バタ
ンメモリ、２は、その選択制御をする標準音声バタン選
択部、３ば、音声人力に係るマイクロフオ／、４は、そ
の人力音声の特徴抽出をする分析部、５は、その特徴デ
ータと標準音声バタンデータとの類似度計算処理（バタ
ンマツチング処理）を行う音声認識部、６は、その処理
結果に基づき入力音声に対する類似度が高い標準音声バ
タンの組を判定する判定部、７は、認識結果の表示に係
る音声合成部、８は、同スピーカ、９は、認識結果の確
認および繰り返し音声入力の指示に係るコンソール部、
１０は、上記各部に対する制御その他所要の処理を行う
制御部、１１は、認識結果に基づいて所望の処理をする
ホスト装置である。Here, 1 is a standard voice button memory that stores (memorizes) multiple sets of standard voice button data for each word to be recognized, 2 is a standard voice button selection unit that controls the selection, and 3 is a voice button memory. 4 is an analysis unit that extracts features of the human voice; 5 is a speech recognition unit that performs similarity calculation processing (bang matching process) between the feature data and standard voice bang data; 6 is a determination unit that determines a set of standard voice bangs that have a high degree of similarity to the input voice based on the processing results; 7 is a speech synthesis unit that displays the recognition results; 8 is the same speaker; 9 is a unit for displaying the recognition results. Console unit for confirmation and repeated voice input instructions;
Reference numeral 10 denotes a control unit that performs control of the above-mentioned units and other necessary processing, and 11 a host device that performs desired processing based on the recognition result.

１ず、マイクロフォン３からの入力音声の認識に先立ち
、制御部１０は、音声入力に対する準備を分析部４．音
声認識部５に指示し、また、その時の認識対象となるべ
き単語の標準音声バタンデータを標準音声バタンメモリ
１から選択するように標準音声バタン選択部２に指示す
る（第２図の処理２１）。1. Prior to recognizing the input voice from the microphone 3, the control unit 10 makes preparations for the voice input in the analysis unit 4. It instructs the speech recognition section 5 and also instructs the standard speech button selection section 2 to select the standard speech button data of the word to be recognized at that time from the standard speech button memory 1 (process 21 in FIG. 2). ).

これらの準備が完了すると、発声者に対して音声人力を
促すべき人力催告メツセージを出力するよう音声合成部
７に指示し、スピーカ８から同メツセージを放声せしめ
る（処理２２）。When these preparations are completed, the voice synthesizing section 7 is instructed to output a human power reminder message to urge the speaker to use voice power, and the message is emitted from the speaker 8 (process 22).

これにより、発声者がマイクロフォン３から音声人力を
すると（処理２３）、分析部４は、その入力音声の音声
分析をして特徴抽出を行う（処理　　　２４）。As a result, when the speaker makes a voice input from the microphone 3 (process 23), the analysis unit 4 analyzes the input voice and extracts features (process 24).

音声認識部５は、上述のように制御部１ｏがらの制御に
よシ標準バタン選択部２が選択・指示する標準音声バタ
ン全組のデータと上記人力音声の特徴データとの間で類
似度計算処理（バタンマツとともに、すべての認識結果
、類似度を判定部６に伝える（処理２５）。As described above, the speech recognition section 5 calculates the degree of similarity between the data of all sets of standard speech slams selected and instructed by the standard speech slam selection section 2 and the characteristic data of the human speech under the control of the control section 1o. Processing (All recognition results and degrees of similarity are conveyed to the determination unit 6 along with the slam pine (processing 25).

判定部６は、類似度が最上位の標準音声パタンの組を判
定し、そのデータを制御部１ｏへ報告する（処理２６）
。The determination unit 6 determines the set of standard speech patterns with the highest degree of similarity, and reports the data to the control unit 1o (process 26).
.

制御部１０は、認識結果の類似度が前もって定められた
定数（リジェクト定数）よりも低く、認識結果として出
力するには疑わしいものとみなすべきもの（リジェクト
）に該当するかどうかを判断しく判断２７）、リジエク
Ｆ・の場合には、標準音声バタン選択部２に対して同一
の標準音声バタンを選択するように指示しく処理３２）
、更に音声合成部に対して再び同一内容の人力催告メツ
セージを出力するよう指示し、同メツセージをスピーカ
８から放声せしめる（処理３３）。これにより、上述の
処理２３以降と同様々再認識処理（繰り返し認識処理）
が行われる。The control unit 10 determines whether the similarity of the recognition result is lower than a predetermined constant (reject constant) and should be considered suspicious to be output as a recognition result (reject). ), in the case of Rizieku F., the process 32) instructs the standard voice button selection section 2 to select the same standard voice button.
Furthermore, it instructs the voice synthesis section to again output a manual reminder message with the same content, and causes the same message to be emitted from the speaker 8 (processing 33). As a result, re-recognition processing (repeated recognition processing) similar to processing 23 and later described above is performed.
will be held.

一方、リジェクトでない場合には、その認識結果の候補
が正しいものであるか否かを発声者に確認させるだめの
表示として、確認要求メツセージを音声合成部７経由で
スピーカ８から放声せしめる（処理２８）。On the other hand, if the recognition result is not rejected, a confirmation request message is emitted from the speaker 8 via the speech synthesis section 7 as a display to prompt the speaker to confirm whether or not the recognition result candidate is correct (process 28). ).

発声者は、これを聴取して入力音声が正しい認識（正認
識）をされたのか、誤った認識（誤認識゛をされたのか
を知り、その旨をコンソール部９から制御部１０へ入力
する（処理２９）。The speaker listens to this and knows whether the input voice has been correctly recognized (correct recognition) or incorrectly recognized (misrecognition), and inputs this information from the console section 9 to the control section 10. (Process 29).

この認識結果の正否の確認人力は、必ずしもコンノール
部９における操作による必要はなく、マイクロフォン３
からの確認用音声人力によってもよいが、その内容は、
音声認識が確実に行われるように簡単で誤認識をしにく
いものであるものが望ましい。This manual confirmation of the correctness of the recognition result does not necessarily have to be done by operating the console section 9;
It may be possible to use a human voice for confirmation, but the content is as follows:
It is desirable to have something simple and difficult to misrecognize so that voice recognition can be performed reliably.

制御部１０は、上記確認情報により正認識をしたか否か
を判断しく判断３０）、正認識である場合には、必要に
応じて上記認識結果の確認情報によって当該音声バタン
情報をホスト装置１１へ送出するとともに、更に一連の
サービス動作が終了したか否かを判断しく判断３１）、
終了していないときは、再び前述の処理２１へ戻って次
の入力音声の認識処理を行い、終了しているときは、全
認識結果をホスト装置１１へ送出し、１つの入力音声に
対する認識処理を終了し、次の人力に備える。The control unit 10 determines whether or not the recognition is correct based on the confirmation information (30), and if the recognition is correct, the control unit 10 transmits the voice slam information to the host device 11 according to the confirmation information of the recognition result as necessary. At the same time, it is determined whether the series of service operations has been completed or not (31).
If the recognition process has not been completed, the process returns to step 21 described above to perform the recognition process for the next input voice, and if the recognition process has completed, the entire recognition result is sent to the host device 11 and the recognition process for one input voice is performed. and prepare for the next manpower.

一方、誤認識であったという確認情報を受けた場合には
、誤認識をし易い単語の対応関係テーブル（例えば、あ
らかじめ制御部１０自身内に用意されたもの）により、
その原単語情報を取り出して直ちに当該音声を認識結果
として音声合成部７経由でスピーカ８から放声させる（
処理３４）。On the other hand, when confirmation information indicating that the recognition was incorrect is received, a correspondence table (for example, prepared in advance within the control unit 10 itself) of words that are likely to be incorrectly recognized is used to
The original word information is extracted and the speech is immediately emitted from the speaker 8 via the speech synthesis unit 7 as a recognition result (
Processing 34).

発声者は、それを聴取して、その結果が正しいか否かの
確認結果の人力を制御部１０に対して行う（処理３５）
。The speaker listens to it and manually confirms whether the result is correct or not to the control unit 10 (processing 35).
.

制御部１０は、この確認によって正認識であったか否か
の判断をしく判断３６）、誤認識であればリジェクトの
場合と同様に処理３２．３３を行うようにし、正認識で
あれば前述の判断３１以降と同様な処理を行う。Based on this confirmation, the control unit 10 determines whether the recognition was correct or not (36), and if the recognition is incorrect, performs the process 32.33 in the same manner as in the case of rejection, and if the recognition is correct, makes the above-mentioned judgment. Processing similar to 31 and subsequent steps is performed.

このようにして、誤認識が発生しても改めて再認識処理
を始めから行う必要がなく、多くの場合に誤り易い羊語
情報を出力するだけで確認訂正をすることができるので
、訂正処理の効率を向上することができる。In this way, even if an erroneous recognition occurs, there is no need to perform the recognition process again from the beginning, and in many cases, confirmation and correction can be performed simply by outputting the erroneous word information, so that the correction process can be corrected. Efficiency can be improved.

以上、詳細に説明したように、本発明によれば、誤認識
となったときの訂正処理の効率を向上し、（９）ひいては総合的な認識率の向上をすることができるので
、この種の音声認識システムにおける信頼性、サービス
性、効率の向上に顕著な効果が得られる。As described in detail above, according to the present invention, it is possible to improve the efficiency of correction processing when an erroneous recognition occurs, and (9) further improve the overall recognition rate. This will have a significant effect on improving reliability, serviceability, and efficiency in voice recognition systems.

[Brief explanation of drawings]

第１図は、本発明に係る音声認識方式の一実施例の方式
構成図、第２図は、その処理フローチャートである。 ■・・・標準音声バタンメモリ、２・・・標準音声バタ
ン選択部、３・・・マイクロフォン、４・・・分析部、
５・・・音声認識部、６・・・判定部、７・・・音声合
成部、８・・・スピーカ、９・・・コンソール部、１ｏ
・・・制Ｎ部、１１・・・ホスト装置。代理人　弁理士　福田幸作１（（ほか１名）（１０）茅ｌ　目早　２　目FIG. 1 is a system configuration diagram of an embodiment of the speech recognition method according to the present invention, and FIG. 2 is a processing flowchart thereof. ■...Standard voice button memory, 2...Standard voice button selection section, 3...Microphone, 4...Analysis section,
5... Speech recognition section, 6... Judgment section, 7... Speech synthesis section, 8... Speaker, 9... Console section, 1o
. . . Control N Department, 11 . . . Host device. Agent Patent attorney Kosaku Fukuda 1 ((1 other person) (10) Kaya Mehaya 2

Claims

[Claims]

1. Store multiple sets of standard voice button data corresponding to each word to be recognized, extract features of human voice,
In a speech recognition device that has a function of calculating the similarity between the feature data and all the above-mentioned standard speech slam data, and determining and outputting the standard speech stamp with the highest degree of similarity as a recognition result, the input speech is incorrectly recognized. By preparing a table of the correspondence between words that are likely to become the same as the original word, if the input speech is misrecognized, a message requesting confirmation of the original word will be output based on the table of correspondence. A voice recognition method that is characterized by control and processing.