JPS59111698A

JPS59111698A - Voice recognition system

Info

Publication number: JPS59111698A
Application number: JP57220321A
Authority: JP
Inventors: 徳子松井; 俊宏木村
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1982-12-17
Filing date: 1982-12-17
Publication date: 1984-06-27

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明に、音声の認識率を向上せしめるための方式に関
し、特に認識対象としての入力音声が離散的なものか連
続的なものかを事前に入力音声によって確認しておくこ
とによって入力音声が正し　　　ｍく認識されるようにした音声認識方式に関するものであ
る。[Detailed Description of the Invention] [Field of Application of the Invention] The present invention relates to a method for improving the speech recognition rate. This invention relates to a speech recognition method in which input speech is correctly recognized by checking the input speech.

[Prior art]

これまでの音声認識装置は一般に離散的な入力音声を対
象とするものと、連続的な入力音声を対象とするものと
に大別されていたが、最近では入力音声が離散的なもの
でもまた連続的なものでも認識し得る音声認識装置が出
現し始めているのが実情である。このような音声認識装
置においては棉準音声パターンとして、離散音声用のも
のと連続音声用のものとがそれぞれ数種類用意されてい
るわけであるが、従来にあっては往々にして入力音声が
誤認識されるものとなっている。これは、入力音声が離
散的なものであるか連続的なものであるかを音声認識装
置が容易に判別し得す、また、音声者にしても特にその
ことを考慮して入力していたわけではないからである。Up until now, speech recognition devices have generally been broadly divided into those that target discrete input voices and those that target continuous input voices, but recently, even if the input voice is discrete, The reality is that speech recognition devices that can recognize even continuous speech are beginning to appear. In such speech recognition devices, several types of standard speech patterns are prepared, one for discrete speech and one for continuous speech, but in the past, the input speech often caused errors. It has become recognized. This means that the speech recognition device can easily determine whether the input speech is discrete or continuous, and the speaker must have taken this into consideration when inputting. This is because it is not.

したがって、離散的な入力音声が連続的なものとして、
また、これとは逆に連続的な入力音声が離散的なものと
して認識されるといった不具合があったものである。Therefore, assuming that the discrete input audio is continuous,
Moreover, on the contrary, there was a problem in that continuous input speech was recognized as discrete.

３ｆ）〔発明の目的〕よって本発明の目的は、入力音声が離散的でも、また連
続的であってもその入力音声を正しく認識し得る音声認
識方式を供するにある。3f) [Object of the Invention] Therefore, an object of the present invention is to provide a speech recognition method that can correctly recognize input speech even if the input speech is discrete or continuous.

〔発明の４既要〕この目的のため本発明は、入力される音声が離散的なも
のか連続的なものかが事前に知れている場合は音声の認
識がより正しく行ない得ることに着目し、サービス初期
において入力される音声より以降入力される音声が離散
的か連続的かを事前に確認し、この認識結果にもとづい
て選択された標準音声パターンを用い入力音声の認識７
行なうようにしたものである。[4 Summary of the Invention] For this purpose, the present invention focuses on the fact that speech recognition can be performed more accurately if it is known in advance whether the input speech is discrete or continuous. , confirm in advance whether the input voice after the input voice is discrete or continuous in the initial stage of the service, and use the standard voice pattern selected based on this recognition result to recognize the input voice 7
This is what I decided to do.

[Embodiments of the invention]

以下、本発明を第１図、第２図により説明する。 The present invention will be explained below with reference to FIGS. 1 and 2.

先ず本発明に係る音声認識装置について説明する。第１
図はその概要構成を示したものである。First, a speech recognition device according to the present invention will be explained. 1st
The figure shows its general configuration.

これによると音声はスピーカ２よジ放声される指示内容
に従って入力されるようになっている。制御部８は音声
合成部７に介しスピーカ２エク必要な指示事項を放声す
る一方、この指示に促されて発声者はその指示内容に応
じた音声７発生するようになっているものである。発声
された音声はマイクロフォン１を介し音声人力−分析部
１１で先ずＡ／Ｄ変換されたうえその音声の特徴が抽出
されるが、この抽出された特徴・Ｐターンが音声認識部
６で標準音声パターンとパターンマツチングされること
によってその類似度が認識結果として得られるようにな
っているわけである。ノソターンマッチングされる標準
音声ツクターンは予め制御部８の制御下にある標準音声
・ぐターン選択部５によって標準音声パターンメモリ４
よジ選択されるが、このようにして選択された標準音声
パターンとの間でパターンマツチングが行なわれるもの
である。この場合時幅音声・やターンメモリ４には連続
用と離散用の標準音声パターンがそれぞれ数種類格納さ
れているが、従来にあってはそれら全ての標準音声パタ
ーンとの間でパターンマツチングが行なわ　　゛れるよ
うになっていたわけである。しかし、本発明による場合
は事前に入力音声が離散的か連続的５Ｃ［かが確認されるから、何れか一方の標準音声・やターン
だけが選択されるようになっている。入力音声が離散的
である場合には離散用の標準音声パターンを用いパター
ンマツチングすれば、エリ正しく入力音声の認識が行な
い得るわけである。According to this, the voice is input according to the contents of the instructions emitted from the speaker 2. The control unit 8 issues necessary instructions to the speaker 2 through the voice synthesis unit 7, and the speaker is prompted by the instructions to generate a voice 7 corresponding to the content of the instructions. The uttered voice is first A/D converted by the human voice analysis unit 11 through the microphone 1, and the features of the voice are extracted.The extracted features and P-turns are converted into standard speech by the voice recognition unit 6. By matching patterns, the degree of similarity is obtained as a recognition result. The standard voice patterns to be matched are stored in advance in the standard voice pattern memory 4 by the standard voice/gutter selection section 5 under the control of the control section 8.
However, pattern matching is performed between the standard voice pattern selected in this way. In this case, several types of continuous and discrete standard voice patterns are stored in the duration voice/turn memory 4, but conventionally, pattern matching is performed between all of these standard voice patterns. This meant that I was beginning to fall in love with it. However, in the case of the present invention, since it is confirmed in advance whether the input voice is discrete or continuous, only one of the standard voices and turns is selected. If the input speech is discrete, pattern matching can be performed using a standard speech pattern for discrete speech, and the input speech can be recognized accurately.

ところで標準音声パターン内には各種の単語対応の音声
ｉ？ターンが含まれているが、シタがってパターンマツ
チングはそれら単語対応の音声パターンとの間で行なわ
れることになる。これらマツチング結果より類似度が最
も大きい単語は認識結果の候補として選択されるが、こ
のような事情は他の標準音声ツクターンとの間でのパタ
ーンマツチングでも同様となっている。したがって、制
御部８にパターンマツチングの終了が報告される時点で
は選択された標準音声パターン対応に認識結果の候補が
全て出揃うことになるが、判定部１ｏはそれら候補から
更に類似度が大のものを最終的な認識結果として制御部
８に送るようになっているものである。これを制御部８
がどのように処理するかは後述するところである。By the way, the standard speech pattern includes speech i? corresponding to various words. However, pattern matching will be performed between the speech patterns corresponding to these words. The word with the highest degree of similarity based on these matching results is selected as a candidate for the recognition result, but this situation is the same in pattern matching with other standard speech patterns. Therefore, at the time when the end of pattern matching is reported to the control unit 8, all recognition result candidates corresponding to the selected standard speech pattern have been obtained, but the determination unit 1o selects from among these candidates the ones with the highest degree of similarity. The recognition result is sent to the control unit 8 as the final recognition result. Control unit 8
How this is handled will be described later.

Ｃｆさて、第１図に示す音声認識装置は例えば銀行などにお
けるキャシュディスペンサの一部として用いられるよう
になっているが、第２図はそのような用途に使用される
際での音声認識の処理フローを示したものである。Cf Now, the voice recognition device shown in Figure 1 is used as part of a cash dispenser in banks, etc., and Figure 2 shows the voice recognition processing when used for such purposes. This shows the flow.

これによる制御部８は音声認識処理に先立って先ず音声
入力に対する準備を音声入力・分析部ＩＩおよび音声認
識部６に指示するとともに、最初の入力音声に対する標
準音声・ぐターンを標準音声パターン選択部５介し標準
音声パターンメモリ４よジ読み出すべく制御するように
なっている。本例では離散用の標準音声パターンが選択
されているが、これは、入力音声が離散的か連続的かを
指定あるいは区別するために最初に入力される音声が離
散的に入力されるからである。認識が容易であれば連続
用の標準音声パターンが選択されるようにしてもよいこ
とは勿論である。これらの準備が完了すると次に制御部
８は発声音に対し何れか一方の標準音声パターンを選択
させるべく入力催告メツセージを音声合成部７を介しス
ピーカ２より７　−（［放声するところとなるものである。このメツセージによ
り発声者はマイクロフォン１より音声ヲ入力するが、こ
の場合に入力される一＋ｌｌｌ散的音声は例えばＪ！、
Ｉ、降入力される音声が連続的であればツ′イチ（１）
”、離散的ならばゝゼロ（０）“とされる。入力音声が
このように簡単であれば離散用の標準音声パターンによ
ってその入力音声は容易に認識され得るわけである。マ
イクロフォン１からの入力音声はこの後ｇｔ声大入力分
析部１１でディノタル変換されたうえその入力音声の特
徴が抽出さノ１．るが、抽出された入力音声の特徴パタ
ーンは次に音声認識部６で予め選択されている全ての離
散用の（１１″ｌ　ｌ’Ｔ、１４音声パターンとの間で
パターンマツチングされるようになっている。これによ
りその標準音声パターン内に含まれる個々の単語対応の
音声バタ・−ンとの間でパターンマツチングが行なわれ
単語対応にマツチング結果が得られるものである。これ
らマツチング結果から類似度が最゛も大きい単＠は認識
結果の候補として選択されるが、パターンマツチングは
予め選択されている他の標準音声パターンとの間でも順
次行なわれるようになっている。したがって、制御部８
に・やターンマツチングの終了が報告される時点では選
択された標準音声／４’ターン対応に認識結果の候補が
全て出揃うことになるが、判定部１０はそれら候補から
更に類似度が大のものを最終的な認識結果として制御部
８に送るようになっているものである。As a result, the control unit 8 first instructs the voice input/analysis unit II and the voice recognition unit 6 to prepare for voice input prior to the voice recognition process, and also selects the standard voice pattern for the first input voice to the standard voice pattern selection unit. 5, the standard voice pattern memory 4 is controlled to be read out. In this example, the standard audio pattern for discrete is selected because the first input audio is input discretely in order to specify or distinguish whether the input audio is discrete or continuous. be. Of course, a continuous standard voice pattern may be selected as long as it is easy to recognize. When these preparations are completed, the control unit 8 sends an input reminder message to the speaker 2 through the voice synthesis unit 7 in order to select one of the standard voice patterns for the voice to be emitted. With this message, the speaker inputs voice from microphone 1, but the 1+llll scattered voice input in this case is, for example, J!,
I, if the input audio is continuous, then T(1)
”, and if it is discrete, it is set to “zero (0)”. If the input speech is simple like this, it can be easily recognized using a standard discrete speech pattern. The input voice from the microphone 1 is then subjected to dinotal conversion in the gt loud input analysis section 11, and the characteristics of the input voice are extracted. However, the extracted characteristic pattern of the input voice is then pattern matched with all the discrete (11''l l'T, 14 voice patterns) selected in advance by the voice recognition unit 6. As a result, pattern matching is performed between the speech pattern corresponding to each word included in the standard speech pattern, and a matching result is obtained for the word correspondence.These matching results The single @ with the highest degree of similarity is selected as a candidate for the recognition result, but pattern matching is also performed sequentially with other pre-selected standard speech patterns.Therefore, Control unit 8
At the time when the completion of ni-ya turn matching is reported, all the recognition result candidates corresponding to the selected standard voice/4' turn will be available. The recognition result is sent to the control unit 8 as the final recognition result.

制御部８では判定部１０からの最終的な認識結果の類似
度と一定値（リジェクト定数）とを比較することによっ
てその認識結果が妥当か否かが判定されるようになって
いる。一定値以下である場合は妥当でないとして標準音
声ｉ９ターン選択部５に対してこれまでと同一の標準音
声パターン、即ち全ての離散用の標準斤声パターンを選
択するよう指示しに後、同一の音声を入力するよう催告
するものである。一方、妥当であると判定された場合に
は、制御部８はその認識結果が正しいか否かを発声者に
確認させるべくその認識結果対応の音声と確認要求メツ
セージを音声としてスピーカ２より放声するようになっ
ている。これに応じ発音者ａは確露忍結果をマイクロフォン１より音声入力するが、
確認結果の入力は必ずしも音声によらない。The control unit 8 compares the similarity of the final recognition result from the determination unit 10 with a constant value (rejection constant) to determine whether or not the recognition result is valid. If it is less than a certain value, it is considered invalid and instructs the standard voice i9 turn selection unit 5 to select the same standard voice pattern as before, that is, the standard voice pattern for all discrete use. This prompts you to input your voice. On the other hand, if it is determined that the recognition result is valid, the control unit 8 outputs the voice corresponding to the recognition result and a confirmation request message from the speaker 2 in order to have the speaker confirm whether or not the recognition result is correct. It looks like this. In response to this, pronounceer a inputs the result from microphone 1, but
The confirmation result is not necessarily input by voice.

コンソール部３が設けられている場合にはコンソール部
３より入力してもよいからである。これにより制御部８
は認識結果が正しいか否かを知れるが、誤認識であった
場合には最終的な認識結果が妥当でないと判定された場
合と同様に処理されるようになっている。これに対し認
識結果が正しかった場合にはその認識結果に応じた標準
音声パターンを選択するよう指示するものである。認識
結果が“行“であれば連続用のものを、また、′ゼロ“
であれば離散用のものを選択するわけであり、Ｊａ降大
入力れる音声の認識処理はその選択された標準音声パタ
ーンを用い行なわれるわけである。この認識結果はホス
ト装置９に送出されることによって最初の入力音声に対
する音声認識処理は終了するが、全てのサービスが終了
しない限りにおいては次の音声入力を催告しつつ入力音
声の認識を行なうようになっているものである。This is because input may be made from the console section 3 if the console section 3 is provided. As a result, the control section 8
can tell whether the recognition result is correct or not, but if it is a misrecognition, it will be processed in the same way as if the final recognition result was determined to be invalid. On the other hand, if the recognition result is correct, an instruction is given to select a standard speech pattern according to the recognition result. If the recognition result is “row”, select one for continuous, and “zero”
If so, a discrete one is selected, and the recognition process for the input voice is performed using the selected standard voice pattern. This recognition result is sent to the host device 9, thereby completing the speech recognition process for the first input speech, but unless all services are completed, the input speech will be recognized while reminding the next speech input. This is what has become.

〔Effect of the invention〕

０ｔ１以上説明したように本発明による場合は、最初の入力音
声より以降入力される音声が連続的か離散的かが事前に
知れ、以降入力される音声は連続用か離散用の何れか一
方の標準音声パターンによって認識され得るから、より
確実に入力音声を識別し得るという効果がある。0t1 As explained above, in the case of the present invention, it is known in advance from the first input voice whether the input voice will be continuous or discrete, and the voice input thereafter will be either continuous or discrete. Since it can be recognized using a standard speech pattern, there is an effect that the input speech can be identified more reliably.

[Brief explanation of the drawing]

第１図は、本発明に係る音声認識装置の一例での概略構
成を示す図、第２図は、その装置での音声認識処理の一
例でのフローを示す図である。１・・・マイクロフォン、４・・・標準音声パターンメ
モリ、５・・・標準音声・ぞターン選択部、６用音声認
識部、８・・・制御部、】ｏ・・・判定部、１】・・・
音声入力・分析部、FIG. 1 is a diagram showing a schematic configuration of an example of a speech recognition device according to the present invention, and FIG. 2 is a diagram showing a flow of an example of speech recognition processing in the device. DESCRIPTION OF SYMBOLS 1...Microphone, 4...Standard voice pattern memory, 5...Standard voice/zo turn selection section, 6 speech recognition section, 8...Control section, ]o...Judgment section, 1] ...
Voice input/analysis department,

Claims

[Claims]

By analyzing the input voice, characteristic patterns are extracted from the voice, pattern matching is performed between the voice and the standard voice, and from the result of the matching, the input voice YRn is recognized as the voice B method, and the voice is At the beginning of the recognition service, we confirm in advance whether the input voice is continuous or discrete after the input voice, and depending on the confirmation result, we select either a continuous or discrete standard voice pattern. A voice recognition method is characterized in that the voice input after the selection is recognized is performed until the end of the service.