JPS5988798A

JPS5988798A - Voice recognition processing system

Info

Publication number: JPS5988798A
Application number: JP57198951A
Authority: JP
Inventors: 徳子松井; 俊宏木村
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1982-11-15
Filing date: 1982-11-15
Publication date: 1984-05-22

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は、認識対象の各単語ごとに用意された各複数組
の標準音声バタンについて大力音声に対する類似度が最
も大きいもの？判定し、それ全認識結果として出力する
音声認識装置において、その入力音声について確実な音
声分析？行った後に音声認識処理をし、同装置の認識重
金向上させるための音声認識処理方式に関するものであ
る。[Detailed Description of the Invention] [Field of Application of the Invention] The present invention is applicable to each of a plurality of sets of standard speech buttons prepared for each word to be recognized, which have the greatest similarity to the main speech. Is there a reliable voice analysis of the input voice in a voice recognition device that makes a judgment and outputs it as a complete recognition result? The present invention relates to a voice recognition processing method for improving the recognition efficiency of the device by performing voice recognition processing after the recognition.

[Prior art]

この種の音声認識装置において、従来の音声認識処理方
式は、その認識結果についてのみ発声者に確認？させる
ようにしてい友が、大力音声の音声分析結果については
何らの確認もさせるようにしていなかつ友。In this type of speech recognition device, the conventional speech recognition processing method only asks the speaker about the recognition result? My friend did not let me confirm any of the voice analysis results of Daiki's voice.

しかしながら、大力音声についての音声分析か確実に行
われていないと、その後の認識処理全綿密・確実に行い
、かつ、その認識結果の確認ケするようにしても、結局
、良好な認識率が得らｎず、入力繰返し等のサービス性
低下ともなっていた。However, if the voice analysis of powerful voices is not performed reliably, even if the subsequent recognition process is performed thoroughly and reliably, and the recognition results are checked, a good recognition rate will not be obtained in the end. Not only that, but the serviceability deteriorated due to repeated input.

[Purpose of the invention]

本発明の目的は、上記した従来技術の欠点なくし、入力
音声について確実な音声分析を行った後に音声認識処理
ケし、認識率ケ向上せしめるとともにサービス性のよい
音声認識処理方式を提供することにある。An object of the present invention is to eliminate the above-mentioned drawbacks of the prior art, perform voice recognition processing after performing reliable voice analysis on input voice, improve the recognition rate, and provide a voice recognition processing method with good serviceability. be.

[Summary of the invention]

本発明に係る音声認識処理方式の構成は、認識対象の各
単語に対応して各複数組の標準音声バタンデータを記憶
しておき、大力音声について音声分析を行って当該特徴
データを抽出し、その特徴データと上記各標準音声バタ
ンデータとのバタンマツチング処理を行い、その類似度
が最−Ｆ位となるものを認識結果として判定・出力する
機能を有する音声認識装置において、大力音声の音声分
析の結果に基づき、逆に同一方式で当該入力音声に対応
する冴声の合成・出刃をした後、それに対する確認結果
によシ、上記音声分析の結果に基づいて以後の音声認識
処理をし、または当該音声の再入力をせしめるように制
御処理するものでおる。The configuration of the speech recognition processing method according to the present invention is to store a plurality of sets of standard speech button data corresponding to each word to be recognized, perform a speech analysis on the powerful speech and extract the characteristic data, In a speech recognition device that has a function of performing a bang matching process between the feature data and each of the above standard voice bang data, and determining and outputting the one with the highest degree of similarity as a recognition result, Based on the results of the analysis, the voice corresponding to the input voice is synthesized and debaked using the same method, and then the subsequent voice recognition processing is performed based on the results of the above-mentioned voice analysis. , or performs control processing to force the re-input of the audio.

これを要するに、入力音声の音声分析が適切であったこ
とを確認した後に、その結果に基づいて所定の音声認識
処理を行わしめ、認識率の向上を図るものである。In short, after confirming that the voice analysis of the input voice was appropriate, a predetermined voice recognition process is performed based on the result to improve the recognition rate.

[Embodiments of the invention]

以下、本発明の実施例を図に基づいて説明する。 Embodiments of the present invention will be described below with reference to the drawings.

第１図は、本発明に係る音声認識処理方式による音声認
識装置の一実施例のブロツク図、駆２図は、そのフロー
チャートである。FIG. 1 is a block diagram of an embodiment of a speech recognition apparatus using a speech recognition processing method according to the present invention, and FIG. 2 is a flowchart thereof.

ここで１は、音声人力に係るマイクロフォン、２は、入
力音声信号について利得調整、帯域制限その他所要の前
処理を行った後、そのディジタル変換をする入力部、３
は入力されたディジタル音声信号に基づいて大力音声の
音声分析を行い、その特徴データを抽出する分析部、４
は所定の音声区間検出用の閾仮に従って入力音声の音声
区間の検出処理をする音声区間検出■Ｓ、５は入力音声
と各標準音声バタンとのバタンマツチング処理（類似計
算処理）を行う音声認識部、６はその処理結果によって
大力音声に対する各類似度の順位を判定する判定部、７
は、認識対象の各単語について各複数組の標準音声バタ
ンデータを格納（記憶）している標準背戸バタンメモリ
、８はその選択制御をする標準音声バタン選択部、９は
大力音声分析結果・認識結果の表示・確認、音声人力指
示その他所要の表示・指示に係る音声合成部、１０は、
同スピーカ、１１は、入力音声分析結果・認識結果の表
示・確認、音声大力指示その他の所要の表示・操作に係
るコンソール部、１２は、上記各部に対する制御その他
所要の処理を行う制御部、１３は、音声認識結果に基づ
いて所望のサービス処理を行うホスト装置である。Here, 1 is a microphone related to human voice input, 2 is an input unit that performs gain adjustment, band limitation, and other necessary preprocessing on the input audio signal, and then converts it into a digital signal. 3
4 is an analysis unit that performs audio analysis of large-scale audio based on the input digital audio signal and extracts characteristic data;
5 is the voice section detection ■S where the voice section of the input voice is detected according to a predetermined threshold for voice section detection, and 5 is the voice where the bang matching process (similarity calculation process) between the input voice and each standard voice bang is performed. A recognition unit 6 is a determination unit that determines the ranking of each degree of similarity for the powerful voice based on the processing result, 7
is a standard back door button memory that stores (memorizes) multiple sets of standard voice button data for each word to be recognized; 8 is a standard voice button selection unit that controls the selection; 9 is a large voice analysis result/recognition unit. The speech synthesis unit 10 is responsible for displaying and confirming results, giving voice instructions and other necessary displays and instructions.
In the speaker, 11 is a console unit for displaying and confirming input voice analysis results and recognition results, voice commands, and other necessary display and operations; 12 is a control unit for controlling each of the above units and other necessary processing; and 13 is a host device that performs desired service processing based on voice recognition results.

まず、音声認識処理に先立ち、制御部１２Ｆｆ、、音声
人力に対する準備を入力部２１分析部３．音声区間検出
部４へ指示するとともに、その時に認識対象となるべき
単語の分類（例えば、数字、サービス種別名、物品名、
地名等の分類）の標準音声バタンの全組を標準音声バタ
ンメモリ７から選択するように標準音声バタン選択部８
に対して指示する（第２図の処理２１）。First, prior to voice recognition processing, the control unit 12Ff prepares for human voice input unit 21, analysis unit 3. Instructs the speech section detection unit 4 and also indicates the classification of the word to be recognized at that time (for example, number, service type name, product name,
A standard voice button selection unit 8 is configured to select all sets of standard voice buttons (classifications such as place names, etc.) from the standard voice button memory 7.
(Process 21 in FIG. 2).

これらの準備が完了すると、発声者に対して音声人力を
促すべき大力催告メヴセージを音声合成ｆ！１５９経由
でスピーカー０から放声せしめる（同処理２２）。Once these preparations are complete, the f! The sound is emitted from speaker 0 via 159 (same process 22).

・　４これにより、発声者かマイクロフォン１から音声を入力
すると（同処理２３）、入力部２は、そのディジタル変
換等をし、分析部３は、そのディジタル音声信号につい
て音声分析をして当該特徴データ等の抽出をする（同処
理２４）。・4 As a result, when a voice is input from the speaker or the microphone 1 (same processing 23), the input unit 2 performs digital conversion, etc., and the analysis unit 3 performs voice analysis on the digital voice signal and determines the relevant characteristics. Data etc. are extracted (processing 24).

制御部１２は、大力音声の音声分析の結果が適切であっ
たか否かを発声自身に確認させるため、分析ｓ３におけ
る上記音声分析の結果（特徴データ等）ｉＣ基づき、逆
に同一方式で当該入力音声に対応する確認用の音声を合
成せしめ（例えば、ＰＡＲＣＯＲ方式の音声分析結果に
基づいてＰＡＲＣＯＲ方式の音声合成を行わしめ）、こ
ｎをスピーカ１０から放声せしめる（同処理２５）。In order to have the voice itself confirm whether or not the result of the voice analysis of the powerful voice was appropriate, the control unit 12 conversely analyzes the input voice using the same method based on the result of the voice analysis (feature data, etc.) iC in analysis s3. A confirmation voice corresponding to is synthesized (for example, PARCOR voice synthesis is performed based on the PARCOR voice analysis result), and this voice is emitted from the speaker 10 (process 25).

発声者は、これを聴増して自己の音声入力に対して適切
なものである・か否かを判断し、適切でないとき、例え
ば、“イチ”と入力したにもかかわらず、出力された音
声が頭の“イ′か脱落して”チ″だけの場＠には、その
判断・確認結果をコンソール部１１から入力する（同処
理２６）６なお、この人力は、マイクロフォン１からの
所定の音声入刀によって行い、その音声認識をするよう
にしてもよい。The speaker listens to this a lot and decides whether or not it is appropriate for his or her own voice input. In the case where the head is "I" or has fallen off and there is only "J", the judgment/confirmation result is input from the console section 11 (same process 26). This may be performed by voice input and the voice recognition may be performed.

その結果、非適切であるという入力をつ・けたときは、
制？１１１１部１２は、再音声入力をするように催告放
声をせしめ（同処理２７）、上述と同様の処理を繰り返
す。As a result, if you enter an input that is inappropriate,
Regulation? The 1111 unit 12 issues a reminder to input the voice again (process 27), and repeats the same process as described above.

上組旨声分析の結果が適切であった旨の人力（または所
定の時間内に無人力であったこと）により、音声認識部
５は、入力音声の特徴テークと選択されている柳槃音声
バタンデータとの闇でバタンマヴチング処理を行い、入
力音声に対する上Ｈα各標準音自パタンの類似度を判定
部６へ伝える（同処理２８）。By human power (or by unmanned power within a predetermined time) that the result of Kamigumi voice analysis was appropriate, the voice recognition unit 5 takes the characteristics of the input voice and the selected Yanagi-kei voice. A slam matching process is performed on the basis of the bang data, and the degree of similarity of each upper Hα standard tone self-pattern to the input voice is transmitted to the determining unit 6 (same process 28).

判定部６は、類似度が最上位となる（酸も確がらしい）
ものを輸識結来として制＃１部１２へ伝える（Ｎ処理２
９）。Judgment unit 6 has the highest degree of similarity (acids are also likely)
Convey the thing to system #1 section 12 as an import result (N processing 2
9).

入力音トｊに対し−Ｃ＠も確からしい類似度の値が低く
、そむ、をＲ１ａ結果として出力するのは疑わしいとす
べきりジェクトの場合には、制御部１２は、標準音声バ
タン選択都８に対して今までと同一のバタンを選択する
ように指示するとともに（Ｉｗｌ処理３２）、音声合成
部９経由でスピーカ１０から再入力催告のメツセージを
放声せしめる。-C@ also has a low probability similarity value with respect to the input sound j, and in the case of a misjudge, it is doubtful to output Somu as the R1a result, the control unit 12 selects the standard sound button selection capital 8. The user is instructed to select the same button as before (Iwl process 32), and a message is emitted from the speaker 10 via the voice synthesis section 9 to remind the user to input again.

寸だ、リジェクトでない場合には、制御部１２は、その
認識結果が正しいものであるか否かを発声者に確認させ
るための表示として、確認要求メヴセーＶを音声合成部
９経由でスピーカ１０から放声させる（同処理３０）。If the recognition result is not rejected, the control unit 12 sends a confirmation request Mevse V from the speaker 10 via the speech synthesis unit 9 as a display for the speaker to confirm whether or not the recognition result is correct. Make a sound (same process 30).

なお、上記表示は、コンソール部１１におけるランプ表
示等によってもよい。Note that the above display may be a lamp display on the console section 11 or the like.

発声者は、これを聴取し１自己の入力音声について正認
識、誤認識いずれであったかを知り、その確認結果をコ
ンソール部１１から制御部１２へ大力する（同処理３１
）。The speaker listens to this, learns whether his or her own input voice was recognized correctly or incorrectly, and outputs the confirmation result from the console unit 11 to the control unit 12 (process 31).
).

制御ｓ１２への上記確認結果入力は、必ずしもコンソー
ル部１１における操作による必侠はなく、マイクロフォ
ン１からの確認用音声の大力によってもよい。この場合
、その同各は、晋ｙｈ認識が確実に行わｎるように簡単
で誤認識をしにくいものであることか望ましく、この点
については前述の７　・処理２６についても同様である。The above-mentioned confirmation result input to the control s12 does not necessarily have to be performed by an operation on the console unit 11, and may be performed by a powerful confirmation voice from the microphone 1. In this case, it is desirable that each of the steps be simple and difficult to misrecognize so as to ensure proper recognition, and the same applies to the above-mentioned 7. Process 26.

制御１１部１２は、」二記確認情報によυ、上述の認識
候補が正しいものであるときは、それを認識結果として
ホスト装置１３へ送出し、１つの入力音声に対する処理
を終了せしめて次の入力に備える。If the above-mentioned recognition candidate is correct according to the confirmation information in the second part, the control unit 12 sends it to the host device 13 as a recognition result, finishes the processing for one input voice, and then starts the next one. Prepare for input.

−万、誤認識であったという確認情報を受けた場合は、
前述のりジェクトの場合と同様に処理３２゜３３を行わ
せ、こｎを正認識が得られるまで繰シ返して行い、正認
識となったときは、上述と同様に当該認識結果がホスト
装置１３へ送出され、一連の処理が終了する。-If you receive confirmation that it was a misidentification,
Processes 32 and 33 are performed in the same way as in the case of the above-mentioned project, and this process is repeated until correct recognition is obtained. When correct recognition is obtained, the recognition result is sent to the host device 13 in the same manner as described above. and the series of processing ends.

このように、入力音声について音声分析の結果に基づい
て対比、する合成音声を発声者自身に確認せしめた後、
その音声分析の結果に基づいて以後の音声認識処理を行
うので、音声分析が適切でなかったことに起因する誤認
識の防止をすることができる。In this way, after having the speaker confirm the synthesized speech that is compared with the input speech based on the results of speech analysis,
Since subsequent speech recognition processing is performed based on the result of the speech analysis, it is possible to prevent erroneous recognition caused by inappropriate speech analysis.

〔発明の効果」以上、詳細に説明したように、本発明によれは、入力音
声について確実な音声分析全行った後に音、　８　。[Effects of the Invention] As explained above in detail, according to the present invention, the sound is generated after performing a reliable voice analysis on the input voice.

声認識処理をするので、この種の音声認識装置の認識來
を格段に向上することができ、その信頼性。Since voice recognition processing is performed, the recognition performance of this type of voice recognition device can be greatly improved, and its reliability.

サービス性の向上に顕著な効果が得られる。A remarkable effect can be obtained in improving serviceability.

[Brief explanation of the drawing]

第１図は、本発明に係る音声認識処理方式による音声認
識装置の一実施例のブロック図、第２図は、そのフロー
チャートである。１・・・マイクロフォン、２・・・入力部、３・・・分
析部、４・・・音声区間検出部、５・・・音声認識部、
６・・・判定部、７・・・標準音声バタンメモリ、８・
・・標準音声バタン選択部、９・・・音声合成部、１０
・・・スピーカ、１１・・・コンソール部、１２・・・
制御部、１３・・・ホスト装置。FIG. 1 is a block diagram of an embodiment of a speech recognition device using a speech recognition processing method according to the present invention, and FIG. 2 is a flowchart thereof. DESCRIPTION OF SYMBOLS 1...Microphone, 2...Input section, 3...Analysis section, 4...Speech section detection section, 5...Speech recognition section,
6... Judgment unit, 7... Standard voice button memory, 8.
...Standard voice button selection section, 9...Speech synthesis section, 10
...Speaker, 11...Console section, 12...
Control unit, 13... host device.

Claims

[Claims]

1. Store all multiple sets of standard voice button data corresponding to each word to be recognized, perform voice analysis on the input voice to extract the feature data, and combine the feature data with the above standard voice button data. A speech recognition device that has the function of performing slam matching processing and determining and outputting all the recognition results with the highest degree of similarity, based on the results of voice analysis of the human voice, uses the same method to After synthesizing and debating the voice corresponding to the voice, based on the results of the confirmation to 7n, perform subsequent voice recognition processing based on the result of the voice analysis, or control and process so as to cause the voice to be re-entered. This is a voice recognition processing method with all the features.