JPS59124395A

JPS59124395A - Voice recognition equipment

Info

Publication number: JPS59124395A
Application number: JP57232796A
Authority: JP
Inventors: 宇佐美　隆一; 松本　正至; 横溝　信一; 新家　修; 三郎安藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-12-29
Filing date: 1982-12-29
Publication date: 1984-07-18

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、入力音声のパラメタに基づいて入力音声のレ
ベルが適正であるか否かを調べ、適正レベルの範囲外で
ある場合には話者に対して入力レベルに関する注意を促
すようにした音声認識装置に関するものである。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention examines whether the level of input speech is appropriate based on the parameters of the input speech, and if the level is outside the appropriate level range, the speaker The present invention relates to a speech recognition device that is designed to call attention to the input level.

[Prior art and problems]

第１図は音声認識装置の従来の構成例を示す図である。 FIG. 1 is a diagram showing an example of a conventional configuration of a speech recognition device.

第１図において、１は音声入力部、２は入力パラメタ抽
出部、３はレベル検出器、４はマツチング部、５は辞書
、６はホスト・プロセッサを示す。In FIG. 1, 1 is an audio input section, 2 is an input parameter extraction section, 3 is a level detector, 4 is a matching section, 5 is a dictionary, and 6 is a host processor.

第１図において、入力パラメタ抽出部２は音声入力部１
からの入力音声の特徴量、所謂パラメタを抽出するもの
であり、辞書５はパラメタの蓄積を行なうものである。In FIG. 1, the input parameter extraction section 2 is the voice input section 1.
The dictionary 5 is used to extract feature amounts, so-called parameters, of the input speech from the input voice, and the dictionary 5 is used to store parameters.

マツチング部４は、入力バラメタ抽出部２によって抽出
された入力音声のパラメタと辞書５に蓄積されたパラメ
タとの比較を行ない、最もマツチングの取れるものを認
識文字列としてホスト・プロセッサ６に通知するもので
ある。レベル検出器３は、音声入力部１からの入力音声
のピーク値を検出して表示を行なうものである。このよ
うな従来の音声認識装置では、音声入力されるアナログ
量を内部処理するためディジタル量に変換する必要があ
る。そのディジタル量も有限の値で表わされるだめ、常
にクリップもしくはオーバーフローされる恐れがある。The matching unit 4 compares the parameters of the input voice extracted by the input parameter extraction unit 2 with the parameters stored in the dictionary 5, and notifies the host processor 6 of the most matching string as a recognized character string. It is. The level detector 3 detects and displays the peak value of the input audio from the audio input section 1. In such conventional speech recognition devices, it is necessary to convert analog amounts inputted into voice into digital amounts for internal processing. Since the digital quantity is also represented by a finite value, there is always a risk of clipping or overflow.

そのためにレベル検出器３が設けられ、通常入力音声の
大きさを適正にするように話者に対しレベルの通知を行
なうようにしている。For this purpose, a level detector 3 is provided, and normally notifies the speaker of the level so that the volume of the input voice is appropriate.

しかしながら、例えば音声認識装置から離れた場所から
、音声入力を行なうなどの場合において、辞書の作成時
のような登録時はレベルに注意することはできるが、実
際の作業時は入力音声のレベルが適正か否かはレベル検
出器３によって知ることもできず、話者が入力音声のレ
ベルについての情報を得る手段がなかった。However, in cases where voice input is performed from a location far from the voice recognition device, for example, it is possible to pay attention to the level when registering, such as when creating a dictionary, but during actual work, the level of the input voice is It was not possible to know whether the input voice was appropriate or not using the level detector 3, and there was no way for the speaker to obtain information about the level of the input voice.

[Purpose of the invention]

本発明は、上記の考察に基づくものであって、音声認識
装置から離れた場所で音声入力を行なう場合にも話者に
対しレベルの通知を行ない得るようになった音声認識装
置を提供することを目的とするものである。The present invention is based on the above considerations, and provides a speech recognition device that can notify the speaker of the level even when inputting speech at a location away from the speech recognition device. The purpose is to

[Structure of the invention]

そのために本発廚の音声認識装置は、音声入力部、該音
声入力部からの入力音声のパラメタを抽出する入力パラ
メタ抽出部、上記パラメタを蓄積する辞書、及び上記パ
ラメタ抽出部により抽出された入力音声のパラメタと上
記辞書に蓄積されたパラメタとの比較を行ない最も近い
認識文字列をホスト・プロセッサに送出するマツチング
部を有する音声認識装置において、通知手段を設けると
共に、上記辞書は、入力音声のパワーがある一定の値を
超える時間をパラメタの一部として蓄積［７、上記マツ
チング部は、上記入力パラメタ抽出部によって抽出され
たパラメタから入力音声のパワーを求めて該パワーが上
記あ込一定の値を超える時間を計数し、該時間を上記辞
書に蓄積された時間と比較して入力音声のパワー〇レベ
ルが適正か否かを調べ、適正でないと判断された場合に
は上記通知手段によシ通知するように構成されたことを
特徴とするものである。To this end, the speech recognition device of the present invention includes a speech input section, an input parameter extraction section that extracts the parameters of the input speech from the speech input section, a dictionary that stores the parameters, and an input parameter extraction section that stores the parameters. A speech recognition device includes a matching section that compares speech parameters with parameters stored in the dictionary and sends the closest recognized character string to a host processor. The time when the power exceeds a certain value is stored as part of the parameter [7. The matching section calculates the power of the input audio from the parameters extracted by the input parameter extraction section, and determines whether the power is within the above-mentioned constant value. The time exceeding the value is counted, and the time is compared with the time stored in the dictionary to check whether the input audio power level is appropriate. If it is determined that it is not appropriate, the notification method is used to notify the user. This feature is characterized in that it is configured to notify the user.

[Embodiments of the invention]

以下、本発明の実施例を図面を参照しつつ説明する。 Embodiments of the present invention will be described below with reference to the drawings.

第２図は本発明の１実施例構成を示す図、第３図は時間
と音声パワーとの関係の例を示す図、第４図は辞書のパ
ラメタ部の構成例を示す図である。FIG. 2 is a diagram showing the configuration of one embodiment of the present invention, FIG. 3 is a diagram showing an example of the relationship between time and audio power, and FIG. 4 is a diagram showing an example of the configuration of a parameter section of a dictionary.

第２図において、工ないし６は第１図に対応するものを
示し、７は音声出力装置、８はスピーカを示す。In FIG. 2, numerals 6 to 6 indicate those corresponding to those in FIG. 1, 7 indicates an audio output device, and 8 indicates a speaker.

第２図において、辞書５には、例えば第３図に示すよう
な音声パワーのパラメタについて、ある一定の値ＰＦＨ
を超える時間Ｔｏｔｔがパラメタの一部として蓄わえら
れる。従来の辞書５におけるバラツタ部の構成が第４図
（イ）に示すような構成であるとすると、本発明が適用
される第２図に示す辞書５では、第４図の）　に示すよ
うにパヮープ・ζある一定の値ＰＷＨを超える入力時間
Ｔｏｖがパラメタ部にパラメタの一部として蓄わえられ
る。そして、マツチング部４では、入力音声のパワーが
ある一定の値Ｐｒｙを超えた点よシ時間カウントを開始
し、Ｐｒｒｒを割った点迄時間カウントを継続する。こ
こで入力バラメタ抽出部２により抽出される入力音声の
パラメタＦ、は、例えばある一定時間毎にサンプリング
された入力音声を周波数分析した出力値であるから、そ
の総和で表わされる値、例えば１６チヤネルからなるパ
ラメタであればΣＦｉがパワＩ＋＋　Ｑ −ＰＦとなる。従って、マツチング部４では、入力パラ
メタ抽出部２によって抽出された音声入力のパラメタの
値の総和をとってパワーとし、そのパワーがある一定の
値ＰＷＨを越えた時間がカウントされ、そのカウント値
Ｔ６Ｆが、認識動作において認識語の辞書中のＴｏＶど
比較される。その結果、ＴａＶがある範囲内にある場合
には音声入力レベルが正常であると判断され、ある範囲
外である場合には音声出力装置７を通してスピーカ８か
ら話者に対してコメントが送出される。音声出力装置７
は、メソセージを自己に保存しており、音声ナンバーと
呼ばれるメツセージと対応づけられた値を指定すること
により音声出力を行なうものである。In FIG. 2, the dictionary 5 contains a certain value PFH for the voice power parameter as shown in FIG.
The time Tott exceeding Tott is stored as part of the parameters. Assuming that the configuration of the variation part in the conventional dictionary 5 is as shown in FIG. 4(A), the dictionary 5 shown in FIG. The input time Tov exceeding a certain constant value PWH is stored in the parameter section as part of the parameters. Then, the matching section 4 starts counting the time when the power of the input voice exceeds a certain value Pry, and continues counting the time until the point when the power of the input voice exceeds Prrr. Here, the parameter F of the input audio extracted by the input parameter extraction unit 2 is, for example, an output value obtained by frequency analysis of the input audio sampled at a certain fixed time interval, and therefore a value represented by the sum of the values, for example, 16 channels. If the parameter consists of, ΣFi becomes the power I++Q−PF. Therefore, in the matching section 4, the sum of the parameter values of the audio input extracted by the input parameter extraction section 2 is taken as the power, and the time when the power exceeds a certain value PWH is counted, and the count value T6F is calculated. In the recognition operation, the ToV of the recognized word in the dictionary is compared. As a result, if TaV is within a certain range, the audio input level is determined to be normal, and if it is outside a certain range, a comment is sent from the speaker 8 to the speaker through the audio output device 7. . Audio output device 7
The device stores messages in itself, and outputs audio by specifying a value called a voice number associated with the message.

通常は音声合成装置が使用される。Usually a speech synthesizer is used.

〔Effect of the invention〕

以上の説明から明らかなように、本発明によれば、普通
入力音声のピーク値によって表示を行なうレベル検出器
によらず、入力パラメタ抽出部の出力に基づいて入力音
声のレベルが適正か否かの判断を行ない、音声出力装置
を通して話者に適切なコメントを送出することかでき、
音声認識装置から離れた場所で音声入力を行なう場合に
も常に適正レベルの入力によって認識処理を行なうこと
ができ、認識率の向上を計ることができる。As is clear from the above description, according to the present invention, it is possible to determine whether or not the level of input audio is appropriate based on the output of the input parameter extracting section, rather than relying on a level detector that normally displays based on the peak value of input audio. and send appropriate comments to the speaker through the audio output device.
Even when voice input is performed at a location away from the voice recognition device, recognition processing can always be performed with an appropriate level of input, and the recognition rate can be improved.

[Brief explanation of drawings]

第１図は音声認識装置の従来の構成例を示す図、第２図
は本発明の１実施例構成を示す図、第３図は時間と音声
パワーとの関係の例を示す図、第４図は辞書のパラメタ
部の構成例を示す図である。１・・・音声入力部、２・・・入力パラメタ抽出部、３
・・・レベル検出器、４・・・マツチング部、５・・・
辞書、６・・・ホスト・プロセッサ、７・・・音声出力
装置！：、８・・・スピーカ。特許出願人　菖士通株式会社代理人弁理士　京　谷　四　部FIG. 1 is a diagram showing an example of the conventional configuration of a speech recognition device, FIG. 2 is a diagram showing the configuration of one embodiment of the present invention, FIG. 3 is a diagram showing an example of the relationship between time and voice power, and FIG. The figure shows an example of the configuration of a parameter section of a dictionary. 1... Audio input section, 2... Input parameter extraction section, 3
...Level detector, 4...Matching section, 5...
Dictionary, 6... host processor, 7... audio output device! :, 8...Speaker. Patent applicant: Soshitsu Co., Ltd. Representative patent attorney: Yotsube Kyotani

Claims

[Claims]

a voice input section, an input frameta extraction section for extracting the frameta of the input voice from the voice input section, a dictionary for accumulating the frameta, and the above (of the input voice extracted by the frameta extraction section). In the speech recognition device, the speech recognition device includes a matching unit that compares the character string with the parameters stored in the dictionary and sends the closest recognized character string to the host processor. The matching section calculates the power of the input voice from the parameter extracted by the above-described parameter extraction section, and calculates the power of the input audio from the parameter extracted by the parameter extraction section. Count the time when the noise level exceeds the certain value, compare the time with the time stored in the dictionary, check whether the noise level of the input voice is appropriate, and determine that it is not appropriate. A voice recognition device characterized in that the voice recognition device is configured to notify by the notification means when the notification is made.