JPH06337697A

JPH06337697A - Voice recognition device

Info

Publication number: JPH06337697A
Application number: JP5129454A
Authority: JP
Inventors: Koichi Ide; 廣一井出; Toshiyuki Watanabe; 俊幸渡辺
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1993-05-31
Filing date: 1993-05-31
Publication date: 1994-12-06
Anticipated expiration: 2014-11-10
Also published as: JP2975808B2

Abstract

PURPOSE:To improve the precision in recognition by using a voice recognition result and adjusting an input part. CONSTITUTION:After a sound inputted from a microphone 1 is amplified by a preamplifier 2A and a variable gain amplifier 2B, is converted to a digital signal by an A/D converter 3 to be supplied to a digital signal processor(DSP) 4. After the digital signal is voice recognition processed in the DSP 4, is outputted from an interface circuit 5 through an optimum format or port. At this time, when no recognition succeeds, a signal adjusting an amplifier gain is outputted from the DSP 4, and a gain switching circuit 2C of the variable gain amplifier 2B is switched by the signal. That is, when no recognition succeeds, the circuit 2C is switched so as to increase the gain of the variable gain amplifier 2B. Further, since a success/a failure in recognition is a probable problem, a method feeding back the result by a human and the method, etc., probably learning in the DSP side 4 may be used also.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は入力手段として音声を用
いる情報機器や民生機器一般に利用できる音声認識装置
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device which can be used in general information equipment and consumer equipment which use voice as input means.

【０００２】[0002]

【従来の技術】人間の発声音は発声者ごとに異なると共
に同一発声者でも発声のたびに変動する。初期の音声認
識技術ではこれらの時間領域での変動をいかに整合させ
パターンマッチングするかの検討がなされた。さらに、
数理統計手法やニューラルネットを導入する事で周波数
・スペクトル両領域でのモデル化が行われ不特定話者の
単語単位での認識率は格段に向上してきた。2. Description of the Related Art Human utterance sounds are different for each speaker and also vary for each speaker even if they are the same speaker. In the early speech recognition technology, how to match these fluctuations in the time domain and pattern matching was examined. further,
By introducing mathematical statistical methods and neural nets, modeling in both frequency and spectrum regions has been performed, and the recognition rate in word units of unspecified speakers has improved dramatically.

【０００３】このような技術開発に伴い現在では音声認
識が手軽に行えるパーソナルソコンピュータ内蔵ボード
が多数製品化されている。With the development of such technology, a large number of personal computer built-in boards capable of easily recognizing voice have been commercialized.

【０００４】一方、民生品分野でも複雑化したＡＶ機器
の操作の単純化の１つのアプローチとして音声入力によ
る操作入力という試みがなされており、例えば、雑誌
「テレビ技術」（’９１年５月号第３８頁〜第４４
頁）には音声認識技術を利用したリモートコントロール
装置が紹介されているように一部商品化されている。On the other hand, in the field of consumer products, an attempt has been made to input an operation by voice input as one approach for simplifying the operation of a complicated AV device. Pages 38-44
Page) has been partially commercialized as a remote control device using voice recognition technology is introduced.

【０００５】一般的な音声認識に使用される回路構成例
を図２に示す。一般にはディジタル信号処理が行われる
ため、入力マイク１・アンプ系２とＡＤ（アナログ・デ
ィジタル）変換処理部３とコンピュータやＤＳＰ（ディ
ジタル・シグナルプロセッサ）もしくは専用の音声認識
用ＩＣで構成される認識処理部４と外部インターフェイ
ス部５に分けられる。FIG. 2 shows a circuit configuration example used for general voice recognition. Generally, since digital signal processing is performed, the input microphone 1, the amplifier system 2, the AD (analog / digital) conversion processing unit 3, the computer, the DSP (digital signal processor), or the recognition composed of a dedicated voice recognition IC It is divided into a processing unit 4 and an external interface unit 5.

【０００６】[0006]

【発明が解決しようとする課題】現在製品化されている
技術は上述の単語単位（ユニット）あるいはその逐次処
理の範疇のもので、この場合、予め決められた単語のみ
を認識できれば良いので切り出した音声と用意したデー
タベースをどれだけうまくマッチングさせるかで認識率
が決まる。The technology currently commercialized is in the category of the above-mentioned word unit (unit) or its sequential processing. In this case, it is necessary to recognize only a predetermined word. The recognition rate is determined by how well the voice and the prepared database are matched.

【０００７】これは、認識のアルゴリズム上の問題と共
に周辺回路でどれほどうまく必要な音声を取り込めるの
かも重要な要素となる。[0007] This is an important factor as well as the problem of the recognition algorithm and how well the peripheral circuit can capture the necessary voice.

【０００８】簡単のため入力音声信号が図３に示すよう
な単一周波数であるとして考える。入力が過大レベルの
場合、認識の演算部へ取り込む前のアンプ系でオーバレ
ンジとなって図４に示すとおり波形はピークがクリップ
され台形状となり高調波成分が重畳する。この過大入力
レベルでは波形自体が変わってしまっている。つまり、
認識の重要なパラメータである周波数のパターンが変わ
ってしまい認識できなくなってしまう。For simplicity, assume that the input voice signal has a single frequency as shown in FIG. When the input is at an excessive level, it becomes overrange in the amplifier system before being taken into the recognition calculation unit, and the peak is trapped in the waveform as shown in FIG. At this excessive input level, the waveform itself has changed. That is,
The frequency pattern, which is an important parameter for recognition, changes and it becomes impossible to recognize.

【０００９】上述の様な過大入力時の対応としてはレン
ジオーバとならない用に予めアンプゲインを落としてし
まうことも考えられるが、逆に過小レベルの認識率が大
幅に低下してしまう。また、入力レベルに応じてゲイン
を可変させる方法もあるが、予めある程度入力レベルが
決まっており同期信号等の基準がある映像信号と異なり
音声信号はもともと時間軸・レベルとも基準がなく、か
なりダイナミックに変動するため、この方法では危険で
ある。As a countermeasure against the excessive input as described above, it is conceivable to drop the amplifier gain in advance in order to prevent the range from being exceeded, but on the contrary, the recognition rate of the under level is greatly lowered. There is also a method of varying the gain according to the input level, but unlike a video signal that has a reference such as a synchronization signal whose input level has been determined in advance to some extent, the audio signal originally has no time axis / level reference and is quite dynamic. This method is dangerous because it fluctuates.

【００１０】そこで、本考案では音声認識結果を使って
入力部の調整を行うことで認識の精度を向上させようと
言うものである。In view of this, the present invention intends to improve the recognition accuracy by adjusting the input section using the voice recognition result.

【００１１】[0011]

【課題を解決するための手段】同一人物がマイク等に向
かって発生する場合、ある一定時間内に決められた単語
を発生する時を考えると、マイクを比較的同じ様な位置
に持ってゆき同じ様な音圧で発生することが多い。そこ
で、音声入力もレベルーオーバ等で歪む場合は連続して
発生すると考えられる。[Means for Solving the Problems] When the same person occurs toward a microphone or the like, consider the time when a predetermined word is generated within a certain fixed time, and bring the microphone to a relatively similar position. It often occurs with similar sound pressure. Therefore, if the voice input is also distorted due to level over or the like, it is considered to occur continuously.

【００１２】そこで、音声認識に不成功の場合はゲイン
調整をする。この時、学習効果の有るような手法例えば
過去数回のデータを平均するなどして記憶して置くよう
にする。Therefore, when the voice recognition is unsuccessful, the gain is adjusted. At this time, a method having a learning effect, for example, data of the past several times is averaged and stored.

【００１３】[0013]

【作用】このようにすることで上述のような発声の特性
から認識率が向上していく。By doing so, the recognition rate is improved due to the above-mentioned utterance characteristics.

【００１４】[0014]

【実施例】図１に本発明の一実施例を示す。FIG. 1 shows an embodiment of the present invention.

【００１５】マイク１から入力された音声はプリアンプ
２Ａ、可変利得アンプ２Ｂで増幅後、ＡＤ変換器３でデ
ィジタル信号に変換されＤＳＰ４に供給される。ＤＳＰ
では音声認識処理を行った後インターフェイス回路５か
ら外部機器に任意のフォーマットもしくはポートを通じ
て出力される。The voice input from the microphone 1 is amplified by the preamplifier 2A and the variable gain amplifier 2B, converted into a digital signal by the AD converter 3, and supplied to the DSP 4. DSP
Then, after the voice recognition processing is performed, it is output from the interface circuit 5 to an external device through an arbitrary format or port.

【００１６】ここで、認識が不成功の場合、ＤＳＰ４か
らアンプゲインを調整する信号が出力され、この信号が
可変利得アンプ２Ｂの利得切り換え回路２Ｃを切り換え
る。即ち、音声認識が不成功の場合は可変利得アンプ２
Ｂの利得を増加せしめるように切り替える。When the recognition is unsuccessful, the DSP 4 outputs a signal for adjusting the amplifier gain, and this signal switches the gain switching circuit 2C of the variable gain amplifier 2B. That is, when the voice recognition is unsuccessful, the variable gain amplifier 2
Switch to increase B gain.

【００１７】尚、認識の成功・不成功は確率的な問題な
ので、ゲイン調整については人間が結果をフィードバッ
クしてやる方法やＤＳＰ側で確率的に学習させる方法等
を使用するようにしても良い。Since recognition success / failure is a probabilistic problem, a method in which a human feeds back the result, a method in which the DSP side learns stochastically, or the like may be used for gain adjustment.

【００１８】音声認識では先述の様に周波数成分が重要
なパラメータとなっており、レンジオーバの様な特徴的
な歪であれば周波数分析で有る程度の予測がつけられる
場合もあり、アルゴリズムのプログラムＲＯＭに余裕が
あればこの情報も調整時の判別要素に加えればさらに精
度は向上する。In the voice recognition, the frequency component is an important parameter as described above, and if there is a characteristic distortion such as a range over, it may be possible to make a prediction of the frequency analysis. If the ROM has a margin, the accuracy can be further improved by adding this information to the discrimination element at the time of adjustment.

【００１９】[0019]

【発明の効果】本発明によれば音声認識の成功率の重要
な要素となるマイクアンプ部での波形歪を認識結果をフ
ィードバックすることで改善させたので、認識率を向上
させることができる。According to the present invention, since the waveform distortion in the microphone amplifier section, which is an important factor in the success rate of voice recognition, is improved by feeding back the recognition result, the recognition rate can be improved.

[Brief description of drawings]

【図１】本発明を実施した音声認識装置のブロック図で
ある。FIG. 1 is a block diagram of a voice recognition device embodying the present invention.

【図２】従来の音声認識システムのブロック図である。FIG. 2 is a block diagram of a conventional voice recognition system.

【図３】動作説明のための波形図である。FIG. 3 is a waveform diagram for explaining the operation.

【図４】動作説明のための波形図である。FIG. 4 is a waveform diagram for explaining the operation.

[Explanation of symbols]

１マイクロフォン２Ａプリアンプ２Ｂ可変利得増幅器２Ｃ利得切り換え回路３ＡＤ変換器４デジタル・シグナルプロセッサ５インターフェィス回路 1 Microphone 2A Preamplifier 2B Variable Gain Amplifier 2C Gain Switching Circuit 3 AD Converter 4 Digital Signal Processor 5 Interface Circuit

Claims

[Claims]

1. A microphone, a variable gain amplifier for amplifying a voice signal obtained from the microphone, an analog-digital converter having an output obtained from the amplifier as an input, and a voice processing an output from the converter. A speech recognition apparatus, comprising: a recognition processing circuit, wherein the gain of the variable gain amplifier is adjusted based on the speech recognition rate in the speech recognition processing circuit.