JPS6226480B2

JPS6226480B2 -

Info

Publication number: JPS6226480B2
Application number: JP54091913A
Authority: JP
Inventors: Masaru Nishimura; Yoshinobu Nishikawa; Tetsuo Shimizu; Yoji Sugiura
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1979-07-18
Filing date: 1979-07-18
Publication date: 1987-06-09
Also published as: JPS55151700A

Description

【発明の詳細な説明】本発明は、テレビジヨン受像機等の音声制御装
置に係り、特に作動中の受像機等に別の指令を出
し音声制御を行う際に、頗度の高い指令語に対す
る認識許容値を大きくとると共に、その識別（認
識）出力によつて受像機の音声回路にミユーテイ
ングをかけ、受像機自身から発する音声（音響）
出力による誤動作を防止することを目的とする。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an audio control device for a television receiver, etc., and in particular, when issuing another command to an operating receiver, etc. to perform audio control, it is possible to In addition to increasing the recognition tolerance value, muting is applied to the audio circuit of the receiver using the identification (recognition) output, and the sound (acoustic) emitted from the receiver itself.
The purpose is to prevent malfunctions caused by output.

指令者〔或る操作者、以下、肉声を原情報とし
て被制御装置（例えばテレビジヨン受像機等）を
遠隔的に制御する者を指す。〕の標準音声指令
（或は指示）内容（音声信号）を標準化し且つ量
子化することによつて標準（デジタル信号）パタ
ーンとして予め記憶しておき、後に発せられる音
声指示内容（音声信号）を標準化した後量子化
し、必要に応じて標準音声信号との間の時間軸の
調整を行つてデジタル化した後RAM等のメモリ
に一時的に記憶し、前記標準パターンとの比較に
より、一定の許容値をもつて合致した際、オン・
オフ制御を行うことが提唱されている。 Commander (an operator, hereinafter referred to as a person who remotely controls a controlled device (for example, a television receiver, etc.) using real voice as source information). ] By standardizing and quantizing the standard voice command (or instruction) content (voice signal), it is stored in advance as a standard (digital signal) pattern, and the voice instruction content (voice signal) to be issued later is stored in advance as a standard (digital signal) pattern. After standardization, it is quantized, and if necessary, the time axis is adjusted with respect to the standard audio signal, digitized, and then temporarily stored in a memory such as RAM. When matched with a value, turns on.
It has been proposed to perform off control.

このような音声認識手段としては例えば第１図
の如く、入力音声を電気信号に変換する音響−電
気信号変換器（例えばマイクロフオン）を含む入
力部１、音声信号の特徴を抽出する、特徴抽出部
２、あらかじめ登録された音声特徴の標準パター
ンを記憶する標準パターン記憶部３、入力音声か
ら抽出された特徴パターンと標準パターンとを比
較し、入力音声を特定する認識処理部４、認識結
果にもとづき例えばテレビ受信機の電源、チヤン
ネル、音量等を制御する出力制御部５を主な構成
要素とし、これに認識率を向上させる為の入力信
号振巾正規化回路６、時間軸調整部７、あらかじ
め音声特徴の標準パターンを登録する為の登録制
御部８が付加される。 Examples of such voice recognition means include, as shown in FIG. 1, an input unit 1 including an acoustic-electrical signal converter (for example, a microphone) that converts input voice into an electric signal, and a feature extraction unit that extracts the characteristics of the voice signal. Department
2. Standard pattern storage unit 3 that stores standard patterns of voice features registered in advance; Recognition processing unit 4 that compares the characteristic patterns extracted from the input voice with the standard pattern and identifies the input voice; Based on the recognition results; For example, the main component is an output control section 5 that controls the power, channel, volume, etc. of a television receiver, and an input signal amplitude normalization circuit 6 and a time axis adjustment section 7 to improve the recognition rate. A registration control section 8 is added for registering standard patterns of audio features.

音声の特徴を抽出するパラメータとしては、周
波数スペクトル分布、相関関数、零交差数、フオ
ルマルト周波数或いは線型予測係数など多くの方
法が考えられるが、これらのうち音声の周波数ス
ペクトルを複数の周波数フイルタにより分離抽出
し標準パターンとの相関を調べるいわゆるフイル
タバンク方式は比較的簡単な構成で高い認識率を
得ることが出来る方法としてよく用いられてい
る。 Many methods can be considered as parameters for extracting voice features, such as frequency spectrum distribution, correlation function, number of zero crossings, formal frequency, or linear prediction coefficient, but among these methods, the frequency spectrum of voice is separated using multiple frequency filters. The so-called filter bank method, which extracts a pattern and examines its correlation with a standard pattern, is often used as a method that can obtain a high recognition rate with a relatively simple configuration.

このような音声による制御装置内制御内容とし
ては電源の入切、チヤンネルや局番の変更指定、
音量の変更などがあり、例えば電源については
「デンゲン・イリ（キリ）」などと発声制御できる
が音量についてはもともとアナログ的な連続可変
制御である為、音声による制御が困難であつた。
このような、アナログ制御量を音声指令制御する
方法につき、本件出願人等は、既に特願昭54−
59235号として出願しており、その大要は、この
ようなアナログ量を、音声にて数値指示し、その
内容を音声識別装置によつて判別して、例えば10
進値として取り出し、複数の可変減衰器或は可変
利得増巾器等の組み合せを変更することによつて
前記アナログ量のレベルを制御し且つそのレベル
を表示せんとする点にある。 Controls within the control device using such voice include turning the power on and off, specifying changes to channels and station numbers,
For example, the power source can be controlled by saying things like ``dengen iri'' (kiri), but since the volume is originally an analog continuous variable control, it was difficult to control it by voice.
Regarding this method of controlling the analog control amount by voice command, the applicant and others have already filed a patent application filed in 1973-
The application has been filed as No. 59235, and the gist of the application is to numerically indicate such an analog quantity by voice, and the content is discriminated by a voice recognition device, for example, 10
The object of the present invention is to control the level of the analog quantity by extracting it as a binary value, and changing the combination of a plurality of variable attenuators or variable gain amplifiers, and to display the level.

ところで、このような音量の調整若しくは、受
像中にチヤンネルを切換える場合の如く、既にテ
レビジヨン受像機が作動している場合には、受像
機自身から放声される音声中の「イチ」「ニ」…
の如き数字によつて誤動作を生じることがある。 By the way, when the television receiver is already operating, such as when adjusting the volume or changing channels while receiving a picture, the "1" and "2" in the sound emitted from the receiver itself …
Malfunctions may occur due to numbers such as .

特に、米要頗度語の認識許要度を大きくとり、
特定認識率を落して音声認識の許容度を大きくと
ることによつて、指令者の発声コンデイシヨンに
よる認識のバラツキを少くしようとする場合に
は、このような配慮が不可欠である。 In particular, we have increased the level of recognition requirements for American and Chinese words.
Such considerations are essential when attempting to reduce the variation in recognition due to the utterance conditions of the dispatcher by lowering the specific recognition rate and increasing the tolerance of speech recognition.

本発明は、斯る点に鑑み、音声指令時に受像機
の音声回路、特にスピーカ等の放声手段を駆動す
る回路にミユーテイングをかける方法を提案する
ものである。以下本発明の詳細を一実施例を表わ
す第２図を参照しつつ説明する。 In view of this, the present invention proposes a method of mutating the audio circuit of a receiver, especially the circuit that drives the sound emitting means such as a speaker, when a voice command is given. The details of the present invention will be explained below with reference to FIG. 2 showing one embodiment.

この実施例においては、音声認識のための特徴
パラメータとしてフイルタバンク方式（周波数ス
ペクトル方式）を採用した音声認識装置を組込ん
だテレビジヨン受像機の音声による制御装置を例
に採つて説明するが、放声手段を備えるものであ
れば、被制御装置を選ばない。 This embodiment will be explained by taking as an example a voice control device for a television receiver incorporating a voice recognition device that employs a filter bank method (frequency spectrum method) as a feature parameter for voice recognition. Any controlled device may be used as long as it is equipped with a sound emitting means.

通常機器の前面に取りつけられる入力部１は有
指向性及び無指向性の２つのマイクロフオン１０
と１１の図示の如き差動接続と増巾器１２により
構成される。即ち有指向性マイクロフオン１０に
対し無指向性マイクロフオン１１は逆位相接続さ
れており、従つて指向特性範囲外からの音声信
号、即ち制御命令音声以外の信号は相殺され、指
向特性範囲内の制御命令音声のSN比はこれによ
つて高められる。本発明は音声指令を受けた際、
TV受像機等のスピーカから流れる音声中の指令
類似語による誤動作を防止するために、重要語
（最頗度語）「デンゲン」或は「パワー」、「チヤン
ネル」、「オンリヨー」又は「ボリユーム」等の指
令（注、これらの用語については、識別のための
許容値を大きく取つてある）が識別された際に
は、第２図に別途要部回路図として開示せる如き
いわゆる初期ミユーテイング回路を設けておき、
一時的にスピーカ出力を断つか若しくは大巾に減
衰せしめる。振巾正規化機能を併せ持つ特徴抽出
部２は、複数個のフイルタ１３−１，１３−２…
１３−Ｎ及び入力信号の全振巾を検知するレベル
検出回路１４、各フイルタの出力をデジタル信号
に変換するＡ−Ｄ（アナログ−デジタル）変換器
１５、該Ａ−Ｄ変換器に前置され前記各フイルタ
出力と前記レベル検出回路１４出力との比をとる
ことにより、フイルタ出力振巾を正規化するアナ
ログ割算器等によつて構成された振巾正規化回路
１６、更に該振巾正規化回路と前記フイルタ群と
の間の挿入され、該フイルタの各出力の接続を切
り替えるマルチプレクサ１７によつて構成され
る。斯る構成により前記入力部１から入力した音
声信号の各フイルタ成分が適当な時間間隔（多く
の場合10ミリ秒前後）で順次サンプリング標本化
され、更に各サンプリング値を量子化することに
よつてデジタルコードに変換された後、マイクロ
コンピユータ若しくは中央処理装置（CPU）１
８のＩ／Ｏポート（図示せず）を経て、記憶メモ
リー１９（通常RAM：ランダムアクセスメモ
リ）に記憶される。 The input section 1 , which is usually attached to the front of the device, has two microphones 10, one directional and one omnidirectional.
and 11 as shown in the figure, and an amplifier 12. That is, the omnidirectional microphone 11 is connected in opposite phase to the directional microphone 10, and therefore, audio signals from outside the directional characteristic range, that is, signals other than control command voices, are canceled out, and signals within the directional characteristic range are canceled out. The signal-to-noise ratio of the control command voice is thereby increased. When the present invention receives a voice command,
In order to prevent malfunctions due to similar command words in the audio coming from the speakers of TV receivers, etc., important words (most important words) ``Dengen'' or ``Power'', ``Channel'', ``Onlyo'' or ``Volume'' are used. When a command such as the following (note: large tolerance values are set for these terms) is identified, a so-called initial mutating circuit, which is separately disclosed as a main circuit diagram in Figure 2, is created. Set it up,
Temporarily cut off the speaker output or significantly attenuate it. The feature extraction unit 2 , which also has an amplitude normalization function, includes a plurality of filters 13-1, 13-2...
13-N, a level detection circuit 14 that detects the total amplitude of the input signal, an A-D (analog-digital) converter 15 that converts the output of each filter into a digital signal, and An amplitude normalization circuit 16 configured with an analog divider or the like normalizes the filter output amplitude by taking the ratio of each filter output and the output of the level detection circuit 14; The multiplexer 17 is inserted between the conversion circuit and the filter group and switches the connection of each output of the filter. With such a configuration, each filter component of the audio signal input from the input section 1 is sequentially sampled at appropriate time intervals (often around 10 milliseconds), and further quantized by quantizing each sampling value. After being converted into digital code, a microcomputer or central processing unit (CPU) 1
The data is stored in a storage memory 19 (usually RAM: random access memory) through an I/O port (not shown) 8.

前記Ａ−Ｄ変換の過程において、標本化された
各量を量子化する際、一様量子化することもでき
るが、別途手動調整手段を設ける際には、段階的
にその調整器の制御指示量（例えばボリユームの
回転角）と制御レベルとの関係に合わせて非直線
的に量子化を計ることもできる。マイクロコンピ
ユータ等で構成されるCPU（中央処理装置）１
８には別の標準パターンメモリ３が接続されてお
り、予め指令者の音声指令（制御指令が、標本化
され量子化された形でその制御内容を指定するコ
ードと共に記憶されている。制御命令音声（音声
による指令）の標準パターンメモリへの登録は、
例えばテレビ受信機の制御の場合には次の様に行
なう。第３図はテレビ受信機のコントロールパネ
ルの１例であり、入力マイク２０、登録モードス
イツチ２１指令者が選択する指令者（話者）番号
指定スイツチ１、２２−１同２２２−２…、制御
命令指定スイツチ電源のオン、オフ切替、音量変
更、チヤンネル切替に各対応してそれぞれ「電
源」指定スイツチ２３「音量」指定スイツチ２４
「チヤンネル」指定スイツチ２５として、又音量
及びチヤンネル指定を行なう数字ボタン２６−
１，２６−２，２６−３…２６−１１，２６−１
２が各対応する表示ランプ２７−１，２７−２，
…２７−１２と共に配設されている。又下部の
「OK」表示ランプ２８は認識又は登録が良好に
完了したとき、「REPEAT」表示ランプ２９は同
じく不良であつたときそれぞれ点灯表示するもの
である。かかる登録制御部３０を用いて標準パタ
ーンを登録するには、まず登録スイツチ２１を押
して登録モードとし、次に話者番号を同指定スイ
ツチ２２−１又は２２−２…、にて指定した上
で、以下順次「電源」スイツチ２３を押して例え
ば「デンゲン」あるいは「パワー（POWER）」、
「音量」スイツチ２４を押して「オンリヨー」又
は「ボリユーム」と発声する。「チヤンネル」ス
イツチ２５を押すとと、登録制御回路（図示せ
ず）は、モード切替信号を出力し、チヤンネル押
ボタンスイツチ回路３１の出力を切替回路３２を
経て前記登録制御回路側に切替える。これにより
前記スイツチ回路３１に含まれる数値指定ボタン
２６−１，２６−２…（第３図）を押して「イ
チ」「ニ」…を発声すると、それぞれの音声は入
力部１、特徴抽出部２を経て各制御内容（電源、
音量チヤンネル１、２、３、…）に対応するコー
ドと共に標準パターンメモリー３に記憶される。 In the process of A-D conversion, when each sampled quantity is quantized, it is possible to uniformly quantize it, but when a separate manual adjustment means is provided, control instructions for that adjuster are given in stages. Quantization can also be measured non-linearly depending on the relationship between the quantity (for example, the rotation angle of the volume) and the control level. CPU (Central Processing Unit) 1 consisting of a microcomputer, etc.
8 is connected to another standard pattern memory 3, in which voice commands (control commands) of the commanding person are stored in advance together with a code specifying the control contents in a sampled and quantized form.Control commands To register voice (voice command) to standard pattern memory,
For example, in the case of controlling a television receiver, it is performed as follows. FIG. 3 shows an example of a control panel of a television receiver, which includes an input microphone 20, a registration mode switch 21, a dispatcher (speaker) number designation switch 1, 22-1, 222-2, etc. selected by the dispatcher, control Command designation switch "Power" designation switch 23 "Volume" designation switch 24 corresponding to power on/off switching, volume change, channel switching, respectively
Numerical buttons 26- are used as "channel" designation switch 25 and also to designate volume and channel.
1, 26-2, 26-3...26-11, 26-1
2 corresponds to each display lamp 27-1, 27-2,
...It is arranged together with 27-12. Further, the "OK" indicator lamp 28 at the bottom lights up when recognition or registration is successfully completed, and the "REPEAT" indicator lamp 29 lights up when the same goes wrong. To register a standard pattern using the registration control unit 30 , first press the registration switch 21 to enter the registration mode, then specify the speaker number with the specification switch 22-1 or 22-2, and then press the registration switch 21 to enter the registration mode. , and then press the "power" switch 23 one after another to select, for example, "DENGE" or "POWER".
Press the "volume" switch 24 and say "only" or "volume". When the "channel" switch 25 is pressed, the registration control circuit (not shown) outputs a mode switching signal and switches the output of the channel pushbutton switch circuit 31 via the switching circuit 32 to the registration control circuit side. As a result, when the numerical designation buttons 26-1, 26-2... (FIG. 3) included in the switch circuit 31 are pressed and ``1'', ``ni'', etc. are uttered, the respective sounds are transmitted to the input section 1 and the feature extraction section 2. Each control content (power supply,
are stored in the standard pattern memory 3 along with codes corresponding to the volume channels 1, 2, 3, . . . ).

さて通常の認識モードでは、前述の制御音声が
入力し、特徴抽出フイルタ１３−１，１３−２…
１３−Ｎにより抽出されデジタル化された信号列
はRAM等の記憶メモリ１９に記憶され、次いで
CPU１８はこの記憶パターンと標準パターンと
の差を、全ての標準パターンについて計算しその
差が最も小さい標準パターンを決定することによ
り入力音声を特定する。一般に人間の話声は同じ
言語で発声してもその時間的推移は常に同等とは
限らない為、第１図に示すが如き何らかの時間軸
調整回路が付加されなければならない。第２図に
於ては説明の都合上かかる時記軸調整回路は省略
している。 Now, in the normal recognition mode, the aforementioned control voice is input, and the feature extraction filters 13-1, 13-2...
The signal string extracted and digitized by 13-N is stored in a storage memory 19 such as RAM, and then
The CPU 18 specifies the input voice by calculating the difference between this stored pattern and the standard pattern for all standard patterns and determining the standard pattern with the smallest difference. In general, even if human speech is uttered in the same language, the time course of the speech is not always the same, so some kind of time axis adjustment circuit as shown in FIG. 1 must be added. In FIG. 2, the time axis adjustment circuit is omitted for convenience of explanation.

認識モードに於ける音声の取り込みは常時行な
われており、入力音声が途切れたとき即ちポーズ
期間に前述の認識計算が実行されそれ以前の入力
音声、パターンマツチング法により特定される。
この時入力音声について特定が可能となつた時、
即ち入力音声が何らかの標準パターンに許容され
得る誤差の範囲内で一致した時、CPU１８は出
力制御回路３３に対し、テレビ受信機の各該当制
御要素を制御すべく指示出力する。例えば「デン
ゲン・イリ（キリ）」という入力音声を認識した
とき出力制御回路３３はテレビ受信機の電源回路
３４をON−OFF制御する。又、「チヤンネル・
※※」（※※は１〜12までの数字）という入力音
声を認識したとき、出力制御回路３３はチヤンネ
ル切替回路３５に出力し、これによりチユーナを
切替制御する。 In the recognition mode, voice is always captured, and when the input voice is interrupted, that is, during a pause period, the above-mentioned recognition calculation is executed, and the previous input voice is identified by pattern matching.
At this time, when it became possible to identify the input audio,
That is, when the input audio matches some standard pattern within an allowable error range, the CPU 18 outputs an instruction to the output control circuit 33 to control each corresponding control element of the television receiver. For example, when the input voice "Dengen Iri (Kiri)" is recognized, the output control circuit 33 controls the power supply circuit 34 of the television receiver to turn on and off. Also, “Channel・
When the input voice ``**'' (** is a number from 1 to 12) is recognized, the output control circuit 33 outputs the output to the channel switching circuit 35, thereby switching and controlling the tuner.

上述の如く、音声指令認識モードにおける音声
信号の取り込みは、指令継続中常時行なわれてお
り入力音声が途切れたとき、即ち一定の指令単位
の間隔（ポーズ期間）にCPU１８によつて認識
計算が実行され、それまでの入力音声指令がパタ
ーンマツチング法によつて特定される。 As mentioned above, the acquisition of voice signals in the voice command recognition mode is always performed while the command continues, and recognition calculations are executed by the CPU 18 when the input voice is interrupted, that is, at intervals of a certain command unit (pause period). The input voice commands up to that point are identified by the pattern matching method.

上述の如く、被制御機器であるTV受像機のス
ピーカ出力その他指令者以外の発する類似者によ
る誤動作をさけるために、最頗度指令語について
は、多少パターンマツチングの比較許容度を大き
くとる。この時、入力音声について特定が可能と
なつた時、即ち、入力音声が何らかの標準パター
ンに許容され得る誤差の範囲内で一致した時、
CPU１８は出力制御回路３３を制御してテレビ
受信機の音声出力を一定時間ミユーテイングさせ
る。第２図の場合、出力制御回路３３はテレビ受
像機の音声復調増巾回路５６の出力増巾トランジ
スタ５７のバイアス回路６１の電圧Vcを落すこ
とにより、該トランジスタのコレクタよりコンデ
ンサ５８を介して接続されたスピーカ５９の音声
出力を停止させる。尚、前記音声回路５６の出力
側に接続されたイヤホーン回路６０に対してはミ
ユーテイングは不必要である。通常制御命令語は
例えば「デンゲン」・「イリ」・「デンゲン」・「キ
リ」、「チヤンネル」・「イチ」、「チヤンネル」・
「ニ」のように複数の単語の連続により構成され
ているので、例えば「チヤンネル」という入力音
声を認識したときスピーカ音声はミユーテイング
され以後の「イチ」又は「ニ」の音声入力はテレ
ビ受信機が発生する音が無くなるためSN比はき
わめて向上し、認識率はきわめて向上する。電源
の入・切チヤンネル変更、音量変更いずれの制御
内容からも、かかる音声のミユーテイングは何等
機能上の欠点となることはない。 As mentioned above, in order to avoid malfunctions caused by the speaker output of the TV receiver, which is the controlled device, and other similar sounds issued by persons other than the commanding person, the comparative tolerance of pattern matching is set to be somewhat large for the most severe command word. At this time, when it becomes possible to specify the input voice, that is, when the input voice matches some standard pattern within an allowable error range,
The CPU 18 controls the output control circuit 33 to mute the audio output of the television receiver for a certain period of time. In the case of FIG. 2, the output control circuit 33 is connected to the collector of the transistor via the capacitor 58 by lowering the voltage Vc of the bias circuit 61 of the output amplification transistor 57 of the audio demodulation amplification circuit 56 of the television receiver. The audio output from the speaker 59 is stopped. Note that muting is not necessary for the earphone circuit 60 connected to the output side of the audio circuit 56. Normal control command words are, for example, "dengen", "iri", "dengen", "kiri", "channel", "ichi", "channel",
For example, when the input voice ``channel'' is recognized, the speaker voice is muted, and the subsequent voice input of ``ichi'' or ``ni'' is transmitted to the TV receiver. Since the sound generated by the noise is eliminated, the signal-to-noise ratio is greatly improved, and the recognition rate is greatly improved. This muting of audio does not result in any functional drawbacks, regardless of the control contents, such as changing the power on/off channel or changing the volume.

なお、上記実施例においては、音声出力トラン
ジスタのバイアス電圧を地絡する方法を採つた
が、代りに、ドライブ段のゲインを変える方法、
或はイヤホン回路に無関係な個所にミユーテイン
グ時にのみ減衰回路を挿入する方法を採用しても
よく、その際、減衰回路の特性として、上記特徴
抽出部２のフイルタ１３−１，１３−２…１３−
Ｎの複数フイルタと逆の特性を持つフイルタを、
CPU出力でオンオフ制御されるスイツチング素
子を介して、一時的にミユーテイング時にのみ音
声回路中の信号路に直列に接続する構成を採用し
ても良いことは言を俟たない。又このようなフイ
ルタとして、デジタルフイルタを用いればIC化
することも可能である。本発明は、上述の如き構
成であるから、音声認識装置を備えたテレビ受像
機等音声機器の音声出力回路を音声認識装置の出
力制御回路の出力で制御することにより、入力音
声を感知した時音声機器の出力音声を適当なレベ
ルにまで減衰させることにより以後の入力音声の
SN比を増大させるものであり、従つてこの種音
声認識装置の認識率を向上せしめるにきわめて有
効である。 In the above embodiment, a method was adopted in which the bias voltage of the audio output transistor was grounded, but instead, a method of changing the gain of the drive stage,
Alternatively, a method may be adopted in which an attenuation circuit is inserted only during muting at a location unrelated to the earphone circuit, and in this case, the characteristics of the attenuation circuit include the filters 13-1, 13-2...13 of the feature extracting section 2 . −
A filter with characteristics opposite to N multiple filters is
Needless to say, a configuration may be adopted in which the signal is connected in series to the signal path in the audio circuit only during temporary muting via a switching element that is on/off controlled by the CPU output. Moreover, if a digital filter is used as such a filter, it is also possible to implement it into an IC. Since the present invention has the above-described configuration, by controlling the audio output circuit of an audio device such as a television receiver equipped with a voice recognition device with the output of the output control circuit of the voice recognition device, when input voice is sensed, By attenuating the output audio of the audio equipment to an appropriate level, the subsequent input audio can be improved.
This increases the SN ratio and is therefore extremely effective in improving the recognition rate of this type of speech recognition device.

[Brief explanation of the drawing]

第１図は、音声認識装置の要部ブロツク図、第
２図は本発明の要部実施回路、第３図は被制御機
器の操作パネルの正面図を表わす。１……入力
部、２……特徴抽出部、３……標準パターンメモ
リ、１８……CPU、３７，４１，４５，５４，
６８……ANDゲート、４４，４６……ORゲー
ト、４２，５０……ラツチ回路、５３……アナロ
グスイツチ、６０……分圧抵抗、４３，６９……
Ａ−Ｄ変換回路、５７……音声出力トランジス
タ、５９……スピーカ、６１……バイアス回路。 FIG. 1 is a block diagram of a main part of a speech recognition device, FIG. 2 is a circuit for implementing a main part of the present invention, and FIG. 3 is a front view of an operation panel of a controlled device. 1 ...Input unit, 2 ...Feature extraction unit, 3 ...Standard pattern memory, 18...CPU, 37, 41, 45, 54,
68...AND gate, 44, 46...OR gate, 42, 50...Latch circuit, 53...Analog switch, 60...Voltage dividing resistor, 43, 69...
A-D conversion circuit, 57...Audio output transistor, 59...Speaker, 61...Bias circuit.

Claims

[Claims]

1. An input unit that converts a first input voice and a second input voice that follows the first input voice into electrical signals, a feature extraction unit that extracts features of the electrical signal, and a standard pattern storage unit that stores voice features as a standard pattern; and a recognition processing unit that compares the feature pattern extracted by the feature extraction unit and the standard pattern to identify the first and second input voices. , an output control section that controls a television receiver or the like having a sound emitting means based on the recognition result of the recognition processing section;
and a muting circuit that mutates the audio output from the sound emitting means in response to a signal from the output control section for a period during which the input section inputs the second input audio. Control device by.