JPS6259927B2

JPS6259927B2 -

Info

Publication number: JPS6259927B2
Application number: JP54059235A
Authority: JP
Inventors: Masaru Nishimura; Yoshinobu Nishikawa; Tetsuo Shimizu; Yoji Sugiura
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1979-05-14
Filing date: 1979-05-14
Publication date: 1987-12-14
Also published as: JPS55151234A

Description

【発明の詳細な説明】本発明は、テレビジヨン受像機等の音声遠隔制
御装置に係り、特に音声による指令によつてアナ
ログ量の段階的制御及びその表示を行い得る音声
遠隔制御装置を得ることを目的とするものであ
る。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an audio remote control device for a television receiver, etc., and in particular, to provide an audio remote control device that can perform stepwise control of analog quantities and display thereof using voice commands. The purpose is to

指令者〔或は操作者、以下、肉声を原情報とし
て被制御装置（例えばテレビジヨン受像機等）を
遠隔的に制御する者を指す。〕の指令（或は指
示）内容（音声信号）を標本化し且つ量子化する
ことによつて標準（デジタル信号）パターンとし
て予め記憶しておき、後に発せられる音声指示内
容（音声信号）を標本化した後量子化し、必要に
応じて時間軸の調整を行つてデジタル化した後
RAM等のメモリに一時的に記憶し、前記標準パ
ターンとの比較により、一定の許容値をもつて合
致した際、オン・オフ制御を行うことが提唱され
ている。 Commander (or operator, hereinafter referred to as a person who remotely controls a controlled device (for example, a television receiver, etc.) using real voice as source information. ) by sampling and quantizing the command (or instruction) content (voice signal) and storing it in advance as a standard (digital signal) pattern, and then sampling the voice instruction content (voice signal) that will be issued later. After that, quantize it, adjust the time axis as necessary, and digitize it.
It has been proposed to temporarily store the pattern in a memory such as RAM, and perform on/off control when it is compared with the standard pattern and matches within a certain tolerance.

このような音声認識手段としては例えば第１図
の如く、入力音声を電気信号に変換する音響―電
気信号変換器（例えばマイクロフオン）を含む入
力部１、音声信号の特徴を抽出する、特徴抽出部
２、あらかじめ登録された音声特徴の標準パター
ンを記憶する、標準パターン記憶部３、入力音声
から抽出された特徴パターンと標準パターンとを
比較し、入力音声を特定する認識処理部４、認識
結果にもとづき例えばテレビ受信機の電源、チヤ
ンネル、音量等を制御する出力制御部５を主な構
成要素とし、これに認識率を向上させる為の入力
信号振巾正規化回路６、時間軸調整部７、あらか
じめ音声特徴の標準パターンを登録する為の登録
制御部８が付加される。 Examples of such voice recognition means include an input unit 1 including an acoustic-electrical signal converter (for example, a microphone) that converts input voice into an electric signal, and a feature extraction unit that extracts the characteristics of the voice signal, as shown in FIG. unit 2, standard pattern storage unit 3 that stores standard patterns of voice features registered in advance; recognition processing unit 4 that compares the feature pattern extracted from the input voice with the standard pattern and identifies the input voice; recognition result; Based on this, the main component is an output control section 5 that controls the power, channel, volume, etc. of a television receiver, and an input signal amplitude normalization circuit 6 and a time axis adjustment section 7 to improve the recognition rate. , a registration control section 8 is added for registering standard patterns of audio features in advance.

音声の特徴を抽出するパラメータとしては、周
波数スペクトル分布、相関関数、零交差数、フオ
ルマント周波数或いは線型予測係数など多くの方
法が考えられるが、これらのうち音声の周波数ス
ペクトルを複数の周波数フイルタにより分離抽出
し標準パターンとの相関を調べるいわゆるフイル
タパンク方式は比較的簡単な構成で高い認識率を
得ることが出来る方法としてよく用いられてい
る。このような音声による制御装置の制御内容と
しては電源の入切、チヤンネル局番の変更指定、
音量の変更などがあり、例えば電源については
「デンゲン・トリ（キリ）」などと発声制御できる
が音量についてはもともとアナログ的な連続可変
制御である為音声による適当な制御方法が、提案
されていない。 Many methods can be considered as parameters for extracting voice features, such as frequency spectrum distribution, correlation function, number of zero crossings, formant frequency, or linear prediction coefficient. Among these methods, the frequency spectrum of voice is separated using multiple frequency filters. The so-called filterpunk method, which extracts a pattern and examines its correlation with a standard pattern, is often used as a method that can obtain a high recognition rate with a relatively simple configuration. The control contents of such a voice control device include turning the power on and off, specifying a change of channel number,
For example, the power supply can be controlled by saying things like "Dengen Tori (Kiri)", but since the volume is originally an analog continuously variable control, no suitable control method using voice has been proposed. .

本発明は、このようなアナログ量を、音声にて
数値指示し、その内容を音声識別装置によつて判
別して、例えば10進値として取り出し、複数の可
変減衰器或は可変利得増巾器等の組み合せを変更
することによつて前記アナログ量のレベルを制御
し且つそのレベルを表示せんとするものである。 According to the present invention, such an analog quantity is numerically indicated by voice, the contents are discriminated by a voice recognition device, and extracted as, for example, a decimal value, and a plurality of variable attenuators or variable gain amplifiers are used. By changing the combination of the above, the level of the analog quantity is controlled and the level is displayed.

以下、本発明の詳細を要部回路ブロツクダイア
グラムを表わす第２図及び第３図を参照しつつ説
明する。 Hereinafter, details of the present invention will be explained with reference to FIGS. 2 and 3 showing main circuit block diagrams.

この実施例においては、音声認識のための特徴
パラメータとしてフイルタバンク方式（周波数ス
ペクトル方式）を採用した音声認識装置を組込ん
だテレビジヨン受信機の音声による制御装置を例
に採つて説明するが、被制御量がアナログ量であ
れば、被制御装置を選ばない。 In this embodiment, a voice control device for a television receiver incorporating a voice recognition device that employs a filter bank method (frequency spectrum method) as a feature parameter for voice recognition will be taken as an example. If the controlled quantity is an analog quantity, the controlled device is not selected.

通常機器の前面に取りつけられる入力部１は有
指向性及び無指向性の２つのマイクロフオン１０
と１１の図示の如き差動接続と増巾器１２により
構成される。即ち有指向性マイクロフオン１０に
対し無指向性マイクロフオン１１は逆位相接続さ
れており、従つて指向特性範囲外からの音声信
号、即ち制御命令音声以外の信号は相殺され、指
向特性範囲内の制御命令音声のSN比はこれによ
つて高められる。その際、TV受像機のスピーカ
から流れる音声中の指令類似語による誤動作を防
止するために、重要語（最頻度語）「デンゲン」
或は「パワー」、「チヤンネル」、「オンリヨー」又
は「ボリユーム」等の指令（注、これらの用語に
ついては、識別のための許容値を大きく取つてあ
る）が識別された際には、第３図に別途要部回路
図として開示せる如きいわゆる初期ミユーテイン
グ回路を設けておき、一時的にスピーカ出力を断
つか若しくは大巾に減衰せしめる。この点につい
ては後に詳述する。振巾正規化機能を併せ特徴抽
出部２は、複数個のフイルタ１３―１，１３―
２，…，１３―Ｎ及び入力信号の全振巾を検知す
るレベル検出回路１４、各フイルタの出力をデイ
ジタル信号に変換するＡ―Ｄ（アナログ―デジタ
ル）変換器１５、該Ａ―Ｄ変換器に前置され前記
各フイルタ出力と前記レベル検出回路１４出力と
の比をとることにより、フイルタ出力振巾を正規
化するアナログ割算器等によつて構成された振巾
正規化回路１６、更に該振巾正規化回路と前記フ
イルタ群との間に挿入され、該フイルタの各出力
の接続を切り替えるマルチプレキサ１７によつて
構成される。斯る構成により前記入力部１から入
力した音声信号の各フイルタ成分が適当な時間間
隔（多くの場合10ミリ秒前後）で順次サンプリン
グ標本化され、更に各サンプリング値を量子化す
ることによつてデジタルコードに変換された後、
マイクロコンピユータ若しくは中央処理装置
（CPU）１８のＩ／Ｏポート（図示せず）を経
て、記憶メモリー１９（通常RAM：ランダムア
クセスメモリ）に記憶される。 The input section 1 , which is usually attached to the front of the device, has two microphones 10, one directional and one omnidirectional.
and 11 as shown in the figure, and an amplifier 12. That is, the omnidirectional microphone 11 is connected in opposite phase to the directional microphone 10, and therefore, audio signals from outside the directional characteristic range, that is, signals other than control command voices, are canceled out, and signals within the directional characteristic range are canceled out. The signal-to-noise ratio of the control command voice is thereby increased. At that time, in order to prevent malfunctions caused by similar command words in the audio played from the TV receiver's speakers, the important word (most frequent word) "Dengen" is used.
Alternatively, when commands such as "power,""channel,""only," or "volume" are identified (note: large tolerances have been set for these terms), the A so-called initial muting circuit is provided as shown in FIG. 3 as a circuit diagram of a main part, and the speaker output is temporarily cut off or greatly attenuated. This point will be explained in detail later. In addition to the amplitude normalization function, the feature extraction unit 2 includes a plurality of filters 13-1, 13-
2,..., 13-N and a level detection circuit 14 that detects the total amplitude of the input signal, an A-D (analog-digital) converter 15 that converts the output of each filter into a digital signal, and the A-D converter. an amplitude normalization circuit 16 configured of an analog divider or the like, which is arranged in front of the filter output amplitude and normalizes the filter output amplitude by taking the ratio between the output of each of the filters and the output of the level detection circuit 14; The multiplexer 17 is inserted between the amplitude normalization circuit and the filter group and switches the connection of each output of the filter. With such a configuration, each filter component of the audio signal input from the input section 1 is sequentially sampled at appropriate time intervals (often around 10 milliseconds), and further quantized by quantizing each sampling value. After being converted into digital code,
It is stored in a storage memory 19 (usually RAM: random access memory) via an I/O port (not shown) of a microcomputer or central processing unit (CPU) 18.

前記Ａ―Ｄ変換の過程において、標本化された
各量を量子化する際、一様量子化することもでき
るが、別途手動調整手段を設ける際には、段階的
にその調整器の制御指示量（例えばボリユームの
回転角）と制御レベルとの関係に合わせて非直線
的に量子化を計ることもできる。 In the process of A-D conversion, when each sampled quantity is quantized, it is possible to uniformly quantize it, but when a separate manual adjustment means is provided, control instructions for that adjuster are given in stages. Quantization can also be measured non-linearly depending on the relationship between the quantity (for example, the rotation angle of the volume) and the control level.

CPU（中央処理装置）１８には別の標準パタ
ーンメモリ３が接続されており、予め指令者の音
声指令（制御命令が、標本化され量子化された形
でその制御内容を指定するコードと共に記憶され
ている。制御命令音声（音声による指令）の標準
パターンメモリへの登録は、例えばテレビ受信機
の制御の場合には次の様に行なう。第４図はテレ
ビ受信機のコントロールパネルの１例であり、入
力マイク２０、登録モードスイツチ２１指令者を
選択する指令者（話者）番号指定スイツチ１、２
２―１同２、２２―２…、制御命令指定スイツチ
電源のオン、オフ切替、音量変更、チヤンネル切
替に各対応してそれぞれ「電源」指定スイツチ２
３「音量」指定スイツチ２４「チヤンネル」指定
スイツチ２５として、又音量及びチヤンネル指定
を行なう数字ボタン２６―１，２６―２，２６―
３…２６―１１，２６―１２が各対応する表示ラ
ンプ２７―１，２７―２，…２７―１２と共に配
設されている。又下部の「OK」表示ランプ２８
は認識又は登録が良好に完了したとき、
「REPEAT」表示ランプ２９は同じく不良であつ
たときそれぞれ点灯表示するものである。かかる
登録制御部３０を用いて標準パターンを登録する
には、まず登録スイツチ２１を押して登録モード
とし、次に話者番号を同指定スイツチ２２―１又
は２２―２…、にて指定した上で、以下順次「電
源」スイツチ２３を押して例えば「デンゲン」あ
るいは「パワー（POWER）」、「音量」スイツチ
２４を押して「オンリヨー」又は「ボリユーム」
と発声する「チヤンネル」スイツチ２５を押す
と、第２図登録制御回路８は、モード切替信号ａ
を出力し、チヤンネル押ボタンスイツチ回路３１
の出力を切替回路３２を経て前記登録制御回路側
に切替える。これにより前記スイツチ回路３１に
含まれる数値指定ボタン２６―１，２６―２…
（第４図）を押して「イチ」「ニ」…と発声する
と、それぞれの音声は入力部１、特徴抽出部２を
経て各制御内容（電源、音量チヤンネル１，２，
３，…）に対応するコードと共に標準パターンメ
モリー３に記憶される。 Another standard pattern memory 3 is connected to the CPU (Central Processing Unit) 18, and the voice commands (control commands) of the commandant are stored in advance in a sampled and quantized form together with a code specifying the control contents. The control command voice (voice command) is registered in the standard pattern memory in the case of controlling a television receiver, for example, as follows. Figure 4 shows an example of the control panel of a television receiver. , an input microphone 20, a registration mode switch 21, a dispatcher (speaker) number designation switch 1 and 2 for selecting the dispatcher.
2-1 2, 22-2... Control command designation switch "Power" designation switch 2 corresponding to power on/off switching, volume change, channel switching.
3 "Volume" designation switch 24 "Channel" designation switch 25 as well as numeric buttons 26-1, 26-2, 26- for specifying volume and channel
3...26-11, 26-12 are arranged together with respective corresponding indicator lamps 27-1, 27-2,...27-12. Also, the "OK" indicator lamp 28 at the bottom
upon successful completion of recognition or registration;
Similarly, the "REPEAT" indicator lamp 29 lights up to indicate when there is a defect. To register a standard pattern using the registration control unit 30 , first press the registration switch 21 to enter the registration mode, then specify the speaker number with the specification switch 22-1 or 22-2, and then press the registration switch 21 to enter the registration mode. , then press the "power" switch 23 in order to select, for example, "DENGE" or "POWER", and press the "volume" switch 24 to select "Only" or "Volume".
When the "channel" switch 25 is pressed, the registration control circuit 8 in FIG.
and outputs the channel pushbutton switch circuit 31.
The output is switched to the registration control circuit side via the switching circuit 32. As a result, the numerical designation buttons 26-1, 26-2 included in the switch circuit 31...
(Fig. 4) When you press ``1'', ``ni'', etc., each sound goes through the input section 1, the feature extraction section 2, and the control contents (power, volume channels 1, 2, etc.).
3,...) are stored in the standard pattern memory 3 together with the corresponding codes.

さて通常の認識モードでは、前述の制御音声が
入力し、特徴抽出フイルタ１３―１，１３―２…
１３―Ｎにより抽出されデジタル化された信号列
はRAM等の記憶メモリ１９に記憶され、次いで
CPU１８はこの記憶パターンと標準パターンと
の差を、全ての標準パターンについて計算しその
差が最も小さい標準パターンを決定することによ
り入力音声を特定する。一般に人間の話声は同じ
言語で発声してもその時間推移は常に同等とは限
らない為、第１図に示す如き何らかの時間軸調整
回路が付加されなければならないことは衆知の通
りである。第２図に於ては説明の都合上かかる時
間軸調整回路は省略している。 Now, in the normal recognition mode, the aforementioned control voice is input, and the feature extraction filters 13-1, 13-2...
The signal string extracted and digitized by 13-N is stored in a storage memory 19 such as RAM, and then
The CPU 18 specifies the input voice by calculating the difference between this stored pattern and the standard pattern for all standard patterns and determining the standard pattern with the smallest difference. In general, human speech is not always the same over time even when uttered in the same language, so it is well known that some kind of time axis adjustment circuit as shown in FIG. 1 must be added. In FIG. 2, the time axis adjustment circuit is omitted for convenience of explanation.

認識モードに於ける音声の取り込みは常時行な
われており、入力音声が途切れたとき即ちポーズ
期間に前述の認識計算が実行されそれ以前の入力
音声、パターンマツチング法により特定される。
この時入力音声について特定が可能となつた時、
即ち入力音声が何らかの標準パターンに許容され
得る誤差の範囲内で一致した時、CPU１８は出
力制御回路３３に対し、テレビ受信機の各該当制
御要素を制御すべく指示出力する。例えば「デン
ゲン・イリ（キリ）」という入力音声を認識した
とき出力制御回路３３はテレビ受信機の電源回路
３４をON―OFF制御する。又、「チヤンネル・
※※」（※※は１〜12までの数字）という入力音
声を認識したとき、出力制御回路３３はチヤンネ
ル切替回路３５に出力し、これによりチユーナ３
６を切替制御する。 In the recognition mode, voice is always captured, and when the input voice is interrupted, that is, during a pause period, the above-mentioned recognition calculation is executed, and the previous input voice is identified by pattern matching.
At this time, when it became possible to identify the input audio,
That is, when the input audio matches some standard pattern within an allowable error range, the CPU 18 outputs an instruction to the output control circuit 33 to control each corresponding control element of the television receiver. For example, when the input voice "Dengen Iri (Kiri)" is recognized, the output control circuit 33 controls the power supply circuit 34 of the television receiver to turn on and off. Also, “Channel・
※※'' (※※ is a number from 1 to 12) When the input voice is recognized, the output control circuit 33 outputs it to the channel switching circuit 35, and thereby the tuner 3
6 is switched and controlled.

ところで、音量の調整の為に本発明装置の被制
御音声機器はあらかじめ音量を数値表示する可視
的な表示器を備えていなければならない。連続可
変される音量を段階的数値表示（即ち、デジタル
表示）に替えるに適当なステツプ数は10前後であ
ろう。具体的な表示器としては、例えば７セグメ
ントの数値表示素子、あるいはステツプ数と同数
の発光素子（LEDなど）の一次元的配列などが
考えられる。ここではテレビ受信機の場合を例と
し、第４図に図示したチヤンネル表示ランプ２７
―１，２７―２…２７―１２を音量表示に兼用す
る方法を採用している。 By the way, in order to adjust the volume, the controlled audio device of the apparatus of the present invention must be equipped in advance with a visible display that numerically displays the volume. The appropriate number of steps to replace the continuously variable volume with a stepwise numerical display (ie, digital display) would be around 10. As a concrete display device, for example, a 7-segment numerical display element, or a one-dimensional array of light emitting elements (LEDs, etc.) of the same number as the number of steps can be considered. Here, the case of a television receiver is taken as an example, and the channel display lamp 27 shown in FIG.
-1, 27-2...27-12 is also used as a volume display.

図に於ける数字表示LED２７―１，２７―
２，…，２７―１２は第４図中にチヤンネル表示
ランプとして図示されており通常チヤンネル切替
回路３５の出力に応じANDゲート３７―１，３
７―２，又は３７―１２、ORゲート３８―１，
３８―２…、又は３８―１２及び抵抗３９―１，
３９―２，…、又は３９―１２を経ていずれか１
つのLEDが点灯、チヤンネル表示を行なつてい
る。即ち出力制御回路の音量レベル表示コントロ
ール信号ｂ（詳細後述）は通常デジタル“０”レ
ベルであり、従がつて各ANDゲート３７―１，
３７―２…３７―１２を制御する。インバータ４
０の出力はデジタル“１”レベルである。次に前
述の如く音声認識装置が例えば、「音量」に相当
する制御命令音声信号を認識した時、出力制御回
路３３は音量レベル表示コントロール信号ｂを一
定時間（数秒間）デジタル“１”レベル（今後Ｈ
レベルと略す）とし、ゲート３７―１，３７―２
…、及び３７―１２を閉じると同時にANDゲー
ト４１―１，４１―２及び４１―１２を開いて、
ラツチ回路であるＤ―FF（Ｄ型フリツプフロツ
プ回路）４２―１，４２―２…，４２―４のＱ出
力のＢ―Ｄ（２進―10進）変換回路４３出力にも
とづき前記数字表示LED２７―１、又は２７―
２，…、又は２７―１２のいずれかを点灯させ
る。後述するようにラツチ回路４２―１，４２―
２，…４２―４は、その時点での音量レベルを２
進表示で保持している。従つてLED２７―１，
２７―２，…又は２７―１２はこの時点灯個数で
音量レベルを数値表示することになるもちろん７
セグメント表示を行つてもよいことは言を俟たな
い。引き続き、音声認識装置が、音量レベルを１
〜２までの段階の命令音声を認識したとき、出力
制御回路３３は通常デジタル“０”レベル（以後
Ｌレベルと略す）である音量レベルラツチコント
ロール信号Ｃを短時間（Ｄ―FFがラツチ動作す
るのに充分な時間）Ｈレベルとし、ORゲート４
４―１，４４―２…，４４―４を経て前述のＤ―
FF４２―１，４４―２…，４２―４をクロツク
する。同時に、出力制御回路３３は、CPU１８
にて比較認識した音量レベルを２進コードで、音
量レベル信号d₁，d₂，…，d₄として出力し、前述
の音量レベルラツチコントロール信号により開く
ANDゲート４５―１、４５―２，…，４５―
４、及び、ORゲート４６―１，４６―２，…，
４６―４及びラツチ回路４２或は５０がクロツク
の立下りをＤ入力としてラツチすることを確実に
するためのインバータ４７―１，４７―２，…４
７―４，４８―１，４８―２，…，４８―４を経
てＤ―FF４２―１，４２―２，…又は４２―４
にＤ入力せしめる。音量レベル信号d₁，d₂，…，
d₄の出力状態を保持したラツチ回路（Ｄ―FF）
４２―１，４２―２，…又は４２―４のＱ出力に
よつて、前述の如く音声認識した音量レベルが数
字表示LED２７―１，２７―２，…又は２７―
１２により点灯表示される。音量レベルラツチコ
ントロール信号Ｃは、同様にORゲート４９を経
て状態保持回路であるＤ―FF５０にクロツク入
力するが、この時ORゲート５１の出力がＬレベ
ルであれば、ワンシヨツト回路５２の出力もＬレ
ベルであるのでインバータ４７―５と４８―５を
経たＤ入力端子はＬレベルであり、従がつて該Ｄ
―FFのＱ出力は０、（＝１）となる。この結果
アナログスイツチ５３が閉じ、Ｄ―FF５０の
出力が１なのでANDゲート５４―１，５４―
２，…５４―１２が前記２進変換回路４３の出力
である音量レベルに応じてアナログスイツチ５５
―１，５５―２，…、又は５５―１２のいずれか
が開く。一方、テレビ受信機の音声FM復調回路
５６の出力は初段の音声増巾回路５７、可変抵抗
器５８を経てその可変端子出力が更に音量調整ボ
リユーム５９又は抵抗器６０―１，６０―２，…
６０―１１より成る分圧回路（或は減衰回路若し
くは可変利得回路でも可）、アナログスイツチ５
３又はアナログスイツチ５５―１，５５―２，…
５５―１２のいずれかを経由し、音声出力増巾器
６１より増巾せられスピーカ等音声出力器６２よ
り音声出力するよう構成されているので、前述の
アナログスイツチ５５―１，５５―２…，５５―
１２のいずれかが開く場合には、それより抵抗器
６０―１，６０―２，…，６０―１１により分圧
された適当な音量レベルが決定される。音量レベ
ルコントロール信号ｂは数秒後（指令者が制御完
了を確認できる時間経過後）Ｌレベルに復帰し、
従がつて数字表示LED２７―１，２７―２，
…，２７―１２は音量レベル表示よりチヤンネル
表示に復帰するが、音量レベルはラツチ回路４２
―１，４２―２，…４２―４の出力状態に対応す
るレベルを維持する。前記可変抵抗器５８は音量
の可変範囲を決めるものである。次に、音量調整
ボリユーム５９が手動で操作された時には、これ
と連動し、一定電圧電源Ｖ_Rとアース間に接続さ
れた可変抵抗器６３の出力変化を抵抗６４、コン
デンサ６６及び該抵抗器の両端が図示の如く両入
力端子に接続されたコンパレータ６６により検出
し、続くマニユアル操作検出回路６７が、マニユ
アル操作信号ｅを、ボリユーム調整操作が続いて
いる間中Ｈレベルで出力する。該信号の立上りに
より、ORゲート５１を介してワンシヨツト回路
５２が動作しパルスを発生するが、これを前述の
ラツチ回路５０が保持し、Ｑ＝１（＝０）とな
りアナログスイツチ５３を開いて音量調整ボリユ
ーム５９で決まる音量レベルにて音声出力増巾器
６１はスピーカ６２を駆動する。Ｄ―FF５０の
＝０出力によりANDゲート５４―１，５４―
２，…５４―１２は全て閉じ、従つてアナログス
イツチ５５―１，５５―２，…，５５―１２が全
べて閉じることは明らかであろう。前記マニユア
ル操作信号ｅにより、図示の通りANDゲート６
８―１，６８―２…６８―４が開き定電圧Ｖ_Rに
バイアスされたＡ―Ｄ変換器６９の出力である音
量レベル（２進コード）を該ｅ信号の立下りで前
記ラツチ回路４２―１，４２―２，…，４２―４
により記憶保持する。これにより、音量調整ボリ
ユームで調整決定された音量レベルがデジタル化
されて該ラツチ回路に保持記憶される。前記OR
ゲート５１の入力の一方は、前記テレビ受信機の
電源回路３４が出力する電源投入信号ｆに接続さ
れており、電源投入後一定時間発生するパルス信
号ｆにより同様な音量レベルのラツチ保持とアナ
ログスイツチ５３による音声増巾回路の利得の決
定が行なわれる。 Number display LED27-1, 27- in the figure
2, . . . , 27-12 are shown as channel indicator lamps in FIG.
7-2, or 37-12, OR gate 38-1,
38-2..., or 38-12 and resistor 39-1,
39-2,... or any 1 after 39-12
Two LEDs are lit, indicating the channel. That is, the volume level display control signal b (details will be described later) of the output control circuit is normally at a digital "0" level, and therefore each AND gate 37-1,
37-2...Controls 37-12. Inverter 4
The output of 0 is a digital "1" level. Next, as described above, when the voice recognition device recognizes a control command voice signal corresponding to, for example, "volume", the output control circuit 33 outputs the volume level display control signal b for a certain period of time (several seconds) to the digital "1" level ( In the future H
Gates 37-1, 37-2
..., and at the same time as closing 37-12, open AND gates 41-1, 41-2 and 41-12,
Based on the output of the B-D (binary-decimal) conversion circuit 43 of the Q output of the D-FF (D flip-flop circuit) 42-1, 42-2..., 42-4, which is a latch circuit, the numerical display LED 27- 1 or 27-
2,... or 27-12. As described later, the latch circuits 42-1, 42-
2,...42-4 sets the current volume level to 2.
It is maintained in decimal format. Therefore, LED27-1,
Of course, 27-2,... or 27-12 will numerically display the volume level by the number of lights lit.
Needless to say, segment display may be used. The voice recognition device continues to lower the volume level to 1.
When recognizing the command voice in stages 2 to 2, the output control circuit 33 outputs the volume level latch control signal C, which is normally a digital "0" level (hereinafter abbreviated as L level), for a short period of time (D-FF latches). (sufficient time for
After 4-1, 44-2..., 44-4, the above D-
Clock FF42-1, 44-2..., 42-4. At the same time, the output control circuit 33
The volume level compared and recognized is output as a binary code as volume level signals d ₁ , d ₂ ,..., d ₄ and opened by the volume level latch control signal mentioned above.
AND gates 45-1, 45-2,..., 45-
4, and OR gates 46-1, 46-2,...,
46-4 and inverters 47-1, 47-2, . . . 4 to ensure that the latch circuit 42 or 50 latches the falling edge of the clock as a D input.
7-4, 48-1, 48-2,..., 48-4 then D-FF42-1, 42-2,... or 42-4
Enter the D input. Volume level signal d ₁ , d ₂ ,...,
Latch circuit (D-FF) that maintains the output state of _d4
By the Q output of 42-1, 42-2, ... or 42-4, the volume level of the voice recognition as described above is displayed numerically on LED 27-1, 27-2, ... or 27-.
12 is lit and displayed. Similarly, the volume level latch control signal C is clock inputted to the D-FF 50, which is a state holding circuit, via the OR gate 49. At this time, if the output of the OR gate 51 is at the L level, the output of the one shot circuit 52 is also at the L level. level, the D input terminal passing through inverters 47-5 and 48-5 is at L level, and therefore the D input terminal is at L level.
-The Q output of FF is 0 (=1). As a result, the analog switch 53 is closed and the output of the D-FF50 is 1, so the AND gates 54-1, 54-
2, . . . 54-12 is the output of the binary conversion circuit 43. Depending on the volume level, the analog switch 55 is
-1, 55-2,... or 55-12 opens. On the other hand, the output of the audio FM demodulation circuit 56 of the television receiver passes through the first-stage audio amplification circuit 57, the variable resistor 58, and the variable terminal output is further controlled by the volume adjustment volume 59 or resistors 60-1, 60-2, . . .
Voltage divider circuit (or attenuation circuit or variable gain circuit is also possible) consisting of 60-11, analog switch 5
3 or analog switch 55-1, 55-2,...
55-12, and is amplified by an audio output amplifier 61 and outputted from an audio output device 62 such as a speaker, the analog switches 55-1, 55-2, etc. ,55-
12, an appropriate volume level divided by resistors 60-1, 60-2, . . . , 60-11 is determined. The volume level control signal b returns to the L level after a few seconds (after a period of time during which the commander can confirm the completion of control),
Accordingly, numerical display LED27-1, 27-2,
..., 27-12 returns to the channel display from the volume level display, but the volume level is determined by the latch circuit 42.
The levels corresponding to the output states of -1, 42-2, . . . 42-4 are maintained. The variable resistor 58 determines the variable range of the volume. Next, when the volume adjustment volume 59 is manually operated, the change in the output of the variable resistor 63 connected between the constant voltage power supply V _R and the ground is controlled by the resistor 64, the capacitor 66, and the resistor. This is detected by a comparator 66 whose both ends are connected to both input terminals as shown, and a subsequent manual operation detection circuit 67 outputs a manual operation signal e at H level while the volume adjustment operation continues. When the signal rises, the one-shot circuit 52 operates via the OR gate 51 and generates a pulse, which is held by the latch circuit 50 described above and becomes Q=1 (=0), opening the analog switch 53 and adjusting the volume. The audio output amplifier 61 drives the speaker 62 at the volume level determined by the adjustment volume 59. AND gates 54-1, 54- due to =0 output of D-FF50
It will be clear that analog switches 55-1, 55-2, . . . , 55-12 are all closed. According to the manual operation signal e, the AND gate 6 is activated as shown in the figure.
8-1, 68-2...68-4 are opened and the volume level (binary code) which is the output of the A- _D converter 69 biased to a constant voltage VR is set to the latch circuit 42 at the falling edge of the e signal. -1,42-2,...,42-4
It is stored in memory. As a result, the volume level adjusted and determined by the volume adjustment volume is digitized and held and stored in the latch circuit. Said OR
One of the inputs of the gate 51 is connected to the power-on signal f output by the power supply circuit 34 of the television receiver, and a pulse signal f generated for a certain period of time after the power is turned on is used to maintain a latch at the same volume level and turn on the analog switch. 53 determines the gain of the audio amplification circuit.

次に、前述の初期ミユーテイング回路の詳細に
ついて第３図を参照しつつ説明する。なお、第３
図において、第２図と共通部分には同じ符号（図
番）を付し、その説明を省略する。 Next, details of the above-mentioned initial muting circuit will be explained with reference to FIG. In addition, the third
In the figure, parts common to those in FIG. 2 are given the same reference numerals (figure numbers), and their explanations will be omitted.

上述の如く、音声指令認識モードにおける音声
信号の取り込みは、指令継続中常時行なわれてお
り入力音声が途切れたとき、即ち一定の指令単位
の間隔（ポーズ期間）にCPU１３によつて認識
計算が実行され、それまでの入力音声指令がパタ
ーンマツチング法によつて特定される。 As mentioned above, in the voice command recognition mode, the voice signal is constantly captured while the command continues, and the recognition calculation is executed by the CPU 13 when the input voice is interrupted, that is, at intervals of a certain command unit (pause period). The input voice commands up to that point are identified by the pattern matching method.

上述の如く、被制御機器であるTV受信機のス
ピーカ出力その他指令者以外の発する類似音によ
る誤動作をさけるために、最頻度指令語について
は、多少パターンマツチングの比較許容度を大き
くとる。 As mentioned above, in order to avoid malfunctions due to similar sounds emitted by persons other than the commanding person, such as the speaker output of the TV receiver that is the controlled device, the comparative tolerance of pattern matching is set somewhat large for the most frequently used command word.

この時、入力音声について特定が可能となつた
時、即ち入力音声が何らかの標準パターンに許容
され得る誤差の範囲内で一致した時、CPU２０
は出力制御回路３３を制御してテレビ受信機の音
声出力を一定時間ミユーテイングさせる。第３図
の場合、出力制御回路３３はテレビ受信機の音声
復調増巾回路５６の出力増巾トランジスタ５７の
バイアスを落すことにより、該トランジスタのコ
レクタよりコンデンサ５８を介して接続されたス
ピーカ５９の音声出力を停止させる。尚、前記音
声回路５６の出力側に接続されたイヤホーン６０
回路に対してはミユーテイングは不必要である。
通常制御命令言語は例えば「デンゲン」・「イ
リ」、「デンゲン」・「キリ」、「チヤンネル」・「イ
チ」、「チヤンネル」・「ニ」のように複数の単語の
連続により構成されているので、例えば「チヤン
ネル」という入力音声を認識したときスピーカ音
声はミユーテイングされ以後の「イチ」又は
「ニ」の音声入力はテレビ受信機が発生する音が
無くなるためSN比はきわめて向上し、認識率は
きわめて向上する。電源の入・切チヤンネル変
更、音量変更いずれの制御内容からも、かかる音
声のミユーテイングは機能上の欠点とはならな
い。尚第３図の具体例では音量の変更は、「オン
リヨー」・「サン」（音量３）などのようにチヤン
ネル同様12段階指定すること（従つてチヤンネル
表示装置を一時的に音量表示装置として併用する
こと）も可能である。入力音声認識の結果にもと
づき出力制御回路３３はテレビ受信機の電源回路
３４、チヤンネル切替回路３５又は音声回路５６
にそれぞれ制御出力することは言うまでも無い。 At this time, when the input voice can be specified, that is, when the input voice matches some standard pattern within an allowable error range, the CPU 20
controls the output control circuit 33 to mute the audio output of the television receiver for a certain period of time. In the case of FIG. 3, the output control circuit 33 lowers the bias of the output amplification transistor 57 of the audio demodulation amplification circuit 56 of the television receiver, thereby increasing the output power of the speaker 59 connected from the collector of the transistor via the capacitor 58. Stop audio output. Note that an earphone 60 connected to the output side of the audio circuit 56
Muting is not necessary for the circuit.
Normally, control command language is made up of a series of multiple words, such as "dengen"/"iri", "dengen"/"kiri", "channel"/"ichi", "channel"/"ni", etc. Therefore, for example, when the input voice ``channel'' is recognized, the speaker voice is muted, and the subsequent voice input of ``1'' or ``2'' eliminates the sound generated by the TV receiver, so the SN ratio is extremely improved and the recognition rate is is greatly improved. This muting of audio is not a functional drawback, regardless of the control contents, such as changing the power on/off channel or changing the volume. In the specific example shown in Figure 3, the volume can be changed by specifying 12 levels like the channel, such as "only" and "sun" (volume 3) (therefore, the channel display device can also be temporarily used as a volume display device). ) is also possible. Based on the result of input voice recognition, the output control circuit 33 controls the power supply circuit 34, channel switching circuit 35, or audio circuit 56 of the television receiver.
Needless to say, control outputs are provided for each.

第３図の実施例を併用すれば音声認識装置を備
えたテレビ受信機等音声機器の音声出力回路に該
音声認識装置の出力制御回路が作用して、入力音
声を感知した時音声機器の出力音声を適当なレベ
ルにまで減衰させることにより以後の入力音声の
SN比を増大させるものであり、従つてこの種音
声認識装置の認識率を向上せしめるにきわめて有
効である。 When the embodiment shown in FIG. 3 is used in combination, the output control circuit of the voice recognition device acts on the voice output circuit of the voice device such as a television receiver equipped with the voice recognition device, and when the input voice is detected, the output of the voice device is outputted. By attenuating the audio to an appropriate level, subsequent input audio can be improved.
This increases the SN ratio and is therefore extremely effective in improving the recognition rate of this type of speech recognition device.

本発明に依れば、音声指令によつてアナログ量
の段階的制御が可能となり、その表示も可視的に
デジタル的表示が可能となる。また、手動制御に
切換えた場合であつても、自動的に応対した段階
的制御及び表示が可能となるものである。 According to the present invention, analog quantities can be controlled in stages by voice commands, and the display thereof can also be visually displayed digitally. Furthermore, even when switching to manual control, automatic step-by-step control and display is possible.

[Brief explanation of the drawing]

第１図は、音声認識装置の要部ブロツク図、第
２図は本発明の要部実施回路例、第３図は初期ミ
ユーテイング回路の一実施例、第４図は被制御機
器の操作パネルの正面図を表わす。１…入力部、２…特徴抽出部、３…標準パター
ンメモリ、１８…CPU、３７，４１，４５，５
４，６８…ANDゲート、４４，４６…ORゲー
ト、４２，５０…ラツチ回路、５３…アナログス
イツチ、６０…分圧抵抗、４３，６９…Ａ―Ｄ変
換回路。 Fig. 1 is a block diagram of the main part of the speech recognition device, Fig. 2 is an example of the main part implementation circuit of the present invention, Fig. 3 is an example of the initial mutating circuit, and Fig. 4 is the operation panel of the controlled device. Represents a front view. 1 ...Input section, 2 ...Feature extraction section, 3 ...Standard pattern memory, 18...CPU, 37, 41, 45, 5
4,68...AND gate, 44,46...OR gate, 42,50...latch circuit, 53...analog switch, 60...voltage dividing resistor, 43,69...A-D conversion circuit.

Claims

[Claims]

1. An input unit that inputs a channel command voice and a volume command voice that commands the volume step by step, and converts the input voice into an electrical signal, a feature extraction unit that extracts the characteristics of the electrical signal, and a standard pattern storage unit that stores voice characteristics as a standard pattern; a recognition processing unit that compares the characteristic pattern extracted by the feature extraction unit with the standard pattern and identifies the channel command voice and the volume command voice; Based on the recognition result of the recognition processing unit,
In a control device comprising a channel and an output control section for controlling volume, the binary value output for volume control of the output control section is decimated and the volume is controlled stepwise by the decimal evolution signal, and the volume is 1. A voice control device for a television receiver or the like, characterized in that an input circuit of a channel display means is switched according to a recognized output of a command voice, and a volume is displayed using the channel display means according to the decimal evolution signal.