JPS60146399A

JPS60146399A - Voice remote controller

Info

Publication number: JPS60146399A
Application number: JP59002843A
Authority: JP
Inventors: 藤恵　英樹; 明寿山田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1984-01-11
Filing date: 1984-01-11
Publication date: 1985-08-02

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は、電話回線を利用し、留守番電話のテープレコ
ーダや家庭用冷暖房機制御、あるいは最近マンション等
にて使用され家庭内の各場所に設置したセンサーで防犯
防災を集中管理する機器等に接続でき、防災等の現状を
電話器を介して遠隔地の電話器にて監視するモニター装
置にも利用出来る音声遠隔制御装置に関するものである
。[Detailed Description of the Invention] Industrial Field of Application The present invention utilizes telephone lines to control tape recorders for answering machines, home air conditioners, and has recently been used in condominiums and installed in various locations in the home. The present invention relates to a voice remote control device that can be connected to devices that centrally manage crime prevention and disaster prevention using sensors, and can also be used as a monitoring device that monitors the current state of disaster prevention through a telephone at a remote location.

従来例の構成とその問題点従来、上記のような遠隔制御装置は、例えば実開昭６１
−４１９１５号や実開昭６１−１０１７１６号に記述さ
れているように、特有の発振器を持って電話器のハンド
セットの送話器から発振音を送出し制御するものや、特
開昭５１−４０８６号に記述されているように、音声認
識装置を用いて特定話者の音声により留守番電話等の所
有者１３・声を判別し、留守番、電話等のテープレコー
ダを制御する構成が主であった。Configuration of conventional example and its problems Conventionally, the above-mentioned remote control device was developed, for example, in
As described in Japanese Patent Laid-open No. 41915 and Japanese Utility Model Application No. 61-101716, there are devices that have a unique oscillator and control the transmission of oscillating sound from the transmitter of a telephone handset. As described in the issue, the main structure was to use a voice recognition device to determine the voice of the owner of an answering machine, etc. based on the voice of a specific speaker, and control the tape recorder of the answering machine, telephone, etc. .

しかしながら、上記の従来の構成では特有の発振器を利
用する場合は常にこの発振器を持っていなければならず
、さらに、音声認識装置を用いる場合でも、現在の音声
認識技術が特定話者の登録においても登録の音声と認識
時の音声とが、電話器の、どご５ンドセツトや電話回線
を介したことによる各種の周波数特性の変動やレベルの
変動を受け、必ずしも実用化されるまでには至っていな
かった。また音声を認識させる場合は、認識対象の語　
数を少なくしなければならず、さらに話者の判別が出来
なくなって所有者以外の者が使用したにもがかわらず、
テープレコーダ等が容易に動作したり、テープレコーダ
に記録されている音声の漏話により誤動作を生じていた
。However, in the above conventional configuration, if a specific oscillator is used, this oscillator must always be present, and even if a speech recognition device is used, current speech recognition technology is difficult to register a specific speaker. The voice for registration and the voice for recognition are subject to various frequency characteristics and level fluctuations due to the telephone set and the telephone line, so it has not necessarily been possible to put it into practical use. There wasn't. Also, when recognizing speech, use the word to be recognized.
Despite the fact that the number had to be reduced, and it became impossible to identify the speaker, people other than the owner used it.
Tape recorders and the like did not operate easily, or malfunctions occurred due to crosstalk in the audio recorded on the tape recorder.

以下これらの誤動作の原因を図面を参照しながら詳細に
説明する。The causes of these malfunctions will be explained in detail below with reference to the drawings.

第１図は従来の音声認識装置のブロック図である。第１
図において、１は音声信号を入力する入力端子、２は音
声分析手段、３はデータ正規化手段、４は音声登録モー
ドと音声認識子′−ドとを切換える切換手段（第１図で
は音声登録モードになっている。）６は標準音声パター
ン記録手段、６は音声パターン整合手段、７は判定手段
、８は機器制御信号発生手段である。FIG. 1 is a block diagram of a conventional speech recognition device. 1st
In the figure, 1 is an input terminal for inputting a voice signal, 2 is a voice analysis means, 3 is a data normalization means, and 4 is a switching means for switching between a voice registration mode and a voice recognizer mode (in FIG. 6 is a standard audio pattern recording means, 6 is an audio pattern matching means, 7 is a determining means, and 8 is an equipment control signal generating means.

以上のように構成された従来の装置について以下その動
作を説明する。まず切換手段４を第１図に示すように音
声登録モードにし、入力端子１より入力した標準入力音
声を音声分析手段２により音声の有無を判定し単位時間
内におけ段３によりフレーム内データを数ビットに圧縮
する。この処理を音声区間中実施し、時系列。The operation of the conventional device configured as described above will be described below. First, the switching means 4 is set to the voice registration mode as shown in FIG. Compress to a few bits. This process is performed during the audio section, and the time series.

周波数特性（以下音声パターンと言う）を、標準音声パ
ターンとして標準音声パターン記録手段に入力する。音
声認識対象単語全ての音声パターンを標準音声パターン
記録手段５に入力した後、音声認識モードに切り換える
。７Ｃ，シて未知入力音声に対し上記処理を施した後、
音声パターンを抽出し、標準音声パターン記録装置５内
データと、音声パターン整合手段６により整合を行ない
、類似音声パターン全抽出し、類似度を判定手段７によ
り有効性を判定し、高類似度時に機器制御信号発生手段
８を作動させる。The frequency characteristics (hereinafter referred to as audio pattern) are inputted as a standard audio pattern into the standard audio pattern recording means. After inputting the speech patterns of all words to be speech recognized into the standard speech pattern recording means 5, the mode is switched to speech recognition mode. 7C, After performing the above processing on the unknown input audio,
A speech pattern is extracted and matched with the data in the standard speech pattern recording device 5 by the speech pattern matching means 6, all similar speech patterns are extracted, and the effectiveness is judged by the similarity determining means 7. When the similarity is high, The device control signal generating means 8 is activated.

しかしながら、上記従来の音声認識装置を電話回線に活
用する場合は以下の欠点を生じていた。However, when the above-mentioned conventional voice recognition device is used in a telephone line, the following drawbacks occur.

（リ　電話回線は線路損失、交換機損失を含めると回線
結合により最大３０ｄＢ　のレベル変動が生じ、音声分
析手段２での音声区間検出が正確に行なわれない。(Li) In the telephone line, including line loss and exchange loss, level fluctuations of up to 30 dB occur due to line coupling, and the voice analysis means 2 cannot accurately detect voice sections.

（２）＠路損失、交換機損失等により、音声通過帯域が
変化しデータ正規化手段３で生成される音声パターンが
変形して類似度が低下する。(2) Due to path loss, exchange loss, etc., the voice passband changes, the voice pattern generated by the data normalization means 3 is deformed, and the degree of similarity decreases.

（３）電話回線は公共な回線なので、電話番号を知ると
誰でも電話で機器制御が可能となり、安全性、Ｊａ密保
持等のため話者を識別する機能を必要としなければなら
ない。(3) Since the telephone line is a public line, anyone who knows the telephone number can control the device over the phone, and for safety and security purposes, a function must be provided to identify the speaker.

発明の目的本発明は電話回線等の通信回線の伝達特性を音声の特性
を利用して補正を行ない、また機密保持には音声パスワ
ードによる階層構造処理を適用して電話回線による音声
認識を行ない、周波数特性の変動やレベルの変動にかか
わらす、轡器の音声による遠隔制御を行なうことができ
る音声遠隔制御装置を提供することを目的とする。Purpose of the Invention The present invention corrects the transmission characteristics of a communication line such as a telephone line using voice characteristics, and performs voice recognition over the telephone line by applying hierarchical structure processing using voice passwords to maintain confidentiality. It is an object of the present invention to provide a voice remote control device that can perform voice remote control of a bicycle regardless of frequency characteristic fluctuations and level fluctuations.

発明の構成上記目的を達成するため本発明の音声遠隔制御装置は、
入力された音声帯域内信号を分析する音声分析手段と、
伝達特性差抽出手段によりめた補正量を入力とする伝達
特性補正手段と音声分析手段によりめたデータを正規化
するデータ正規化手段、登録時にデータを登録する標準
音声パターン整合手段、認識時に入力音声パターンと前
記標準音声パターンメモリの標準音声パターンとの照合
を行なうパターン整合手段、前記パターン整合手段の整
合出力により登録時と認識時の音声入力の類似度の大小
を判定手段を有する音声認識装置とをｆｆ１７え、音声
帯域内信号により、機器を制御する構成とな゛っており
、これにより電話回線等の通信回線の伝達特性の補正を
行なうことができる。Structure of the Invention In order to achieve the above object, the voice remote control device of the present invention has the following features:
a voice analysis means for analyzing the input voice in-band signal;
A transfer characteristic correction means which inputs the correction amount determined by the transfer characteristic difference extraction means, a data normalization means which normalizes the data determined by the voice analysis means, a standard voice pattern matching means which registers data at the time of registration, and an input at the time of recognition. A speech recognition device comprising: a pattern matching means for matching a speech pattern with a standard speech pattern in the standard speech pattern memory; and a means for determining the degree of similarity between speech inputs at the time of registration and during recognition based on the matching output of the pattern matching means. The device is configured to be controlled by an audio band signal using ff17, thereby making it possible to correct the transfer characteristics of a communication line such as a telephone line.

実施例の説明第２図は本発明の一実施例の「声遠隔制御装置のブロッ
ク図である。DESCRIPTION OF EMBODIMENTS FIG. 2 is a block diagram of a voice remote control device according to an embodiment of the present invention.

第２図において、９は複数個のバンドパスフィルターや
、オオルシュ、アダマール、フーリエ等の直交変換によ
り、音声の振巾変化を周波数等のスペクトラム情報とし
て音声の特徴を分析する音声分析手段。１０は分析され
た音声の信号の時系列データを正規化し、各スペクトラ
ムの振巾情報を少ない変動におさえ、音声発生時の話者
の音の大きさの変動をおさえ、さらに低ビット数に圧縮
し音声パターンを記号別化するデータ正規化手段、１１
は登録時にデータ正規化手段１０によって正規化された
パスワードや、「早送り」、「巻もどし」等の単語音声
のパターン記号別や手短かガ発生音の記号列を登録して
おく標準音声パターンメモリ、１２は認識時に入力音声
パターンと登録時に標準音声パターンメモリー１１に記
録しである標準音声パターンとの照合を行なうパターン
整合手段。１３はパターン整合手段１２によって得られ
た結果から類似度の大小を判定する判定手段で、判定手
段１３は類似度の閾値を設定しておき、閾値よシ類似度
が大きい場合（すなわち、標準音声パターンと同一と見
なされる場合）テープレコーダや音声合成出力信号を制
御するための信号発生を行なう機器制御信号発生手段１
４に接続されており、異質の単語であ、った場合、機器
制御信号発生手段１４へは制御信号が出力されず機器の
制御は行なわれない。１６は音声登録時第１パスワード
等の音声帯域内信号の最大又は平均エネルギーをめるエ
ネルギー抽出手段、１６は音声分析手段９でめた複数の
分析チャンネル別に第１パスワードを時間軸方向に平均
を行ない、周波数特性をめる平均パワースペクトル抽出
手段、１７はエネルギー抽出手段１６、平均パワースペ
クトル抽出手段１６がら抽出した物理量を記録する記録
手段、１８は記録した内容と新規入力された物理量間の
差を演算する補正量演算手段、１９は補正量の大小で記
録手段１７内の物理量と、新規大刃物理量の類似度を判
定する判定手段、１９＆は判定手段１９の類似度が高い
場合、パターン整合手段１２と判定手段１３とを接続す
るスイッチ、１９ｂは第１パスワード入力時、判定手段
１９の類似度が高く第１パスワードが認識された場合、
音声分析手段９とエネルギー抽出手段１５等とを接続さ
れないように作動するスイッチ、２０は入力端子２１に
接続され、レベル調整アンプ及び帯域特性可変アンプを
有し、登録時の第１パスワードのエネルギー、周波数特
性との類似度が低い時、類似度を高くするため、それぞ
れの補正量を用いて伝達特性を変化させる伝達特性補正
手段である。２２は音声認識装置内で２種以上のパスワ
ードを有し、前の音声認識終了から一定時間内に次の音
声認識を行なわない場合前の音声認識結果を無効とする
時間処理手段である。In FIG. 2, reference numeral 9 denotes a voice analysis means that analyzes the characteristics of the voice using a plurality of band-pass filters and orthogonal transformations such as Ohrsch, Hadamard, and Fourier transforms, using changes in the amplitude of the voice as spectrum information such as frequency. 10 normalizes the time-series data of the analyzed audio signal, suppresses the amplitude information of each spectrum to a small amount of variation, suppresses the variation in the volume of the speaker's sound when speech is generated, and further compresses it to a lower number of bits. data normalization means for dividing speech patterns into symbols, 11
is a standard sound pattern memory in which the password normalized by the data normalization means 10 at the time of registration, and the symbol strings of word sound patterns such as "fast forward" and "rewind" and the symbol strings of short ka-ga sounds are registered. , 12 is a pattern matching means for comparing the input speech pattern at the time of recognition with the standard speech pattern recorded in the standard speech pattern memory 11 at the time of registration. Reference numeral 13 denotes a determining means for determining the degree of similarity based on the result obtained by the pattern matching means 12. The determining means 13 sets a threshold for the degree of similarity, and when the degree of similarity is greater than the threshold (i.e., the standard voice device control signal generating means 1 for generating signals for controlling a tape recorder or a voice synthesis output signal (if the pattern is considered to be the same as the pattern)
4, and if the word is a foreign word, no control signal is output to the device control signal generating means 14 and the device is not controlled. Reference numeral 16 denotes energy extraction means for calculating the maximum or average energy of a signal within the audio band such as the first password at the time of voice registration, and reference numeral 16 averages the first password in the time axis direction for each of the plurality of analysis channels determined by the voice analysis means 9. 17 is an energy extraction means 16, a recording means for recording the physical quantity extracted from the average power spectrum extraction means 16, and 18 is a difference between the recorded content and the newly input physical quantity. 19 is a determination means for determining the degree of similarity between the physical quantity in the recording means 17 and the new large blade physical quantity based on the magnitude of the correction amount; 19 & is a determination means for pattern matching when the degree of similarity in the determination means 19 is high; A switch 19b that connects the means 12 and the determining means 13 is a switch 19b that connects the determining means 12 and the determining means 13 when the first password is input and the determining means 19 recognizes the first password because the degree of similarity is high.
A switch 20 is connected to the input terminal 21 and operates to prevent the audio analysis means 9 from being connected to the energy extraction means 15, etc., and has a level adjustment amplifier and a variable band characteristic amplifier. This is a transfer characteristic correction means that changes the transfer characteristic using each correction amount in order to increase the similarity when the degree of similarity with the frequency characteristic is low. Reference numeral 22 denotes a time processing means which has two or more types of passwords in the speech recognition device and invalidates the previous speech recognition result if the next speech recognition is not performed within a certain period of time from the end of the previous speech recognition.

以上のように構成された本実施例について以下その動作
を説明する。The operation of this embodiment configured as above will be described below.

ます、あらかじめ音声登録時の第１パスワードのレベル
及び周波数特性をエネルギー抽出手段１６及び平均パワ
ースペクトル抽出手段１６により抽出し、記録手段１７
に記録しておく。First, the level and frequency characteristics of the first password at the time of voice registration are extracted in advance by the energy extraction means 16 and the average power spectrum extraction means 16, and the recording means 17
Record it in

次に音声認識時の第１パスワードで、そのレベル周波数
特性を平均パワースペクトル抽出手段１６、記録手段１
７により抽出し、そのデータを記録手段１７内データと
補正量演算手段１８で比較し補正量をめ、判定手段１９
により補正量の多少を判定し、多い場合には音声分析手
段９の前に設置された伝達特性補正手段２ｏに音声分析
手段９の分析データを入力し、上記補正量が少なくなる
まで、（判定手段１９の閾値以下になるまで）音声の入
力を要求し、同様な手順で伝達特性補正手段２０を制御
する。Next, using the first password during voice recognition, the level frequency characteristics are recorded by the average power spectrum extraction means 16 and the recording means 1.
7, the data is compared with the data in the recording means 17 by the correction amount calculation means 18 to determine the correction amount, and the judgment means 19
If the amount of correction is large, the analysis data of the voice analysis means 9 is input to the transfer characteristic correction means 2o installed in front of the voice analysis means 9. The transfer characteristic correction means 20 is controlled in the same manner.

また、この補正量が少ない場合、分析データをデータ正
規化手段１０へと入力する。Furthermore, if the amount of correction is small, the analysis data is input to the data normalization means 10.

次に第１パスワードが通過した場合、判定手段１３から
時間処理手段２２に出力信号が出力され、テープレコー
ダを停止させ、第２パスワードを待つ。また、一定の時
間内に第２番目以風のパスワードが入力した場合のみ、
以後の認識動作を開始するようにデータ正規化手段１０
に時間処理手段２２がタイマーをかける。もしこの時間
内に第２パスワード（又は第３パスワード）が認識され
なければ所有者以外の人の音声である確率が高いため、
再度第１パスワードを要求するメツセージを音声合成出
力手段等を用いて要求する。さらに第１次パスワードが
パスワード以外の雑音である可能性が高いため、テープ
レコーダの動作を開始し、オーナの留守メツセージを送
出し、メツセージ終了とともに外部からのメツセージの
録音を開始する。その際、音声認識のモードには入らな
いようにする。Next, if the first password passes, the determination means 13 outputs an output signal to the time processing means 22, the tape recorder is stopped, and the second password is waited for. Also, only if the second or subsequent password is entered within a certain period of time,
The data normalization means 10 starts the subsequent recognition operation.
The time processing means 22 sets a timer. If the second password (or third password) is not recognized within this time, there is a high probability that the voice is from someone other than the owner.
A message requesting the first password is sent again using a voice synthesis output means or the like. Furthermore, since there is a high possibility that the primary password is noise other than the password, the tape recorder starts operating, sends out the owner's absence message, and when the message ends, starts recording the message from the outside. At that time, do not enter voice recognition mode.

すなわち、オーナであればメツセージの聴取は必要なく
、捷た、メツセージ出力時間内にパスワードの認識は行
なえる。That is, if the owner is the owner, there is no need to listen to the message, and the password can be recognized within the time it takes to output the message.

さらに、こめ様なパスワードが蝮数個認識した後（オー
ナであることが確認された後）テープレコーダの制御用
単語、「巻きもどし」。Furthermore, after recognizing several strange passwords (after confirming that the owner is the owner), the word for controlling the tape recorder is "rewind".

「早送りｊ等の発声により認識され、機器制御信号発生
手段１４より出力されテープレコーダを制御する。It is recognized by the utterance of ``fast forward j,'' etc., and is output from the device control signal generating means 14 to control the tape recorder.

第３図〜第５図は本実施例の動作信号のフローチャート
である。第３図の開始から電話回線が結合され、テープ
メツセージが再生された後外部者による留守番電話の録
音を行なう。その際、オーナの音声による第１パスワー
ドが入力された時１、レベル、周波数補正を行ない、第
１パスワードを判別する。この時第１パスワードとして
認識した場合、テープレコーダーを停止しタイマーをス
タートさせる。さらに第２パスワードかどうかを判別し
タイマーの時間内に認識されない場合最初に戻り、テー
プを再生する。3 to 5 are flowcharts of operation signals of this embodiment. From the beginning of FIG. 3, the telephone line is connected, and after the tape message is played back, an answering machine is recorded by an outsider. At this time, when the first password is inputted by the owner's voice, level and frequency correction is performed to determine the first password. At this time, if it is recognized as the first password, the tape recorder is stopped and a timer is started. Furthermore, it is determined whether or not it is the second password, and if it is not recognized within the timer time, the process returns to the beginning and plays the tape.

同様にして第３パスワードが認識された後、テープレコ
ーダーの制御音声の「早送り」、「巻きもどし」を認識
し、テープレコーダーの制御を行なう。ここで早送りの
場合、テープの録音終了後、すなわち外部者が録音する
メツセージの終了端である場合、タイマーをスタートす
るメツセージの終了端である場合、タイマーをスｐＨ−
）　Ｌ、「録音」か「巻き戻し」かを判別し録音の時は
オーナの音声を録音する。またタイマー動作内に音声が
入らない場合回線を切断する。さらに時間内に第１パス
ワードが入力されさらに「消去」が入力された場合、テ
ープレコーダーのメツセージ内８に最初から消去する。After the third password is recognized in the same way, the tape recorder control voice commands "fast forward" and "rewind" are recognized and the tape recorder is controlled. In the case of fast forwarding, after the tape has finished recording, that is, at the end of the message recorded by an outsider, or at the end of the message that starts the timer, the timer is set to
) L, determines whether to "record" or "rewind" and records the owner's voice when recording. Also, if there is no voice within the timer operation, the line will be disconnected. Furthermore, if the first password is input within the specified time and "erase" is input, the message is erased from the beginning in message 8 of the tape recorder.

これは、テープが外部からのメンセージで満たされ、そ
れ以上のメツセージが記録できない場合に、外部から制
御するものであり３重の誤動作を防止してＶる。This is controlled from the outside when the tape is filled with messages from the outside and no more messages can be recorded, thereby preventing triple malfunctions.

第６図は本実施例のハードウェアの構成図であり、２線
、４線のハイブリッドトランスを介して４線変換された
電話回線の受信端子２１１１に、伝達特性補正手段２０
としてプログラマブルアンプ２３、周波数補正アンプ２
４ｆ：介し音声分析手段９と破線で示した認識、テープ
レコーダ制御部２６を設け、テープレコーダ２６やプロ
グラマブルアンプ２３、周波数補正アンプ２４と接続さ
れている。さらにテープレコーダ２６の動作の説明を音
声合成手段２７を用いてテープ再生出力信号とともに送
信端子２ｔｂへ送られ、遠隔地の電話からメツセージや
テープレコーダ２６の動作が確認できる。２８はスピー
カ２９用の増幅器である。FIG. 6 is a hardware configuration diagram of this embodiment, in which a transfer characteristic correction means 20 is connected to a receiving terminal 2111 of a telephone line converted into four wires via a two-wire and four-wire hybrid transformer.
as programmable amplifier 23, frequency correction amplifier 2
4f: A recognition and tape recorder control unit 26 is provided, which is indicated by a broken line, and is connected to the tape recorder 26, the programmable amplifier 23, and the frequency correction amplifier 24. Further, an explanation of the operation of the tape recorder 26 is sent to the transmission terminal 2tb along with a tape reproduction output signal using the voice synthesis means 27, so that messages and the operation of the tape recorder 26 can be confirmed from a remote telephone. 28 is an amplifier for the speaker 29.

本実施例による一連の動作を合成音の確認によって行な
う場合は以下の様になる。When the series of operations according to this embodiment is performed by checking the synthesized sound, it is as follows.

（Ａ）　第１パスワードの入力に対し「第２パスワード
をどうぞ」を音声合成音により送出する。(A) In response to the input of the first password, "Please enter the second password" is sent out using a synthesized voice.

（Ｂ）　第２パスワード１第３パスワードに対しても同
様に認識する。もし第１〜第３パスワードが認識できな
い場合、「第１パスワードをどうぞ」ともう一度最初（
Ａ）がらやり直する。(B) The second password 1 and the third password are recognized in the same way. If the 1st to 3rd passwords cannot be recognized, please enter the 1st password again.
A) Start over.

（Ｃ）　第３パスワードまで認識した場合、「ハスワー
ド終りました」と発生し、「早送り」。(C) If up to the third password is recognized, the message "Hassword completed" will appear and "Fast forward" will appear.

「巻き戻し」等の動作を行なうたびに合成音から同じ音
声を送出する。The same voice is sent out from the synthesized sound every time an operation such as "rewind" is performed.

（Ｄ）　テープメツセージの終了になるとメツセージエ
ンド状態を伝えるためや「メツセージです」や「ピッピ
」の合成音を送出し、タイマ一時間内に「録音」や「消
去」動作を行なう。(D) When the tape message ends, a synthesized sound such as "message desu" or "pippi" is sent out to convey the message end status, and "recording" and "erasing" operations are performed within one hour of the timer.

この際、録音はそのままオーナが「録音」と発声すれば
、合成音の「ロクオン」を送出し、テープレコーダが録
音モードで動作する。At this time, if the owner utters the word "record" while recording, the synthesized sound "Rokuon" is sent out and the tape recorder operates in recording mode.

（Ｅ）「消去」の際には、一度「録音」モードに入れて
時間内に第１パスワードを入れ、「消去」の音声により
初めて「ショウキョ」を合成音より発声し、テープレコ
ーダのメツセージを消去する。(E) When "erasing", first enter the "recording" mode and enter the first password within the time limit, then utter "shokyo" from the synthesized voice for the first time with the "erase" voice, and record the message on the tape recorder. to erase.

同、本実施例におけるパスワードを用いた階層構造と、
テープレコーダ制御操作に用いた手段は家庭内機器の冷
暖房機の制御や防災用監視装置の遠隔地モニターに適用
することができる。Similarly, the hierarchical structure using passwords in this embodiment,
The means used to control the tape recorder can be applied to the control of air conditioners in household equipment and remote monitoring of disaster prevention monitoring equipment.

マタ、パスワード口笛、ブツシュホンのダイヤル発振音
等の手短かな音声帯域信号を登録してもよく、さらにパ
スワードと併用して標準パターンメモリー１１に登録し
てもよい。A quick audio band signal such as a password whistle, a dial oscillation sound of a telephone, etc. may be registered, and furthermore, it may be registered in the standard pattern memory 11 in combination with a password.

発明の効果以上のように本発明の音声遠隔制御装置によれば、音声
等容易に使用できる音声帯域内信号により、例えば電話
回線を用い、遠方より留守番電話ち々庭内機器の制御が
でき、またパスワードによる階層構造を利用すれば、容
易に機密の保護や他人によるトラブルを避けることがで
き、その効果は大なるものがある。Effects of the Invention As described above, according to the voice remote control device of the present invention, it is possible to control an answering machine or a garden device from a distance using, for example, a telephone line using an easily usable voice band signal such as voice. Furthermore, if a hierarchical structure based on passwords is used, it is possible to easily protect confidentiality and avoid trouble caused by others, which has a great effect.

[Brief explanation of the drawing]

第１図は従来の音声認識装置のブロック図、第２図は本
発明の一実施例の音声遠隔制御装置のブロック図、第３
図乃至第６図は同動作信号のフローチャート、第６図は
同ハードウェアの構成図である。９・・・・音声分析手段、１０・・・・データ正規化手
段、１１・・・・・・標準音声パターンメモリ、１２・
・・・・パターン整合手段、１３・・・・・・判定手段
、１４・・・・・機器制御信号発生手段、１５・・・・
エネルギ抽出手段、１６・・・・平均パワースペクトル
、抽出手段、１７・・・・・記録手段、１８・・・・補
正量演算手段、１９・・・・・判定手段、２ｏ・・・伝
達特性補正４投。代理人の氏名　弁理士　中　尾　敏　男　ほか１名。第　３　面第　４　面FIG. 1 is a block diagram of a conventional voice recognition device, FIG. 2 is a block diagram of a voice remote control device according to an embodiment of the present invention, and FIG.
6 through 6 are flowcharts of the same operation signals, and FIG. 6 is a configuration diagram of the same hardware. 9...Speech analysis means, 10...Data normalization means, 11...Standard speech pattern memory, 12.
... Pattern matching means, 13 ... Judgment means, 14 ... Equipment control signal generation means, 15 ...
Energy extraction means, 16: Average power spectrum, extraction means, 17: Recording means, 18: Correction amount calculation means, 19: Judgment means, 2o: Transfer characteristic 4 corrected pitches. Name of agent: Patent attorney Toshio Nakao and one other person. 3rd page 4th page

Claims

[Scope of Claims] (1) Speech analysis means for analyzing input speech band signal phrases, transfer characteristic correction means for inputting the correction amount determined by the transfer characteristic difference extraction means, and data determined by the voice analysis means. a standard speech pattern memory for registering data at the time of registration, a pattern matching means for matching the input speech pattern with the standard speech pattern in the standard speech pattern memory at the time of recognition, and matching of the pattern matching means. a voice recognition device having a determination means for determining the degree of similarity between voice inputs at the time of registration and at the time of recognition from the output;
An audio remote control device that controls equipment using the audio in-band signal. (2) The voice remote control device according to claim 1, which has a hierarchical structure of two or more types of voice passwords and controls the operation of the device after recognizing all the voice passwords. (3) The transfer characteristic difference extraction means includes an energy extraction means for extracting the maximum value or average energy of the registered voice at the time of voice registration in the first voice password processing, and an average power by frequency band of the count analyzed by the voice analysis means. Average power spectrum extraction means for extracting a spectrum; recording means for recording physical quantities extracted by the energy extraction means and the average power spectrum extraction means; energy extraction means and average power spectrum extraction means for input signals during speech recognition. and a correction amount calculation means for extracting a correction m from the physical quantity extracted in the recording means and the physical quantity in the recording means extracted at the time of voice registration, and the transfer characteristic correction means corrects the transfer function of the signal transmission system using the correction amount. An audio remote control device according to claim 1. (4) A patent claim in which the voice recognition device performs voice password processing using a time processing means to request the first voice password again if the next voice password is not recognized within a certain period of time after recognizing one voice password. The voice remote control device according to item 1. (6) The voice remote control device according to claim 2, wherein a password hierarchy structure is applied to the operation control with high importance among the device operation processing of the voice recognition device, and the operation command is recognized after passing the voice password. (6) A voice band signal other than the owner's voice pattern, such as a whistle, is included in the password registration pattern of the voice recognition device.
3. The audio remote control device according to claim 1, wherein the audio remote control device performs the recognition operation by registering audio band signals such as the sound of clicking one's lips or the dial oscillation sound of a telephone.