JPH09257970A

JPH09257970A - Time adjusting method and device through voice recognition

Info

Publication number: JPH09257970A
Application number: JP7186696A
Authority: JP
Inventors: Hiroshi Hasegawa; 浩長谷川; Isanaka Edatsune; 伊佐央枝常; Yasunaga Miyazawa; 康永宮沢; Mitsuhiro Inazumi; 満広稲積; Osamu Urano; 治浦野; Sunao Aizawa; 直相澤
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1996-03-27
Filing date: 1996-03-27
Publication date: 1997-10-03

Abstract

PROBLEM TO BE SOLVED: To simply and accurately perform time adjustment operation in an apparatus with a buil-in timepiece by taking in the voice of time information service given by NTT, etc., through a telephone line, recognizing the voice by use of a voice recognition technique, and automatically adjusting the time with the result. SOLUTION: By use of continuous voice recognition means 5 performing voice recognition by a keyword spotting processing with a dinamic current neutral network, voice character data to the input voice obtained by analyzation of voice by an input signal analyzing unit 3 is compared with reference voice character data to respective words registered in a reference voice character data storing unit 51 by a word detection unit 52, and it is detected, in the form of numerical value showing the provability of existence of the registered word, that on which position on the time axis of input voice the registered word exists and that how extent of provability the registered word exists, for every registered words. Based on the numerical value showing the provability, the input voice is recognized, and according to the result the time of the timepiece is set by a specified timming.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、家電製品、ＡＶ
（Ａudio and Ｖisual）機器、ファクシミリ、留守番機
能付き電話機などに内蔵された時計の現在時刻合わせを
音声認識技術を用いて行う音声認識を用いた時刻合わせ
方法及びその装置に関する。TECHNICAL FIELD The present invention relates to home electric appliances, AVs.
(Audio and Visual) The present invention relates to a time setting method and apparatus using voice recognition, which uses a voice recognition technology to set the current time of a clock built in a device, a facsimile, a telephone with an answering machine, or the like.

【０００２】[0002]

【従来の技術】最近の家電製品やＡＶ機器の多くは時計
を内蔵しているものが多くなってきている。2. Description of the Related Art Recently, many home electric appliances and AV devices have a built-in clock.

【０００３】一般に、この種の時計の現在時刻合わせ操
作は、機器を購入して使い始めるときは勿論、機器を据
え付けたに後に停電があった場合の復旧後、電源プラグ
を抜いた後などに必要となる。Generally, the present time adjustment operation of this type of timepiece is performed not only when the device is purchased and started to be used, but also when the device is installed and the power is restored after a power failure or after the power plug is unplugged. Will be needed.

【０００４】この時刻合わせにおいて、正確な時刻に合
わせるには、たとえば、ビデをデッキを購入して据え付
ける場合を例に取れば、ビデオデッキをテレビに接続し
たあと、付属のリモコンのメニューボタンを押して、テ
レビ画面に映し出される幾つかのメニューのうち、「現
在時刻設定」を選択し、たとえば、現在の時刻が午前８
時少し前であったとすると、リモコンの数字ボタンによ
り、表示を午前８時０分に設定し、電話によりＮＴＴの
時刻サービス（１１７番）を聞きながら、「午前８時ち
ょうどをお知らせします。ピッ、ピッ、ピッ、ポーン」
のポーンとなった瞬間に、時刻設定ボタンなどを押すと
いった手順で時刻合わせを行う。この操作手順は機器の
種類やメーカなどにより異なる場合があるが、大まかな
操作手順としては大体以上のようである。このような操
作手順により時刻合わせを行うことで、現在時刻を分単
位でほぼ正確な時刻に合わせることが可能となる。In order to set the correct time in this time adjustment, for example, in the case where a bidet is purchased and installed in a deck, for example, after connecting the VCR to the TV, the menu button on the attached remote controller is pushed. , "Current time setting" is selected from several menus displayed on the TV screen. For example, the current time is 8 am
If it was a little before the hour, set the display to 8:00 am with the number buttons on the remote control, and while listening to the NTT time service (No. 117) by phone, "I will inform you exactly at 8 am. , Pip, pip, pawn "
At the moment it becomes a pawn, the time is set by pressing the time setting button. This operation procedure may vary depending on the type of equipment, manufacturer, etc., but it seems to be more than the general operation procedure. By performing the time adjustment by such an operation procedure, it becomes possible to adjust the current time to a substantially accurate time in minutes.

【０００５】このように、最近では、リモコンによって
殆どの操作が可能となる機器も多く、このような機器で
は、現在時刻合わせもこのリモコンを用いて行うのが普
通である。また、この種の機器においては、機器の様々
な機能を手順通り高精度に行わせるためには、内蔵され
た時計の時刻を常に正確に合わせておくことがきわめて
重要な要件となっているものも多い。As described above, recently, there are many devices that can be operated by a remote controller, and in such a device, it is common to use the remote controller to adjust the current time. In addition, in this type of equipment, it is extremely important to keep the time of the built-in clock always accurate in order to perform various functions of the equipment with high accuracy according to the procedure. There are also many.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、このよ
うな一連の操作は、この種の機器の取り扱いに慣れた人
にとっても、正確な時刻合わせは面倒な作業であり、機
器の取り扱いに不慣れな人にとってはなおさらである。
特に、機器の取り扱いに不得手といわれる高齢者など
は、難解な仕様書を見ながら試行錯誤を繰り返しながら
時刻合わせを行っているのが現状である。また、前記し
た方法では秒単位の時刻合わせは難しく、分単位の時刻
合わせが限度である。However, such a series of operations is a troublesome work even for a person who is accustomed to handling this type of equipment, and a person who is unfamiliar with handling the equipment is required. Even more so for me.
In particular, the elderly, who are said to be poor at handling equipment, are currently adjusting the time by repeating trial and error while viewing difficult specifications. Further, according to the method described above, it is difficult to set the time in seconds, and the time adjustment in minutes is limited.

【０００７】一方、最近では、テレビの時報から自動的
に時間合わせを行う機能を有したビデオデッキも開発さ
れてきている。これは、通常、あまり録画に使用しない
時間（たとえば、午前６時前）になると、自動的にビデ
オデッキの電源が入り、特定のチャネルを監視して、た
とえば、午前６時の時報の「ピッ、ピッ、ピッ、ポー
ン」の「ポーン」で時刻を合わせる機能を有しているも
のである。On the other hand, recently, a video deck having a function of automatically adjusting the time from a time signal of a television has also been developed. This usually means that when it's not used for recording too often (eg, before 6am), the VCR automatically powers up and monitors certain channels to, for example, “pip” on a time signal at 6am. , Pip, pip, pawn, "pawn" has the function of adjusting the time.

【０００８】しかし、このような機能を生かすには、少
なくとも、ユーザが予め大体の現在時刻設定を行ってお
かなければならず、また、ユーザによる現在時刻の設定
が正確な現在時刻に対して所定の範囲以内に設定されて
いなければならない。これは、自動的にビデオデッキの
電源が入ってから特定のチャネルを監視して、たとえば
午前６時の時報、ピッ、ピッ、ピッ、ポーンのポーンで
時刻を合わせるまでの時間をせいぜい５分間程度以内に
抑える必要があるからである。さらに、途中で停電した
り、何らかの原因で電源プラグが抜けたりした場合も、
時刻合わせの機能が果たされないということになる。However, in order to make use of such a function, at least the user has to set the current time in advance in advance, and the current time set by the user is set to a correct current time. Must be set within the range of. This is about 5 minutes at the maximum until the VCR power is turned on automatically and a specific channel is monitored, and the time is adjusted at a time signal of 6 am, pit, pit, pit, pit, and pawn, for example. This is because it is necessary to keep it within. Furthermore, if there is a power outage or the power plug is pulled out for any reason,
This means that the time adjustment function is not fulfilled.

【０００９】本発明はこのような問題を解決するため
に、電話によるＮＴＴの時刻サービスなどの音声および
発信音による時刻情報を取り込んで、音声認識技術を利
用して音声認識し、その認識結果を用いて自動的に時刻
合わせを行うことで、時計を内蔵した機器における時刻
合わせ操作を簡単にかつ正確に時刻合わせすることを可
能とした音声認識を用いた時刻合わせ方法及びその装置
を実現することを目的としている。In order to solve such a problem, the present invention takes in time information by voice and dial tone such as NTT's time service by telephone, performs voice recognition using a voice recognition technology, and outputs the recognition result. To realize a time adjustment method using voice recognition and a device therefor capable of easily and accurately adjusting the time in a device with a built-in clock by automatically adjusting the time. It is an object.

【００１０】[0010]

【課題を解決するための手段】本発明の音声認識を用い
た時刻合わせ方法は、機器に内蔵された時計の現在時刻
合わせを行う際、音声を含む時刻案内を入力し、この時
刻案内の音声部分を音声認識して、その認識結果をもと
に前記時計の時刻を、時刻案内で案内されるその時点の
時刻に所定のタイミングにて設定する音声認識を用いた
時刻合わせ方法において、音声認識を行う手段は、ダイ
ナミックリカレントニューラルネットワーク方式を用い
たキーワードスポッティング処理による音声認識手段を
用い、音入力手段により入力された前記時刻案内に含ま
れる音声を分析して得られた入力音声に対する音声特徴
データと、それぞれの登録単語に対する標準音声特徴デ
ータとを比較し、前記登録単語が入力音声の時間軸上の
どの部分にどの程度の確からしさで存在するかを、それ
ぞれの登録単語毎に、確からしさを示す数値で検出し、
その確からしさを示す数値を基に、入力音声を認識し
て、その認識結果をもとに前記時計の時刻を、時刻案内
で案内されるその時点の時刻に所定のタイミングにて設
定することを特徴とする。A time adjustment method using voice recognition according to the present invention inputs a time guide including a voice when adjusting a current time of a clock built in a device, and inputs a voice guide of the time guide. In a time adjustment method using voice recognition, a part of the voice is recognized, and the time of the clock is set based on the recognition result at a predetermined timing at the time point of the time guidance. The means for performing is a voice recognition means by keyword spotting processing using the dynamic recurrent neural network method, voice feature data for the input voice obtained by analyzing the voice included in the time guidance input by the sound input means And the standard voice feature data for each registered word are compared, and to what part on the time axis of the input voice the registered word is and Whether present in probability of, for each registered word, detects a numerical value indicating the likelihood,
Based on the numerical value indicating the certainty, the input voice is recognized, and based on the recognition result, the time of the clock is set to the time of the time indicated by the time guidance at a predetermined timing. Characterize.

【００１１】このような、時刻合わせ方法において、前
記認識結果をもとに内蔵された時計が設定される時刻
は、時間の経過に対応したそれぞれの時刻案内の音声ご
とに、その時刻案内音声に対して音声認識を行い、それ
ぞれの時刻ごとの音声認識結果の確からしさが十分と判
断された時点における時刻とする。In such a time adjustment method, the time at which the built-in clock is set based on the recognition result is set to the time guidance voice for each time guidance voice corresponding to the passage of time. The voice recognition is performed on the other hand, and the time is the time when the certainty of the voice recognition result at each time is determined to be sufficient.

【００１２】また、前記認識結果をもとに内蔵された時
計が設定される時刻は、時間の経過に対応した時刻案内
の音声ごとに予め設定した回数の音声認識を行い、それ
ぞれの時刻ごとの時刻案内音声に対する音声認識結果を
もとに決定された時刻とすることを特徴とする。As for the time when the built-in clock is set based on the recognition result, the voice recognition is performed a preset number of times for each voice of the time guidance corresponding to the passage of time, and the time recognition is performed for each time. The time is determined based on the voice recognition result for the time guidance voice.

【００１３】このような方法を採用することにより、音
声を含む時刻案内を入力させるだけで、自動的に正確な
現在時刻をセットできる。これにより、この種の機器の
取り扱いに不慣れなユーザでも簡単に時刻合わせを行う
ことができる。特に、音声を含む時刻案内として、ＮＴ
Ｔの「１１７番」による時刻サービスを利用すれば、任
意の時刻に、「１１７番」に電話をかけて、受話器から
発せられる時刻案内を入力させるだけで、自動的に秒単
位の正確な現在時刻をセットできる。また、音声認識技
術としてダイナミックリカレントニューラルネットワー
ク方式を用いたキーワードスポッティング処理による連
続音声認識手段を用いているので、ノイズの影響を受け
にくく、回線にノイズが存在していてもノイズが単語認
識出力に大きな影響を及ぼすことが無いため、高精度な
音声認識を行うことができる。By adopting such a method, an accurate current time can be automatically set only by inputting time guidance including voice. As a result, even a user who is unfamiliar with handling this type of device can easily adjust the time. Especially, as a time guide including voice, NT
If you use the time service by "117" of T, you can call the "117" at any time and input the time guidance emitted from the handset, and the accurate current time in seconds will be automatically displayed. You can set the time. In addition, since the continuous speech recognition means by keyword spotting processing using the dynamic recurrent neural network method is used as the speech recognition technology, it is less susceptible to noise, and even if there is noise on the line, noise is output to word recognition output. Since there is no significant influence, it is possible to perform highly accurate voice recognition.

【００１４】また、本発明の音声認識を用いた時刻合わ
せ装置は、機器に内蔵された時計の現在時刻合わせを行
う際、音声を含む時刻案内を入力し、この時刻案内の音
声部分を音声認識して、その認識結果をもとに前記時計
の時刻を、時刻案内で案内されるその時点の時刻に所定
のタイミングにて設定する音声認識を用いた時計の現在
時刻合わせ装置において、音声を含む時刻案内を入力す
る音入力手段と、この音入力手段に入力された現在時刻
を予告する音声を分析して得られた入力音声に対する音
声特徴データと、それぞれの登録単語に対する標準音声
特徴データとを比較し、前記登録単語が入力音声の時間
軸上のどの部分にどの程度の確からしさで存在するか
を、それぞれの登録単語毎に、確からしさを示す数値で
検出するダイナミックリカレントニューラルネットワー
ク方式を用いたキーワードスポッティング処理による音
声認識手段と、この音声認識手段による認識結果をもと
に前記時計の時刻を、時刻案内で案内されるその時点の
時刻に所定のタイミングにて設定する時刻設定手段とを
有することを特徴とする。Further, the time adjusting device using the voice recognition of the present invention inputs the time guide including the voice when the current time of the clock built in the device is adjusted, and recognizes the voice part of the time guide. Then, the time of the timepiece is set based on the recognition result at a predetermined time at the time point of the time guidance provided by the time guidance. The sound input means for inputting time guidance, the voice feature data for the input voice obtained by analyzing the voice predicting the current time input to the sound input means, and the standard voice feature data for each registered word are provided. In comparison, the dynamics of detecting, with respect to each part of the registered word, the degree of certainty in which part on the time axis of the input voice the registered word is detected by a numerical value indicating the certainty. Voice recognition means by keyword spotting processing using the recurrent neural network method, and based on the recognition result by this voice recognition means, the time of the clock is set at a predetermined timing to the time of the time point that is guided by the time guidance. And a time setting means for setting.

【００１５】このような音声認識を用いた時刻合わせ装
置において、前記認識結果をもとに内蔵された時計が設
定される時刻は、時間の経過に対応したそれぞれの時刻
案内の音声ごとに音声認識を行い、それぞれの時刻ごと
の音声認識結果の確からしさが十分と判断された時点に
おける時刻とする。In the time adjusting device using such voice recognition, the time when the built-in clock is set based on the recognition result is voice recognized for each voice of the time guidance corresponding to the passage of time. And the time at which the certainty of the speech recognition result at each time is determined to be sufficient.

【００１６】また、前記認識結果をもとに内蔵された時
計が設定される時刻は、時間の経過に対応した時刻案内
の音声ごとに予め設定した回数の音声認識を行い、それ
ぞれの時刻ごとの時刻案内音声に対する音声認識結果を
もとに決定された時刻とすることを特徴とする。As for the time when the built-in clock is set based on the recognition result, a predetermined number of times of voice recognition is performed for each time guidance voice corresponding to the passage of time, and the time recognition is performed for each time. The time is determined based on the voice recognition result for the time guidance voice.

【００１７】このような装置を用いることにより、音声
を含む時刻案内を入力させるだけで、自動的に正確な現
在時刻をセットできる。これにより、この種の機器の取
り扱いに不慣れなユーザでも簡単に時刻合わせを行うこ
とができる。特に、音声を含む時刻案内として、ＮＴＴ
の「１１７番」による時刻サービスを利用すれば、任意
の時刻に、「１１７番」に電話をかけて、受話器から発
せられる時刻案内を入力させるだけで、自動的に秒単位
の正確な現在時刻をセットできる。また、音声認識技術
としてダイナミックリカレントニューラルネットワーク
方式を用いたキーワードスポッティング処理による連続
音声認識手段を用いているので、ノイズの影響を受けに
くく、回線にノイズが存在していてもノイズが単語認識
出力に大きな影響を及ぼすことが無いため、高精度な音
声認識を行うことができる。By using such a device, an accurate current time can be automatically set only by inputting time guidance including voice. As a result, even a user who is unfamiliar with handling this type of device can easily adjust the time. Especially, as a time guide including voice, NTT
If you use the “117” time service, you can call the “117” at any time and have the time guide issued from the handset enter, and the current time will be displayed in seconds automatically. Can be set. In addition, since the continuous speech recognition means by keyword spotting processing using the dynamic recurrent neural network method is used as the speech recognition technology, it is less susceptible to noise, and even if there is noise on the line, noise is output to word recognition output. Since there is no significant influence, it is possible to perform highly accurate voice recognition.

【００１８】[0018]

【発明の実施の形態】以下、本発明の実施の形態を図面
を参照しながら説明する。なお、この実施の形態では、
機器としてはビデオデッキを例にとり、このビデをデッ
キに内蔵された時計の現在時刻を、ＮＴＴの「１１７
番」の時刻案内サービスを用いて時刻合わせを行う例に
ついて説明する。Embodiments of the present invention will be described below with reference to the drawings. In this embodiment,
Taking a video deck as an example of the device, the current time of the clock built into this deck is set by NTT "117
An example will be described in which the time is adjusted by using the "time" service.

【００１９】（第１の実施の形態）図１は本発明の実施
の形態において、時刻合わせを行うための時刻合わせ処
理部の構成を示すブロック図であり、音を入力する音入
力部１、この音入力部１に入力された「１１７番」の時
刻案内信号を増幅器、ローパスフィルタなどを通して、
適当な音声波形とした後、Ａ／Ｄ変換器により、ディジ
タル信号に変換する入力信号処理部２、この入力信号処
理部２により処理された信号を、短時間毎に周波数分析
して、周波数の特徴を表す数次元の特徴ベクトルを抽出
し、この特徴ベクトルの時系列（特徴ベクトル列）を出
力する入力信号分析部３、この入力信号分析部３からの
出力をもとに時刻案内に含まれる発信音を検出するとと
もに、そのタイミングを検出する発信音検出部４（これ
については後に説明する）、時刻案内に含まれる音声を
構成するそれぞれの単語（これについては後に説明す
る）を認識する音声認識部５（この音声認識部について
は詳細を後に説明する）、音声認識された結果を時刻と
して表示する時刻表示部６、正しく認識された音声に対
応して内蔵された時計の時刻を、時刻案内で案内される
その時点の時刻に所定のタイミングにて設定する時刻設
定部７などから構成されている。(First Embodiment) FIG. 1 is a block diagram showing a configuration of a time adjustment processing unit for performing time adjustment according to an embodiment of the present invention. A sound input unit 1 for inputting a sound, The "117th" time guide signal input to the sound input unit 1 is passed through an amplifier, a low-pass filter, etc.
After an appropriate voice waveform, an A / D converter converts the input signal processing unit 2 into a digital signal, and the signal processed by the input signal processing unit 2 is subjected to frequency analysis at short time intervals to determine the frequency. An input signal analysis unit 3 that extracts a several-dimensional feature vector representing a feature and outputs a time series (feature vector sequence) of this feature vector, and is included in the time guidance based on the output from this input signal analysis unit 3. A voice that detects a dial tone and also detects the timing of the dial tone detection unit 4 (which will be described later) and a voice that recognizes each word (which will be described later) that constitutes the voice included in the time guidance. A recognition unit 5 (details of this voice recognition unit will be described later), a time display unit 6 for displaying the result of voice recognition as a time, and a built-in time corresponding to a correctly recognized voice. The time, the time at that point to be guided by the time the guide is composed of such as the time setting unit 7 for setting at a predetermined timing.

【００２０】ところで、この実施の形態では、基準とす
る時刻を前記したように、ＮＴＴの「１１７番」の時刻
案内サービスを用いている。この「１１７番」の時刻案
内サービスは、１秒間隔で発せられる発信音（第１の発
信音）、この第１の発信音に重畳された現在時刻を予告
する時刻案内音声と、予告された時刻に到達したことを
知らせる発信音（第２の発信音）、さらには、何時何分
３０秒と何時何分ちょうど毎に、前記第２の発信音の前
に１秒間隔で出力される発信音（第３の発信音）とで構
成され、これら発信音および音声が時間の経過とともに
継続的に出力されるものである。By the way, in the present embodiment, as described above, the time guide service of "117" of NTT is used. This "117" time guidance service is announced with a dial tone (first dial tone) emitted at 1-second intervals and a time guidance voice for notifying the current time superimposed on the first dial tone. A dial tone (second dial tone) that informs that the time has been reached, and further, a call that is output at an interval of 1 second before the second dial tone every hour, minute, thirty seconds, and exactly every hour, minute, and hour. Sound (third dial tone), and these dial tone and voice are continuously output with the passage of time.

【００２１】すなわち、その時刻案内の一例としては、
「午前７時２０分５０秒をお知らせします。・・・ポー
ン・・・」、「午前７時２１分ちょうどをお知らせしま
す。・・・ピッピッピポーン・・・」、「午前７時２１
分１０秒をお知らせします。・・・ポーン・・・」とい
うように、１０秒間隔で時刻案内される。この時刻案内
は、１秒毎に秒を刻む第１の発信音（この第１の発信音
は前記「午前７時２０分５０秒をお知らせします。・・
・ポーン・・・」において、「・・・」で示す部分であ
り、これを発信音ｐ１で表す）、１０秒ごとの第２の発
信音（この１０秒毎の発信音は「午前７時２０分５０秒
をお知らせします。・・・ポーン・・・」の「ポーン」
の部分であり、これを発信音ｐ２で表す）、人間のアナ
ウンスの声（これは「午前７時２０分５０秒をお知らせ
します。」の部分であり、これを音声信号ｖ１と表す）
などにより構成され、その他に回線に乗るノイズ（これ
をｎで表す）も含まれる。なお、何時何分３０秒と何時
何分ちょうどというような場合には、「ポーン」の前に
「ピッピッピ」という第３の発信音が必ず入る（これを
発信音ｐ３で表す）。前記「ポーン」という発信音ｐ２
は８８０Ｈｚ、「ピッピッピ」という発信音ｐ３は４４
０Ｈｚである。そして、発信音ｐ１は１秒間隔で発せら
れ、何時何分３０秒と何時何分ちょうどの前に発せられ
る発信音ｐ３も１秒毎に発せられている。That is, as an example of the time guidance,
"I will notify you at 7:20:50 am ... Pawn ...", "I will notify you just at 7:21 am ... Pippippi Pawn ...", "7:21 am
I will inform you about 10 minutes. "Pawn ...", the time is guided at intervals of 10 seconds. This time guide is the first dial tone that ticks the second every 1 second (This first dial tone notifies you of the above "7:20:50 am.-
・ "Pawn ..." is a portion indicated by "...", which is represented by a dial tone p1. A second dial tone every 10 seconds (the dial tone every 10 seconds is "7:00 am"). "Pawn" of "Pawn ..."
, Which is indicated by the dial tone p2), and the voice of the human announcement (this is the portion of "I will notify you at 7:20:50 am.", Which is indicated by the voice signal v1).
It also includes noise on the line (denoted by n). In addition, in the case of exactly what time, what minute and thirty seconds, and what time, what minute, and so on, the third dial tone "pippippi" always comes before "pawn" (this is represented by dial tone p3). The tone "pawn" p2
Is 880 Hz, and the beeping sound p3 is 44.
0 Hz. The dial tone p1 is emitted at intervals of 1 second, and the dial tone p3 that is emitted just before what hour and 30 minutes and what hour and minute is also emitted every second.

【００２２】図２（ａ）は、「午前７時２１分ちょうど
をお知らせします。・・・ピッピッピポーン・・・」の
時刻案内信号であり、同図（ｂ）は、「午前７時２１分
１０秒をお知らせします。・・・ポーン・・・」の時刻
案内信号の例を示すものである。このように、図２
（ａ）の場合は、前の時刻を知らせる「ポーン」（発信
音ｐ２）の後に、発信音ｐ１が１秒間隔で発せられ、発
信音ｐ１が６回発せられた後、発信音ｐ３が「ピッピッ
ピ」１秒間隔で発せられ、さらにその１秒後に、「ポー
ン」という発信音ｐ２が発せられる。また、同図（ｂ）
の場合は、前の時刻を知らせる「ポーン」（発信音ｐ
２）の後に、発信音ｐ１が１秒間隔で９回発せられた
後、その１秒後に、「ポーン」という発信音ｐ２が発せ
られる。なお、人間のアナウンスによる音声信号ｖ１は
発信音ｐ１に重畳した状態で送られてくる。また、同図
（ｃ）は、同図（ｂ）に示す時刻案内信号を入力信号分
析部３で特徴分析された特徴ベクトルを部分的に示すも
ので、図１で示した発信音特徴ベクトル記憶部４１に
は、この図２（ｃ）で示した発信音ｐ１，ｐ２，ｐ３の
それぞれの特徴ベクトルＶｐ１，ＶＰ２，Ｖｐ３が記憶
される。なお、Ｖｖ１は音声部分の特徴ベクトルを示し
ている。また、図２（ｄ）は発信音のタイミングとし
て、ここでは、特に発信音ｐ２のタイミング信号を示す
ものである。FIG. 2 (a) is a time guidance signal of "I will notify you at 7:21 am .... Pippipippon ...", and FIG. 2 (b) shows "7:21 am. The following is an example of a time guidance signal for "Pawn ...". Thus, FIG.
In the case of (a), a beep tone p1 is emitted at 1 second intervals after a "pawn" (a beep tone p2) indicating the previous time, and a beep tone p3 is issued after the beep tone p1 is emitted six times. "Pippippi" is emitted at 1 second intervals, and 1 second later, a beep sound p2 "Pawn" is emitted. Also, FIG.
In the case of, "Pawn" (diatone p
After 2), the dial tone p1 is emitted 9 times at intervals of 1 second, and 1 second later, the dial tone p2 of "pawn" is issued. It should be noted that the voice signal v1 by the human announcement is sent in a state of being superimposed on the dial tone p1. Further, (c) of the figure partially shows the feature vector obtained by performing the feature analysis of the time guidance signal shown in (b) of the same figure by the input signal analysis section 3, and stores the tone vector feature vector storage shown in FIG. The section 41 stores the respective feature vectors Vp1, VP2, Vp3 of the dial tones p1, p2, p3 shown in FIG. 2 (c). Note that Vv1 indicates the feature vector of the voice part. 2D shows the timing signal of the dial tone, particularly the timing signal of the dial tone p2.

【００２３】ところで、前記音声認識部５は、本発明で
は、本出願人が開発したダイナミックリカレントニュー
ラルネットワーク（Ｄynamic Ｒecurrent Ｎeural Ｎet
works：以下、ＤＲＮＮという）方式を用いたキーワー
ドスポッティング処理による連続音声認識技術を用い
る。このＤＲＮＮによる音声認識技術は既に特許出願済
みである（特開平４−１６１０７５、特開平６−４０９
７、特開平６−１１９４７６など）。このＤＲＮＮを用
いた音声認識についてその概略を以下に説明する。By the way, in the present invention, the speech recognition unit 5 is a dynamic recurrent neural network (Dynamic Recurrent Neural Net) developed by the present applicant.
works: hereinafter referred to as DRNN) continuous speech recognition technology by keyword spotting processing is used. This DRNN voice recognition technology has already been applied for a patent (Japanese Patent Application Laid-Open Nos. 4-161075 and 6-409).
7, JP-A-6-119476, etc.). The outline of voice recognition using this DRNN will be described below.

【００２４】このＤＲＮＮ方式を用いたによる音声認識
部５は、主に標準音声特徴データ記憶部５１、単語検出
部５２、音声認識処理部５３で構成される。前記標準音
声特徴データ記憶部５１には、この場合、時間を特定す
るに必要な単語の標準音声特徴データが記憶されてい
る。この標準音声特徴データは、ＤＲＮＮによる音声認
識技術は不特定話者の音声認識を可能としていることか
ら、前記したような認識対象単語を多数の話者（たとえ
ば２００人程度）の発話した音声を用いて得られた音声
特徴データとすることができるが、時刻案内の情報源が
限定される場合には、特定話者による音声特徴データで
もよい。The voice recognition unit 5 using the DRNN system is mainly composed of a standard voice feature data storage unit 51, a word detection unit 52, and a voice recognition processing unit 53. In this case, the standard voice characteristic data storage unit 51 stores standard voice characteristic data of words necessary for specifying the time. This standard speech feature data allows speech recognition by unspecified speakers to be recognized by the speech recognition technology based on DRNN, so that the speech to be recognized by a large number of speakers (for example, about 200 persons) is used as the recognition target word as described above. It can be voice feature data obtained by using it, but when the information source of the time guidance is limited, it may be voice feature data by a specific speaker.

【００２５】また、前記単語検出部５２は、入力音声に
対する音声特徴データと、それぞれの登録単語に対する
標準音声特徴データとを比較し、前記登録単語が入力音
声の時間軸上のどの部分にどの程度の確からしさで存在
するかを、それぞれの登録単語毎に、確からしさを示す
数値で検出するもので、以下に、この単語検出部６２の
具体的な処理について、図３を参照しながら説明する。Further, the word detecting section 52 compares the voice feature data for the input voice with the standard voice feature data for each registered word, and determines to what part of the input voice the registered word is on the time axis and how much. Is detected with a numerical value indicating the certainty for each registered word. The specific processing of the word detecting unit 62 will be described below with reference to FIG. .

【００２６】今、入力音声が「午前７時２１分１０秒を
お知らせします」であった場合、図３（ａ）のような音
声信号が出力されたとする。なお、ここでは、説明を簡
略化するため、「午前７時」という部分の音声について
の説明とする。If the input voice is "Notify me of 7:21:10 am", it is assumed that the voice signal as shown in FIG. 3 (a) is output. It should be noted that, here, for simplification of the description, the description will be made of the voice of the part “7:00 am”.

【００２７】この「午前７時」の入力音声のうち、「午
前」と「７時」がキーワードとなり、これらは、認識候
補単語として、標準音声特徴データ記憶部５１にその特
徴データが予め登録されている。そして、単語検出部５
２には、標準音声特徴データ記憶部５１に予め登録され
ている単語（前記したように時間を特定するに必要な単
語）に対応して、各単語を検出する検出部があり、それ
ぞれの検出部が入力音声中にどの程度の確からしさで、
対応する単語が存在するかを検出するものである。つま
り、単語１として「午前」という単語が入力音声に存在
したときに、その「午前」という信号を待っている検出
部がそれを検出すると、図３（ｂ）の如く、入力音声の
「午前」の部分で信号が立ち上がる。同様に、単語２と
して、「７時」という単語が入力音声に存在したとき
に、その「７時」という信号を待っている検出部がそれ
を検出すると、図３（ｃ）の如く、入力音声の「７時」
の部分で信号が立ち上がる。Of the input speech of "7:00 am", "am" and "7:00" are keywords, and their characteristic data are registered in advance in the standard speech characteristic data storage unit 51 as recognition candidate words. ing. And the word detection unit 5
2 includes a detection unit that detects each word corresponding to a word (a word necessary for specifying the time as described above) registered in the standard voice feature data storage unit 51 in advance. How certain is the part in the input voice,
It is to detect whether the corresponding word exists. That is, when the word "am" is present as the word 1 in the input voice, and the detection unit waiting for the signal "am" detects it, as shown in FIG. The signal rises in the section. Similarly, when the word "7 o'clock" is present in the input voice as the word 2, when the detection unit waiting for the signal "7 o'clock" detects it, the input is made as shown in FIG. 3 (c). Voice "7 o'clock"
The signal rises in the part.

【００２８】同図（ｂ），（ｃ）において、0.9あるい
は0.8といった数値は、確からしさを示す数値であり、
以下では近似度という。この近似度が、0.9あるいは0.8
というような高い数値であれば、確からしさが高いとい
うことであり、単語１に対しては「午前」が認識候補で
あると判定され、同様に、単語２に対しては「７時」が
認識候補であると判定される。つまり、「午前」という
登録単語は入力音声中の時間軸ｗ１の部分に近似度0.9
で存在し、「７時」という登録単語は入力音声中の時間
軸ｗ２の部分に近似度0.8で存在することがわかる。In the figures (b) and (c), the numerical value such as 0.9 or 0.8 is a numerical value showing the certainty,
Below, it is called the degree of approximation. This approximation is 0.9 or 0.8
If the numerical value is high, it means that the certainty is high, and it is determined that “am” is a recognition candidate for word 1, and “7 o'clock” is similar for word 2. It is determined to be a recognition candidate. In other words, the registered word "am" has a degree of approximation of 0.9 in the time axis w1 part of the input voice.
It can be seen that the registered word "7 o'clock" exists in the part of the time axis w2 in the input voice with an approximation degree of 0.8.

【００２９】また、この図３の例では、「７時」という
単語に対して、同図（ｄ）に示すように、単語３（この
単語３は、「８時」という単語であるとする）を待つ信
号もある程度の確からしさ（その近似度は0.6程度）を
有して立ち上がっている。このように、入力音声信号に
対して同一時間上（この場合は、ｗ２という時間上）
に、２つ以上の登録単語が認識候補として存在する場合
には、単語検出部６２からは近似度の高い順に第１位か
ら幾つかの認識候補が出力される。Further, in the example of FIG. 3, the word "7 o'clock" is changed to the word 3 (this word 3 is the word "8 o'clock") as shown in FIG. The signal that waits for) stands up with a certain degree of certainty (the degree of approximation is about 0.6). Thus, on the same time as the input audio signal (in this case, on the time of w2)
In addition, when two or more registered words are present as recognition candidates, the word detection unit 62 outputs some recognition candidates from the first rank in descending order of similarity.

【００３０】なお、図３（ｅ）は単語４として、たとえ
ば「１０秒」を待つ検出部の出力を示すもので、図示し
た時間内には、入力音声中に「１０秒」という音声また
は、それに類似する特徴ベクトルを有する音声が存在し
ないので、殆ど立ち上がりが生じていない。FIG. 3 (e) shows the output of the detection unit waiting for, for example, "10 seconds" as the word 4, and within the time shown in the figure, the voice "10 seconds" or Since there is no voice having a feature vector similar to that, almost no rise occurs.

【００３１】音声認識処理部６３は、主に演算器（ＣＰ
Ｕ）と処理プログラムを記憶しているＲＯＭから構成さ
れ、単語検出部６２からの前記したような認識単語出力
（近似度）を基に、入力音声の意味を理解する。たとえ
ば、単語検出部６２からの図３（ｂ）〜（ｄ）に示すよ
うな検出データ（これをワードラティスといい、このワ
ードラティスは、登録単語名、近似度、単語の始点ｓと
終点ｅを示す信号などが含まれる）が入力されると、ま
ず、そのワードラティスを基に、入力音声の中のキーワ
ードとしての単語を１つまたは複数個決定する。この例
では、入力音声は「午前７時」であるので、「午前」と
「７時」が検出される。なお、この例では、説明を簡略
化するために、入力音声を「午前７時」としたが、実際
には、「午前７時２１分１０秒をお知らせします」とい
うような音声であり、この場合も、前記同様に、「午
前」、「７時」、「２１分」、「１０秒」がキーワード
となり、これらのキーワードから「午前７時２１分１０
秒をお知らせします」という連続的な入力音声の内容を
理解する。The voice recognition processing section 63 is mainly composed of an arithmetic unit (CP
U) and a ROM storing a processing program, and understands the meaning of the input voice based on the recognition word output (degree of approximation) from the word detection unit 62 as described above. For example, detection data as shown in FIGS. 3B to 3D from the word detection unit 62 (this is called word lattice, and this word lattice is the registered word name, the degree of approximation, the start point s and the end point e of the word). Is included), first, based on the word lattice, one or more words as keywords in the input voice are determined. In this example, since the input voice is "7:00 am", "am" and "7:00" are detected. In this example, in order to simplify the explanation, the input voice is set to "7:00 am", but in reality, it is a voice such as "I will notify you at 7:21:10". In this case as well, as in the above, “am”, “7:00”, “21 minutes”, and “10 seconds” are the keywords, and “7:21:10 am” is selected from these keywords.
Understand what the continuous input voice says "I will tell you the seconds."

【００３２】音声認識技術として以上説明したようなＤ
ＲＮＮによる音声認識技術を用いることにより、従来か
ら用いられているＤＰマッチングなどの音声認識技術に
比べて突発的なノイズなどの影響を受けにくく、図２に
示すようなノイズｎの影響を受けることなく高精度な音
声認識を行うことができる。たとえば、ＤＰマッチング
における音声認識は、音声分析して得れらた特徴ベクト
ルと、標準音声特徴データ記憶部に記憶されている特徴
ベクトルとを比較する際、時系列的な順序に従って、各
要素の特徴ベクトル同志の対応付けを行って比較するた
め、途中にノイズによる特徴ベクトルが存在すると、各
要素毎の特徴ベクトルの対応が付けにくくなる。このた
め、ＤＰマッチングによる音声認識では、音声認識を行
うに際して、回線に乗るノイズや発信音を音声から分離
して、ノイズや発信音の影響をできるだけ少なくする必
要がある。As described above as the voice recognition technique, D
By using the voice recognition technology by RNN, it is less susceptible to sudden noise and the like as compared with the voice recognition technology such as DP matching that has been conventionally used, and is influenced by the noise n as shown in FIG. Without this, highly accurate voice recognition can be performed. For example, in the voice recognition in DP matching, when comparing the feature vector obtained by the voice analysis with the feature vector stored in the standard voice feature data storage unit, each feature element Since feature vectors are associated with each other and compared, if feature vectors due to noise exist in the middle, it becomes difficult to associate feature vectors for each element. For this reason, in voice recognition by DP matching, it is necessary to separate noise and dial tone on the line from the voice when voice recognition is performed, and to minimize the influence of noise and dial tone.

【００３３】これに対して、本発明で用いるＤＲＮＮに
よる音声認識技術は、たとえば、「午前」という信号に
ノイズが存在していても、その音声に対する全体的な近
似度には大きな影響を与えることがない。したがって、
回線中に突発的に乗るノイズ、あるいは、音声以外の発
信音の影響を受けにくく、音声から発信音を分離する必
要も無くなる。なお、このＤＲＮＮによる音声認識技術
が突発的なノイズの影響を受けにくいということは、本
出願人が既に特許出願した前記特開平４−１６１０７５
に詳細に記載されている。On the other hand, the DRNN speech recognition technique used in the present invention has a great influence on the overall degree of approximation to the speech even if there is noise in the signal "AM". There is no. Therefore,
It is unlikely to be affected by noise that suddenly gets on the line or a dial tone other than voice, and there is no need to separate the dial tone from the voice. It should be noted that the fact that the voice recognition technology by the DRNN is not easily affected by sudden noise means that the present applicant has already applied for a patent.
In more detail.

【００３４】図４はビデオデッキに付属しているリモコ
ンの外観表面部を示す図であり、このリモコンにより、
ビデオデッキの持つ機能を殆どを実行することができる
ようになっている。また、このリモコン内には通常のリ
モコンが有する回路部の他に、図１で示した現在時刻合
わせを行うための音声認識による時刻合わせ処理部が内
蔵されている。FIG. 4 is a view showing the outer surface of the remote controller attached to the VCR.
Most of the functions of the VCR can be executed. In addition to the circuit section of the normal remote controller, the remote controller has a built-in time adjustment processing section based on voice recognition for adjusting the current time shown in FIG.

【００３５】このリモコンの外観表面部には、通常のリ
モコンと同様、電源ボタン３１、現在時刻設定や番組予
約設定などの種々の設定を行うためのメニューボタン３
２、ビデオ／テレビ切替ボタン３３、「０」〜「９」の
数字ボタン３４、録画、再生、巻き戻し、早送りなどの
ボタン３５などが設けられている。なお、これら以外に
もビデオデッキを操作するための種々のボタンも設けら
れるが、これらについては図示を省略する。そして、こ
れらのボタンの他に、前記時刻合わせ処理部における音
入力部（以下、マイクロホンという）１、時刻表示部７
（この時刻表示部７に表示される時刻については後述す
る）、さらに、このリモコンにセットされた時刻データ
をビデオデッキ本体に転送するための時刻データ転送ボ
タン３６が設けられている。On the outer surface of the remote controller, the power button 31, the menu button 3 for performing various settings such as the current time setting and the program reservation setting, as in a normal remote controller.
2, a video / television switch button 33, number buttons 34 of "0" to "9", a button 35 for recording, reproducing, rewinding, fast-forwarding, etc. are provided. In addition to these buttons, various buttons for operating the VCR are provided, but these are not shown. In addition to these buttons, a sound input unit (hereinafter referred to as a microphone) 1 and a time display unit 7 in the time adjustment processing unit.
Further, a time data transfer button 36 for transferring the time data set on the remote controller to the VCR main body (the time displayed on the time display unit 7 will be described later) is provided.

【００３６】このような構成において、現在時刻設定を
行う動作について具体例を用いて説明する。The operation for setting the current time in such a configuration will be described using a specific example.

【００３７】まず、現在時刻が午前７時２０分５０秒少
し前であるとする。この時点で、ユーザが「１１７番」
に電話をかける。そして、図４で示したリモコンのマイ
クロホン１を電話機（この電話機はコードレスホン、そ
の親機、あるいは携帯電話などどれでもよい）の受話器
に近づけて、受話器から発せられる現在時刻案内を聞か
せる。First, it is assumed that the current time is slightly before 7:20:50 am. At this point, the user is “117”
Call. Then, the microphone 1 of the remote controller shown in FIG. 4 is brought close to the handset of the telephone (this handset may be a cordless phone, its parent device, or a mobile phone) to hear the current time guidance emitted from the handset.

【００３８】今、入力開始時刻が、午前７時２１分ちょ
うどを知らせる「ポーン」が発せられる少し前であると
する。この入力開始後の時刻案内は、午前７時２１分ち
ょうどを知らせる「ポーン」のあとに続いて、「午前７
時２１分１０秒をお知らせします。・・・ポーン・・
・」、その次に、「午前７時２１分２０秒をお知らせし
ます。・・・ポーン・・・」、さらに、「午前７時２１
分３０秒をお知らせします。・・・ピッピッピポーン・
・・」というように、１０秒間隔で時刻案内される。な
お、この時刻案内は、前記したように、１秒毎に秒を刻
む発信音ｐ１、１０秒ごとの時刻を知らせる発信音ｐ
２、たとえば「ただいまより、午前７時２０分５０秒を
お知らせします。」というよな人間のアナウンスする音
声信号ｖ１などにより構成され、その他に回線に乗るノ
イズｎも含まれる。なお、３０秒毎および６０秒（何時
何分ちょうど）には、「ポーン」の前に「ピッピッピ」
という発信音ｐ３が必ず入る。It is now assumed that the input start time is a short time before the "pawn" for notifying just 7:21 am is issued. The time information after the start of this input is "pawn" that informs you at 7:21 am, and then "7 am
We will inform you of the time of 21 minutes and 10 seconds. ···pawn··
・、 Next, "I will notify you at 7:21:20 am ... Pawn ...", and also "7:21 am
I will inform you about 30 minutes. ... Pippipippon
・・ ”Is displayed at 10-second intervals. In addition, as described above, this time guidance includes a beep sound p1 for notifying the second every 1 second and a beep sound p for notifying the time every 10 seconds.
2. For example, it is composed of a human-announced voice signal v1 such as "I will inform you of 7:20:50 am from now," and the noise n on the line is also included. Every 30 seconds and every 60 seconds (just what time and minute), "pippippi" before the "pawn"
The dial tone p3 is always input.

【００３９】ここで、図２（ｂ）で示した時刻案内信号
（この時刻案内信号は、「午前７時２１分１０秒をお知
らせします。・・・ポーン・・・」の信号）がリモコン
のマイクロホン１に入力されると、図５のフローチャー
トで示される処理が行われる。図５において、まず、マ
イクロホン１から取り込まれた信号のレベルが規定値以
上あるか否かの判定を行い（ステップｓ１）、規定値以
上であれば、入力信号の特徴分析を行う（ステップｓ
２）。次に、発信音が存在するか否かの判定を行い（ス
テップｓ３）、発信音が存在すれば、発信音の検出及び
そのタイミングを検出する（ステップｓ４）。Here, the time guidance signal shown in FIG. 2 (b) (this time guidance signal indicates "7:21:10 am ... Pawn ...") is the remote control. When input to the microphone 1, the processing shown in the flowchart of FIG. 5 is performed. In FIG. 5, first, it is determined whether or not the level of the signal taken in from the microphone 1 is equal to or higher than a specified value (step s1), and if it is equal to or higher than the specified value, the characteristic analysis of the input signal is performed (step s).
2). Next, it is determined whether or not a dial tone is present (step s3). If a dial tone is present, detection of the dial tone and its timing are detected (step s4).

【００４０】なお、この発明では、音声認識技術とし
て、ＤＲＮＮによる音声認識を用いているので、音声認
識を行うに際して、音声位置を特定する必要もなく、ま
た、前記したように、音声の特徴ベクトルから発信音の
特徴ベクトルを分離する必要もない。つまり、この発信
音の特定は、時刻を設定する場合のタイミングとしての
発信音が必要なだけであるため、必ずしもｐ１，ｐ２，
ｐ３の全ての発信音を検出する必要がなく、時刻を知ら
せる発信音ｐ２を検出してそのタイミングがわかればよ
い。In the present invention, since the voice recognition by DRNN is used as the voice recognition technique, it is not necessary to specify the voice position when performing the voice recognition, and as described above, the voice feature vector is used. It is not necessary to separate the feature vector of the dial tone from. In other words, since the identification of this dial tone only requires the dial tone as the timing when setting the time, p1, p2, and
It is not necessary to detect all the beep sounds of p3, and it is sufficient to detect the beep sound p2 for notifying the time and know its timing.

【００４１】また、音声位置を特定する必要がないの
は、ＤＲＮＮによる音声認識が前記したように、入力信
号の中に認識候補単語（この場合、時刻を特定する単
語）が存在すると、それを検出する信号が出力される方
式であるため、前の発信音ｐ２と次の発信音ｐ２との間
の音声を入力し、その音声中に認識候補単語が存在すれ
ば、位置を特定しなくても、それに対応する認識出力を
取り出すことができるからである。Further, it is not necessary to specify the voice position because the recognition candidate word (in this case, the word specifying the time) exists in the input signal as described in the voice recognition by the DRNN. Since the detection signal is output, the voice between the previous dial tone p2 and the next dial tone p2 is input, and if the recognition candidate word exists in the voice, the position is not specified. Also, the recognition output corresponding to it can be taken out.

【００４２】したがって、ステップｓ４における発信音
検出およびそのタイミング検出処理は、発信音ｐ２を検
出してそのタイミングを知ればよい。この発信音ｐ２の
検出は、入力信号分析部３にて分析された特徴ベクトル
を基に容易に特定することができ、図２（ｃ）に示すよ
うに、そのタイミングを検出することができる。このよ
うに、発信音ｐ２が検出され、そのタイミングが検出さ
れると、次に、音声認識処理を行う（ステップｓ５）。Therefore, in the dial tone detection and the timing detection processing thereof in step s4, it is sufficient to detect the dial tone p2 and know its timing. The detection of the dial tone p2 can be easily specified based on the feature vector analyzed by the input signal analysis unit 3, and its timing can be detected as shown in FIG. 2 (c). In this way, when the dial tone p2 is detected and the timing thereof is detected, a voice recognition process is then performed (step s5).

【００４３】今、ここで、「午前７時２１分１０秒をお
知らせします。・・・ポーン・・・」という音声がマイ
クロホン１を通して入力され、その認識候補として、図
６に示すような認識候補が出力されたとする。図６
（ａ）は、入力音声に含まれる「午前」、「７時」、
「２１分」、「１０秒」というそれぞれの単語に対し
て、「午前」に対する第１位の認識候補として「午前」
が選択され、「７時」に対する第１位の認識候補として
「７時」が選択され、「２１分」に対する第１位の認識
候補として「１１分」が選択され、「１０秒」に対する
第１位の認識候補として「１０秒」が選択され、また、
第２位の認識候補としては、「午前」に対しては「午
後」が選択され、「７時」に対しては「８時」が選択さ
れ、「２１分」に対しては「２１分」が選択され、「１
０秒」に対しては「２０秒」が選択され、さらに、時
間、分、秒に対し、第３位の認識候補として、「７時」
に対しては「１時」が選択され、「２１分」に対しては
「２０分」が選択され、「１０秒」に対しては「３０
秒」が選択された例である。At this point, the voice "I will notify you at 7:21:10 am ... Pawn ..." is input through the microphone 1 and the recognition candidates shown in FIG. It is assumed that candidates are output. FIG.
(A) shows "am", "7:00" included in the input voice,
For each word "21 minutes" and "10 seconds", "am" is the first recognition candidate for "am"
Is selected, “7 o'clock” is selected as the first recognition candidate for “7 o'clock”, “11 minutes” is selected as the first recognition candidate for “21 minutes”, and the first recognition candidate for “10 seconds” is selected. "10 seconds" was selected as the first recognition candidate, and
As the second recognition candidate, “afternoon” is selected for “am”, “8:00” is selected for “7:00”, and “21 minutes” is selected for “21 minutes”. Is selected and “1
"20 seconds" is selected for "0 seconds", and "7 o'clock" is selected as the third recognition candidate for hours, minutes, and seconds.
"1 o'clock" is selected for, "20 minutes" is selected for "21 minutes", and "30 minutes" is selected for "10 seconds".
In this example, "second" is selected.

【００４４】このようにして、それぞれの単語に対し
て、それぞれの単語毎に、１位から順に幾つかの認識候
補単語が選択される。この認識候補の出力は、前記単語
検出部５２により得られる確からしさを表す数値（近似
度）をもとに出力されるもので、たとえば、「７時」と
いう入力に対して、単語２（７時）が近似度0.8で第１
位の認識候補として選択され、単語３（８時）が近似度
0.6で第２位の認識候補として選択されるというよう
に、近似度に基づいて認識候補が選択される。In this way, for each word, several recognition candidate words are selected in order from the first rank for each word. The output of the recognition candidate is based on the numerical value (approximation degree) representing the certainty obtained by the word detection unit 52. For example, the word 2 (7 1) with an approximation of 0.8
Word 3 (8 o'clock) selected as a candidate for rank
The recognition candidate is selected based on the degree of approximation, such that 0.6 is selected as the second recognition candidate.

【００４５】そして、第１位の認識候補単語で構成され
る内容「午後７時１１分１０秒」がリモコンの時刻表示
部６に表示される（ステップｓ６）。なお、この時刻表
示部６の表示は、「午後７時１１分１０秒」が表示され
たのち直ちに、１秒刻みの時刻が刻々と表示される。ユ
ーザは、受話器から発せられる刻々と変化する時刻案内
を聞きながら、リモコンの時刻表示部６の表示内容を見
て、その表示内容が受話器から発せられる内容に一致し
ているか否かを判断し（ステップｓ７）、一致していれ
ば、正しく認識されたとして、その時点で音声認識動作
を終了させ、そのときの認識結果をもとにリモコン内の
時計の時刻設定処理を行う（ステップｓ８）。Then, the content "7:11:10" composed of the first candidate word for recognition is displayed on the time display section 6 of the remote controller (step s6). In addition, the time display unit 6 displays "7:11:10 pm", and then immediately displays the time in one second increments. While listening to the ever-changing time guidance emitted from the handset, the user looks at the display content of the time display section 6 of the remote controller and determines whether or not the displayed content matches the content emitted from the handset ( In step s7), if they match, it is determined that the voice has been correctly recognized, the voice recognition operation is terminated at that time, and the time setting process of the clock in the remote controller is performed based on the recognition result at that time (step s8).

【００４６】しかし、この例では、受話器からの内容
は、「午前７時２１分１０秒」から始まる時刻案内であ
り、「午前７時２１分１０秒」に対する第１位の認識結
果は「午前７時１１分１０秒」であって、時刻表示部７
の表示内容は「午前７時１１分１０秒」から１秒刻み
で、「午前７時１１分１１秒、１２秒、１３秒、・・
・」といういような表示であるので、時刻表示部６に刻
々と表示される時刻と、受話器から発せられる時刻案内
とは内容が一致しない（この場合、分単位の時刻が異な
って認識されている）。このような場合には、一致する
まで、認識動作を続ける。However, in this example, the content from the handset is the time guidance starting from "7:21:10", and the first recognition result for "7:21:10" is "AM". 7:11:10 and the time display unit 7
The display content of "is 7:11:10," in 1 second increments, "7:11:11, 12 seconds, 13 seconds, ...
. ”, The contents of the time displayed on the time display unit 6 and the time guidance emitted from the handset do not match (in this case, the time in minutes is recognized differently). Exist). In such a case, the recognition operation is continued until they match.

【００４７】図６（ｂ）は引き続き認識動作を行って得
られた認識結果である。つまり、「午前７時２１分１０
秒をお知らせします。・・・ポーン・・・」の後の、
「午前７時２１分２０秒をお知らせします。・・・ポー
ン・・・」という音声がマイクロホン１を通して入力さ
れ、その認識候補として、図６（ｂ）に示すような認識
候補が出力されたとする。この場合、「午前」、「７
時」、「２１分」、「２０秒」というそれぞれの単語に
対して、「午前」に対する第１位の認識候補として「午
前」が選択され、「７時」に対する第１位の認識候補と
して「７時」が選択され、「２１分」に対する第１位の
認識候補として「２１分」が選択され、「２０秒」に対
する第１位の認識候補として「２０秒」が選択されたこ
とになる。また、第２位の認識候補としては、「午前」
に対しては「午後」が選択され、「７時」に対しては
「８時」が選択され、「２１分」に対しては「１１分」
が選択され、「２０秒」に対しては「１０秒」が選択さ
れたことになる。このようにして、引き続き行われた認
識動作（第２回目の認識処理動作という）においても、
それぞれの単語に対して、それぞれの単語毎に、１位か
ら順に幾つかの認識候補単語が選択される。FIG. 6B shows the recognition result obtained by continuing the recognition operation. In other words, “7:21 am 10
I will inform you of the second. After "Pawn ...",
It is said that the sound "I will notify you at 7:21:20 am ... Pawn ..." was input through the microphone 1 and the recognition candidates as shown in FIG. 6 (b) were output as the recognition candidates. To do. In this case, "AM", "7"
For each of the words “hour”, “21 minutes”, and “20 seconds”, “am” is selected as the first recognition candidate for “am” and as the first recognition candidate for “7 o'clock”. "7 o'clock" was selected, "21 minutes" was selected as the first recognition candidate for "21 minutes", and "20 seconds" was selected as the first recognition candidate for "20 seconds". Become. In addition, "am" is the second candidate for recognition.
"Pm" is selected for, "8:00" is selected for "7:00", and "11 minutes" is selected for "21 minutes"
Is selected, and “10 seconds” is selected for “20 seconds”. In this way, even in the recognition operation performed continuously (called the second recognition processing operation),
For each word, several recognition candidate words are selected in order from the first rank for each word.

【００４８】そして、この第２回目の認識動作において
も、第１位の認識候補単語で構成される内容「午前７時
２１分２０秒」がリモコンの時刻表示部６に表示された
のち、直ちに、１秒刻みの時刻が刻々と表示される。ユ
ーザは、受話器から発せられる刻々と変化する時刻案内
を聞きながら、リモコンの時刻表示部６の表示内容を見
て、その表示内容が受話器から発せられる内容に一致し
ているか否かを判断する。この場合、受話器からの内容
は、「午前７時２１分２０秒」からの時刻案内であり、
これに対する第１位の認識結果は「午前７時２１分２０
秒」であって、時刻表示部６の表示内容は「午前７時２
１分２０秒」から１秒刻みで「「午前７時２１分２１
秒、２２秒、２３秒、・・・」といういような表示であ
るので、時刻表示部６に刻々と表示される時刻と、受話
器から発せられる時刻案内と一致し、正しく認識された
として、その時点で音声認識動作を終了する。Also in the second recognition operation, immediately after the content "7:21:20 am" composed of the first-ranked recognition candidate word is displayed on the time display section 6 of the remote controller. The time is displayed every second for every second. While listening to the ever-changing time guide emitted from the handset, the user looks at the display content of the time display unit 6 of the remote controller and determines whether or not the display content matches the content emitted from the handset. In this case, the content from the handset is the time guidance from "7:21:20 am",
The recognition result of the first place for this is "7:21:20 am
Seconds ”and the display content of the time display section 6 is“ 7: 2 am
From "1 minute 20 seconds" in 1 second increments, "" 7:21:21 am
Seconds, 22 seconds, 23 seconds, etc. ”are displayed. Therefore, it is assumed that the time displayed on the time display unit 6 is consistent with the time guidance issued from the handset and that the time is recognized correctly. At that point, the voice recognition operation ends.

【００４９】これにより、リモコン内の時計は現在時刻
に正しく時刻合わせされた状態となり、以降、正しい時
刻で計時動作する。なお、前記音声認識結果により、リ
モコン内の時計の時刻合わせを行う処理は、時刻設定手
段７により行われる。この時刻設定処理は、音声認識の
結果を基に、それぞれの単語の認識結果に対応した時刻
データをメモリから読み出し、その時刻を知らせる発信
音ｐ２、つまり、「ただいまより、何時何分何秒をお知
らせします。・・・ポーン・・・」の「ポーン」のタイ
ミングに同期して時刻合わせされる。As a result, the clock in the remote controller is brought into a state where the time is correctly set to the current time, and thereafter, the clock operation is performed at the correct time. The time setting means 7 performs the process of adjusting the time of the clock in the remote controller based on the result of the voice recognition. This time setting process reads the time data corresponding to the recognition result of each word from the memory based on the result of the voice recognition, and gives a beep sound p2 for notifying the time, that is, "what time, what minute, what second" I will inform you ... The time is adjusted in synchronization with the "pawn" of "pawn ...".

【００５０】このようにしてリモコンの時刻合わせがな
されると、ユーザは、リモコンに設けられた時刻データ
転送ボタン３６を押して、ビデオデッキ本体側に時刻デ
ータを転送する。これにより、ビデオデッキの時計の時
刻を正しい現在時刻に合わせることができる。When the time of the remote controller is adjusted in this manner, the user pushes the time data transfer button 36 provided on the remote controller to transfer the time data to the VCR main body side. This makes it possible to set the clock time of the VCR to the correct current time.

【００５１】以上説明したように、この第１の実施の形
態では、電話の受話器から発せられる時刻案内をリモコ
ンのマイクロホン１を通して入力し、音声認識技術を用
いて、時刻案内のアナウンスを正しく認識するまで音声
認識動作を行い、正しく音声認識されたことをユーザが
判断すると、音声認識処理を終了させる。これにより、
リモコン内の時計はその正しく認識された時刻から刻々
と時刻を刻み始める。As described above, in the first embodiment, the time guidance emitted from the telephone handset is input through the microphone 1 of the remote controller, and the voice recognition technology is used to correctly recognize the time guidance announcement. The voice recognition operation is performed up to, and when the user determines that the voice is correctly recognized, the voice recognition process is ended. This allows
The clock in the remote control starts ticking from the time it is correctly recognized.

【００５２】したがって、ユーザは、時刻合わせのため
に従来行っていたような面倒な手順を踏まずに、「１１
７番」に電話をかけて、受話器から発せられる時刻案内
をリモコンに入力させ、音声認識が正しく行われたか否
かを判断するだけで、自動的に秒単位の正確な現在時刻
をセットできる。また、音声認識技術としてＤＲＮＮに
よる音声認識手段を用いているので、回線にノイズが存
在してもそのノイズによって単語認識出力に大きな影響
を与えることなく、高精度な音声認識を行うことができ
る。Therefore, the user does not have to go through the troublesome procedure for adjusting the time, which is conventionally done, and the user can select "11.
You can automatically set the correct current time in seconds by simply dialing "7" and letting the remote control enter the time guidance emitted from the handset and determining whether or not voice recognition was performed correctly. Further, since the voice recognition means based on DRNN is used as the voice recognition technology, even if there is noise on the line, the noise can be highly accurately recognized without significantly affecting the word recognition output.

【００５３】（第２の実施の形態）以上説明した第１の
実施の形態では、音声認識が正しくおこなれたか否かの
判断をユーザが行って、正しく認識された時点で、それ
以降の認識処理を行わないようにしたが、これに限ら
ず、予め設定された回数だけ自動的に認識処理を行い、
それぞれの認識結果を基に最も確からしい認識結果を得
て、時刻を決定するようにしてもよい。(Second Embodiment) In the first embodiment described above, the user makes a judgment as to whether or not the voice recognition is correctly performed, and when the user is correctly recognized, the subsequent recognition is performed. Although the processing is not performed, the recognition processing is not limited to this, and the recognition processing is automatically performed a preset number of times.
The time may be determined by obtaining the most probable recognition result based on each recognition result.

【００５４】図７はこの処理を説明するフローチャート
であり、以下、図７を参照しながら簡単に説明する。な
お、この第２の実施の形態において使用する時刻合わせ
処理部の構成などは第１の実施の形態と同じであるもの
とする（図１参照）。また、この第２の実施の形態にお
いても第１の実施の形態同様、最初に入力される時刻案
内は、「午前７時２１分１０秒をお知らせします。・・
・ポーン・・・」の信号であるとする。FIG. 7 is a flow chart for explaining this processing, which will be briefly described below with reference to FIG. Note that the configuration of the time adjustment processing unit used in this second embodiment is the same as that in the first embodiment (see FIG. 1). Also, in the second embodiment, as in the first embodiment, the first time information entered is "7:21:10 am.-
・ Pawn ... ”

【００５５】ここでは、認識処理を行う回数を６回に設
定した場合について説明する。まず、第１回目の認識処
理（認識回数をｉとし、ｉ＝１とする）を行うが、この
認識処理は、その時点でリモコンのマイクロホン１に入
力される「午前７時２２分１０秒をお知らせします。・
・・ポーン・・・」に対しての処理である。まず、前記
第１の実施の形態同様、まず、マイクロホン１から取り
込まれた信号のレベルが規定値以上あるか否かの判定を
行い（ステップｓ１１）、規定値以上であれば、入力信
号の特徴分析を行う（ステップｓ１２）。次に、発信音
が存在するか否かの判定を行い（ステップｓ１３）、発
信音が存在すれば、発信音の検出及びそのタイミングを
検出する（ステップｓ１４）。なお、この場合、第１の
実施の形態同様、必ずしもｐ１，ｐ２，ｐ３の全ての発
信音を検出する必要がなく、時刻を知らせる発信音ｐ２
を検出してそのタイミングがわかればよい。このよう
に、発信音ｐ２が検出され、そのタイミングが検出され
ると、次に、音声認識処理を行う（ステップｓ１５）。Here, a case where the number of times of recognition processing is set to 6 will be described. First, the first recognition process (the number of times of recognition is i and i = 1) is performed. This recognition process is performed at the time when “7:22:10 am It will be announced.·
··· Pawns ... ” First, as in the first embodiment, first, it is determined whether or not the level of the signal taken in from the microphone 1 is equal to or higher than a specified value (step s11). Analysis is performed (step s12). Next, it is determined whether a dial tone is present (step s13), and if a dial tone is present, detection of the dial tone and its timing are detected (step s14). In this case, as in the first embodiment, it is not always necessary to detect all the dial tones p1, p2, and p3, and the dial tone p2 notifying the time is set.
Should be detected and the timing should be known. In this way, when the dial tone p2 is detected and the timing thereof is detected, the voice recognition process is then performed (step s15).

【００５６】この音声認識処理は、前記第１の実施の形
態同様、ＤＲＮＮによる音声認識技術を用いた音声認識
を行い、認識候補を出力する。In this voice recognition process, as in the first embodiment, voice recognition is performed using the voice recognition technique by DRNN, and recognition candidates are output.

【００５７】この第１回目の認識処理は、「午前７時２
１分１０秒をお知らせします。・・・ポーン・・・」と
いう時刻案内に対してであり、それぞれの単語に対し
て、幾つかの認識候補文字が上位から順に選択される。
すなわち、「午前」、「７時」、「２１分」、「１０
秒」という入力音声単語のそれぞれの特徴ベクトルと、
標準音声特徴データ記憶部に登録されている特徴ベクト
ルとを比較して、登録単語が入力音声の時間軸上のどの
部分にどの程度の確からしさで存在するかを、それぞれ
の登録単語毎に、確からしさを示す数値（近似度）で出
力する。This first recognition processing is performed at "7: 2 am
We will inform you about 1 minute and 10 seconds. ... Pawn ... ", and several recognition candidate characters are selected in order from the top for each word.
That is, "am", "7:00", "21 minutes", "10"
Each feature vector of the input speech word "second",
By comparing with the feature vector registered in the standard voice feature data storage unit, to which part and on what degree of certainty the registered word exists on the time axis of the input voice, for each registered word, Output with a numerical value (proximity) indicating the certainty.

【００５８】次に、ｉ＝ｉ＋１とし（ステップｓ１
６）、認識回数が設定値（ｉ＝６）に達したか否かを判
定し（ステップｓ１７）、設定値に達していなければ、
第２回目の認識処理に入るが、この第２回目の認識処理
は、第１回目の１０秒後の「午前７時２１分２０秒をお
知らせします。・・・ポーン・・・」という時刻案内に
対してである。Next, i = i + 1 is set (step s1
6), it is determined whether or not the number of times of recognition has reached a set value (i = 6) (step s17), and if it has not reached the set value,
The second recognition process is started, but this second recognition process is 10 seconds after the first recognition process, "7:21:20 am .... Pawn ..." For guidance.

【００５９】この認識処理も前記同様、ステップｓ１１
〜ｓ１５を行う。この第２回目の認識処理は、「午前７
時２２分２０秒をお知らせします。・・・ポーン・・
・」という時刻案内に対してであるため、「午前」、
「７時」、「２１分」、「２０秒」という入力音声単語
のそれぞれの特徴ベクトルと、標準音声特徴データ記憶
部に登録されている特徴ベクトルとを比較して、登録単
語が入力音声の時間軸上のどの部分にどの程度の確から
しさで存在するかを、それぞれの登録単語毎に、確から
しさを示す数値（近似度）で出力する。This recognition process is also the same as the above step s11.
~ S15 are performed. The second recognition process is “7 am
We will inform you of the time of 22 minutes and 20 seconds. ···pawn··
・ "Is for the time guidance, so" am ",
The feature vectors of the input voice words “7 o'clock”, “21 minutes”, and “20 seconds” are compared with the feature vectors registered in the standard voice feature data storage unit, and the registered words are the input voice words. For each registered word, which part on the time axis exists with what degree of certainty is output as a numerical value (approximation degree) indicating the degree of certainty.

【００６０】次に、ｉ＝ｉ＋１とし（ステップｓ１
６）、認識回数が設定値（ｉ＝６）に達したか否かを判
定し（ステップｓ１７）、設定値に達していなければ、
第３回目の認識処理に入るが、この第３回目の認識処理
は、第２回目の１０秒後の「午前７時２１分３０秒をお
知らせします。・・・ピッピッピポーン・・・」という
時刻案内に対してである。Next, i = i + 1 is set (step s1
6), it is determined whether or not the number of times of recognition has reached a set value (i = 6) (step s17), and if it has not reached the set value,
The third recognition process starts, but this third recognition process is called "7:21:30 am 10 seconds after the second time. Pippipippon ..." For time guidance.

【００６１】このようにして、ｉ＝６まで６回の認識処
理を行う。そして、６回の認識結果をもとに、最も近似
度の高い認識結果を得て、それをもとに現時点の時刻デ
ータを求めて、リモコンに内蔵された時計の時刻設定を
行う（ステップｓ１８）。In this way, recognition processing is performed 6 times until i = 6. Then, the recognition result with the highest degree of approximation is obtained based on the recognition results of six times, the current time data is obtained based on the recognition result, and the time of the clock built in the remote controller is set (step s18). ).

【００６２】この６回の認識結果をもとに、最も確から
しさの高い認識結果を得て、それをもとに現時点の時刻
データを求める処理について、以下に具体例を用いて説
明する。A process of obtaining the most probable recognition result based on the recognition results of six times and obtaining the current time data based on the recognition result will be described below using a specific example.

【００６３】ここでは、第１回目の認識処理を行ったと
きの実際の時刻（電話機から発せられる時刻）を、「午
前７時２１分１０秒」とする。したがって、第２回目の
認識処理を行ったときの実際の時刻はそれより１０秒後
であるから「午前７時２１分２０秒」であり、第３回目
の認識処理を行ったときの実際の時刻はさらに１０秒後
であるから「午前７時２１分３０秒」であり、第４回目
の認識処理はさらに１０秒後であるから実際の時刻は
「午前７時２１分４０秒」であり、第５回目の認識処理
はさらに１０秒後であるから実際の時刻は「午前７時２
１分５０秒」であり、第６回目の認識処理はさらに１０
秒後であるから実際の時刻は「午前７時２２分ちょう
ど」である。Here, it is assumed that the actual time when the first recognition process is performed (time emitted from the telephone) is "7:21:10". Therefore, the actual time when the second recognition process is performed is “10:21:20 am” since it is 10 seconds after that, and the actual time when the third recognition process is performed is Since the time is 10 seconds later, it is “7:21:30”, and the fourth recognition process is 10 seconds later, so the actual time is “7:21:40”. , The fifth recognition process is 10 seconds later, so the actual time is “7: 2 am
1 minute and 50 seconds ", and the sixth recognition process requires 10
Since it is a second later, the actual time is “just 7:22 am”.

【００６４】今、第１回目の認識処理において、それぞ
れの単語に対する第１位の認識候補が「午前」、「７
時」、「１１分」、「１０秒」であり、第２回目の認識
処理において、それぞれの単語に対する第１位の認識候
補が「午前」、「７時」、「２１分」、「１０秒」であ
り、第３回目の認識処理において、それぞれの単語に対
する第１位の認識候補が「午前」、「７時」、「２１
分」、「４０秒」であり、第４回目の認識処理におい
て、それぞれの単語に対する第１位の認識候補が「午
前」、「７時」、「２１分」、「４０秒」であり、第５
回目の認識処理において、それぞれの単語に対する第１
位の認識候補が「午前」、「７時」、「２１分」、「５
０秒」であり、第６回目の認識処理において、それぞれ
の単語に対する第１位の認識候補が「午前」、「７
時」、「２２分」、「ちょうど」であるとする。Now, in the first recognition process, the first recognition candidates for each word are "am" and "7".
“Hour”, “11 minutes”, “10 seconds”, and in the second recognition process, the first-ranked recognition candidates for each word are “AM”, “7:00”, “21 minutes”, “10”. Second ”, and in the third recognition process, the first-ranked recognition candidates for each word are“ AM ”,“ 7:00 ”, and“ 21 ”.
Minutes ”and“ 40 seconds ”, and in the fourth recognition process, the first-ranked recognition candidates for each word are“ am ”,“ 7:00 ”,“ 21 minutes ”, and“ 40 seconds ”, Fifth
In the recognition process of the first time, the first for each word
Recognition candidates for rank are “AM”, “7:00”, “21 minutes”, “5”
0 second ", and in the sixth recognition process, the first-ranked recognition candidates for each word are" am "and" 7 ".
It is assumed that they are "hour", "22 minutes", and "just".

【００６５】このような認識結果に対して、今、秒単位
の時刻の特定について図８を参照して説明する。なお、
ここで、電話機から発せられる秒単位の時刻が１０秒ご
とであるので、機器（この場合ビデオデッキ）側の取り
うる秒単位の現在時刻は、（１）何時何分１０秒から始まって、以下、２０秒、３
０秒、・・・というよな時刻変化。With respect to such a recognition result, the specification of the time in seconds will now be described with reference to FIG. In addition,
Here, since the time in seconds sent from the telephone is every 10 seconds, the current time in seconds that can be taken by the device (in this case, the VCR) is (1) What time, what minutes and 10 seconds, and then , 20 seconds, 3
Time changes such as 0 seconds.

【００６６】（２）何時何分２０秒から始まって、以
下、３０秒、４０秒、・・・というような時刻変化。(2) Time changes such as what hour, minute and 20 seconds, and then 30 seconds, 40 seconds, and so on.

【００６７】（３）何時何分３０秒から始まって、以
下、４０秒、５０秒、・・・というような時刻変化。(3) Time change starting from what hour and minute and 30 seconds, and then 40 seconds, 50 seconds, and so on.

【００６８】（４）何時何分４０秒から始まって、以
下、５０秒、６０秒（ちょうど）、・・・というような
時刻変化。(4) Time change such as starting at what hour and minute and 40 seconds, and then 50 seconds, 60 seconds (just), and so on.

【００６９】（５）何時何分５０秒から始まって、以
下、６０秒（ちょうど）、１０秒、・・・というような
時刻変化。(5) Time change such as what hour, minute, and 50 seconds, and then 60 seconds (just), 10 seconds, and so on.

【００７０】（６）何時何分ちょうどから始まって、以
下、１０秒、２０秒、・・・というような時刻変化。(6) A time change such as 10 hours, 20 seconds, ...

【００７１】の６通りが考えられ、このうちのどれかを
現在時刻の秒単位として設定することになる。There are six possible ways, and any one of them can be set as the second unit of the current time.

【００７２】図８に示したように、第１回目の認識処理
で得られた秒単位における第１位の認識候補は「１０
秒」であり、第２回目の認識処理で得られた秒単位にお
ける第１位の認識候補は「１０秒」であり、第３回目の
認識処理で得られた秒単位における第１位の認識候補は
「４０秒」であり、第４回目の認識処理で得られた秒単
位における第１位の認識候補は「４０秒」であり、第５
回目の認識処理で得られた秒単位における第１位の認識
候補は「５０秒」であり、第６回目の認識処理で得られ
た秒単位における第１位の認識候補は「ちょうど」であ
る。As shown in FIG. 8, the first recognition candidate per second obtained in the first recognition process is "10".
Second ”, the first-ranked recognition candidate in the second unit obtained in the second recognition process is“ 10 seconds ”, and the first-ranked recognition candidate in the second unit obtained in the third recognition process. The candidate is “40 seconds”, the first recognition candidate in the second unit obtained in the fourth recognition process is “40 seconds”, and the fifth recognition candidate is “40 seconds”.
The first-ranked recognition candidate obtained in the second recognition process in seconds is “50 seconds”, and the first-ranked recognition candidate obtained in the sixth recognition process in seconds is “just”. .

【００７３】これら第１回目から第６回目までの認識処
理で得られたそれぞれの第１位の認識候補を、前記した
（１）〜（６）の時刻変化に対応させる。図８におい
て、第１回目から第６回目までの認識処理で得られたそ
れぞれ第１位の認識候補の時刻に一致する部分の時刻に
丸印を付してある。Each of the first-ranked recognition candidates obtained by the first to sixth recognition processes is made to correspond to the above-mentioned time changes of (1) to (6). In FIG. 8, the circles are attached to the times of the portions that match the times of the first-ranked recognition candidates obtained by the first to sixth recognition processes.

【００７４】このように、第１回目から第６回目までの
認識処理で得られたそれぞれの第１位の認識候補の時刻
（秒単位）と、実際の時刻変化（秒単位）との対応付け
を行い、その結果から現在時刻の秒単位がどのような状
態で変化しているかを判断する。この場合、（１）で示
した時刻変化に対して最も多く対応付けされているの
で、（１）の秒単位変化を現在時刻の秒単位の変化であ
ると決定する。つまり、現在時刻の秒単位は、第１回目
の認識処理の時点では、何時何分１０秒、第２回目の認
識処理の時点では、何時何分２０秒の、第３回目の認識
処理の時点では、何時何分３０秒、・・・、第６回目の
認識処理の時点では、何時何分ちょうどであると決定で
きる。As described above, the time (second unit) of each of the first-ranked recognition candidates obtained by the first to sixth recognition processing is associated with the actual time change (second unit). Then, from the result, it is determined how the second unit of the current time is changing. In this case, since the time change shown in (1) is most frequently associated, the change in seconds in (1) is determined to be the change in seconds in the current time. That is, the second unit of the current time is the time of the third recognition process, which is what hour and minute and 10 seconds at the time of the first recognition process, and what time and minute and 20 seconds at the time of the second recognition process. Then, it can be determined that what hour and minute and thirty seconds are ... And exactly what hour and minute at the time of the sixth recognition process.

【００７５】このようにして、秒単位の時刻が決定され
ると、次に、分単位の特定であるが、６回の認識処理の
うち、必ず分が変わることに着目し、かつ、分が変わる
場合、たとえば、７時５８分から７時５９分、７時５９
分から８時ちょうど、８時ちょうどから８時１分という
ように１分毎に連続的に変化することに着目して、現在
時刻における分を特定する。When the time in seconds is determined in this way, next, in the specification in minutes, attention is paid to the fact that the minutes always change among the six recognition processes. If it changes, for example, 7:58 to 7:59, 7:59
The minute at the current time is specified by paying attention to the fact that it continuously changes every minute, such as just from 8:00 to 8:00 and from 8:00 to 8:00.

【００７６】たとえば、第１回目の認識処理で得られた
分単位における第１位の認識候補が「１１分」であり、
第２回目の認識処理で得られた分単位における第１位の
認識候補が「２１分」であり、第３回目の認識処理で得
られた分単位における第１位の認識候補が「２０分」で
あり、第４回目の認識処理で得られた分単位における第
１位の認識候補が「２１分」であり、第５回目の認識処
理で得られた分単位における第１位の認識候補が「２１
分」であり、第６回目の認識処理で得られた分単位にお
ける第１位の認識候補が「２２分」であるとする。For example, the first-ranked recognition candidate per minute obtained in the first recognition processing is “11 minutes”,
The first-ranked recognition candidate in the minute unit obtained in the second recognition process is “21 minutes”, and the first-ranked recognition candidate in the minute unit obtained in the third recognition process is “20 minutes”. , And the first-ranked recognition candidate in the minute unit obtained in the fourth recognition process is “21 minutes”, and the first-ranked recognition candidate in the minute unit obtained in the fifth recognition process. Is "21
It is assumed that “22 minutes” is the first-ranked recognition candidate in the minute unit obtained by the sixth recognition process.

【００７７】これら第１回目から第６回目までの認識処
理で得られた分単位のそれぞれ第１位の認識候補をもと
に、現在時刻の分を特定する際、前記したように、６回
の認識処理のうち、必ず分が変わることと、１分毎に連
続的に変化することに着目して、第１回目から６回目ま
での認識候補を基に、現在時刻における分を特定する。
この場合、前記した第１回から第６回までの分単位の認
識結果の情報と、前記秒単位の時刻設定において第６回
目の認識処理で分が変わっているという情報とを基にし
て、第１回目〜第５回目までの認識処理時点における分
は「２１分」であり、第６回目の認識処理時点における
分は「２２分」であると決定する。When the minute of the current time is specified based on the first-ranked recognition candidates on the minute-by-minute basis obtained by the recognition processing from the first time to the sixth time, as described above, Focusing on the fact that the minute always changes in the recognition process and the continuous change every minute, the minute at the current time is specified based on the recognition candidates from the first time to the sixth time.
In this case, based on the information on the recognition result in units of minutes from the first to sixth times and the information that the minutes have changed in the sixth recognition process in the time setting in seconds, It is determined that the minute at the time of the first to fifth recognition processing is “21 minutes” and the minute at the time of the sixth recognition processing is “22 minutes”.

【００７８】このようにして、分単位の時刻が決定され
ると、次に、時間単位の特定であるが、時間は１時間と
いう長い時間毎の変化であるため、６回の認識処理で得
られた結果を総合することで判定を行う。When the time in minutes is determined in this way, the time unit is specified next. However, since the time is a long time change of one hour, it can be obtained by six recognition processes. Judgment is made by integrating the results obtained.

【００７９】たとえば、第１回目の認識処理で得られた
時間単位における第１位の認識候補が「７時」であり、
第２回目の認識処理で得られた時間単位における第１位
の認識候補が「８時」であり、第３回目の認識処理で得
られた時間単位における第１位の認識候補が「１時」で
あり、第４回目以降の認識処理で得られた時間単位にお
ける第１位の認識候補が全て「７時」であるとする。For example, the first recognition candidate in the time unit obtained in the first recognition process is “7:00”,
The first-ranked recognition candidate in the time unit obtained in the second recognition process is “8 o'clock”, and the first-ranked recognition candidate in the time unit obtained in the third recognition process is “1 o'clock”. It is assumed that all the first-ranked recognition candidates in the time unit obtained by the fourth and subsequent recognition processes are “7 o'clock”.

【００８０】これら第１回目から第６回目までの認識処
理における第１位の認識結果のうち、「７時」が４回、
「８時」と「１時」がそれぞれ１回あり、これらから、
現在時刻を「７時」と決定する。Of the first-ranked recognition results in the first to sixth recognition processing, "7:00" is four times,
There is once each "8 o'clock" and "1 o'clock".
The current time is determined to be “7:00”.

【００８１】以上の処理により、現在時刻は、第１回目
の認識処理時点においては「７時２１分１０秒」、第２
回目の認識処理時点においては「７時２１分２０秒」、
第３回目の認識処理時点においては「７時２１分３０
秒」、・・・、第６回目の認識処理時点においては「７
時２２分ちょうど」であると判定される。これに基づい
てリモコン内の時計が現在時刻に合わせられると、ユー
ザに対して時刻合わせを終了したことを知らせるガイド
音などを発する。これにより、ユーザはリモコンに設け
られた時刻データ転送ボタン３６を押して、ビデオデッ
キ本体側に時刻データを転送する。これによって、ビデ
オデッキの時計の時刻を正しい現在時刻に合わせること
ができる。As a result of the above processing, the current time is "7:21:10" at the time of the first recognition processing and the second time.
At the time of the second recognition processing, "7:21:20",
At the time of the third recognition processing, “7:21:30
Seconds, ..., At the time of the sixth recognition processing, “7
It is determined to be "just 22 minutes at the hour". Based on this, when the clock in the remote controller is set to the current time, a guide sound or the like is issued to inform the user that the time adjustment is completed. As a result, the user pushes the time data transfer button 36 provided on the remote controller to transfer the time data to the VCR main body side. This allows the clock on the VCR to be set to the correct current time.

【００８２】なお、この第２の実施の形態において、認
識処理を行う回数を６回としたのは、認識処理を行う回
数を多くすれば、より確かな認識処理が行えるが、時刻
合わせのために多くの時間を費やすのは使い勝手の面か
らも好ましくなく、ある程度制限がある。一方、回数を
極端に少なくすると、認識精度の点で問題があり、６回
程度が認識処理時間や認識精度を考慮して最適であるか
らである。また、６回の認識を行うことにより、必ず、
分が変わるため、認識精度を向上させるに有利なものと
するためである。ただし、これは６回に限られるもので
はないことは勿論である。In the second embodiment, the number of times the recognition processing is performed is 6 because the more reliable the recognition processing can be performed if the number of times the recognition processing is performed is increased. It is not preferable to spend a lot of time in terms of usability, and there are some limitations. On the other hand, if the number of times is extremely small, there is a problem in terms of recognition accuracy, and about 6 times is optimal in consideration of the recognition processing time and recognition accuracy. In addition, by performing recognition 6 times,
This is because the minute changes, which is advantageous for improving the recognition accuracy. However, it goes without saying that this is not limited to six times.

【００８３】以上のように、この第２の実施の形態で
は、受話器から発せられる時刻案内をリモコンのマイク
ロホンを通して入力させるだけで、あとは機器側で自動
的に、時刻案内のアナウンスを所定回数の音声認識処理
を行い、最も確からしい時刻を判断して、現時点の時刻
データを得るようにしている。したがって、ユーザは、
時刻合わせのために従来行っていたような面倒な手順を
踏まずに、「１１７番」に電話を掛けて、受話器から発
せられる時刻案内をリモコンに入力させるだけで、自動
的に秒単位の正確な現在時刻をセットできる。また、こ
の第２の実施の形態の前記第１の実施の形態同様、音声
認識技術としてＤＲＮＮによる音声認識手段を用いてい
るので、回線にノイズが存在してもそのノイズによって
単語認識出力に大きな影響を与えることなく、高精度な
音声認識を行うことができる。As described above, in the second embodiment, the time guidance emitted from the handset is input only through the microphone of the remote controller, and then the device side automatically announces the time guidance a predetermined number of times. Speech recognition processing is performed to determine the most probable time, and the current time data is obtained. Therefore, the user
Simply call "117" and let the remote control enter the time guidance sent from the handset without taking the troublesome procedure that was used to adjust the time. You can set the current time. Further, as in the first embodiment of the second embodiment, since the voice recognition means by DRNN is used as the voice recognition technology, even if there is noise in the line, the noise causes a large word recognition output. Highly accurate voice recognition can be performed without affecting.

【００８４】（第３の実施の形態）前記第２の実施の形
態では、認識処理を行う回数をたとえば６回というよう
に予め設定して、設定された回数の認識処理を行った時
点で現在時刻を判断するようにしたが、この第３の実施
の形態では、認識処理を行う回数を設定せず、十分な確
からしさが得られた時点で、認識処理を終了して、その
確からしさが十分と判定された時刻を基に現時点の時刻
データを得るようにしたものである。この処理を図９の
フローチャートに示す。(Third Embodiment) In the second embodiment, the number of times of recognition processing is set in advance to, for example, six times, and at the time when the recognition processing is performed the set number of times, Although the time is determined, in the third embodiment, the recognition process is terminated at the time when sufficient certainty is obtained without setting the number of times the recognition process is performed, and the certainty is determined. The present time data is obtained based on the time determined to be sufficient. This process is shown in the flowchart of FIG.

【００８５】以下、図９を参照しながら説明する。な
お、この第３の実施の形態において使用する時刻合わせ
処理部の構成などは第１の実施の形態と同じであるもの
とする（図１参照）。また、この第３の実施の形態にお
いても前記同様、最初に入力される時刻案内は、「午前
７時２１分１０秒をお知らせします。・・・ポーン・・
・」の信号であるとする。Hereinafter, description will be made with reference to FIG. The configuration of the time adjustment processing unit used in the third embodiment is the same as that in the first embodiment (see FIG. 1). Also in the third embodiment, as in the above case, the first time guidance entered is "I will notify you at 7:21:10 am ... Pawn ...
-"Signal.

【００８６】まず、第１回目の認識処理を行うが、この
認識処理は、その時点でリモコンのマイクロホン１に入
力される前記「午前７時２１分１０秒をお知らせしま
す。・・・ポーン・・・」に対しての処理である。ま
ず、前記第１の実施の形態同様、まず、マイクロホン１
から取り込まれた信号のレベルが規定値以上あるか否か
の判定を行い（ステップｓ２１）、規定値以上であれ
ば、入力信号の特徴分析を行う（ステップｓ２２）。次
に、発信音が存在するか否かの判定を行い（ステップｓ
２３）、発信音が存在すれば、発信音の検出及びそのタ
イミングを検出する（ステップｓ２４）。なお、この場
合、第１の実施の形態同様、必ずしもｐ１，ｐ２，ｐ３
の全ての発信音を検出する必要がなく、時刻を知らせる
発信音ｐ２を検出してそのタイミングがわかればよい。
このように、発信音ｐ２が検出され、そのタイミングが
検出されると、次に、音声認識処理を行う（ステップｓ
２５）。First, the first recognition process is performed. This recognition process notifies the above-mentioned “7:21:10, which is input to the microphone 1 of the remote controller at that time. .. ” First, similarly to the first embodiment, first, the microphone 1
It is determined whether or not the level of the signal taken in from is equal to or higher than a specified value (step s21), and if it is equal to or higher than the specified value, characteristic analysis of the input signal is performed (step s22). Next, it is determined whether or not a dial tone is present (step s
23) If there is a dial tone, detection of the dial tone and its timing are detected (step s24). Note that in this case, as in the first embodiment, p1, p2, p3
It is not necessary to detect all the beep sounds, and it is sufficient to detect the beep sound p2 for notifying the time and to know its timing.
In this way, when the dial tone p2 is detected and the timing thereof is detected, next, the voice recognition process is performed (step s
25).

【００８７】この音声認識処理は、前記同様、「午
前」、「７時」、「２１分」、「１０秒」という入力音
声単語のそれぞれの特徴ベクトルと、標準音声特徴デー
タ記憶部に登録されている特徴ベクトルとを比較して、
登録単語が入力音声の時間軸上のどの部分にどの程度の
確からしさで存在するかを、それぞれの登録単語毎に、
確からしさを示す数値（近似度）で出力する。This voice recognition processing is registered in the standard voice feature data storage unit as well as the feature vectors of the input voice words of “AM”, “7:00”, “21 minutes”, and “10 seconds”, as described above. Comparing with the feature vector
For each registered word, the degree to which the registered word exists on the time axis of the input voice with certainty,
Output with a numerical value (proximity) indicating the certainty.

【００８８】この第１回目の認識処理は、「午前７時２
１分１０秒をお知らせします。・・・ポーン・・・」と
いう時刻案内に対してであり、それぞれの単語に対して
幾つかの認識候補文字が上位から順に選択される。This first recognition process is performed at "7: 2 am
We will inform you about 1 minute and 10 seconds. ... Pawn ... ", and several recognition candidate characters are selected in order from the top for each word.

【００８９】次に得られた認識候補に対して、確からし
さを示す数値（近似度）が十分かどうかの判定を行う
（ステップｓ２６）。以下、この処理について説明す
る。Next, it is judged whether or not the numerical value (approximation degree) indicating the certainty is sufficient for the obtained recognition candidate (step s26). Hereinafter, this processing will be described.

【００９０】たとえば、「７時」という音声に対して、
第１位の認識候補として「７時」、第２位の認識候補と
して「８時」、第３位の認識候補として「１時」という
ような認識候補が選択されたとする。これら幾つかの認
識候補の選択は、入力音声単語のそれぞれの特徴ベクト
ルと、標準音声特徴データ記憶部に登録されている特徴
ベクトルとを比較して、登録単語が入力音声の時間軸上
のどの部分にどの程度の確からしさで存在するかを、そ
れぞれの登録単語毎に、確からしさを示す数値（近似
度）で出力するが、これらの認識候補の中からある１つ
の音声を認識音声として決定するには、次のような方法
を用いる。For example, for the voice "7 o'clock",
It is assumed that a recognition candidate such as “7:00” is selected as the first recognition candidate, “8 o'clock” is selected as the second recognition candidate, and “1 o'clock” is selected as the third recognition candidate. These several recognition candidates are selected by comparing each feature vector of the input voice word with the feature vector registered in the standard voice feature data storage unit to determine which of the registered words on the time axis of the input voice. For each registered word, the degree of certainty that exists in the portion is output as a numerical value (approximation degree) that indicates the certainty, but one of these recognition candidates is determined as the recognized voice. To do this, the following method is used.

【００９１】たとえば、その近似度に予めしきい値を設
定しておき、近似度がそのしきい値以上であれば、それ
を認識音声として決定する。具体的には、しきい値を0.
8と設定し、近似度が0.8以上であれば、それを認識音声
として決定する。一方、全ての認識候補にどれもしきい
値以上の近似度を有するものが存在しなければ、第２回
目、第３回目と認識処理を繰り返し、その間に、しきい
値以上となった認識候補があれば、それを認識音声と決
定する。また、ある回数、認識処理を行ってもしきい値
以上になる認識候補が存在しない場合には、安定的に大
きな近似度を有する認識候補を認識音声として決定す
る。For example, a threshold value is set in advance for the degree of approximation, and if the degree of approximation is greater than or equal to the threshold value, it is determined as the recognized voice. Specifically, set the threshold to 0.
If it is set to 8 and the degree of approximation is 0.8 or more, it is determined as the recognized voice. On the other hand, if none of the recognition candidates have the degree of approximation equal to or higher than the threshold value, the recognition process is repeated for the second time and the third time, and in the meantime, the recognition candidate whose value is equal to or higher than the threshold value is detected. If there is, it is determined as the recognized voice. Further, if there is no recognition candidate that is equal to or higher than the threshold value even after performing the recognition process a certain number of times, a recognition candidate having a stable large degree of approximation is determined as a recognition voice.

【００９２】このような確からしさが十分かどうかの判
断を各単語毎に行い、各単語の確からしさが十分と判断
されると、それをもとに現時点の時刻データを得て、リ
モコン内の時計の時刻設定を行う（ステップｓ２７）。
このようにして、リモコン内の時計が現在時刻に合わせ
られると、ユーザに対して時刻合わせを終了したことを
知らせるガイド音などを発する。これにより、ユーザは
リモコンに設けられた時刻データ転送ボタン３６を押し
て、ビデオデッキ本体側に時刻データを転送する。これ
によって、ビデオデッキの時計の時刻を正しい現在時刻
に合わせることができる。The determination as to whether or not such certainty is sufficient is made for each word, and when it is determined that the certainty of each word is sufficient, the current time data is obtained based on that and the remote controller The time of the clock is set (step s27).
In this way, when the clock in the remote controller is set to the current time, a guide sound or the like is issued to inform the user that the time adjustment has been completed. As a result, the user pushes the time data transfer button 36 provided on the remote controller to transfer the time data to the VCR main body side. This allows the clock on the VCR to be set to the correct current time.

【００９３】以上のように、この第３の実施の形態で
は、ユーザは受話器から発せられる時刻案内をリモコン
のマイクロホンを通して入力させるだけで、あとは機器
側で自動的に、時刻案内のアナウンスを音声認識処理
し、正しく認識されたと判断した時点で、認識処理を終
了し、現時点の時刻データを得るようにしている。した
がって、ユーザは、時刻合わせのために従来行っていた
ような面倒な手順を踏まずに、「１１７番」に電話を掛
けて、受話器から発せられる時刻案内をリモコンに入力
させるだけで、自動的に秒単位の正確な現在時刻をセッ
トできる。また、この第３の実施の形態も、前記第１、
第２の実施の形態同様、音声認識技術としてＤＲＮＮに
よる音声認識手段を用いているので、回線にノイズが存
在してもそのノイズによって単語認識出力に大きな影響
を与えることなく、高精度な音声認識を行うことができ
る。As described above, in the third embodiment, the user simply inputs the time guidance emitted from the handset through the microphone of the remote controller, and then the device automatically announces the time guidance. When the recognition processing is performed and it is determined that the recognition is correctly performed, the recognition processing is ended and the current time data is obtained. Therefore, the user does not have to go through the troublesome procedure used to adjust the time, simply dials “117” and inputs the time guidance from the handset into the remote control. You can set the exact current time in seconds to. In addition, in the third embodiment, the first,
As in the second embodiment, since the voice recognition means based on DRNN is used as the voice recognition technique, even if there is noise on the line, the noise does not have a great influence on the word recognition output, and highly accurate voice recognition is possible. It can be performed.

【００９４】なお、第１〜第３の実施の形態において
は、入力させる時刻案内は、電話から発せられるＮＴＴ
の時刻サービスを用いた例を示したが、これに限られる
ものではなく、短波ラジオで放送されているＮＴＴの時
刻サービスと同じ様な時刻放送でも同様に実施できる。
さらに、テレビやラジオの時報などの単発的な時刻案内
であっても実現可能である。In the first to third embodiments, the time guidance to be input is NTT sent from the telephone.
However, the present invention is not limited to this, and the same time broadcasting as the time service of NTT broadcast on shortwave radio can be performed in the same manner.
Furthermore, it is possible to realize even a single time guidance such as a TV or radio time signal.

【００９５】また、第１〜第３の実施の形態において
は、ビデオデッキの時刻合わせを行う例について説明し
たが、ビデオデッキに限られるものではなく、家電製品
や電子機器さらには留守番機能付き電話機など時計を内
蔵しその時計の時刻合わせをユーザ自ら行う必要のある
機器に広く適用可能となるものである。また、前記各実
施の形態では、「１１７番」にダイヤルして時刻案内を
得るようにしたが、フォンダイアラーの機能を組み込む
ことにより、より操作を簡略化することができる。すな
わち、リモコンに時報ダイアルボタンを設け、電話の受
話器を上げ、リモコンを電話機に近づけて、その時報ダ
イアルボタンを押すことにより、自動的に「１１７番」
に電話がつながり、これによって、前記したような音声
認識処理を行って時刻合わせを行うようにすることも可
能となる。Further, in the first to third embodiments, the example in which the time of the VCR is adjusted has been described, but the present invention is not limited to the VCR, and home electric appliances, electronic devices, and telephones with an answering machine function. It can be widely applied to devices in which a clock is built-in and the time of the clock needs to be adjusted by the user. Further, in each of the above-described embodiments, the "117" is dialed to obtain the time guidance, but the operation can be further simplified by incorporating the function of the phone dialer. In other words, by providing a time signal dial button on the remote control, raising the handset of the telephone, bringing the remote control closer to the telephone, and pressing the time signal dial button, the "117" automatically
It is also possible to connect to the telephone and to perform time recognition by performing the voice recognition processing as described above.

【００９６】また、前記したような各種の機器におい
て、リモコンによる時刻合わせ機能の無い機器において
は、機器本体に本発明の音声認識による時刻合わせ機能
を設ければよい。また、時計が内蔵された電話機（留守
番機能知己の電話機など）に本発明を組み込んだ場合
は、機械がたとえば一週間に一回、勝手に「１１７番」
に回線をつないで、自己の時計を自動的に現在時刻に合
わせるようにするということも可能であり、また、時計
を内蔵したラジオやテレビに本発明を組み込んだ場合
は、機械が勝手に時刻放送（たとえば午前７時、午後７
時、正午など）を音声認識して自己の時計を自動的に現
在時刻に合わせるようにするということも可能である。
このようにすれば、内蔵された時計は常に、正しい時刻
で作動することになる。In addition, among the above-mentioned various devices, if the device does not have the time adjusting function by the remote controller, the device itself may be provided with the time adjusting function by the voice recognition of the present invention. Also, when the present invention is incorporated in a telephone having a built-in clock (such as a telephone with an answering machine known to oneself), the machine automatically selects "117" once a week, for example.
It is also possible to connect the line to and automatically adjust the time of the user's own clock to the current time.In addition, if the present invention is incorporated into a radio or TV with a built-in clock, the machine will automatically set the time. Broadcast (eg 7am, 7pm)
It is also possible to automatically recognize your own clock by recognizing the time of day, noon, etc.).
In this way, the built-in timepiece will always operate at the correct time.

【００９７】なお、以上説明した本発明のそれぞれの処
理を行うプログラムはフロッピィディスクなどの記憶媒
体に記憶させておくことができ、本発明は、それらのプ
ログラムが記憶された記憶媒体をも含むものである。The programs for performing the respective processes of the present invention described above can be stored in a storage medium such as a floppy disk, and the present invention also includes a storage medium in which those programs are stored. .

【００９８】[0098]

【発明の効果】本発明によれば、時計を内蔵した機器に
おける時計の現在時刻合わせ方法において、音声を含む
時刻案内を入力し、この時刻案内における現在時刻を予
告する音声部分を音声認識して、その認識結果をもとに
内蔵された時計の時刻を、時刻案内で案内されるその時
点の時刻に所定のタイミングにて設定するようにしたの
で、ユーザは、時刻合わせのために従来行っていたよう
な面倒な手順を踏まずに、音声および発信音から構成さ
れる時刻案内を入力させるだけで、自動的に正確な現在
時刻をセットできる。これにより、この種の機器の取り
扱いに不慣れなユーザでも簡単に時刻合わせを行うこと
ができる。特に、前記音声および発信音から構成される
時刻案内として、ＮＴＴの「１１７番」による時刻サー
ビスを利用すれば、任意の時刻に、「１１７番」に電話
をかけて、受話器から発せられる時刻案内を入力させる
だけで、自動的に秒単位の正確な現在時刻をセットでき
る。ちなみに、時計を内蔵する機器は、機器の購入時、
あるいは、停電復旧後、プラグが抜かれた場合など、そ
の都度、時刻合わせを行う必要があり、その手間はきわ
めて面倒で、しかも、正確な時刻の設定はさらに面倒で
あったが、本発明によれば、簡単な操作で、秒単位まで
正確な時刻合わせが可能となるため、時計の時刻合わせ
に対する抵抗感をなくすことができ、この種の機器の使
い勝手を大幅に向上させることができる。According to the present invention, in a method for adjusting the current time of a timepiece in a device having a built-in timepiece, a time guidance including voice is input, and a voice portion for warning the current time is recognized by voice recognition. Based on the recognition result, the time of the built-in clock is set at a predetermined timing to the time at which the time guidance is given, so the user has conventionally performed the time adjustment. You can set the correct current time automatically by simply inputting the time guide consisting of voice and dial tone without going through the troublesome procedure. As a result, even a user who is unfamiliar with handling this type of device can easily adjust the time. In particular, if the time service provided by NTT's "117" is used as the time guide composed of the voice and dial tone, the time guide issued from the handset by calling "117" at any time. You can set the correct current time in seconds automatically by simply inputting. By the way, the device with a built-in clock is
Alternatively, it is necessary to adjust the time each time, for example, when the plug is unplugged after the power failure is restored, which is very troublesome and the setting of the accurate time is even more troublesome. For example, since it is possible to set the time accurately in seconds by a simple operation, it is possible to eliminate the feeling of resistance to the time setting of the clock, and to greatly improve the usability of this type of device.

【００９９】また、本発明は、音声認識技術として本出
願人が開発したＤＲＮＮによる音声認識手段を用いてい
るので、回線にノイズが存在してもそのノイズによって
単語認識出力に大きな影響を与えることなく、高精度な
音声認識を行うことができる。また、このＤＲＮＮによ
る音声認識技術は、不特定話者音声認識を可能としてい
るので、時刻を知らせる情報源が特に限定されず、幅広
い様々な情報源を使用することができる。さらに、この
ＤＲＮＮによる音声認識技術は、不特定話者音声認識を
可能とした音声認識技術としては、きわめて安価にて実
現可能としているので、これを組み込んだ製品のコスト
に大きな影響を与えることなく、日常的に使用される機
器に幅広く適用することができる。Further, since the present invention uses the DRNN voice recognition means developed by the present applicant as a voice recognition technique, even if there is noise on the line, the noise has a great influence on the word recognition output. Without this, highly accurate voice recognition can be performed. Further, since the voice recognition technology by the DRNN enables the voice recognition by the unspecified speaker, the information source for notifying the time is not particularly limited, and a wide variety of information sources can be used. Furthermore, the voice recognition technology by the DRNN can be realized at an extremely low cost as a voice recognition technology that enables unspecified speaker voice recognition, so that the cost of a product incorporating the voice recognition technology is not significantly affected. , Can be widely applied to the equipment used on a daily basis.

[Brief description of drawings]

【図１】本発明の実施の形態における時刻合わせ処理部
の構成を示すブロック図。FIG. 1 is a block diagram showing a configuration of a time adjustment processing unit according to an embodiment of the present invention.

【図２】ＮＴＴの「１１７番」による時刻案内信号を説
明する図。FIG. 2 is a diagram for explaining a time guidance signal by NTT “117”.

【図３】本発明の実施の形態において用いられるＤＲＮ
Ｎによる音声認識を説明する図。FIG. 3 is a DRN used in an embodiment of the present invention.
The figure explaining the voice recognition by N.

【図４】本発明の実施の形態において用いられる機器の
リモコンの表面外観図。FIG. 4 is a surface appearance diagram of a remote controller of a device used in the embodiment of the present invention.

【図５】本発明の第１の実施の形態の処理を説明するフ
ローチャート。FIG. 5 is a flowchart illustrating a process according to the first embodiment of this invention.

【図６】音声認識部における音声認識結果の一例を示す
図。FIG. 6 is a diagram showing an example of a voice recognition result in a voice recognition unit.

【図７】本発明の第２の実施の形態の処理を説明するフ
ローチャート。FIG. 7 is a flowchart illustrating processing according to the second embodiment of this invention.

【図８】第２の実施例において秒単位の時刻の特定の例
を説明する図。FIG. 8 is a diagram illustrating a specific example of time in seconds according to the second embodiment.

【図９】本発明の第３の実施の形態の処理を説明するフ
ローチャート。FIG. 9 is a flowchart illustrating processing according to the third embodiment of this invention.

[Explanation of symbols]

１音入力部（マイクロホン）２入力信号処理部３入力信号分析部４発信音検出部５音声認識部６時刻表示部７時刻設定部３１電源ボタン３２メニューボタン３３ビデオ／テレビ切替ボタン３４「０」〜「９」の数字ボタン３５録画、再生、巻き戻し、早送りなどのボタン３６時刻データ転送ボタン５１標準音声特徴データ記憶部５２単語検出部５３音声認識処理部 1 sound input unit (microphone) 2 input signal processing unit 3 input signal analysis unit 4 dial tone detection unit 5 voice recognition unit 6 time display unit 7 time setting unit 31 power button 32 menu button 33 video / TV switch button 34 “0” ~ Numerical buttons of "9" 35 Buttons for recording, playing, rewinding, fast-forwarding 36 Time data transfer button 51 Standard voice feature data storage section 52 Word detection section 53 Voice recognition processing section

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所 // Ｇ０６Ｆ 15/18 ５６０Ｇ０６Ｆ 15/18 ５６０Ｇ (72)発明者稲積満広長野県諏訪市大和３丁目３番５号セイコーエプソン株式会社内 (72)発明者浦野治長野県諏訪市大和３丁目３番５号セイコーエプソン株式会社内 (72)発明者相澤直長野県諏訪市大和３丁目３番５号セイコーエプソン株式会社内─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification number Internal reference number FI Technical display location // G06F 15/18 560 G06F 15/18 560G (72) Inventor Mitsuhiro Inazumi 3 Yamato, Suwa City, Nagano Prefecture 3-5, Seiko-Epson Co., Ltd. (72) Inventor, Osamu Urano, 3-3-5, Yamato, Suwa-shi, Nagano (72) In-house, Seiko-Epson, Inc. (72) 3-3.5, Yamato, Suwa, Nagano Prefecture No. Seiko Epson Corporation

Claims

[Claims]

1. When adjusting the current time of a clock built in a device, a time guide including voice is input, the voice part of the time guide is voice-recognized, and the time of the clock is set based on the recognition result. In a time adjustment method using voice recognition, the time is set at a predetermined time at a current time indicated by time guidance, and the means for performing the voice recognition is a dynamic recurrent neural network (Dynamic Recurrent Neural Network).
The voice feature data for the input voice obtained by analyzing the voice included in the time guidance input by the voice inputting means by using the voice recognizing means by the keyword spotting process using the Networks method, and for each registered word. Compared with the standard voice feature data, the registration word is present in which part on the time axis of the input voice with a certain degree of certainty, for each registered word, it is detected by a numerical value indicating the certainty, Based on the numerical value indicating the certainty, the input voice is recognized, and based on the recognition result, the time of the clock is set to the time of the time indicated by the time guidance at a predetermined timing. Time adjustment method using featured voice recognition.

2. The time at which the built-in clock is set based on the recognition result is voice-recognized for each voice of the time guidance corresponding to the passage of time, and the voice recognition result for each time is obtained. 2. The time adjustment method using voice recognition according to claim 1, wherein the time is set to a time when it is determined that the certainty of is sufficient.

3. The time at which the built-in clock is set based on the recognition result is obtained by performing a preset number of times of voice recognition for each time guidance voice corresponding to the passage of time, and The time setting method using voice recognition according to claim 1, wherein the time is determined based on the voice recognition result for the time guidance voice.

4. When adjusting the current time of a clock built in the device, a time guide including voice is input, the voice part of the time guide is voice-recognized, and the time of the clock is detected based on the recognition result. A time input device for inputting time guidance including voice, in a time setting device for a timepiece using voice recognition, which sets time at a predetermined time to a time point to be guided by the time guidance, and a sound input unit for inputting the time guidance. Voice feature data for the input voice obtained by analyzing the voice that announces the current time input to the means,
By comparing with standard voice feature data for each registered word, it is shown for each registered word, the certainty of which part on the time axis of the input voice the certainity of the registered word is and with what degree of certainty. A voice recognition unit based on keyword spotting processing using a dynamic recurrent neural network method that detects numerical values, and the time of the clock based on the recognition result by the voice recognition unit is set to a predetermined time at the time point when the time guidance is provided. A time setting device using voice recognition, comprising:

5. The time at which the built-in clock is set based on the recognition result is voice-recognized for each time guidance voice corresponding to the passage of time, and the voice recognition result for each time is obtained. 5. The time adjustment device using voice recognition according to claim 4, wherein the time is a time at which the certainty of is determined to be sufficient.

6. The time when the built-in clock is set based on the recognition result is obtained by performing a preset number of times of voice recognition for each time guidance voice corresponding to the passage of time. 5. The time adjustment device using voice recognition according to claim 4, wherein the time determined based on the voice recognition result for the time guidance voice is used.