JPH06110486A

JPH06110486A - Multimedia device with speech input means

Info

Publication number: JPH06110486A
Application number: JP4256496A
Authority: JP
Inventors: Hiroshi Matsuura; 博松浦
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1992-09-25
Filing date: 1992-09-25
Publication date: 1994-04-22

Abstract

PURPOSE:To improve the usability by easily and surely performing a series of operations for speech input by a user in cooperation with various media. CONSTITUTION:When the user stands right in front of the display part 2 of this device, a sensor 51 provided nearby the display screen of the display part 2 among sensors 51-54 and a hook switch 55 constituting an approach detection part 5 turns ON. Then a control part 7 displays and vocalizes a message indicating the receiver of a handset type speech input part 1 is applied to the ear by using the display part 2 and a 1st speech output part 3 provided outside the speech input part 1. Once the application of the receiver to the ear is detected from the states of the sensors 52-54 and hook switch 55 provided at respective parts of the speech input part 1, the control part 7 displays and vocalizes a message requesting, for example, vocalization for ordering by using the display part 2 and a 2nd speech output part 4 incorporated in the speech input part 1.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声を入力するための
音声入力手段を持ち、複数のメディアで入出力が可能な
マルチメディア装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multimedia device having voice input means for inputting voice and capable of inputting / outputting a plurality of media.

【０００２】[0002]

【従来の技術】近年、音声入力の操作性向上のために、
ハンドセット型の音声入力装置が開発されている。この
種の音声入力装置に、特願昭６３−２７４６９２号で示
されるようなものがある。この特願昭６３−２７４６９
２号記載の音声入力装置は、同入力装置の握り部などに
設けた接近検知手段（センサ）により、入力装置の状態
と利用者の動作を検知することで、音声入力を受付ける
タイミングを決定して、周囲雑音等が入力されるのを防
ぐようにしたものである。2. Description of the Related Art Recently, in order to improve the operability of voice input,
Handset type voice input devices have been developed. An example of this type of voice input device is shown in Japanese Patent Application No. 63-274692. This Japanese Patent Application Sho 63-27469
The voice input device described in No. 2 determines the timing of accepting voice input by detecting the state of the input device and the operation of the user by the approach detection means (sensor) provided in the grip of the input device. Thus, the ambient noise is prevented from being input.

【０００３】[0003]

【発明が解決しようとする課題】しかし、上記した従来
のハンドセット型の音声入力装置では、周囲雑音等の入
力は防げるものの、利用者にとっては、いつ音声入力装
置を持って、いつ音声を入力したらよいのか、また、い
つ音声入力装置を置いたらよいのかも分からず、使い勝
手が悪いという問題があった。However, although the above-mentioned conventional handset type voice input device can prevent the input of ambient noise and the like, the user must always hold the voice input device and input the voice. There was a problem that it was not easy to use because I did not know when it was good or when to put the voice input device.

【０００４】本発明は上記事情に鑑みてなされたもので
その目的は、利用者による音声入力のための一連の操作
が、種々のメディアとの協調により容易に且つ確実に行
える、使い勝手のよいマルチメディア装置を提供するこ
とにある。The present invention has been made in view of the above circumstances, and an object thereof is to provide a user-friendly multi-user system that allows a user to easily and reliably perform a series of operations for voice input by cooperating with various media. Providing a media device.

【０００５】[0005]

【課題を解決するための手段】本発明のマルチメディア
装置は、利用者が発声した音声を入力するためのハンド
セット型の音声入力手段と、利用者の存在および音声入
力手段を利用者が利用する際の利用者の各種動作を検知
するための検知手段と、この検知手段の検知内容をもと
に、利用者の動作に従う表示メッセージを表示手段に表
示させると共に、利用者の動作に従う音声メッセージを
音声出力手段から出力させる制御手段とを備えたことを
特徴とする。According to the multimedia device of the present invention, a user uses a handset type voice input means for inputting a voice uttered by the user and the presence and voice input means of the user. A detection means for detecting various actions of the user at the time, and a display message according to the action of the user is displayed on the display means based on the detection content of this detection means, and a voice message according to the action of the user is displayed. And a control means for outputting from the audio output means.

【０００６】また本発明は、音声出力手段を、音声入力
手段の筐体外に設けられた第１の音声出力手段と、音声
入力手段を利用者が利用する際に利用者の耳元近傍に位
置するように同入力手段の筐体に内蔵された第２の音声
出力手段とにより構成し、音声メッセージの出力先が切
替えられるようにしたことをも特徴とする。According to the present invention, the voice output means is located near the user's ear when the user uses the voice output means and the first voice output means provided outside the housing of the voice input means. It is also characterized in that it is constituted by the second voice output means incorporated in the housing of the input means, and the output destination of the voice message can be switched.

【０００７】[0007]

【作用】上記の構成においては、検知手段により検知さ
れた利用者の存在の有無と、利用者が存在する場合の利
用者の動作状態に従って、表示手段に表示される表示メ
ッセージの内容、あるいは音声出力手段から出力される
音声メッセージの内容が、制御手段により変えられる。
このように、利用者の動作に従って表示メッセージの内
容および音声メッセージの内容が切替えられるため、利
用者は、自身の動作状態によって切り替わる表示メッセ
ージあるいは音声メッセージに従い、ハンドセット型の
音声入力手段の適切な保持、発声、表示画面の確認など
の動作を確実に且つ容易に行うことが可能となる。In the above structure, the content of the display message displayed on the display means or the voice is displayed according to the presence or absence of the user detected by the detection means and the operating state of the user when the user exists. The content of the voice message output from the output means is changed by the control means.
In this way, the content of the display message and the content of the voice message are switched according to the operation of the user, so that the user appropriately holds the handset type voice input means according to the display message or the voice message which is switched depending on the operation state of the user. It is possible to surely and easily perform operations such as utterance and confirmation of the display screen.

【０００８】また、第１の音声出力手段と、第２の音声
出力手段との２種の音声出力手段を用いた構成では、検
知手段により検知される利用者の動作状態に従って、音
声メッセージの出力先が切替えられる。このため、利用
者が本装置に接近した場合には、音声入力手段の筐体外
に設けられた第１の音声出力手段からメッセージを音声
出力し、利用者が音声入力手段を持った後は、利用者の
耳元に位置するように同入力手段の筐体に内蔵された第
２の音声出力手段からメッセージを音声出力することに
より、音声メッセージが騒音として音声入力手段により
入力されることを防止したり、他人に聞かれることを防
ぐ一方、必要な情報が利用者に確実に届くようにするこ
とが可能となる。Further, in the configuration using the two types of voice output means, the first voice output means and the second voice output means, the voice message is output according to the operation state of the user detected by the detection means. The destination is switched. Therefore, when the user approaches the apparatus, a message is output by voice from the first voice output means provided outside the housing of the voice input means, and after the user holds the voice input means, By outputting the message by voice from the second voice output means built in the housing of the input means so as to be located near the user's ear, the voice message is prevented from being input as noise by the voice input means. It is possible to ensure that necessary information reaches the user while preventing others from being asked.

【０００９】[0009]

【実施例】以下、本発明の一実施例について、ハンバー
ガショップの注文機に用いるマルチメディア装置に適用
した場合を例に、図面を参照して説明する。図１は、同
実施例におけるマルチメディア装置の構成を概略的に示
すブロック図である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings by taking as an example the case of being applied to a multimedia device used for an ordering machine of a hamburger shop. FIG. 1 is a block diagram schematically showing the configuration of the multimedia device according to the embodiment.

【００１０】図１に示すマルチメディア装置は、音声入
力部１、表示部２、第１の音声出力部３、第２の音声出
力部４、接近検知部５、音声認識部６および制御部７か
ら構成される。The multimedia device shown in FIG. 1 includes a voice input unit 1, a display unit 2, a first voice output unit 3, a second voice output unit 4, an approach detection unit 5, a voice recognition unit 6 and a control unit 7. Composed of.

【００１１】音声入力部１は、利用者が発声した音声を
入力するのに用いられる。表示部２はメニューやメッセ
ージ等の表示に用いられるもので、ＣＲＴディスプレイ
や液晶ディスプレイなどである。The voice input unit 1 is used to input a voice uttered by a user. The display unit 2 is used for displaying menus, messages and the like, and is a CRT display, a liquid crystal display or the like.

【００１２】第１の音声出力部３は、利用者が本装置に
接近した際に音声によるメッセージ等の出力に用いられ
るスピーカ（外部スピーカ）であり、音声入力部１の筐
体から離して設けられる。第２の音声出力部は、利用者
が音声入力部１（の筐体）を持った状態での利用者の動
作内容に応じた音声によるメッセージ等の出力に用いら
れるスピーカ（内部スピーカ）であり、音声入力部１の
筐体に内蔵されている。The first voice output unit 3 is a speaker (external speaker) used for outputting a message or the like by voice when the user approaches the apparatus, and is provided separately from the housing of the voice input unit 1. To be The second voice output unit is a speaker (internal speaker) used for outputting a message or the like by voice according to the operation content of the user when the user holds (the casing of) the voice input unit 1. The audio input unit 1 is built in the housing.

【００１３】接近検知部５は、利用者の存在および音声
入力部１を利用者が利用する際の利用者の各種動作を検
知するためのもので、光学的なセンサ５１，５２，５
３，５４およびフックスイッチ５５から構成される。セ
ンサ５１は表示部２の表示画面近傍に設けられ、利用者
が表示部２の前に立ったことを検出するのに用いられ
る。センサ５２〜５４は、図２を参照して後述するよう
に、音声入力部１の本体部分（入力部１１）の筐体に設
けられる。またフックスイッチ５５は、図３を参照して
後述するように、音声入力部１の本体部分（入力部１
１）が置かれる支持台（１２）に設けられる。The approach detection unit 5 is for detecting the presence of the user and various actions of the user when the user uses the voice input unit 1, and optical sensors 51, 52, 5 are provided.
3, 54 and a hook switch 55. The sensor 51 is provided near the display screen of the display unit 2 and is used to detect that the user stands in front of the display unit 2. The sensors 52 to 54 are provided in the housing of the main body portion (input unit 11) of the voice input unit 1 as described later with reference to FIG. Further, the hook switch 55, as will be described later with reference to FIG.
It is provided on a support (12) on which 1) is placed.

【００１４】音声認識部６は音声入力部１により入力さ
れた音声の認識を司る。制御部７は接近検知部５による
検知内容をもとに、利用者の動作に従う表示メッセージ
を表示部２に表示させると共に、利用者の動作に従う音
声メッセージを第１の音声出力部３または第２の音声出
力部４から出力させる。また制御部７は、接近検知部５
による検知内容をもとに、音声認識部６による認識開始
可能タイミングを決定する。この制御部７は、表示部２
と共に、装置本体の筐体内に設けられている。音声入力
部１は、この音声入力部１の本体部分をなす図２の外観
図に示す入力部１１と、この入力部１１が置かれる図３
の外観図に示す支持台１２とから構成される。The voice recognition unit 6 controls the recognition of the voice input by the voice input unit 1. The control unit 7 causes the display unit 2 to display a display message according to the operation of the user based on the content detected by the approach detection unit 5, and outputs a voice message according to the operation of the user to the first voice output unit 3 or the second voice output unit 3. It is output from the voice output unit 4. Further, the control unit 7 controls the approach detection unit 5
The timing at which the voice recognition unit 6 can start recognition is determined based on the content detected by. The control unit 7 includes a display unit 2
Together with this, it is provided in the housing of the apparatus body. The voice input unit 1 includes a main body portion of the voice input unit 1 shown in the external view of FIG. 2 and the input unit 11 placed in FIG.
And a support base 12 shown in the external view.

【００１５】入力部１１の筐体構造は、図２に示すよう
に、電話機の送受話器型（ハンドセット型）に形成され
ており、握り部１１１と、この握り部１１１の両端に形
成された耳当て部１１２および口当て部１１３を有して
構成される。As shown in FIG. 2, the housing structure of the input unit 11 is formed in a handset type of a telephone, and includes a grip portion 111 and ears formed at both ends of the grip portion 111. It is configured to have a contact portion 112 and a mouth contact portion 113.

【００１６】握り部１１１は、図１に示すマルチメディ
ア装置の利用者が入力部１１を持つための部分であり、
その背面の幾分耳当て部１１２に寄った位置にはセンサ
５２が、同じく口当て部１１３に寄った位置にはセンサ
５３が、それぞれ設けられている。なお、握り部１１１
は、利用者が入力部１１を持って耳当て部１１２に耳を
当接させたときに、口当て部１１３が利用者の口元近傍
に位置するように適当な長さを有している。The grip portion 111 is a portion for the user of the multimedia device shown in FIG. 1 to hold the input portion 11.
A sensor 52 is provided on the rear surface at a position slightly closer to the ear rest 112, and a sensor 53 is also provided at a position closer to the mouth rest 113. The grip 111
Has an appropriate length so that when the user holds the input section 11 and brings his / her ear into contact with the earpiece 112, the mouthpiece 113 is located near the mouth of the user.

【００１７】上記センサ５２，５３は、物体や人が接触
するか、あるいは近接したこと、例えば利用者が握り部
１１１を手で握った場合等を検知するためのものであ
る。このセンサ５２，５３は、図４に示すように、発光
部１３１からの光を物体あるいは人体（ここでは手等）
で反射し、受光部１３２で受けることにより、物体等の
存在を検知する構造となっている。このセンサ５２，５
３の検知可能な物体等の距離は、任意に調整可能であ
り、通常は２〜３ｃｍとする。The sensors 52 and 53 are for detecting that an object or a person has come into contact with or approached, for example, when the user grips the grip 111 with his / her hand. As shown in FIG. 4, the sensors 52 and 53 use the light from the light emitting unit 131 as an object or a human body (here, a hand or the like).
The structure is such that the presence of an object or the like is detected by being reflected by and received by the light receiving unit 132. This sensor 52, 5
The distance of the object or the like that can be detected in 3 can be arbitrarily adjusted, and is usually 2 to 3 cm.

【００１８】耳当て部１１２には、図１中の制御部７か
らの電気信号を音に変換して合成音（からなる音声メッ
セージ）を発するための、図１に示した第２の音声出力
部（スピーカ）４が内蔵されている。この耳当て部１１
２のほぼ中央部には、センサ５４が設けられている。こ
のセンサ５４は、耳当て部１１２が物体等、例えば利用
者の耳に接触するか、あるいは近接したことを検知する
ためのものである。センサ５４には、上記センサ５２，
５３と同様の構造の光学的センサが用いられる。また前
記センサ５１についても、センサ５２，５３と同様の構
造の光学的センサが用いられる。但し、センサ５１の検
知距離は数十ｃｍとする。The ear pad 112 has a second voice output shown in FIG. 1 for converting an electric signal from the control unit 7 in FIG. 1 into a sound to generate a synthesized voice (a voice message consisting of the voice). The unit (speaker) 4 is built in. This earpiece 11
A sensor 54 is provided at a substantially central portion of 2. The sensor 54 is for detecting that the ear pad 112 is in contact with or near an object such as a user's ear. The sensor 54 includes the sensors 52,
An optical sensor having the same structure as 53 is used. Also for the sensor 51, an optical sensor having the same structure as the sensors 52 and 53 is used. However, the detection distance of the sensor 51 is several tens of cm.

【００１９】口当て部１１３には、利用者が発した音声
を電気信号に変換するための電気音響変換器として、入
力部１１の中心をなすマイクロホン（図示せず）が内蔵
されている。このマイクロホンによって変換された電気
信号はコード１１７を介して図１に示す音声認識部６に
供給される。また、センサ５２〜５４の検知信号は同じ
コード１１７を介して、図１に示す制御部７に供給され
る。The mouthpiece unit 113 has a built-in microphone (not shown) at the center of the input unit 11 as an electroacoustic transducer for converting the voice uttered by the user into an electric signal. The electric signal converted by this microphone is supplied to the voice recognition unit 6 shown in FIG. Further, the detection signals of the sensors 52 to 54 are supplied to the control unit 7 shown in FIG. 1 via the same code 117.

【００２０】支持台１２は、図３に示すように、耳当て
部１１２および口当て部１１３を下向きにして入力部１
１を置いたときに、その耳当て部１１２および口当て部
１１３が嵌まる凹部１２１，１２２を有している。そし
て耳当て部１１２が嵌まる凹部１２１には、入力部１１
が支持台１２に置かれたことを検知するためのフックス
イッチ５５が設けられている。このフックスイッチ５５
は、支持台１２に入力部１１が置かれるとオフとなり、
入力部１１が取上げられるとオンとなる。なお、フック
スイッチ５５に代えて、センサ５１〜５４と同様の構造
の光学的センサを用いることも可能である。この場合、
検知距離は、２〜３ｃｍとする。次に本発明の一実施例
の動作を説明する。As shown in FIG. 3, the support base 12 has the ear pad 112 and the mouth pad 113 facing downward, and the input section 1 is provided.
It has recesses 121 and 122 into which the ear rest 112 and the mouth rest 113 fit when the device 1 is placed. The input portion 11 is provided in the recess 121 into which the ear pad 112 fits.
A hook switch 55 is provided for detecting that the is placed on the support 12. This hook switch 55
Turns off when the input unit 11 is placed on the support base 12,
It is turned on when the input unit 11 is picked up. Instead of the hook switch 55, an optical sensor having the same structure as the sensors 51 to 54 can be used. in this case,
The detection distance is 2-3 cm. Next, the operation of the embodiment of the present invention will be described.

【００２１】まず、接近検知部５を構成するセンサ５１
〜５４およびフックスイッチ５５の検知信号であるオン
（ｏｎ）／オフ（ｏｆｆ）状態信号は制御部７に導かれ
る。制御部７は、センサ５１〜５４およびフックスイッ
チ５５のオン／オフ状態信号（オン／オフ状態）をもと
に、利用者の存在の有無、入力部１１に対する利用者の
操作（動作）内容を、図５に整理して示すように判別す
ることができる。First, the sensor 51 constituting the approach detection unit 5
The on / off state signals, which are detection signals of 54 to 54 and the hook switch 55, are guided to the control unit 7. Based on the on / off state signals (on / off state) of the sensors 51 to 54 and the hook switch 55, the control unit 7 determines the presence / absence of a user and the content of the user's operation (action) on the input unit 11. , Can be determined as shown in FIG.

【００２２】即ち制御部７は、（表示部２の表示画面近
傍に設けられた）センサ５１がオンであれば、利用者が
本マルチメディア装置（の表示部）の前に存在する状態
（１）を判別し、センサ５１がオフであれば、利用者が
本マルチメディア装置（の表示部）の前に存在しない状
態（２）を判別する。That is, if the sensor 51 (provided in the vicinity of the display screen of the display unit 2) is on, the control unit 7 indicates that the user is present in front of (the display unit of) the present multimedia device (1 ), And if the sensor 51 is off, it determines the state (2) in which the user does not exist in front of (the display unit of) the present multimedia device.

【００２３】また制御部７は、入力部１１背面の耳当て
部１１２寄りの位置と口当て部１１３寄りの位置に設け
られたセンサ５２，５３の少なくとも一方がオン、耳当
て部１１２に設けられたセンサ５４がオン、フックスイ
ッチ５５もオンであれば、入力部１１を利用者の耳に正
しく当てている状態（３）を判別する。Further, in the control unit 7, at least one of the sensors 52 and 53 provided on the rear surface of the input unit 11 near the ear rest 112 and the position near the mouth rest 113 is turned on, and the control unit 7 is provided on the ear rest 112. If the sensor 54 is turned on and the hook switch 55 is also turned on, the state (3) in which the input unit 11 is correctly applied to the user's ear is determined.

【００２４】次に、センサ５２，５３がいずれもオフ、
センサ５４がオン、フックスイッチ５５がオフであれ
ば、制御部７は、入力部１１を支持台１２の上に正しく
置いている状態（４）を判別する。Next, the sensors 52 and 53 are both turned off,
If the sensor 54 is on and the hook switch 55 is off, the control unit 7 determines the state (4) in which the input unit 11 is correctly placed on the support base 12.

【００２５】次に、センサ５２，５３が少なくとも一方
がオン、センサ５４がオフ、フックスイッチ５５がオン
であれば、制御部７は、図２（ａ）に示す入力部１１の
面（上面）を下にしてテーブル等に置いている状態
（５）、あるいは入力部１１が取上げられたが、まだ耳
に当てられていない状態（８）を判別する。Next, if at least one of the sensors 52 and 53 is on, the sensor 54 is off, and the hook switch 55 is on, the control section 7 causes the surface (upper surface) of the input section 11 shown in FIG. The state (5) in which the input portion 11 is picked up but not placed on the ear (8) is determined.

【００２６】次に、センサ５２，５３がいずれもオフ、
センサ５４がオフ、フックスイッチ５５がオンであれ
ば、制御部７は、図２（ｂ）に示す入力部１１の面（一
方の側面）、あるいはその反対の面（他方の側面）を下
にしてテーブル等に置いている状態（６）を判別する。Next, the sensors 52 and 53 are both turned off,
When the sensor 54 is off and the hook switch 55 is on, the control unit 7 turns down the surface (one side surface) of the input unit 11 shown in FIG. 2B or the opposite surface (the other side surface). The state (6) placed on a table or the like is determined.

【００２７】次に、センサ５２，５３がいずれかもオ
フ、センサ５４がオン、フックスイッチ５５がオンであ
れば、制御部７は、図２（ｃ）に示す入力部１１の面
（下面）を下にしてテーブル等に置いている状態
（７）、あるいは入力部１１を耳に当てているが持ち方
が悪い状態（９）を判別する。Next, if any of the sensors 52 and 53 is off, the sensor 54 is on, and the hook switch 55 is on, the control section 7 changes the surface (lower surface) of the input section 11 shown in FIG. It is determined whether it is placed down on a table or the like (7), or when the input unit 11 is put on the ear but is held poorly (9).

【００２８】さて、本マルチメディア装置の表示部２の
直前に利用者が立つと、接近検知部５を構成するセンサ
５１〜５４およびフックスイッチ５５のうち、表示部２
の表示画面近傍に設けられたセンサ５１がオンする。す
ると制御部７は、図５に示す状態（１）を判別し、図６
に示すフローチャートに従う処理を開始する。When the user stands in front of the display unit 2 of the present multimedia device, the display unit 2 of the sensors 51 to 54 and the hook switch 55 constituting the approach detection unit 5 is displayed.
The sensor 51 provided near the display screen is turned on. Then, the control unit 7 determines the state (1) shown in FIG.
The process according to the flowchart shown in is started.

【００２９】まず制御部７は、例えば図７に示すよう
な、「受話器を耳に当てて下さい。」というメッセージ
を含む案内画面を、表示部２に表示する（ステップＳ
１）。同時に制御部７は、例えば「受話器をお持ち下さ
い。」という音声メッセージを、第１の音声出力部（外
部スピーカ）３から出力させる。このとき、音声入力部
１の入力部１１（受話器）は支持台１２の上に正しく置
かれている図５に示す状態（４）にあり、センサ５２，
５３はオフ、センサ５４はオン、フックスイッチ５５は
オフとなっている。First, the control unit 7 displays, on the display unit 2, a guide screen including a message such as "Please put the handset on your ear," as shown in FIG. 7 (step S).
1). At the same time, the control unit 7 causes the first voice output unit (external speaker) 3 to output a voice message, for example, "Please bring your handset." At this time, the input unit 11 (receiver) of the voice input unit 1 is in the state (4) shown in FIG.
53 is off, the sensor 54 is on, and the hook switch 55 is off.

【００３０】本装置の表示部２の前面に立った利用者
は、上記の表示内容（図７参照）および音声出力内容に
従って、まず入力部１１の握り部１１１を掴んで同入力
部１１を支持台１２から取上げる。このように、利用者
が入力部１１を図５に示す状態（８）にすると、センサ
５２，５３の少なくとも一方がオン、センサ５４がオ
フ、フックスイッチ５５がオンとなる。利用者は、上記
のように入力部１１を取上げると、その入力部１１の耳
当て部１１２を耳に当てる。The user standing in front of the display unit 2 of the present apparatus first grasps the grip 111 of the input unit 11 and supports the input unit 11 in accordance with the display contents (see FIG. 7) and the voice output contents. Pick up from the table 12. Thus, when the user sets the input unit 11 to the state (8) shown in FIG. 5, at least one of the sensors 52 and 53 is turned on, the sensor 54 is turned off, and the hook switch 55 is turned on. When the user picks up the input unit 11 as described above, the user puts the ear pad 112 of the input unit 11 on his / her ear.

【００３１】このように本実施例では、利用者が図１に
示す構成のマルチメディア装置（の表示部２）の前に立
つだけで、図７に示すような案内画面が表示されると共
に、、外部スピーカである第１の音声出力部３から「受
話器をお持ち下さい。」という音声メッセージが出力さ
れるため、利用者はこれに従って、音声入力部１の入力
部１１を支持台１２から取上げて、その耳当て部１１２
を耳に当てることができる。As described above, in this embodiment, the user simply stands in front of (the display unit 2 of) the multimedia device having the configuration shown in FIG. 1 to display the guide screen as shown in FIG. , The voice message “Please bring the handset” is output from the first voice output unit 3 which is an external speaker, and the user accordingly picks up the input unit 11 of the voice input unit 1 from the support base 12. The earpiece 112
Can be applied to your ears.

【００３２】さて利用者が、入力部１１を取上げた後、
その耳当て部１１２を正しく耳に当てると、センサ５
２，５３の少なくとも一方がオン、フックスイッチ５５
がオンを保ったまま、センサ５４がオンとなる。Now, after the user picks up the input section 11,
When the earpiece 112 is correctly applied to the ear, the sensor 5
At least one of 2, 53 is ON, hook switch 55
The sensor 54 is turned on while the signal is kept on.

【００３３】すると制御部７は、入力部１１が図５に示
す状態（３）となったこと、即ち利用者が入力部１１を
正しく持って、その耳当て部１１２を耳に正しく当てて
いる状態になったことを判別する（ステップＳ２，Ｓ
３）。この場合、制御部７は、入力部１１を用いた音声
による注文が正しく行える状態になったものと判断し、
例えば図８に示すように、「ご希望の品名を発声して下
さい。」というメッセージを含むメニューの画面を表示
部２に表示する（ステップＳ４）。同時に制御部７は、
音声入力部１の入力部１１（に設けられた耳当て部１１
２）に内蔵されている第２の音声出力部（内部スピー
カ）４から、例えば「発声をどうぞ。」という音声メッ
セージを出力させる。また制御部７は、利用者からの音
声による注文が可能となったことを判断すると、これ以
降に音声入力部１（の入力部１１）から入力される音声
を受付けて認識対象とすることを音声認識部６に通知す
る。Then, the control section 7 determines that the input section 11 is in the state (3) shown in FIG. 5, that is, the user correctly holds the input section 11 and correctly puts the ear pad 112 on his / her ear. It is determined that the state has been reached (steps S2, S
3). In this case, the control unit 7 determines that the order by voice using the input unit 11 can be correctly performed,
For example, as shown in FIG. 8, a menu screen including a message "Please say your desired product name." Is displayed on the display unit 2 (step S4). At the same time, the control unit 7
The ear pad 11 provided on the input unit 11 (of the voice input unit 1
The second voice output unit (internal speaker) 4 incorporated in 2) outputs a voice message such as "Please speak." Further, when the control unit 7 determines that it is possible to place an order by voice from the user, the control unit 7 accepts the voice input from (the input unit 11 of) the voice input unit 1 thereafter and sets it as a recognition target. The voice recognition unit 6 is notified.

【００３４】利用者は、図８に示すメニュー画面を見な
がら、上記のように入力部１１を取上げて耳当て部１１
２を耳に当てた状態で、したがって口当て部１１３を口
元に近づけた状態で、上記の「発声をどうぞ。」の音声
メッセージに従って希望の品名を発声をすればよい。例
えば利用者が、「ハンバーガ」「１つ」と発声すれば、
この音声が入力部１１の耳当て部１１２に内蔵のマイク
ロホンにより電気信号に変換されて音声認識部６に送ら
れ、同認識部６で認識される。また、利用者が続けて、
例えば「グレープジュース」「１つ」と発声すれば、同
様に音声認識部６で認識される。While looking at the menu screen shown in FIG. 8, the user picks up the input unit 11 as described above and pushes the ear pad 11
With 2 applied to the ear, and thus with the mouthpiece 113 approaching the mouth, the desired product name may be uttered in accordance with the above-mentioned voice message "Please speak." For example, if the user says "hamburger""1",
This voice is converted into an electric signal by a microphone built in the ear pad 112 of the input unit 11, sent to the voice recognition unit 6, and recognized by the recognition unit 6. In addition, the user continues,
For example, if "grape juice""one" is uttered, the voice recognition unit 6 similarly recognizes it.

【００３５】この際、利用者が「発声をどうぞ。」の音
声メッセージの出力終了を待たずに、注文のための発声
を行ったとしても、この音声メッセージは、入力部１１
の耳当て部１１２に内蔵されている第２の音声出力部４
から出力されるため、入力部１１の口当て部１１３に内
蔵されたマイクロホンにより拾われて雑音となる虞はな
い。勿論、「発声をどうぞ。」の音声メッセージの出力
終了後に入力部１１（の口当て部１１３に内蔵されたマ
イクロホン）から入力された音声（電気信号）だけを受
付けるようにするならば、上記の音声メッセージを外部
スピーカである第１の音声出力部３から出力しても問題
はない。但し、音声メッセージの出力終了を待ちきれず
に注文を行う利用者が存在することを考慮すると、上記
したように、ステップＳ２，Ｓ３で入力部１１が図５に
示す状態（３）となったことを判別した時点以降の入力
音声を受付け、「発声をどうぞ。」の音声メッセージを
内部スピーカである第２の音声出力部４から出力するよ
うにした方が都合がよい。さて、音声認識部６での音声
認識結果、即ち利用者からの音声による注文内容の音声
認識結果は、認識の都度、制御部７に送られる。制御部
７は、この音声認識部６の認識結果を受け、認識された
注文内容を、例えば図９に示すように、表示部２に表示
されているメニュー画面上で明示する（ステップＳ５，
Ｓ６）。この明示方式（強調表示方式）としては、反転
表示、ブリンク表示、色を変えた表示、枠で囲む表示等
の種々の方式が適用可能である。At this time, even if the user utters for ordering without waiting for the output of the voice message "Please speak", the voice message will still be input.
Second audio output unit 4 built in the ear pad 112 of the
Since it is output from, there is no possibility that it will be picked up by a microphone built in the mouthpiece part 113 of the input part 11 and become noise. Of course, if only the voice (electrical signal) input from the input unit 11 (the microphone built in the mouthpiece unit 113) is accepted after the output of the voice message “Please speak.” There is no problem even if the voice message is output from the first voice output unit 3, which is an external speaker. However, considering that there are users who can not wait until the output of the voice message is completed, as described above, the input unit 11 becomes the state (3) shown in FIG. 5 in steps S2 and S3. It is more convenient to accept the input voice after the time when it is determined and output the voice message "Please speak." From the second voice output unit 4, which is an internal speaker. By the way, the voice recognition result in the voice recognition unit 6, that is, the voice recognition result of the order contents by the voice from the user is sent to the control unit 7 every time the recognition is performed. The control unit 7 receives the recognition result of the voice recognition unit 6 and clearly indicates the recognized order contents on the menu screen displayed on the display unit 2 as shown in FIG. 9 (step S5).
S6). As this explicit method (emphasized display method), various methods such as reverse display, blink display, display with different colors, and display surrounded by a frame can be applied.

【００３６】その後、予め定められた一定期間を越えて
も発声がなく、また入力部１１（受話器）が支持台１２
に置かれてもいない場合には（ステップＳ７〜Ｓ１
０）、制御部７は、注文が完了したものとして、例えば
「ただいまのご注文はハンバーガ１つとグレープジュー
ス１つです。よろしければ、受話器を置いて下さい。」
という音声メッセージ、即ち利用者の注文結果の確認要
求と音声入力の終了操作を促すためのメッセージを、音
声入力部１の入力部１１（に設けられた耳当て部１１
２）に内蔵されている第２の音声出力部４から出力させ
る（ステップＳ１１）。After that, no utterance occurs even after a predetermined period of time elapses, and the input unit 11 (handset) is held by the support base 12.
If it is not placed in (steps S7 to S1)
0), the control unit 7 indicates that the order is completed, for example, "I have one hamburger and one grape juice right now. Please place the handset if you like."
Voice message, that is, a message requesting confirmation of the order result by the user and a message for prompting the operation of ending the voice input, the earpiece unit 11 provided in the input unit 11 (of the voice input unit 1
The second audio output unit 4 built in 2) is output (step S11).

【００３７】ここで利用者が入力部１１（受話器）を支
持台１２に置けば、センサ５２，５３のはいずれもオ
フ、センサ５４はオン、フックスイッチ５５はオフとな
る。すると制御部７は、入力部１１が図５に示す状態
（４）になったことを判別し、メニュー画面上に明示し
てある注文が決定されたことを認識する（ステップＳ１
２，Ｓ１３）。When the user places the input unit 11 (receiver) on the support 12, the sensors 52 and 53 are both off, the sensor 54 is on, and the hook switch 55 is off. Then, the control unit 7 determines that the input unit 11 is in the state (4) shown in FIG. 5, and recognizes that the order clearly indicated on the menu screen has been decided (step S1).
2, S13).

【００３８】一方、注文内容に誤りがある場合に、利用
者が例えば「違います」とか「訂正」と発声すれば、そ
の発声内容が音声認識部６により認識されて制御部７に
渡された段階で、「注文をし直して下さい。」等のメッ
セージを表示部２に表示すると共に、第２の音声出力部
４から音声出力させる（ステップＳ１４，Ｓ１５）。こ
れにより、再度の音声入力による注文（のし直し）が可
能となる。また、利用者の注文結果の確認要求と音声入
力の終了操作を促すためのメッセージが音声出力される
前に、利用者が受話器（入力部１１）を支持台１２に置
いた場合でも、その状態が判別されて（ステップＳ８，
Ｓ９）、上記と同様に注文が決定される。制御部７は、
注文内容が決定されたことを認識すると、例えば図１０
に示すように注文内容とその金額を表示部２に表示する
（ステップＳ１６）。On the other hand, if the user utters "No" or "correction" when the order contents are incorrect, the uttered contents are recognized by the voice recognition unit 6 and passed to the control unit 7. At the stage, a message such as "Please reorder" is displayed on the display unit 2 and the second voice output unit 4 outputs the voice (steps S14 and S15). As a result, it becomes possible to place an order (reload) by voice input again. In addition, even if the user puts the handset (input unit 11) on the support base 12 before the voice output of the message requesting confirmation of the user's order result and the voice input termination operation, that state Is determined (step S8,
In S9), the order is determined in the same manner as above. The control unit 7
When recognizing that the order details have been decided, for example, as shown in FIG.
As shown in, the contents of the order and the amount of the order are displayed on the display unit 2 (step S16).

【００３９】なお、以上の説明では、ステップＳ２，Ｓ
３で図５に示す状態（３）が判別されてから、ステップ
Ｓ１１で「受話器を置いて下さい。」というメッセージ
が出力されるまでは、利用者が入力部１１の耳当て部１
１２を正しく耳に当てていることを前提としている。こ
のため、音声メッセージは入力部１１に内蔵の第２の音
声出力部４から出力するようにしていた。In the above description, steps S2 and S
After the state (3) shown in FIG. 5 is discriminated in step 3 until the message “Please put the handset on” is output in step S11, the user uses the earpiece 1 of the input section 11.
It is assumed that 12 is correctly applied to the ear. For this reason, the voice message is output from the second voice output unit 4 built in the input unit 11.

【００４０】しかし、この音声出力方式では、もし利用
者が入力部１１の耳当て部１１２から耳を離した場合に
は、第２の音声出力部４から出力される音声メッセージ
は利用者の耳に届かない。そこで、音声メッセージの出
力に際しては、センサ５２，５３とセンサ５４とフック
スイッチ５５により入力部１１の状態を調べ、センサ５
２，５３の少なくとも一方がオン、センサ５４がオフ、
フックスイッチ５５がオンとなる図５に示す状態（８）
の場合、即ち入力部１１は取上げられているが、その耳
当て部１１２に耳が当てられていない状態（８）の場合
には、外部の第１の音声出力部３からも音声メッセージ
を出力して、利用者が聞取れるようにしてもよい。However, in this voice output system, if the user releases his / her ear from the ear pad 112 of the input unit 11, the voice message output from the second voice output unit 4 will be the user's ear. Does not reach Therefore, when outputting the voice message, the state of the input unit 11 is checked by the sensors 52 and 53, the sensor 54, and the hook switch 55, and the sensor 5
At least one of 2, 53 is on, the sensor 54 is off,
The state (8) shown in FIG. 5 in which the hook switch 55 is turned on.
In the case of, that is, when the input unit 11 is picked up, but the ear pad 112 is not put on the ear (8), a voice message is also output from the external first voice output unit 3. Then, the user may listen.

【００４１】また、本装置からメッセージを表示あるい
は音声出力した場合に、利用者が入力部１１に対して期
待される取扱いをしない場合、例えば「受話器をお持ち
下さいの」音声出力に対して、入力部１１が図５に示す
状態（３）にならないような場合、入力部１１に対する
取扱いが誤っている旨のメッセージと、正しい扱いを促
すための案内メッセージを表示あるいは音声出力するよ
うにしてもよい。When a message is displayed or voice output from the apparatus, if the user does not handle the input section 11 as expected, for example, for voice output of "Please bring the handset", If the input unit 11 does not reach the state (3) shown in FIG. 5, a message indicating that the input unit 11 is mishandled and a guidance message for prompting correct handling are displayed or output by voice. Good.

【００４２】以上は、ハンバーガショップの注文機に用
いるマルチメディア装置に適用した場合について説明し
たが、これに限るものではなく、本発明は音声入力部を
備えたマルチメディア装置全般に適用可能である。The above description has been given of the case where the present invention is applied to a multimedia device used for a hamburger shop ordering machine, but the present invention is not limited to this, and the present invention can be applied to all multimedia devices provided with a voice input section. .

【００４３】[0043]

【発明の効果】以上説明したように本発明のマルチメデ
ィア装置によれば、利用者の存在およびハンドセット型
の音声入力手段を利用者が利用する際の利用者の各種動
作を検知手段により検知し、この検知内容をもとに表示
画面や音声出力内容を切替える構成としたので、常に利
用者の動作に従う適切な表示画面や音声出力内容にダイ
ナミックに切替えることができ、利用者は、自身の動作
状態によって切り替わる表示画面や音声出力内容によ
り、ハンドセット型の音声入力手段の適切な保持、発
声、表示画面の確認などの動作を確実に且つ容易に行う
ことができる。As described above, according to the multimedia device of the present invention, the presence of the user and various actions of the user when the user uses the handset type voice input means are detected by the detecting means. Since the display screen and audio output contents are switched based on this detection content, it is possible to dynamically switch to an appropriate display screen and audio output content that always follows the user's operation, and the user can Depending on the display screen and the voice output contents that are switched depending on the state, it is possible to reliably and easily carry out operations such as appropriate holding of the handset type voice input means, utterance, and confirmation of the display screen.

【００４４】また本発明によれば、音声入力手段に内蔵
の音声出力手段（第２の音声出力手段）と音声入力手段
の外部に設けた音声出力手段（第１の音声出力手段）と
を備え、検知手段により検知される利用者の動作状態に
従って音声の出力先を切替える構成とすることにより、
音声が騒音となって音声入力手段に入力されるのを防止
したり、他人に聞かれるのを防ぐ一方、必要な情報を確
実に利用者に届けることができる。Further, according to the present invention, the voice input means includes a voice output means (second voice output means) built in and a voice output means (first voice output means) provided outside the voice input means. By configuring to switch the audio output destination according to the operating state of the user detected by the detection means,
It is possible to prevent the voice from being input as noise into the voice inputting means and to prevent it from being heard by others, while surely delivering the necessary information to the user.

[Brief description of drawings]

【図１】本発明の一実施例に係るマルチメディア装置の
概略構成を示すブロック図。FIG. 1 is a block diagram showing a schematic configuration of a multimedia device according to an embodiment of the present invention.

【図２】図１中の音声入力部１の本体部分（入力部１
１）の外観を示す図。FIG. 2 is a main body portion of the voice input unit 1 in FIG. 1 (input unit 1
The figure which shows the external appearance of 1).

【図３】図１中の音声入力部１の本体部分（入力部１
１）と対をなす部分（支持台１２）の外観を示す図。FIG. 3 is a main part of the voice input unit 1 (input unit 1 in FIG. 1).
The figure which shows the external appearance of the part (support stand 12) which pairs with 1).

【図４】図２中のセンサ５２，５３の構成を示す外観
図。4 is an external view showing the configuration of sensors 52 and 53 in FIG.

【図５】利用者の存在の有無および入力部１１に対する
利用者の操作（動作）状態と、図１中の接近検知部５を
構成するセンサ５１〜５４，フックスイッチ５５の状態
との関係を示す図。5 shows the relationship between the presence / absence of a user and the operation (operation) state of the user with respect to the input unit 11, and the states of the sensors 51 to 54 and the hook switch 55 which form the approach detection unit 5 in FIG. FIG.

【図６】利用者が図１の装置の直前に立ったことが検知
された際の、同装置内の制御部７による一連の動作を説
明するためのフローチャート。FIG. 6 is a flowchart for explaining a series of operations by the control unit 7 in the apparatus when it is detected that the user stands in front of the apparatus in FIG.

【図７】利用者が図１の装置の直前に立ったことが検知
された際に表示される表示画面例を示す図。7 is a diagram showing an example of a display screen displayed when it is detected that the user stands in front of the device of FIG.

【図８】利用者が音声入力部１の入力部１１を取上げて
耳に当てた際に表示される表示画面例を示す図。FIG. 8 is a diagram showing an example of a display screen displayed when the user picks up the input unit 11 of the voice input unit 1 and puts it on the ear.

【図９】利用者からの注文の完了が検知された際に表示
される表示画面例を示す図。FIG. 9 is a diagram showing an example of a display screen displayed when completion of an order from a user is detected.

【図１０】注文完了後に入力部１１を支持台１２に置い
たことが検知された際に表示される表示画面例を示す
図。FIG. 10 is a diagram showing an example of a display screen displayed when it is detected that the input unit 11 is placed on the support base 12 after the order is completed.

[Explanation of symbols]

１…音声入力部、２…表示部、３…第１の音声出力部、
４…第２の音声出力部、５…接近検知部、６…音声認識
部、７…制御部、１１…入力部、１２…支持台、５１〜
５４…センサ、５５…フックスイッチ、１１１…握り
部、１１２…耳当て部、１１３…口当て部。1 ... voice input unit, 2 ... display unit, 3 ... first voice output unit,
4 ... 2nd audio | voice output part, 5 ... approach detection part, 6 ... voice recognition part, 7 ... control part, 11 ... input part, 12 ... support base 51-
54 ... Sensor, 55 ... Hook switch, 111 ... Grip, 112 ... Ear rest, 113 ... Mouth rest.

Claims

[Claims]

1. A voice input unit for inputting a voice uttered by a user, a display unit used for displaying a message and the like, a presence of the user, and a use when the user uses the voice input unit. A detection unit for detecting various actions of the user, and a control unit for displaying a display message according to the action of the user on the display unit based on the detection content of the detection unit. A multimedia device characterized in that message contents are switched according to the above.

2. A voice input means for inputting a voice uttered by a user, a voice output means used for voice output of a message and the like, a presence of the user and a case where the user uses the voice input means. A detection means for detecting various actions of the user, and a control means for causing the voice output means to output a voice message according to the action of the user based on the detection content of the detection means. A multimedia device characterized in that the content of a message is switched according to the operation of a person.

3. A voice input means for inputting a voice uttered by a user, a display means used for displaying a message and the like, a voice output means used for voice output of a message and the like, and presence and presence of the user. A detection unit for detecting various actions of the user when the user uses the voice input unit, and a display message according to the action of the user is displayed on the display unit based on the detection content of the detection unit. And a control means for outputting a voice message according to the user's operation from the voice output means, and the display message content and the voice message content are switched according to the user's operation. apparatus.

4. A handset type voice input means for inputting a voice uttered by a user, and a first voice output means provided outside the housing of the voice input means and used for voice output of a message or the like. And, when the user uses the voice input unit, the voice input unit is built in the housing of the input unit so as to be located near the user's ear.
Second voice output means used for voice output of a message, etc., detection means for detecting the presence of the user and various actions of the user when the user uses the voice input means, and this detection means And a control unit for outputting a voice message according to the user's action from the first voice output unit or the second voice output unit based on the detected content of the message, and switching the message content according to the user's action. In addition, the multimedia device is characterized in that the output destination of the message is switched.

5. A handset type voice input means for inputting a voice uttered by a user, a display means used for displaying a message etc., and a message etc. provided outside the housing of the voice input means. A first voice output unit used for voice output, and a built-in housing of the input unit so that the voice input unit is located near a user's ear when the user uses the voice input unit,
Second voice output means used for voice output of a message, etc., detection means for detecting the presence of the user and various actions of the user when the user uses the voice input means, and this detection means Based on the detected content of the above, a control for displaying a display message according to the operation of the user on the display means and outputting a voice message according to the operation of the user from the first voice output means or the second voice output means. A multimedia device comprising: means for switching display message contents and voice message contents according to a user's operation, and switching an output destination of the voice message.

6. A voice recognition unit for recognizing a voice input by the voice input unit is further provided, wherein the voice recognition unit starts preparation for recognition in accordance with a user's operation detected by the detection unit. The multimedia device according to any one of claims 1 to 5, which is characterized.