JPH07302351A

JPH07302351A - Picture and voice response equipment and method therefor

Info

Publication number: JPH07302351A
Application number: JP6119514A
Authority: JP
Inventors: Hiroshi Hamada; 博志浜田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1994-05-09
Filing date: 1994-05-09
Publication date: 1995-11-14

Abstract

PURPOSE:To provide a picture and voice response equipment capable of being operated as a substitution for a man and making an office work confortable by moving the mouth part of a human picture displayed on a display means in accordance with a voice output from a voice response means. CONSTITUTION:A user 11 sits on a chair oppositely to the display screen 13 of a video display part to be a virtual secretary. Namely, the existing direction of the user 11 is detected by a user detecting sensor, the screen 13 on a turn table 14 is turned to the direction of the user 11 based upon the detected result. The screen 13 is electrically connected to an equipment body 16 through a cable 15 and a communication line 17 is drawn out from the main body 16. A secretary picture is displayed on the screen 13 and its mouth part is moved in accordance with a voice output from a voice output means, so that the user 11 executes business processing such as data retrieval by a short instruction only by a message in an atmosphere as if the user 11 talks with a human secretary.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、例えばオフィス・オー
トメーション（ＯＡ）機器として使用され、使用者の話
す言葉を理解し、使用者に対して能動的に情報を提供す
る画像・音声応答装置及び画像・音声応答方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is used as an office automation (OA) device, for example, to understand a word spoken by a user and actively provide information to the user. The present invention relates to an image / voice response method.

【０００２】[0002]

【従来の技術】従来より、情報機器あるいは事務機器等
のＯＡ機器は、オフィスにおける人間の作業補助に提供
されるものであった。2. Description of the Related Art Conventionally, OA equipment such as information equipment or office equipment has been provided to assist human work in an office.

【０００３】例えば、人間の頭脳労働に最も近いとされ
るコンピュータでは、近年のパソコンや「ＷＩＮＤＯＷ
Ｓ」（マイクロソフト社の登録商標）に代表されるよう
に、使用者に対してより多くの参照／作業画面を提供し
たり、あるいは、より多種のいわゆる「マルチメディ
ア」情報を表示したりすることが可能となり、より良き
書庫、ライブラリーの方向を主眼に技術追求が進行して
いる。[0003] For example, in a computer which is said to be closest to human brain labor, recent personal computers and "WINDOW
Providing more reference / work screens to the user or displaying more various so-called "multimedia" information, as represented by "S" (registered trademark of Microsoft Corporation). It is possible, and the pursuit of technology is progressing with a focus on better library and library direction.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、従来の
ＯＡ機器では、「人間の代役になる進化したＯＡ機器」
を目指すものではなく、あくまで「人間の作業環境を良
くする機械」として作製されているため、オフィスにお
ける人間の機械操作が依然として多く残されており、快
適なオフィスワークの観点からは未だ満足のいくもので
はなかった。However, in the conventional OA equipment, "advanced OA equipment that acts as a human substitute"
It is not intended to aim at, but is made as a "machine that improves the human work environment", so there are still many human machine operations in the office, and it is still satisfactory from the viewpoint of comfortable office work. It wasn't something.

【０００５】本発明は上記従来の問題点に鑑み、人間の
代役として機能して、快適なオフィスワークを可能とす
る画像・音声応答装置及び画像・音声応答方法を提供す
ることを目的とする。In view of the above-mentioned conventional problems, it is an object of the present invention to provide an image / voice response device and an image / voice response method which function as a substitute for humans and enable comfortable office work.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するため
に本発明の画像・音声応答装置は、使用者の音声に含ま
れる言語情報を認識する音声認識手段と、該音声認識手
段の認識結果に対応した応答内容を音声出力する音声応
答手段と、画像／文書情報を表示する表示手段とを備え
た画像・音声応答装置であって、前記表示手段にコンピ
ュータ・グラフィックスまたは自然画像によって人物画
像を表示し、前記表示手段に表示された前記人物画像の
口元の部分を前記音声応答手段からの音声出力に応じて
動かすようにしたものである。In order to achieve the above object, an image / speech response device of the present invention includes a speech recognition means for recognizing language information included in a user's speech, and a recognition result of the speech recognition means. An image / voice response device comprising voice response means for voice-outputting the response content corresponding to, and display means for displaying image / document information, wherein the display means is a person image by computer graphics or a natural image. Is displayed and the mouth portion of the person image displayed on the display means is moved according to the voice output from the voice response means.

【０００７】上述の画像・音声応答装置において、前記
人物画像が複数種類記憶された人物画像記憶手段を有
し、これらの人物画像の中から使用者が選択した人物画
像を前記表示手段に表示するようにしてもよい。In the above-mentioned image / speech response device, a person image storage means for storing a plurality of types of person images is provided, and the person image selected by the user from these person images is displayed on the display means. You may do it.

【０００８】上述の画像・音声応答装置において、装置
近傍に使用者が物理的に存在することを検知する使用者
検知手段を有し、該使用者検知手段により使用者が検知
された時に前記表示手段への前記人物画像の表示をオン
し、使用者が検知されない時には該人物画像の表示をオ
フするようにしてもよい。In the above-mentioned image / speech response device, there is a user detecting means for detecting the physical presence of the user in the vicinity of the device, and when the user is detected by the user detecting means, the display is made. The display of the person image on the means may be turned on, and the display of the person image may be turned off when the user is not detected.

【０００９】上述の画像・音声応答装置において、使用
者の発する音声により前記人物画像の前記表示手段への
表示をオンし、該音声が途切れて所定時間経過後に前記
表示をオフするようにしてもよい。In the above-mentioned image / speech response device, the display of the person image on the display means is turned on by the voice uttered by the user, and the display is turned off after a lapse of a predetermined time after the voice is interrupted. Good.

【００１０】上述の画像・音声応答装置において、前記
使用者検知手段を、使用者の存在する方角を検知し得る
構成にし、該使用者検知手段によって検知された方角に
前記表示手段の表示面を向けるようにしてもよい。In the above-mentioned image / speech response device, the user detection means is configured to detect the direction in which the user is present, and the display surface of the display means is set in the direction detected by the user detection means. You may turn it.

【００１１】上述の画像・音声応答装置において、使用
者の音声レベルを検知する音声レベル検知手段を有し、
前記音声応答手段は、前記音声レベル検知手段の検知結
果に応じた音声レベルで前記応答内容を音声出力するよ
うにしてもよい。The above-mentioned image / sound response device has a sound level detection means for detecting the sound level of the user,
The voice response unit may output the response contents by voice at a voice level according to the detection result of the voice level detection unit.

【００１２】上述の画像・音声応答装置において、使用
者の声紋パターンが登録された声紋登録手段と、使用者
の声紋を分析する声紋分析手段と、前記声紋分析手段の
分析結果が前記声紋登録手段中の声紋パターンに一致す
るか否かを判定する判定手段と、前記判定手段により一
致すると判定されたときは前記音声応答手段による音声
出力を許可し、不一致と判定されたときは前記音声応答
手段による音声出力を禁止する応答制御手段とを備える
ようにしてもよい。In the above-mentioned image / speech response device, the voiceprint registration means in which the voiceprint pattern of the user is registered, the voiceprint analysis means for analyzing the voiceprint of the user, and the analysis result of the voiceprint analysis means is the voiceprint registration means. Determination means for determining whether or not it matches the voiceprint pattern in the inside, and permitting voice output by the voice response means when the determination means determines that they match, and the voice response means when it is determined that they do not match. It is also possible to provide a response control means for prohibiting the audio output by.

【００１３】上述の画像・音声応答装置において、前記
声紋パターンの前記声紋登録手段への登録時に、登録用
音声を出力する登録音声出力手段と、前記登録用音声に
習って使用者が発した音声の声紋パターンを前記声紋登
録手段へ登録する登録制御手段とを備えるようにしても
よい。In the above-mentioned image / speech response device, when the voiceprint pattern is registered in the voiceprint registration means, a registration voice output means for outputting a registration voice and a voice uttered by the user following the registration voice. Registration control means for registering the voice print pattern of the above in the voice print registration means.

【００１４】上記目的を達成するために本発明の画像・
音声応答方法は、人物画像をコンピュータ・グラフィッ
クスまたは自然画像によって表示手段に表示し、使用者
の音声に含まれる言語情報を認識し、その認識結果に対
応した応答内容を音声出力すると同時に、前記表示手段
に表示された前記人物画像の口元の部分を該音声出力に
応じて動かすようにしたものである。In order to achieve the above object, the image of the present invention
In the voice response method, a person image is displayed on the display means by computer graphics or a natural image, the language information included in the voice of the user is recognized, and the response content corresponding to the recognition result is output at the same time as the voice output. The mouth portion of the person image displayed on the display means is moved according to the sound output.

【００１５】[0015]

【作用】上記構成により本発明の画像・音声応答装置に
よれば、音声認識手段が使用者の音声に含まれる言語情
報を認識し、音声応答手段は該音声認識手段の認識結果
に対応した応答内容を音声出力する。そして、表示手段
に人物画像を表示し、この表示手段に表示された前記人
物画像の口元の部分を前記音声応答手段からの音声出力
に応じて動かすようにする。これにより、例えば人間の
秘書と対話しているかのような雰囲気で、使用者は口頭
だけのわずかな指示で、資料の検索等を行える。According to the image / speech response device of the present invention having the above structure, the speech recognition means recognizes the language information included in the speech of the user, and the speech response means responds to the recognition result of the speech recognition means. The contents are output as voice. Then, the person image is displayed on the display means, and the mouth portion of the person image displayed on the display means is moved according to the voice output from the voice response means. As a result, the user can search for a material or the like with a slight verbal instruction in an atmosphere as if he / she were interacting with a human secretary.

【００１６】また、人物画像記憶手段の中から使用者が
選択した人物画像を表示手段に表示することにより、使
用者の好みで人物画像を表示させることができる。By displaying the person image selected by the user from the person image storage means on the display means, the person image can be displayed according to the user's preference.

【００１７】また、使用者検知手段により使用者が検知
された時に表示手段への人物画像の表示をオンし、使用
者が検知されない時には該人物画像の表示をオフするよ
うにすることにより、装置近傍に使用者が存在するとき
のみ装置を動作させることができる。Further, the display of the person image on the display means is turned on when the user is detected by the user detecting means, and the display of the person image is turned off when the user is not detected. The device can be operated only when there are users nearby.

【００１８】また、使用者の発する音声により人物画像
の表示手段への表示をオンし、該音声が途切れて所定時
間経過後に前記表示をオフすることにより、不要のとき
は動作を停止することができる。Further, by turning on the display of the person image on the display means by the voice uttered by the user and turning off the display after a lapse of a predetermined time after the voice is interrupted, the operation can be stopped when it is unnecessary. it can.

【００１９】また、使用者検知手段によって検知された
方角に表示手段の表示面を向けることにより、使用者は
例えばオフィス内での移動を拘束されない。Further, by directing the display surface of the display means in the direction detected by the user detection means, the user is not restricted from moving within the office, for example.

【００２０】また、音声応答手段は、音声レベル検知手
段の検知結果に応じた音声レベルで応答内容を音声出力
することにより、使用者は適正な音声レベルで装置と対
話することができる。Further, the voice response unit outputs the response contents by voice at the voice level corresponding to the detection result of the voice level detection unit, so that the user can interact with the apparatus at an appropriate voice level.

【００２１】また、声紋分析手段は使用者の声紋を分析
し、判定手段はその分析結果が声紋登録手段中の声紋パ
ターンに一致するか否かを判定し、応答制御手段は一致
すると判定されたときに音声応答手段による音声出力を
許可し、不一致と判定されたときに音声応答手段による
音声出力を禁止することにより、装置内の情報の機密性
が保持される。The voiceprint analysis means analyzes the voiceprint of the user, the determination means determines whether or not the analysis result matches the voiceprint pattern in the voiceprint registration means, and the response control means determines that they match. The confidentiality of the information in the device is maintained by permitting the voice output by the voice response unit at times and prohibiting the voice output by the voice response unit when it is determined that they do not match.

【００２２】また、登録音声出力手段は声紋パターンの
声紋登録手段への登録時に登録用音声を出力し、登録制
御手段は、その登録用音声に習って使用者が発した音声
の声紋パターンを前記声紋登録手段へ登録する。これに
より、簡易かつ的確に使用者の声紋パターンを登録する
ことができる。The registration voice output means outputs a registration voice when the voiceprint pattern is registered in the voiceprint registration means, and the registration control means uses the voiceprint pattern of the voice uttered by the user in accordance with the registration voice. Register to voiceprint registration means. As a result, the voiceprint pattern of the user can be registered easily and accurately.

【００２３】また、本発明の画像・音声応答方法によれ
ば、人物画像を表示手段に表示し、使用者の音声に含ま
れる言語情報を認識し、その認識結果に対応した応答内
容を音声出力すると同時に、前記表示手段に表示された
前記人物画像の口元の部分を該音声出力に応じて動か
す。これにより、例えば人間の秘書と対話しているかの
ような雰囲気で、使用者は口頭だけのわずかな指示で、
資料の検索等を行える。According to the image / speech response method of the present invention, the person image is displayed on the display unit, the language information included in the voice of the user is recognized, and the response content corresponding to the recognition result is output as voice. At the same time, the mouth portion of the person image displayed on the display means is moved according to the sound output. This allows the user to feel as if they were interacting with a human secretary, for example, with a few verbal instructions.
You can search materials.

【００２４】[0024]

【実施例】以下、図面を参照して本発明の実施例を説明
する。Embodiments of the present invention will be described below with reference to the drawings.

【００２５】図１は、本発明に係る画像・音声応答装置
の一実施例の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an embodiment of an image / voice response device according to the present invention.

【００２６】この画像・音声応答装置は、高度に自動化
されたマンマシンインターフェイスを備えたコンピュー
タ装置で構成され、オフィス内の秘書ロボット（以下、
仮想秘書という）として使用されるものであり、使用者
の口頭指令を解釈して自動的に事務処理を行うものであ
る。This image / speech response device is composed of a computer device having a highly automated man-machine interface, and is used as a secretary robot (hereinafter,
It is used as a virtual secretary) and interprets a user's verbal command and automatically performs paperwork.

【００２７】図中１は、装置全体の動作を制御するＣＰ
Ｕ部であり、このＣＰＵ部１にはバスライン２を介し
て、ファイル部３、通信部４、操作盤５、音声入力部
６、音声出力部７、及び映像表示部８が相互接続されて
いる。Reference numeral 1 in the figure is a CP for controlling the operation of the entire apparatus.
The CPU unit 1 is a U unit, and a file unit 3, a communication unit 4, an operation panel 5, an audio input unit 6, an audio output unit 7, and a video display unit 8 are interconnected to the CPU unit 1 via a bus line 2. There is.

【００２８】ファイル部３は、ＲＡＭ等で構成され、テ
キストデータ、画像データ（静止画や動画）、音声デー
タ、使用者の声紋パターン、及び電話帳データ等の各種
データが保存される。保存される画像データとして複数
種の秘書の人物画像（以下、秘書画像という）も含まれ
る。秘書画像としては、自然画像（写真／ムービ画像）
や、コンピュータ・グラフィックスで作成されたアニメ
ーションなどによる人物像や、その他架空のキャラクタ
画像等が用いられ、さらにこの秘書画像用の衣服画像
も、女性、男性用にビジネス風やフォーマルなものなど
各種が保存されている。また、音声データとして女性、
男性用に数種の音声パターンが保存され、さらに使用者
に面識のある人物の情報も随時追加登録されている。The file unit 3 is composed of a RAM or the like, and stores various data such as text data, image data (still images and moving images), voice data, voiceprint patterns of users, and telephone directory data. The image data to be stored includes person images of a plurality of types of secretaries (hereinafter referred to as secretary images). As a secretary image, a natural image (photo / movie image)
People, animated images created by computer graphics, and other fictional character images are used. In addition, clothes images for this secretary image can also be used for women and men in a variety of business and formal styles. Has been saved. Also, as voice data,
Several types of voice patterns are stored for men, and the information of the person who is familiar to the user is additionally registered at any time.

【００２９】通信部４は、通信回線４Ａへ出力される送
信画データの変調及び通信回線４Ａからのアナログデー
タを復調するモデムや、回線の切り換えを行う網制御ユ
ニット（ＮＣＵ）などを有している。The communication section 4 has a modem for modulating transmission image data output to the communication line 4A and demodulating analog data from the communication line 4A, and a network control unit (NCU) for switching lines. There is.

【００３０】操作盤５は各種操作スイッチ等から構成さ
れ、また、使用者の口頭指示を採取する音声入力部６に
は、より良く使用者の音声を拾い上げるために指向性の
集音マイクが用いられ、また周囲の雑音をカットするフ
ィルタ機構なども備えられている。さらに、音声入力部
６には特定話者方式等の音声認識装置が備えられ、該音
声認識装置は、マイク等で採取した使用者の音声に含ま
れる言語情報を認識し、コンピュータが扱えるテキスト
文などのコード情報に変換する。The operation panel 5 is composed of various operation switches and the like, and the voice input section 6 for collecting the verbal instruction of the user uses a directional sound collecting microphone in order to better pick up the user's voice. In addition, a filter mechanism for cutting out ambient noise is also provided. Furthermore, the voice input unit 6 is provided with a voice recognition device such as a specific speaker system, and the voice recognition device recognizes language information included in the voice of the user collected by a microphone or the like, and a text sentence that can be processed by a computer. Convert to code information such as.

【００３１】音声出力部７は、音声応答装置やスピー
カ、音声レベル検知手段等で構成され、ファイル部３に
記憶されているテキスト文などの情報（応答内容）を音
声応答装置により音声化し、スピーカによって出力す
る。ここで、音声レベル検知手段は、使用者の音声レベ
ルを検知するものであり、音声出力部７は、前記音声レ
ベル検知手段の検知結果に応じた音声レベルでスピーカ
より音声出力するようになっている。The voice output unit 7 is composed of a voice response device, a speaker, a voice level detecting means, etc., and converts information (response contents) such as text sentences stored in the file unit 3 into voice by the voice response device, and the speaker Output by. Here, the voice level detection means detects the voice level of the user, and the voice output section 7 outputs the voice from the speaker at the voice level according to the detection result of the voice level detection means. There is.

【００３２】また、映像表示部８には、液晶表示装置等
で構成され、コンピュータ・グラフィックスまたは自然
画像による秘書画像が表示されるほか、文書が表示され
る。The video display unit 8 is composed of a liquid crystal display device or the like, and displays a secretary image by computer graphics or a natural image and a document.

【００３３】そして、ＣＰＵ部１には、制御プログラム
等が格納されるＲＯＭ、演算結果等が記憶されるＲＡＭ
や、使用者検知センサ１Ａなどが接続されている入力回
路、出力回路等が備えられ，制御プログラム従って次の
ような各種制御を行う。The CPU unit 1 has a ROM for storing control programs and the like, and a RAM for storing the calculation results and the like.
Also, an input circuit, an output circuit, etc. to which the user detection sensor 1A and the like are connected are provided, and various kinds of control as follows are performed according to a control program.

【００３４】使用者検知センサ１Ａは、装置近傍に使用
者が物理的に存在することを検知するものであり、オフ
ィス内の所定箇所に設置されている。ＣＰＵ部１は、使
用者検知センサ１Ａにより使用者の存在が検知された時
に映像表示部８への秘書画像を表示をオンし、使用者が
検知されない時には該秘書画像の表示をオフする。ま
た、使用者が音声を発しているときには、秘書画像の映
像表示部８への表示をオンし、該音声が途切れて所定時
間経過後にはその表示をオフする。さらに、秘書画像の
口元の部分を音声出力手段７からの音声出力に応じて動
かすようにされる。The user detection sensor 1A is for detecting the physical presence of a user in the vicinity of the apparatus, and is installed at a predetermined location in the office. The CPU unit 1 turns on the display of the secretary image on the video display unit 8 when the presence of the user is detected by the user detection sensor 1A, and turns off the display of the secretary image when the user is not detected. Further, when the user is making a sound, the display of the secretary image on the video display unit 8 is turned on, and the display is turned off after a lapse of a predetermined time after the sound is interrupted. Further, the mouth portion of the secretary image is moved according to the voice output from the voice output means 7.

【００３５】また、ＣＰＵ部１は、音声入力部６から入
力された使用者の声紋を分析し、その分析結果が前記フ
ァイル３中の声紋パターンに一致するか否かを判定し、
一致すると判定されたときは、音声出力部７による音声
出力を許可し、不一致と判定されたときはその音声出力
を禁止する。Further, the CPU section 1 analyzes the voiceprint of the user input from the voice input section 6 and judges whether or not the analysis result matches the voiceprint pattern in the file 3,
When it is determined that they match, the voice output by the voice output unit 7 is permitted, and when it is determined that they do not match, the voice output is prohibited.

【００３６】また、本装置には、図示しないが画像リー
ダやプリンタ、光学式文字読取り装置（ＯＣＲ）等のコ
ンピュータ周辺機器が接続されている。Although not shown, computer peripherals such as an image reader, a printer and an optical character reader (OCR) are connected to the apparatus.

【００３７】図２は、本実施例における仮想秘書の作動
時の状態を示す概念図ある。FIG. 2 is a conceptual diagram showing a state during operation of the virtual secretary in this embodiment.

【００３８】図中１１は使用者であり、使用者１１は椅
子１２に座って仮想秘書における映像表示部８の表示画
面１３に対峙している。すなわち、使用者検知センサ１
Ａにより使用者の存在する方角が検知され、これによっ
て回転台１４上で表示画面１３が回転して使用者１１の
方角に向くようになっている。In the figure, 11 is a user, and the user 11 sits on a chair 12 and faces the display screen 13 of the video display section 8 of the virtual secretary. That is, the user detection sensor 1
The direction in which the user is present is detected by A, whereby the display screen 13 is rotated on the turntable 14 to face the direction of the user 11.

【００３９】この表示画面１３はケーブル１５を通して
電気的に装置本体１６に接続され、この装置本体１６か
らは通信回線１７が引き出されている。The display screen 13 is electrically connected to the device body 16 through a cable 15, and a communication line 17 is drawn from the device body 16.

【００４０】そして、表示画面１３には、秘書画像が表
示され、その口元の部分は音声出力手段７からの音声出
力に合わせて動き、あたかも人間の秘書と対話している
かのような雰囲気で、使用者は口頭だけのわずかな指示
で、資料の検索等の事務処理が行われる。Then, the secretary image is displayed on the display screen 13, and the part of the mouth thereof moves in accordance with the voice output from the voice output means 7 in an atmosphere as if the user is interacting with the secretary. The user can perform paperwork such as searching for materials with only a few instructions given orally.

【００４１】このように、本実施例の仮想秘書は、使用
者とのインターフェースを「コンピュータが画面上に作
り出す仮想の秘書画像」と「音声による会話」とによっ
て行う。なお、仮想秘書と使用者の音声による会話の最
中において、仮想秘書は常に最大の音声方向に自身の表
示画面１３と集音マイクとスピーカとが向くようにコン
トロールを行い、使用者のオフィス内での位置移動に追
従する。As described above, the virtual secretary of the present embodiment performs the interface with the user by "the virtual secretary image created by the computer on the screen" and "the voice conversation". During the conversation with the virtual secretary and the user's voice, the virtual secretary controls the display screen 13, the sound collecting microphone, and the speaker so that they are always oriented in the maximum voice direction, and the virtual secretary is in the office of the user. Follow the position movement in.

【００４２】次に、本実施例の仮想秘書を用いた事務処
理例を説明する。Next, an example of office processing using the virtual secretary of this embodiment will be described.

【００４３】まず、仮想秘書の初期設定として、（１）
秘書画像の設定、（２）仮想秘書の発する音声の質の設
定、（３）音声レベルの設定、（４）仮想秘書の呼び名
の設定、（５）使用者の声紋の登録、を行う。First, as an initial setting of the virtual secretary, (1)
The secretary image is set, (2) the voice quality of the virtual secretary is set, (3) the voice level is set, (4) the name of the virtual secretary is set, and (5) the voiceprint of the user is registered.

【００４４】（１）の秘書画像の設定については、ファ
イル部３に登録されている複数種の人物画像を表示画面
１３に表示して、使用者に音声入力や操作盤５により選
択させる。例えば、使用者が男性の場合には女性の映像
サンプル、また使用者が女性の場合には男性の映像サン
プルを自動的に表示画面１３に表示して使用者の好みの
映像を選択できるようにする。また、複数個の秘書画像
を選び、これを一日の時刻毎や、日替わり、月替わりで
切り換えて使用してもよい。Regarding the setting of the secretary image of (1), a plurality of types of person images registered in the file section 3 are displayed on the display screen 13 and the user is prompted to input a voice or select the operation panel 5. For example, when the user is a male, a female video sample, and when the user is a female, a male video sample is automatically displayed on the display screen 13 so that the user's favorite video can be selected. To do. Alternatively, a plurality of secretary images may be selected and used by switching them at each time of day, daily or monthly.

【００４５】さらに、秘書に着せる衣服も同様に、ファ
イル部３の中から使用者の好みのものを選択させる。Similarly, the clothes worn by the secretary are selected from the file section 3 by the user.

【００４６】そして、例えば表示画面１３に秘書画像と
衣服画像とを合成して着替えさせ、表示しながら使用者
に好みの組み合わせを選択させることにより、最終的な
秘書画像を設定する。Then, for example, the final secretary image is set by synthesizing the secretary image and the clothes image on the display screen 13 and changing the clothes and allowing the user to select a desired combination while displaying.

【００４７】（２）の仮想秘書の発する音声の質の設定
については、ファイル部３に登録されている数種の音声
パターンを仮想秘書が音声出力部７を通して音声出力
し、使用者に好みのものを選択させる。具体的には、設
定した秘書画像の女性、男性の区別に応じて、女性用音
声サンプル及び男性用音声サンプルをメニューとして複
数個、仮想秘書側が発声し、これを使用者に選択させる
ようにする。Regarding the setting of the quality of the voice generated by the virtual secretary in (2), the virtual secretary outputs the voice patterns of several kinds registered in the file section 3 through the voice output section 7 to the user's liking. Let them choose one. Specifically, the virtual secretary side utters a plurality of female voice samples and male voice samples as a menu according to the set distinction between female and male in the secretary image, and allows the user to select one. .

【００４８】（３）の音声レベルの設定は、最適な音声
の大きさはオフィスの広さ、使用者個々の聴覚能力など
によって異なる点を考慮して行われるもので、具体的に
は仮想秘書側が特定の単語や文章を連続的に発声し、こ
れに対して使用者側が、「より大きく」あるいは「より
小さく」の調整指示を口頭で出す形で行われる。また、
操作盤５より該調整指示を出すようにしてもよい。The voice level is set in (3) in consideration of the fact that the optimum voice volume varies depending on the size of the office, the hearing ability of each user, and the like. The user continuously utters a specific word or sentence, and the user verbally gives a “larger” or “smaller” adjustment instruction. Also,
The adjustment instruction may be issued from the operation panel 5.

【００４９】（４）の仮想秘書の呼び名の設定について
は、使用者側から特定名の指定を行うことで設定する。
具体的には、使用者側からの音声による単語入力や、操
作盤５より入力することで行われる。また、ファイル部
３に予め複数の呼び名を登録しておき、その中から使用
者が選択するようにしてもよい。本実施例では、仮想秘
書に「マリリン」という呼び名が与えられている。The name of the virtual secretary (4) is set by designating a specific name from the user side.
Specifically, it is performed by inputting a word by voice from the user side or by inputting from the operation panel 5. Also, a plurality of names may be registered in the file unit 3 in advance, and the user may select from among them. In this embodiment, the virtual secretary is given the name “Mariline”.

【００５０】（５）の使用者の声紋の登録は、登録した
声紋以外の声の人間には応答しないことで装置内に蓄え
られた情報の機密性を保護する目的、また特定話者方式
の音声認識を用いる場合にこの登録された音声サンプル
を使用する目的で行われる。The registration of the voiceprint of the user of (5) is for the purpose of protecting the confidentiality of the information stored in the device by not responding to a human voice other than the registered voiceprint, and for the specific speaker method. This is done for the purpose of using this registered voice sample when using voice recognition.

【００５１】具体的には、装置側より先導して発声（登
録用音声）し、使用者が同じ言葉を後に続けて発声する
形で使用者の声紋パターンをファイル部３に登録する。Specifically, the voiceprint pattern of the user is registered in the file unit 3 in such a manner that the user speaks first (voice for registration), and the user subsequently utters the same word.

【００５２】以上の初期設定が済み、使用者がオフィス
に出勤すると、仮想秘書は、使用者検知センサ１Ａによ
りオフィス内に人の気配をセンスし、動作開始状態とな
る。When the above-mentioned initial setting is completed and the user comes to the office, the virtual secretary senses the presence of the person in the office by the user detection sensor 1A, and becomes the operation start state.

【００５３】「おはよう、マリリン」の使用者の挨拶に
対して、声質の分析にて使用者と同一のものと確認する
と、「おはようございます。ＸＸＸＸ」の挨拶を返す。In response to the greeting of the user "Good morning, Marilyn", when the voice quality analysis confirms that it is the same as that of the user, the greeting "Good morning. XXX" is returned.

【００５４】昨晩のうちに自動受信したＦＡＸや電子メ
ールの情報はファイル部３内に蓄えられていて、ＯＣＲ
によりテキストデータに変換されている。The information of the FAX and the electronic mail which is automatically received last night is stored in the file section 3, and the OCR
Has been converted to text data.

【００５５】仮想秘書側から使用者へ能動的に「ＸＸＸ
Ｘ様よりＦＡＸが届いております。」、「ＸＸＸＸ様よ
りメールが届いております。」のコメントがなされる。
「読み上げてくれ」の口頭指示で仮想秘書はこれらのテ
キスト文を音声出力部７の音声応答置で音声変換して朗
読し始める。From the virtual secretary side to the user, "XXX
FAX has arrived from Mr. X. , "A mail has arrived from Mr. XXXX."
In response to the verbal instruction of “read me aloud,” the virtual secretary will convert these text sentences into voice by the voice response unit of the voice output unit 7 and start reading aloud.

【００５６】また、「スケジュール」と仮想秘書に口頭
指示すると、使用者の声紋を認識し、ファイル部３に登
録されている声紋データとの一致を見て自分の主人であ
ることを確認して、本日のスケジュール表データをファ
イル部３より引き出し、音声変換してこれを読み上げ
る。この時、面会相手の映像情報がファイル部３に登録
されていたら、同時にそれを表示画面１３に映し出す。When the "schedule" and the virtual secretary are verbally instructed, the voiceprint of the user is recognized, and by checking the agreement with the voiceprint data registered in the file section 3, it is confirmed that the user is the master. , Today's schedule table data is extracted from the file unit 3, converted into voice, and read aloud. At this time, if the video information of the person to be visited is registered in the file section 3, it is simultaneously displayed on the display screen 13.

【００５７】「タイプ」と指示すると、仮想秘書は口述
ワープロの態勢に入る。以後、使用者の話す言葉が仮想
秘書に聞き取られ、音声認識装置によってテキストデー
タに変換されて、ファイル部３内に蓄えられる。When "Type" is designated, the virtual secretary is in the position of a dictation word processor. Thereafter, the words spoken by the user are heard by the virtual secretary, converted into text data by the voice recognition device, and stored in the file unit 3.

【００５８】さらに、「電話、ＸＸＸＸ君」と指示する
と、仮想秘書はファイル部３内の電話帳データから、Ｘ
ＸＸＸ氏の電話番号を検索し、通信部４のモデム機能を
持ちいて自動的にダイヤリングが行われる。そして、電
話がつながり、相手が出たところで使用者に対して、
「ＸＸＸＸ様とつながりました」の音声応答で合図す
る。Further, if "telephone, XXXX" is instructed, the virtual secretary will select X from the telephone directory data in the file section 3.
The telephone number of Mr. XXX is searched for, and the communication section 4 has a modem function to automatically dial. Then, when the call is connected and the other party answers,
Give a voice response with "I was connected with Mr. XXXX".

【００５９】その後、音声入力部６の集音マイクにて使
用者の話す音声信号を電話信号に流し込み、また、電話
相手の話す音声を音声出力部７のスピーカにて使用者へ
伝え、フリーハンドの電話会話が行われる。After that, the voice signal of the user is poured into the telephone signal by the sound collecting microphone of the voice input unit 6, and the voice of the other party of the telephone is transmitted to the user through the speaker of the voice output unit 7 to make a freehand. Telephone conversation is conducted.

【００６０】もし、ＸＸＸＸ氏の電話番号が前記電話帳
データに存在しないときには、「申し訳ありませんが、
お電話番号は？」と返答し、ここで使用者から電話番号
を口頭で入力してもらい、そこで、同上のダイヤリング
動作に入る。この電話番号は、以後、ファイル部３の電
話帳データに追加記憶され、その後の上記自動ダイヤリ
ングの電話帳データの１つに用いられる。If the phone number of Mr. XXXX does not exist in the phone book data, "Sorry,
What is your phone number? , "And the user is asked to input the telephone number verbally, and then the dialing operation described above is started. This telephone number is thereafter additionally stored in the telephone directory data in the file section 3 and used as one of the subsequent telephone directory data for automatic dialing.

【００６１】また、「ＦＡＸ、ＸＸＸＸ君」と口頭指令
すると、ファイル部３内の電話帳データに登録している
ＸＸＸＸ氏のＦＡＸ番号を確認し、その後、「内容はど
うなさいますか？」の返答をする。これを受けて「口
述」の旨を指示すると、以後仮想秘書はヒアリングの態
勢に入り、使用者の話すことを聞き取り、これをテキス
トデータ化し、さらに文字画像にフォント展開後、ＦＡ
Ｘ画像として通信部４を通して先方（ＸＸＸＸ氏）へ伝
送する。When a verbal command "FAX, XXXX" is issued, the FAX number of Mr. XXXX registered in the telephone directory data in the file section 3 is confirmed, and then "What's the content?" Give a reply. In response to this, when the instruction for "dictation" is given, the virtual secretary then enters a listening posture, listens to the user's speech, converts this into text data, further develops the font into a character image, and then the FA
The X image is transmitted to the other party (Mr. XXXX) through the communication unit 4.

【００６２】さらに、「ファイル出せ」と口頭指令する
と、「タイトル、キーワードは？」と仮想秘書が聞き返
すので適当な単語を答える。既に作成、記憶されている
幾つかのデータを仮想秘書がファイル部３から取り出
し、「…でしょうか？」と照会するので、操作者はその
中から所望のものを選定すると、これを同様のＦＡＸ動
作で先方へ送る。Further, when the verbal command "put out file" is given, the virtual secretary asks "What is the title and keyword?", So an appropriate word is answered. The virtual secretary takes out some of the data that has already been created and stored from the file section 3 and inquires as to "...?". Therefore, when the operator selects the desired one, the same fax Send to the other party by action.

【００６３】「タイプ」と指示すると、以後、仮想秘書
はヒアリング態勢に入り、使用者の話すことを聞き取
り、テキストデータ化してファイル部３にファイルす
る。When the "type" is designated, the virtual secretary thereafter enters a listening posture, listens to the user's speech, converts it into text data, and files it in the file unit 3.

【００６４】なお、本実施例において、使用者に対して
文書等の情報を表示画面１３に表示する場合は、表示画
面１３に秘書映像と情報とを表示することが必要になる
が、これには、（ａ）秘書画像と情報画像とを切替えて
表示する、（ｂ）映像表示部８に２通りの方式を表示す
る機能を備え、秘書画像と情報映像とを個別に表示す
る、（ｃ）１通りの方式で画面分割表示を行う、等の方
式が考えられる。In the present embodiment, when displaying information such as a document for the user on the display screen 13, it is necessary to display the secretary image and information on the display screen 13. Has a function of (a) switching and displaying a secretary image and an information image, and (b) having a function of displaying two types of methods on the video display unit 8 and displaying the secretary image and the information video separately, (c) ) A method such as screen division display by one method can be considered.

【００６５】（ｃ）の表示方式を採る場合は、パソコン
などで主流になりつつある「ＷＩＮＤＯＷＳ」方式の表
示法が有効である。When adopting the display method of (c), the display method of the "WINDOWS" method which is becoming mainstream in personal computers and the like is effective.

【００６６】また、秘書画像にリアル性を持たすべく、
３次元（３Ｄ）画像表示も有効なものとなる。この３Ｄ
画像表示は、原理的に左右の目にずれた視点の別々の画
像を与えるようにする。In order to make the secretary image realistic,
Three-dimensional (3D) image display is also effective. This 3D
In principle, the image display is such that separate images with different viewpoints are provided to the left and right eyes.

【００６７】具体的には、表示画面１３に時分割で左右
の画像を表示し、変更ガラスや液晶シャッタの眼鏡を使
用して左右の目に別々の画像を取り込むメガネ式や、表
示画面１３全面にかまぼこ型レンズを備えるレンチキュ
ラ式、あるいは簡易タイプとして凹面鏡で中空に焦点を
合わせた投影表示などを用いる。Specifically, the left and right images are displayed on the display screen 13 in a time-sharing manner, and glasses for taking in separate images for the left and right eyes using change glasses or liquid crystal shutter glasses, or the entire display screen 13 is used. A lenticular type with a semi-cylindrical lens, or a simple type such as a projection display in which a hollow is focused by a concave mirror is used.

【００６８】オフィスで仮想秘書との対話のために使用
者にわざわざメガネを強いることは好ましくなく、無メ
ガネ方式が望ましい。It is not preferable to force the user to wear glasses in order to interact with the virtual secretary in the office, and the glasses-free method is preferable.

【００６９】[0069]

【発明の効果】以上詳細に説明したように、本発明の画
像・音声応答装置によれば、表示手段にコンピュータ・
グラフィックスまたは自然画像によって人物画像を表示
し、表示手段に表示された前記人物画像の口元の部分を
音声応答手段からの音声出力に応じて動かすようにした
ので、例えば人間の秘書と対話しているかのような雰囲
気で、使用者は口頭だけのわずかな指示で、資料の検索
等を行える。これにより、オフィス等における人間の機
械操作を大幅に省くことが可能となる。As described in detail above, according to the image / speech response device of the present invention, the display means is a computer
Since the person image is displayed by graphics or a natural image, and the mouth portion of the person image displayed on the display means is moved according to the voice output from the voice response means, for example, by interacting with a human secretary. In an ambience-like atmosphere, the user can search for materials, etc. with only a few instructions given orally. As a result, it is possible to greatly save human machine operations in the office or the like.

【００７０】また、人物画像記憶手段の中から使用者が
選択した人物画像を表示手段に表示することにより、使
用者の好みで人物画像を表示させることが可能となる。By displaying the person image selected by the user from the person image storage means on the display means, the person image can be displayed according to the user's preference.

【００７１】また、使用者検知手段により使用者が検知
された時に表示手段への人物画像の表示をオンし、使用
者が検知されない時には該人物画像の表示をオフするよ
うにすることにより、装置近傍に使用者が存在するとき
のみ装置を動作させることができ、不要時の消費電力を
低減させることが可能となる。Further, the display of the person image on the display means is turned on when the user is detected by the user detecting means, and the display of the person image is turned off when the user is not detected. The device can be operated only when a user is present in the vicinity, and power consumption when not needed can be reduced.

【００７２】また、使用者の発する音声により人物画像
の表示手段への表示をオンし、該音声が途切れて所定時
間経過後に前記表示をオフすることにより、不要のとき
は動作を停止することができ、不要時の消費電力を低減
させることが可能となる。Further, by turning on the display of the person image on the display means by the voice uttered by the user and turning off the display after a lapse of a predetermined time after the voice is interrupted, the operation can be stopped when it is unnecessary. Therefore, it is possible to reduce power consumption when unnecessary.

【００７３】また、使用者検知手段によって検知された
方角に表示手段の表示面を向けることにより、使用者は
例えばオフィス内での移動を拘束されないで済む。Further, by directing the display surface of the display means in the direction detected by the user detection means, the user does not have to be restricted from moving within the office, for example.

【００７４】また、音声応答手段は、音声レベル検知手
段の検知結果に応じた音声レベルで応答内容を音声出力
することにより、使用者は適正な音声レベルで装置と対
話することが可能となる。Further, the voice response means outputs the response contents by voice at the voice level corresponding to the detection result of the voice level detection means, so that the user can interact with the device at an appropriate voice level.

【００７５】また、使用者の声紋パターンが登録された
声紋登録手段と、使用者の声紋を分析する声紋分析手段
と、前記声紋分析手段の分析結果が前記声紋登録手段中
の声紋パターンに一致するか否かを判定する判定手段
と、前記判定手段により一致すると判定されたときは音
声応答手段による音声出力を許可し、不一致と判定され
たときは前記音声応答手段による音声出力を禁止する応
答制御手段とを備えることにより、装置内の情報の機密
性を保持することが可能となる。The voiceprint registration means in which the voiceprint pattern of the user is registered, the voiceprint analysis means for analyzing the voiceprint of the user, and the analysis result of the voiceprint analysis means match the voiceprint pattern in the voiceprint registration means. A response control for determining whether or not there is a match, and permitting the voice output by the voice response unit when it is determined to match by the determination unit, and prohibiting the voice output by the voice response unit when it is determined not to match. By providing the means, it becomes possible to maintain the confidentiality of information in the device.

【００７６】また、前記声紋パターンの前記声紋登録手
段への登録時に、登録用音声を出力する登録音声出力手
段と、前記登録用音声に習って使用者が発した音声の声
紋パターンを前記声紋登録手段へ登録する登録制御手段
とを備えることにより、簡易かつ的確に使用者の声紋パ
ターンを登録することが可能となる。Further, when the voiceprint pattern is registered in the voiceprint registration means, a registration voice output means for outputting a registration voice and a voiceprint pattern of the voice uttered by the user following the registration voice are registered in the voiceprint. By providing the registration control means for registering to the means, it becomes possible to easily and accurately register the voiceprint pattern of the user.

【００７７】また、本発明の画像・音声応答方法によれ
ば、人物画像をコンピュータ・グラフィックスまたは自
然画像によって表示手段に表示し、使用者の音声に含ま
れる言語情報を認識し、その認識結果に対応した応答内
容を音声出力すると同時に、前記表示手段に表示された
前記人物画像の口元の部分を該音声出力に応じて動かす
ようにしたので、上記発明と同様の効果がある。According to the image / speech response method of the present invention, the human image is displayed on the display means by computer graphics or a natural image, the language information included in the voice of the user is recognized, and the recognition result is obtained. At the same time that the response content corresponding to is output as a voice, the mouth portion of the person image displayed on the display means is moved according to the voice output, so that there is the same effect as the above invention.

[Brief description of drawings]

【図１】本発明に係る画像・音声応答装置の一実施例の
概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an embodiment of an image / voice response device according to the present invention.

【図２】本実施例における仮想秘書の作動時の状態を示
す概念図ある。FIG. 2 is a conceptual diagram showing a state during operation of a virtual secretary in this embodiment.

[Explanation of symbols]

１ＣＰＵ部１Ａ使用者検知センサ２バスライン３ファイル部４通信部５操作盤６音声入力部７音声出力部８映像表示部１１使用者１３表示画面 1 CPU part 1A User detection sensor 2 Bus line 3 File part 4 Communication part 5 Operation panel 6 Audio input part 7 Audio output part 8 Video display part 11 User 13 Display screen

フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ１０Ｌ 3/00 ＲＳ５３１Ｌ C7.C8 ５７１ＨＨ０４Ｍ 11/06 Continuation of the front page (51) Int.Cl. ⁶ Identification code Office reference number FI Technical display location G10L 3/00 RS 531 L C7.C8 571 H H04M 11/06

Claims

[Claims]

1. A voice recognition unit for recognizing language information included in a user's voice, a voice response unit for outputting a response content corresponding to a recognition result of the voice recognition unit by voice, and displaying image / document information. An image / speech response device comprising display means, wherein a human image is displayed on the display means by computer graphics or a natural image, and the portion of the mouth of the human image displayed on the display means is converted into the voice. An image / speech response device characterized in that the image / speech response device is adapted to move in response to a sound output from a response means.

2. A person image storage means for storing a plurality of types of person images, and a person image selected by a user from these person images is displayed on the display means. The image / speech response device according to claim 1.

3. A user detection means for detecting the physical presence of a user near the apparatus, wherein when the user is detected by the user detection means, the image of the person on the display means is displayed. 3. The image / voice response apparatus according to claim 1, wherein the display is turned on and the display of the person image is turned off when the user is not detected.

4. The image according to claim 1, wherein the display of the person image on the display means is turned on by a voice uttered by a user, and the display is turned off after a lapse of a predetermined time after the voice is interrupted. -Voice response device.

5. The user detection means is configured to detect the direction in which the user is present, and the display surface of the display means is directed to the direction detected by the user detection means. The image / voice response device according to claim 3.

6. A voice level detecting means for detecting a voice level of a user, wherein the voice response means outputs the response contents by voice at a voice level according to a detection result of the voice level detecting means. The image / voice response device according to claim 1, wherein

7. A voiceprint registration unit in which a voiceprint pattern of a user is registered, a voiceprint analysis unit for analyzing a voiceprint of the user, and an analysis result of the voiceprint analysis unit matches a voiceprint pattern in the voiceprint registration unit. The determination unit that determines whether or not the determination unit and the determination unit determine that the voice response unit permits the voice output, and the determination unit determines that the voice response unit prohibits the voice output. The image / voice response device according to claim 1, further comprising a response control unit.

8. A registration voice output means for outputting a registration voice when the voiceprint pattern is registered in the voiceprint registration means, and a voiceprint pattern of a voice uttered by a user following the registration voice for the voiceprint registration. The image / speech response device according to claim 7, further comprising: registration control means for registering to the means.

9. A person image is displayed on a display means by computer graphics or a natural image, linguistic information included in a user's voice is recognized, and a response content corresponding to the recognition result is output at the same time. An image characterized by moving the mouth portion of the person image displayed on the display means in accordance with the audio output.
Voice response method.