JP3621861B2

JP3621861B2 - Conversation information presentation method and immersive virtual communication environment system

Info

Publication number: JP3621861B2
Application number: JP2000037462A
Authority: JP
Inventors: 修平織田; 貴史八木; 聡石橋
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2000-02-16
Filing date: 2000-02-16
Publication date: 2005-02-16
Anticipated expiration: 2020-02-16
Also published as: JP2001228794A

Description

【０００１】
【発明の属する技術分野】
本発明は、会話情報提示方法及び没入型仮想コミュニケーション環境システムに関し、特に、複数の表示装置がユーザを囲むように配置された没入型仮想環境において、聴覚障害者の会話支援等を目的とし、ユーザの発話内容を文字画像提示する会話情報提示技術に適用して有効な技術に関するものである。
【０００２】
【従来の技術】
従来、没入型仮想環境が体験できる没入型多面ディスプレイシステムがある。この没入型多面ディスプレイシステムは、本来、シミュレーション等の可視化環境として開発されたものである。近年ではそれをネットワークで接続し、コミュニケーション環境として利用する研究が盛んに行われている。没入型多面ディスプレイシステムは複数のスクリーン（表示装置）を前後左右上下等に配置し、ユーザを映像で囲むような構造になっており、高い臨場感を得ることができる。
【０００３】
このような没入型仮想コミュニケーション環境では、ユーザはアバタ（分身）となり立体的な仮想世界中を自由に歩き回ることができ、前後上下左右方向を見ることができ、他アバタ（他ユーザ）と遭遇したときに会話の場を持つことができる。このとき、発話ユーザの会話情報は音声で提示される。
【０００４】
このような没入型仮想コミュニケーション環境については、例えば、文献：信学技報、ＭＶＥ９９−４５、ｐｐ．１〜８、１９９９（河野隆志、鈴木由里子、山本憲男、志和新一、石橋聡著、表題“没入型仮想コミュニケーション環境”）に記載されている。
【０００５】
一方、一面ディスプレイで仮想環境を体験できる非没入型ディスプレイシステムにおいては、聴覚障害者支援等の為に会話内容の音声提示に替わる代替手段として、テレビの字幕のように会話情報を文字によって提示する方法がある。この場合、ユーザの視界方向にディスプレイが存在するため、ディスプレイの一部に文字情報を提示することでユーザは会話情報を獲得することができる。
【０００６】
【発明が解決しようとする課題】
しかしながら、非没入型ディスプレイシステムで採用されているテレビの字幕のような文字提示方法は、没入型多面ディスプレイシステムには不向きである。没入型多面ディスプレイシステムはユーザを囲むように複数のディスプレイが設置されており、ユーザが前後上下左右方向を見ることができるため、ユーザの視界方向は特定のディスプレイに固定されない。そのため、ある特定のディスプレイの一部にテレビ字幕のように文字情報を提示する方法では、ユーザがその特定のディスプレイを見ていない場合に会話情報の獲得に失敗するという問題があった。
【０００７】
また、ユーザがそのディスプレイを見ていた場合でも、発話者がユーザの視界にない場合には、ユーザが発話者の位置を把握できず、円滑なコミュニケーションが行えないという問題があった。さらには、ユーザが会話情報の獲得のためにその特定のディスプレイを視界に置こうとすることで、仮想空間内を自由に歩き回れないという問題が発生する。
【０００８】
本発明は、前記課題を解決するためになされたものであり、没入型仮想コミュニケーション環境において、ユーザが仮想空間内を自由に歩き回りながら、他ユーザと会話の場を容易に持つことが可能な会話情報提示技術を提供することを目的とする。
【０００９】
本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述及び添付図面によって明らかにする。
【００１０】
【課題を解決するための手段】
本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、下記のとおりである。
【００１１】
（１）複数の表示装置（ディスプレイ）がユーザを囲むように配置された没入型仮想コミュニケーション環境において、発話ユーザの発話内容を入力する発話入力過程と、前記入力された発話内容から文字情報を生成する文字生成過程と、前記没入型仮想コミュニケーション環境における発話ユーザと受話ユーザの３次元位置と視線ベクトルを抽出する抽出過程と、前記発話ユーザと受話ユーザの３次元位置により発話ユーザと受話ユーザ間の距離を抽出する距離抽出過程と、前記受話ユーザの視線ベクトルとあらかじめ決められた視界角により受話ユーザの視界を抽出する視界抽出過程と、前記発話ユーザと受話ユーザの距離があらかじめ設定された距離内にあるという第１の条件と、前記発話ユーザの３次元位置が前記受話ユーザの視界内にあるという第２の条件との、２つの条件のいずれをも満たす場合は、発話から一定時間、前記発話ユーザの３次元位置の周囲を文字表示画像の提示位置とし、前記２つの条件のいずれかを満たさない場合は、前記発話ユーザの３次元位置の周囲から前記発話ユーザの視線ベクトル方向へ発話からの時間にともなって移動する位置を文字表示画像の提示位置とする提示位置決定過程と、前記生成された文字情報を前記決定された文字表示画像の提示位置に出力する出力過程とを有する会話情報提示方法である。
【００１３】
（２）複数の表示装置（ディスプレイ）がユーザを囲むように配置された没入型仮想コミュニケーション環境システムであって、発話ユーザの発話内容を入力する発話入力手段と、前記入力された発話内容から文字情報を生成する文字生成手段と、前記没入型仮想コミュニケーション環境における発話ユーザと受話ユーザの３次元位置と視線ベクトルを抽出する抽出手段と、前記発話ユーザと受話ユーザの３次元位置により発話ユーザと受話ユーザ間の距離を抽出する距離抽出手段と、前記受話ユーザの視線ベクトルとあらかじめ決められた視界角により受話ユーザの視界を抽出する視界抽出手段と、前記発話ユーザと受話ユーザの距離があらかじめ設定された距離内にあるという第１の条件と、前記発話ユーザの３次元位置が前記受話ユーザの視界内にあるという第２の条件との、２つの条件のいずれをも満たす場合は、発話から一定時間、前記発話ユーザの３次元位置の周囲を文字表示画像の提示位置とし、前記２つの条件のいずれかを満たさない場合は、前記発話ユーザの３次元位置の周囲から前記発話ユーザの視線ベクトル方向へ発話からの時間にともなって移動する位置を文字表示画像の提示位置とする提示位置決定手段と、前記生成された文字情報を前記決定された文字表示画像の提示位置に出力する出力手段とを具備するものである。
【００１５】
前述の手段によれば、没入型仮想コミュニケーション環境内において、ユーザ（例えば、聴覚障害ユーザ）が、発話ユーザの位置と発話内容を把握することができる。これにより、仮想空間内を自由に歩き回りながら、他ユーザと会話の場を容易に持つことができる。
【００１６】
以下に、本発明について、本発明による実施形態（実施例）とともに図面を参照して詳細に説明する。
【００１７】
【発明の実施の形態】
図１は、本発明による一実施形態（実施例）の没入型仮想コミュニケーション環境システムの概略構成を示すブロック構成図である。
【００１８】
図１において、発話入力手段１は、発話ユーザの発話内容を入力し発話内容をデータ化するものである。この発話入力手段１としては、例えば、マイク等の発話音声入力機器やジェスチャー発話に対するモーションキャプチャ動作入力機器を用いる。
【００１９】
文字生成手段２は、発話入力手段１より入力された発話内容データを認識して文字情報に変換生成するものである。この文字生成手段２としては、例えば、前記発話内容データが音声情報であれば音声認識装置を使用し、発話内容データが動作情報であれば動作認識装置を使用する。
【００２０】
ここで、音声認識用ソフトウエアは、例えば、音声認識エンジンＲＥＸ（ＮＴＴ）が知られている。また、動作認識方法は、例えば、文献：信学技報、ＭＶＥ９９−３６、１９９９／７（矢部愽明，その他著、表題“ジェスチャ動画像と意記述単語系列とのネットワーク構造対応に基づくジェスチャ認識”）に記載されている。
【００２１】
抽出手段３−１、３−２は、それぞれ発話ユーザと受話ユーザの仮想環境内における３次元位置と視線ベクトルを抽出するものである。抽出する方法例として、ユーザの身体に位置を検出する位置センサを取りつけて仮想環境内における位置を抽出する方法がある。具体的には、三次元（３Ｄ）メガネに取りつけられた磁気センサや３次元ワンド（磁気センサとスイッチボタンとを備えた棒状のインタフェース装置）に備えられた磁気センサによって、現実の位置・方向情報処理用パーソナルコンピュータ（ＰＣ）等による仮想環境空間におけるユーザの３次元抽出手段を抽出することが考えられる。特に、視線ベクトルはユーザの見ている方向を忠実に抽出するためにもセンサを頭部につけるのがよいと考えられる。位置センサ装置にこだわる必要は無く、検出精度のよいものがいいのはいうまでもない。
【００２２】
ここで、抽出された仮想環境内における発話ユーザの３次元位置をＡ、その位置Ａでの視線ベクトルをベクトルａとし、受話ユーザの３次元位置をＢ、その位置Ｂでの視線ベクトルをベクトルｂとする。また、視線ベクトルａはユーザの３次元位置Ａを起点とした方向ベクトルとする。視線ベクトルｂも同様に３次元位置Ｂを起点とした方向ベクトルとする。
【００２３】
距離抽出手段４は、前記抽出手段３−１及び３−２で得られた発話ユーザの３次元位置Ａと受話ユーザの３次元位置Ｂを入力し、３次元位置Ａ、Ｂ間の距離を計算し、発話ユーザと受話ユーザの距離ｄを抽出するものである。
【００２４】
視界抽出手段５は、前記抽出手段３−２で得られた受話ユーザの視線ベクトルｂを入力し、その視線ベクトルｂを中心軸として、図２のように、あらかじめ設定した視界角Ｒの無限円錐状の視界Ｗを抽出するものである。視界角Ｒは仮想環境内で会話するときに必要と考えられる受話ユーザの視野の角度と定義し、視界Ｗの角度となる。この視界角Ｒは自由自在に調整することができ、視界角Ｒが大きければ大きいほど視界Ｗは広くなる。
【００２５】
提示位置決定手段６は、前記抽出手段３−１で抽出された３次元位置Ａと視線ベクトルａ、前記距離抽出手段４で抽出された距離ｄ及び前記視界抽出手段５で抽出された視界Ｗを入力し、それにもとづいて文字情報の提示位置を決定するものである。
【００２６】
出力手段７は、前記文字生成手段２で生成された文字情報及び提示位置決定手段６で決定された提示位置を入力し、文字情報を提示位置に基づいて出力する出力手段である。
【００２７】
図３は、前記提示位置決定手段６の処理手順を示すフローチャートである。前記提示位置決定手段６の処理手順は、図３に示すように、前記距離抽出手段４から前記発話ユーザと受話ユーザの距離ｄ、前記抽出手段３−１から発話者の３次元位置Ａ及び視界抽出手段５から視界Ｗを入力する。そして、距離Ｄを設定する。あらかじめ設定される距離Ｄは、仮想環境内におけるユーザ間の会話に適すると考えられる距離とする。距離Ｄは長ければ長いほど会話範囲が広くなり遠くにいるユーザとも明確な会話ができるようにすることができる。発話ユーザが、図４のように、受話ユーザからの距離Ｄ内で、かつ視界Ｗ内にいる場合と、それ以外の場合とに分ける順序を、図３に示すように、ｄ≦Ｄの判断及びＡ⊂Ｗの判断で分類する。ｄ≦ＤかつＡ⊂Ｗが成り立つときは、文字情報の提示位置を、発話されてから一定時間Ｔ（設定時間）の間、発話ユーザの周囲と決定する。
【００２８】
一方、ｄ≦ＤあるいはＡ⊂Ｗが成り立たないときは、文字情報の提示位置を発話ユーザの周囲とする。これを初期提示位置とし、ある一定の時間Ｔ（設定時間）が経過すると発話ユーザの視線ベクトルａの方向に文字情報の提示位置を移動させる。
【００２９】
図５及び図６は、文字情報の出力例を示す模式図である。図５は、ｄ≦ＤかつＡ⊂Ｗが成り立つときの出力例であり、吹き出し画像が口元付近に出力され発話内容が吹き出しの中に表示される。また、ｄ≦ＤあるいはＡ⊂Ｗが成り立たないときは、図６のように、吹き出し画像が初期提示位置である口元に表示され、発話内容が吹き出しの中に表示される。
【００３０】
そして、発話してから一定時間Ｔになったときは、発話ユーザのベクトルａの方向へ前記初期提示位置から一定距離Ｐ先の提示位置へ移動させる。さらに時間Ｔの２倍の時間がたつと前記初期提示位置から距離Ｐの２倍の位置と、時間がｎ倍（ｎは自然数）増えるごとに初期提示位置からｎ倍増えた先を提示位置とする。時間Ｔと距離Ｐは自在に設定することができるが吹き出し内の文字が読み取りやすいようにバランスをとる必要がある。また、文字情報の出力には吹き出し以外にも球体、雲のような浮遊体等も考えられる。
【００３１】
すなわち、本実施例の会話情報提示方法は、図１に示すように、複数の表示装置（ディスプレイ）がユーザ（例えば、聴覚障害ユーザ）を囲むように配置された没入型仮想コミュニケーション環境において、発話ユーザの発話内容を発話入力手段１に入力し、この入力された発話内容から文字情報を文字生成手段２で生成し、前記没入型仮想コミュニケーション環境における発話ユーザと受話ユーザの３次元位置と視線ベクトルを抽出手段３−１と抽出手段３−２で抽出する。前記発話ユーザと受話ユーザの３次元位置により発話ユーザと受話ユーザ間の距離を距離抽出手段４で抽出し、前記受話ユーザの視線ベクトルとあらかじめ決められた視界角により受話ユーザの視界を視界抽出手段５で抽出する。前記発話ユーザと受話ユーザ間の距離と前受話ユーザの視界と発話ユーザの３次元位置及び発話ユーザの視線ベクトルに基づいて発話ユーザの文字情報の提示位置を提示位置決定手段６で決定し、前記生成された文字情報を前記決定された提示位置に基づいて出力手段７により出力し、表示装置（ディスプレイ）に表示する。
【００３２】
前記提示位置決定過程は、図３に示すように、前記発話ユーザと受話ユーザの距離があらかじめ設定された距離Ｄ内にあり、かつ発話ユーザの３次元位置が受話ユーザの視界内にある場合は、発話から一定時間Ｔ、発話ユーザの周囲を文字表示画像の提示位置とし、それ以外の場合は、発話ユーザの周囲から視線ベクトル方向へ発話からの時間にともなって移動する位置を提示位置とする。
【００３３】
以上、本発明者によってなされた発明を、前記実施形態（実施例）に基づき具体的に説明したが、本発明は、前記実施形態に限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは勿論である。
【００３４】
【発明の効果】
以上説明したように、本発明によれば、ユーザ（例えば聴覚障害者）が、没入型仮想コミュニケーション環境内において、発話者の位置と発議内容を容易に把握することができ、また、発話者の発話行為も把握しやすくなるので、例えば、仮想空間内を自由に歩き回りながら他ユーザと会話の場を持つことができる。
【００３５】
以上、本発明者によってなされた発明を、前記実施形態に基づき具体的に説明したが、本発明は、前記実施形態に限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは勿論である。
【００３６】
【発明の効果】
以上説明したように、本発明によれば、没入型仮想コミュニケーション環境内において、ユーザ（例えば、聴覚障害ユーザ）が、発話ユーザの位置と発話内容を把握することができる。これにより、仮想空間内を自由に歩き回りながら、他ユーザと会話の場を容易に持つことができる。
【図面の簡単な説明】
【図１】本発明による一実施形態（実施例）の没入型仮想コミュニケーション環境システムの概略構成を示すブロック構成図である。
【図２】本実施例における受話ユーザの視界Ｗの例を示す図である。
【図３】本実施例における提示位置決定手段の処理手順を示すフローチャートである。
【図４】本実施例における会話範囲分類例を示す図である。
【図５】本実施例における文字情報の出力例を示す図である。
【図６】本実施例における文字情報の出力例を示す図である。
【符号の説明】
１…発話入力手段、２…文字生成手段、３−１，３−２…抽出手段、４…距離抽出手段、５…視界抽出手段、６…提示位置決定手段、７…出力手段。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a conversation information presentation method and an immersive virtual communication environment system, and particularly to a conversation support for a hearing impaired person in an immersive virtual environment in which a plurality of display devices are arranged so as to surround the user. The present invention relates to a technique that is effective when applied to a conversation information presentation technique for presenting the content of utterances in a character image.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, there are immersive multi-sided display systems that allow you to experience an immersive virtual environment. This immersive multi-face display system was originally developed as a visualization environment such as simulation. In recent years, active research has been carried out to connect it via a network and use it as a communication environment. The immersive multi-face display system has a structure in which a plurality of screens (display devices) are arranged in the front, rear, left, right, top, and the like, and surrounds the user with images, so that a high presence can be obtained.
[0003]
In such an immersive virtual communication environment, the user becomes an avatar (self-portrait), can freely walk around the three-dimensional virtual world, can see the front, rear, up, down, left and right directions, and has encountered other avatars (other users) Sometimes you can have a conversation. At this time, the conversation information of the speaking user is presented by voice.
[0004]
For such an immersive virtual communication environment, see, for example, Literature: IEICE Technical Report, MVE99-45, pp. 1-8, 1999 (Takashi Kawano, Yuriko Suzuki, Norio Yamamoto, Shinichi Shiwa, Satoshi Ishibashi, titled “Immersive Virtual Communication Environment”).
[0005]
On the other hand, in a non-immersive display system that allows you to experience a virtual environment on a single-sided display, as an alternative to voice presentation of conversation content for the purpose of supporting the hearing impaired, etc., the conversation information is presented as text, such as TV subtitles. There is a way. In this case, since the display exists in the user's field of view, the user can acquire conversation information by presenting text information on a part of the display.
[0006]
[Problems to be solved by the invention]
However, the method of presenting characters such as television subtitles adopted in the non-immersive display system is not suitable for the immersive multi-screen display system. In the immersive multi-face display system, a plurality of displays are installed so as to surround the user, and the user can see the front, rear, up, down, left, and right directions, so the user's viewing direction is not fixed to a specific display. For this reason, the method of presenting text information like a television subtitle on a part of a specific display has a problem that conversation information acquisition fails when the user does not watch the specific display.
[0007]
In addition, even when the user looks at the display, there is a problem that if the speaker is not in the user's field of view, the user cannot grasp the position of the speaker and cannot perform smooth communication. Furthermore, when the user tries to place the specific display in the field of view in order to acquire conversation information, there arises a problem that the user cannot freely walk around in the virtual space.
[0008]
The present invention has been made to solve the above-described problem, and in an immersive virtual communication environment, a user can easily have a place for conversation with other users while freely walking in a virtual space. The purpose is to provide information presentation technology.
[0009]
The above and other objects and novel features of the present invention will become apparent from the description of this specification and the accompanying drawings.
[0010]
[Means for Solving the Problems]
Of the inventions disclosed in this application, the outline of typical ones will be briefly described as follows.
[0011]
(1) In an immersive virtual communication environment in which a plurality of display devices (displays) are arranged so as to surround the user, an utterance input process for inputting the utterance content of the uttered user, and character information is generated from the input utterance content A character generation process, an extraction process of extracting a three-dimensional position and a line-of-sight vector of the uttering user and the receiving user in the immersive virtual communication environment, and a three-dimensional position of the speaking user and the receiving user between the speaking user and the receiving user. A distance extracting process for extracting a distance; a field extracting process for extracting the receiving user's field of view based on the line-of-sight vector of the receiving user and a predetermined viewing angle; and a distance between the speaking user and the receiving user within a predetermined distance. The three-dimensional position of the speaking user is the field of view of the receiving user. If both of the two conditions, ie, the second condition of the utterance, are satisfied, the character display image presentation position is set around the three-dimensional position of the utterance user for a certain time from the utterance. If it does not satisfy this, a presentation position determination process in which the position that moves with the time from the utterance in the direction of the sighting vector of the utterance user from around the three-dimensional position of the utterance user as the presentation position of the character display image , a conversation information presentation method and an output step of output the character information the generated presented the position of the determined character display images.
[0013]
( 2 ) An immersive virtual communication environment system in which a plurality of display devices (displays) are arranged so as to surround a user, utterance input means for inputting the utterance content of the uttered user, and characters from the input utterance content Character generating means for generating information, extraction means for extracting the three-dimensional positions and line-of-sight vectors of the uttering user and the receiving user in the immersive virtual communication environment, and the speaking user and the receiving voice based on the three-dimensional positions of the speaking user and the receiving user Distance extraction means for extracting a distance between users, field extraction means for extracting the field of view of the receiving user based on the line-of-sight vector of the receiving user and a predetermined field of view angle, and a distance between the speaking user and the receiving user are preset. The first condition that the user is within a certain distance and the three-dimensional position of the user is When both of the two conditions, ie, the second condition of being within the field of view of the user, are satisfied, the periphery of the three-dimensional position of the speaking user is set as the presentation position of the character display image for a certain period of time after speaking, If any one of the two conditions is not satisfied, a presentation position in which a position that moves with the time from the utterance in the direction of the sighting user's line of sight from the three-dimensional position of the utterance user is the presentation position of the character display image and determining means, character information the generated intended and an output means for force out the presentation position of the determined character display images.
[0015]
According to the above-described means, the user (for example, a hearing-impaired user) can grasp the position and the content of the utterance user in the immersive virtual communication environment. Thereby, it is possible to easily have a place for conversation with other users while walking around in the virtual space freely.
[0016]
Hereinafter, the present invention will be described in detail with reference to the drawings together with embodiments (examples) according to the present invention.
[0017]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a block diagram showing a schematic configuration of an immersive virtual communication environment system according to an embodiment (example) of the present invention.
[0018]
In FIG. 1, an utterance input unit 1 inputs utterance contents of a utterance user and converts the utterance contents into data. As the utterance input means 1, for example, an utterance voice input device such as a microphone or a motion capture operation input device for gesture utterance is used.
[0019]
The character generation means 2 recognizes the utterance content data input from the utterance input means 1 and converts it into character information. As the character generating means 2, for example, a speech recognition device is used if the speech content data is speech information, and a motion recognition device is used if the speech content data is motion information.
[0020]
Here, for example, a speech recognition engine REX (NTT) is known as the speech recognition software. Also, the motion recognition method is, for example, literature: IEICE Technical Report, MVE99-36, 1999/7 (Yamabe Yasuaki, et al., Title “Gesture Recognition Based on Network Structure Correspondence between Gesture Motion Image and Meaning Word Sequence” ")It is described in.
[0021]
The extraction means 3-1 and 3-2 extract the three-dimensional position and the line-of-sight vector in the virtual environment of the speaking user and the receiving user, respectively. As an example of the extraction method, there is a method of extracting a position in a virtual environment by attaching a position sensor for detecting a position to the user's body. Specifically, the actual position / direction information is obtained by a magnetic sensor attached to three-dimensional (3D) glasses or a magnetic sensor provided in a three-dimensional wand (a bar-shaped interface device including a magnetic sensor and a switch button). It is conceivable to extract a user's three-dimensional extraction means in a virtual environment space by a processing personal computer (PC) or the like. In particular, it is considered that a sensor should be attached to the head in order to faithfully extract the direction of the user's viewing direction. Needless to say, it is not necessary to stick to the position sensor device, and it is needless to say that a device with good detection accuracy is preferable.
[0022]
Here, the three-dimensional position of the speaking user in the extracted virtual environment is A, the line-of-sight vector at the position A is the vector a, the three-dimensional position of the receiving user is B, and the line-of-sight vector at the position B is the vector b. And The line-of-sight vector a is a direction vector starting from the user's three-dimensional position A. Similarly, the line-of-sight vector b is a direction vector starting from the three-dimensional position B.
[0023]
The distance extraction means 4 inputs the three-dimensional position A of the utterance user and the three-dimensional position B of the reception user obtained by the extraction means 3-1 and 3-2, and calculates the distance between the three-dimensional positions A and B. Then, the distance d between the uttering user and the receiving user is extracted.
[0024]
The field-of-view extraction unit 5 inputs the line-of-sight vector b of the receiver user obtained by the extraction unit 3-2, and uses the line-of-sight vector b as the central axis, as shown in FIG. The field of view W is extracted. The view angle R is defined as the angle of the visual field of the receiving user that is considered necessary when talking in a virtual environment, and is the angle of the view W. The view angle R can be freely adjusted. The larger the view angle R, the wider the view W.
[0025]
The presentation position determination means 6 includes the three-dimensional position A and the line-of-sight vector a extracted by the extraction means 3-1, the distance d extracted by the distance extraction means 4, and the field of view W extracted by the field of view extraction means 5. The input position of the character information is determined based on the input.
[0026]
The output means 7 is an output means for inputting the character information generated by the character generation means 2 and the presentation position determined by the presentation position determination means 6 and outputting the character information based on the presentation position.
[0027]
FIG. 3 is a flowchart showing a processing procedure of the presentation position determination means 6. As shown in FIG. 3, the processing procedure of the presentation position determining means 6 includes the distance d from the distance extracting means 4 to the distance d between the uttering user and the receiving user, and the extracting means 3-1 from the three-dimensional position A and the field of view of the speaker. The field of view W is input from the extraction means 5. Then, the distance D is set. The distance D set in advance is a distance that is considered suitable for conversation between users in a virtual environment. The longer the distance D, the wider the conversation range, and a clear conversation can be made with a user who is far away. As shown in FIG. 3, the order in which the speaking user is divided into the case where the speaking user is within the distance D from the receiving user and within the field of view W as shown in FIG. And A⊂W. When d ≦ D and A⊂W holds, the presentation position of the character information is determined to be around the uttering user for a certain time T (set time) after the utterance.
[0028]
On the other hand, when d ≦ D or A⊂W does not hold, the presentation position of the character information is set around the uttering user. With this as the initial presentation position, when a certain time T (set time) elapses, the presentation position of the character information is moved in the direction of the line-of-sight vector a of the speaking user.
[0029]
5 and 6 are schematic diagrams showing examples of outputting character information. FIG. 5 shows an output example when d ≦ D and A⊂W holds. A speech balloon image is output near the mouth and the utterance content is displayed in the speech balloon. When d ≦ D or A⊂W does not hold, as shown in FIG. 6, the balloon image is displayed in the mouth as the initial presentation position, and the utterance content is displayed in the balloon.
[0030]
Then, when the predetermined time T has elapsed since the utterance, the utterance user is moved in the direction of the vector a from the initial presentation position to the presentation position of a predetermined distance P ahead. When the time T is twice as long as time T, the position that is twice the distance P from the initial presentation position, and the point that increases n times from the initial presentation position every time n times (n is a natural number) is the presentation position. To do. Although the time T and the distance P can be set freely, it is necessary to balance them so that the characters in the balloon are easy to read. In addition to the speech balloon, a sphere, a floating body such as a cloud, etc. can be considered for outputting character information.
[0031]
That is, as shown in FIG. 1, the conversation information presentation method according to the present embodiment is used in an immersive virtual communication environment in which a plurality of display devices (displays) are arranged so as to surround a user (for example, a hearing-impaired user). The user's utterance content is input to the utterance input means 1, character information is generated from the input utterance content by the character generation means 2, and the three-dimensional positions and line-of-sight vectors of the utterance user and the reception user in the immersive virtual communication environment. Is extracted by the extraction means 3-1 and the extraction means 3-2. The distance extraction means 4 extracts the distance between the speaking user and the receiving user based on the three-dimensional position of the speaking user and the receiving user, and the visual field extracting means indicates the visual field of the receiving user based on the line-of-sight vector of the receiving user and a predetermined viewing angle. Extract with 5. Based on the distance between the uttering user and the receiving user, the field of view of the previous receiving user, the three-dimensional position of the speaking user, and the line-of-sight vector of the speaking user, the presentation position of the character information of the speaking user is determined by the presentation position determining means 6, The generated character information is output by the output means 7 based on the determined presentation position and displayed on the display device (display).
[0032]
In the presentation position determination process, as shown in FIG. 3, when the distance between the speaking user and the receiving user is within a preset distance D, and the three-dimensional position of the speaking user is within the field of view of the receiving user For a certain time T from the utterance, the position around the utterance user is set as the presentation position of the character display image. In other cases, the position that moves with the time from the utterance in the direction of the line of sight from the utterance user is set as the presentation position. .
[0033]
As mentioned above, the invention made by the present inventor has been specifically described based on the above-described embodiment (example), but the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the scope of the invention. Of course, it can be changed.
[0034]
【The invention's effect】
As described above, according to the present invention, a user (for example, a hearing-impaired person) can easily grasp the position and content of a speaker in an immersive virtual communication environment. Since it becomes easy to grasp the utterance action, for example, it is possible to have a place of conversation with other users while freely walking in the virtual space.
[0035]
As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Of course.
[0036]
【The invention's effect】
As described above, according to the present invention, in the immersive virtual communication environment, a user (for example, a hearing-impaired user) can grasp the position and the content of the utterance user. Thereby, it is possible to easily have a place for conversation with other users while walking around in the virtual space freely.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of an immersive virtual communication environment system according to an embodiment (example) of the present invention.
FIG. 2 is a diagram illustrating an example of a field of view W of a receiving user in the present embodiment.
FIG. 3 is a flowchart illustrating a processing procedure of a presentation position determination unit in the present embodiment.
FIG. 4 is a diagram showing an example of conversation range classification in the present embodiment.
FIG. 5 is a diagram illustrating an output example of character information in the embodiment.
FIG. 6 is a diagram illustrating an output example of character information in the present embodiment.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Speech input means, 2 ... Character production | generation means, 3-1, 3-2 ... Extraction means, 4 ... Distance extraction means, 5 ... Visual field extraction means, 6 ... Presentation position determination means, 7 ... Output means.

Claims

In an immersive virtual communication environment in which multiple display devices are arranged to surround a user,
An utterance input process for inputting the utterance content of the utterance user,
A character generation process for generating character information from the input utterance content;
An extraction process for extracting a three-dimensional position and a line-of-sight vector of the speaking user and the receiving user in the immersive virtual communication environment ;
A distance extracting step of extracting a distance between the speaking user and the receiving user based on a three-dimensional position of the speaking user and the receiving user;
A field of view extraction process for extracting the field of view of the receiving user based on the line-of-sight vector of the receiving user and a predetermined viewing angle;
The first condition that the distance between the speaking user and the receiving user is within a preset distance, and the second condition that the three-dimensional position of the speaking user is within the field of view of the receiving user If both of the conditions are satisfied, the area around the three-dimensional position of the utterance user is set as the presentation position of the character display image for a certain time from the utterance. If either of the two conditions is not satisfied, 3 of the utterance user A presentation position determination process in which a position that moves with time from utterance in the direction of the sighting vector of the utterance user from around the three-dimensional position is set as the presentation position of the character display image ;
Conversation information presentation method and an outputting step of output the character information the generated presented the position of the determined character display images.

An immersive virtual communication environment system in which a plurality of display devices are arranged to surround a user,
Utterance input means for inputting the utterance content of the utterance user;
Character generation means for generating character information from the input utterance content;
Extraction means for extracting a three-dimensional position and a line-of-sight vector of the speaking user and the receiving user in the immersive virtual communication environment ;
A distance extracting means for extracting a distance between the speaking user and the receiving user based on the three-dimensional position of the speaking user and the receiving user;
A field of view extracting means for extracting the field of view of the receiving user based on the line of sight vector of the receiving user and a predetermined viewing angle;
The first condition that the distance between the speaking user and the receiving user is within a preset distance, and the second condition that the three-dimensional position of the speaking user is within the field of view of the receiving user If both of the conditions are satisfied, the area around the three-dimensional position of the utterance user is set as the presentation position of the character display image for a certain time from the utterance. If either of the two conditions is not satisfied, 3 of the utterance user A presentation position determining unit that sets a position that moves with the time from the utterance in the direction of the gaze vector of the utterance user from around the three-dimensional position as a presentation position of the character display image ;
Immersive Virtual Communication Environment system characterized by comprising output means for output the character information the generated presented the position of the determined character display images.